Skip to content
Rafael Vera Marañón Senior Data Engineer & Data Architect

Projects

Projects

Filter projects by tag

Agentic Data Quality on Databricks

Featured

Agentic Data Quality on Databricks

A metadata-driven data quality pattern on Databricks Free Edition, using AI-assisted checks while keeping the architecture close to production concepts.

DatabricksData qualityMetadataAI
Bronze-Silver-Gold Lakeflow pipeline

Featured

Bronze-Silver-Gold Lakeflow pipeline

A complete Databricks Lakeflow pipeline using public TPC-H data and a medallion architecture.

LakehouseDatabricksPipeline designSpark
Column-level PII encryption on Databricks

Featured

Column-level PII encryption on Databricks

A Databricks Free Edition exercise focused on column-level handling of personally identifiable information.

SecurityDatabricksPIISpark
Tiny analytics agent on Databricks

Featured

Tiny analytics agent on Databricks

A small Databricks Free Edition agent that uses a foundation model and a Python execution tool for simple analytics questions.

DatabricksAIAgentsMLflow
Airbyte and PostgreSQL replication

Airbyte and PostgreSQL replication

A data replication project focused on moving PostgreSQL data with Airbyte.

ETLPostgreSQLReplication
AWS EMR and Apache Spark data engineering project

AWS EMR and Apache Spark data engineering project

Practical Spark processing setup on Amazon EMR.

AWSEMRSparkData engineering
AWS Lambda ETL to Power BI API

AWS Lambda ETL to Power BI API

Serverless ETL experiment with Lambda, API Gateway, Docker, FastAPI and Power BI streaming.

AWSLambdaFastAPIPower BI
Azure Databricks demographics pipeline with Power BI

Azure Databricks demographics pipeline with Power BI

Medallion-style demographics pipeline on Azure Databricks with Power BI reporting.

AzureDatabricksLakehousePower BI
BigQuery transformations with dbt, Airflow and Kubernetes

BigQuery transformations with dbt, Airflow and Kubernetes

Automated BigQuery transformation workflow using dbt, Airflow, Kubernetes and GitHub Actions.

GCPBigQuerydbtAirflow
Controlled OpenClaw setup on Raspberry Pi

Controlled OpenClaw setup on Raspberry Pi

Runbook-oriented setup for OpenClaw on a Raspberry Pi, linked to the delta maintenance advisor repository.

AgentsAIAutomationRaspberry Pi
Data format benchmark with Spark

Data format benchmark with Spark

A Spark benchmark comparing Parquet, Delta Lake, ORC, Avro and JSON.

SparkScalaDelta LakeBenchmarking
Docker CI/CD with Jenkins and SonarQube

Docker CI/CD with Jenkins and SonarQube

A CI/CD exercise using Git, Docker, Jenkins and SonarQube around a small Python application.

CI/CDDockerJenkinsPython
FastAPI microservices on Kubernetes

FastAPI microservices on Kubernetes

Simple FastAPI REST API deployed and exposed from Kubernetes.

FastAPIKubernetesDockerMicroservices
FastAPI on AKS with Terraform

FastAPI on AKS with Terraform

Infrastructure and deployment notes for FastAPI microservices on Azure AKS.

AzureTerraformFastAPIKubernetes
FastAPI on EKS with Terraform

FastAPI on EKS with Terraform

Infrastructure and deployment notes for FastAPI microservices on AWS EKS.

AWSTerraformFastAPIKubernetes
Foreign trade ETL

Foreign trade ETL

An ETL project for foreign trade data using dlt, dbt, DuckDB and AWS.

ETLAWSdbtDuckDB
IoT monitoring with Spark Structured Streaming

IoT monitoring with Spark Structured Streaming

A smart-farm monitoring system using Scala, Apache Spark Structured Streaming and Kafka.

StreamingSparkKafkaScala
Marinas MCP Server

Marinas MCP Server

A first MCP server proof of concept using FastMCP, Azure Web Apps and MongoDB.

MCPAzureMongoDBAI
PySpark ETL with S3 persistence

PySpark ETL with S3 persistence

A PySpark ETL project applying SOLID-oriented structure and AWS S3 persistence.

PySparkAWSETLPython
Python, Airflow, Azure and Tableau data pipeline

Python, Airflow, Azure and Tableau data pipeline

End-to-end pipeline using Python extraction, Airflow orchestration, Azure services and Tableau analytics.

AzureAirflowPythonTableau
Real-time sentiment pipeline with Spark, OpenAI, Kafka and Elasticsearch

Real-time sentiment pipeline with Spark, OpenAI, Kafka and Elasticsearch

Streaming sentiment analysis pipeline using Spark, OpenAI, Kafka and Elasticsearch.

SparkKafkaAIOpenAI
Smart City real-time data engineering on AWS

Smart City real-time data engineering on AWS

Real-time mobility pipeline using Kafka, Spark, S3, Glue, Redshift, Lambda and Power BI.

AWSKafkaSparkStreaming
SmartCart master final project

SmartCart master final project

Final project for the Data Engineering and Architecture master's degree at EOI.

Data engineeringArchitectureFastAPIKafka
Snowflake, Airflow, dbt and Cosmos data workflow

Snowflake, Airflow, dbt and Cosmos data workflow

Snowflake pipeline orchestrated with Airflow, dbt, Cosmos and Snowpark.

SnowflakeAirflowdbtSnowpark
Snowflake, Snowpark and Streamlit analytics app

Snowflake, Snowpark and Streamlit analytics app

Snowflake data application work using Snowpark, Streamlit and basic machine learning.

SnowflakeSnowparkStreamlitPython
Spark jobs orchestrated with Airflow

Spark jobs orchestrated with Airflow

Submitting Python and Scala Spark jobs through Airflow.

SparkAirflowPythonScala
Spring Boot and Kafka Streams data flow

Spring Boot and Kafka Streams data flow

Kafka producer, stream processor and consumer built with Spring Boot.

JavaSpring BootKafkaStreaming
SWRO desalination ML on Databricks

SWRO desalination ML on Databricks

Machine learning and distributed processing applied to desalination data using Spark on Databricks.

DatabricksSparkMachine learningAI
Syneratech master final project

Syneratech master final project

Final project for the Big Data and Business Analytics master's degree at EOI.

Big DataAnalyticsPostgreSQLNeo4j