Projects
Projects
Filter projects by tag

Featured
Agentic Data Quality on Databricks
A metadata-driven data quality pattern on Databricks Free Edition, using AI-assisted checks while keeping the architecture close to production concepts.

Featured
Bronze-Silver-Gold Lakeflow pipeline
A complete Databricks Lakeflow pipeline using public TPC-H data and a medallion architecture.

Featured
Column-level PII encryption on Databricks
A Databricks Free Edition exercise focused on column-level handling of personally identifiable information.

Featured
Tiny analytics agent on Databricks
A small Databricks Free Edition agent that uses a foundation model and a Python execution tool for simple analytics questions.

Airbyte and PostgreSQL replication
A data replication project focused on moving PostgreSQL data with Airbyte.

AWS EMR and Apache Spark data engineering project
Practical Spark processing setup on Amazon EMR.

AWS Lambda ETL to Power BI API
Serverless ETL experiment with Lambda, API Gateway, Docker, FastAPI and Power BI streaming.

Azure Databricks demographics pipeline with Power BI
Medallion-style demographics pipeline on Azure Databricks with Power BI reporting.

BigQuery transformations with dbt, Airflow and Kubernetes
Automated BigQuery transformation workflow using dbt, Airflow, Kubernetes and GitHub Actions.

Controlled OpenClaw setup on Raspberry Pi
Runbook-oriented setup for OpenClaw on a Raspberry Pi, linked to the delta maintenance advisor repository.

Data format benchmark with Spark
A Spark benchmark comparing Parquet, Delta Lake, ORC, Avro and JSON.

Docker CI/CD with Jenkins and SonarQube
A CI/CD exercise using Git, Docker, Jenkins and SonarQube around a small Python application.
FastAPI microservices on Kubernetes
Simple FastAPI REST API deployed and exposed from Kubernetes.

FastAPI on AKS with Terraform
Infrastructure and deployment notes for FastAPI microservices on Azure AKS.

FastAPI on EKS with Terraform
Infrastructure and deployment notes for FastAPI microservices on AWS EKS.

Foreign trade ETL
An ETL project for foreign trade data using dlt, dbt, DuckDB and AWS.

IoT monitoring with Spark Structured Streaming
A smart-farm monitoring system using Scala, Apache Spark Structured Streaming and Kafka.

Marinas MCP Server
A first MCP server proof of concept using FastMCP, Azure Web Apps and MongoDB.

PySpark ETL with S3 persistence
A PySpark ETL project applying SOLID-oriented structure and AWS S3 persistence.

Python, Airflow, Azure and Tableau data pipeline
End-to-end pipeline using Python extraction, Airflow orchestration, Azure services and Tableau analytics.

Real-time sentiment pipeline with Spark, OpenAI, Kafka and Elasticsearch
Streaming sentiment analysis pipeline using Spark, OpenAI, Kafka and Elasticsearch.

Smart City real-time data engineering on AWS
Real-time mobility pipeline using Kafka, Spark, S3, Glue, Redshift, Lambda and Power BI.

SmartCart master final project
Final project for the Data Engineering and Architecture master's degree at EOI.

Snowflake, Airflow, dbt and Cosmos data workflow
Snowflake pipeline orchestrated with Airflow, dbt, Cosmos and Snowpark.

Snowflake, Snowpark and Streamlit analytics app
Snowflake data application work using Snowpark, Streamlit and basic machine learning.

Spark jobs orchestrated with Airflow
Submitting Python and Scala Spark jobs through Airflow.

Spring Boot and Kafka Streams data flow
Kafka producer, stream processor and consumer built with Spring Boot.

SWRO desalination ML on Databricks
Machine learning and distributed processing applied to desalination data using Spark on Databricks.

Syneratech master final project
Final project for the Big Data and Business Analytics master's degree at EOI.