Skip to content
Rafael Vera Marañón Senior Data Engineer & Data Architect

Project

PySpark ETL with S3 persistence

A PySpark ETL project applying SOLID-oriented structure and AWS S3 persistence.

PySpark ETL with S3 persistence
Context
Public PySpark project documented on Medium.
Problem
ETL examples can become hard to maintain when extraction, transformation and persistence concerns are mixed together.
Solution
Structured a PySpark ETL pipeline with clearer responsibilities and persistence to S3.
Impact
Demonstrates attention to maintainable code structure in data engineering workflows.

Stack

PySparkAWS S3PythonETLSOLID

Links

This project is included because the Medium article points to a public implementation repository.