Skip to content
Rafael Vera Marañón Senior Data Engineer & Data Architect

Project

Data format benchmark with Spark

A Spark benchmark comparing Parquet, Delta Lake, ORC, Avro and JSON.

Data format benchmark with Spark
Context
Public benchmarking project documented on Medium.
Problem
File format decisions often rely on generic guidance rather than hands-on measurements.
Solution
Built a benchmark project to compare read and write behavior across common big data formats.
Impact
Provides a concrete basis for discussing storage format trade-offs.

Stack

Apache SparkScalaJMHDelta LakeParquet

Links

This project keeps the storage format discussion grounded in executable code.