Project

Data format benchmark with Spark

A Spark benchmark comparing Parquet, Delta Lake, ORC, Avro and JSON.

Context: Public benchmarking project documented on Medium.
Problem: File format decisions often rely on generic guidance rather than hands-on measurements.
Solution: Built a benchmark project to compare read and write behavior across common big data formats.
Impact: Provides a concrete basis for discussing storage format trade-offs.

Stack

Apache SparkScalaJMHDelta LakeParquet

This project keeps the storage format discussion grounded in executable code.