Project
Data format benchmark with Spark
A Spark benchmark comparing Parquet, Delta Lake, ORC, Avro and JSON.
- Context
- Public benchmarking project documented on Medium.
- Problem
- File format decisions often rely on generic guidance rather than hands-on measurements.
- Solution
- Built a benchmark project to compare read and write behavior across common big data formats.
- Impact
- Provides a concrete basis for discussing storage format trade-offs.
Stack
Apache SparkScalaJMHDelta LakeParquet
Links
This project keeps the storage format discussion grounded in executable code.