Thursday, March 24, 2016

Avro vs Parquet: Apple vs Orange!

Comparing Avro vs Parquet is  like comparing Apple vs Orange!
By definition, Avro is is a data serialization system while Parquet is a data storage format. 
The format in which data is stored on disk or sent over the network is different from the format in which it lives in memory. The process of converting data in memory into a format in which it can be stored in disk or sent over the networking is called serialization. The reversing process is called deserialization. 
Data can be serialized using 2 main formats: text format and binary formats. Examples of text formant are CSV, XML, JSON. And Avro is a binary format for data serialization. 
So if you're looking for a way to compress the data in Hadoop ecosystem, then parquet is the best option. 
For more info, visit https://parquet.apache.org.

No comments:

Post a Comment