首页 > 解决方案 > Why does HDFS serialize using protocol buffers, not Java serialization APIs?

问题描述

Why does HDFS use a protocol buffer instead of the Java serialization API?

What if I want to send an object from a data node to another data node via Java serialization?

I've tried a couple of things and I get the following error: java.io.WriteAbortedException: writing aborted; java.io.NotSerializableException: java.lang.Thread

标签: javahadoopserializationhdfsprotocol-buffers

解决方案


Because formats with external schema definition like Protocol Buffers are more space efficient then build-in Java serialization which produces a very verbose file.

HDFS can use different format to store the data. Formats that provide best space efficiency while not being overly CPU intensive are generally preferred. Some format are designed for a specific goal which helps with data processing:

  • Avro which is row oriented
  • Parquet which is column oriented

java.io.NotSerializableException: java.lang.Thread exception shows that you are attempting to serialize Thread which is not implementing Serializable


推荐阅读