首页 > 解决方案 > 用 Pickle5 代替 PyArrow 的序列化/反序列化?

问题描述

似乎 PyArrow 已弃用pa.serialize()andpa.deserialize()方法并建议使用其他选项,例如 Pickle5

使用 Pickle5(协议 5)似乎与 PyArrow 不推荐使用的序列化方法具有相同的性能。和 有任何适当的替代品pa.serialize()pa.deserialize()

这是一个简化的代码,比较了 PyArrow 和 Pickle 在序列化/反序列化时的区别:

import time

import numpy as np
import pickle5
import pyarrow as pa


class Person:
    def __init__(self, Thumbnail: np.ndarray = None):
        if Thumbnail is not None:
            self.Thumbnail: np.ndarray = Thumbnail
        else:
            self.Thumbnail: np.ndarray = np.random.rand(256, 256, 3)


def serialize_Person(person):
    return {'Thumbnail': person.Thumbnail}


def deserialize_Person(person):
    return Person(person['Thumbnail'])


context = pa.SerializationContext()
context.register_type(Person, 'Person', custom_serializer=serialize_Person, custom_deserializer=deserialize_Person)

PERSONS = [Person() for i in range(100)]

"""
PyArrow
"""
t1 = time.time()
persons_serialized = pa.serialize(PERSONS, context=context).to_buffer()
persons_deserialized = pa.deserialize(persons_serialized, context=context)
t2 = time.time()
print(f'PyArrow Time => {t2 - t1}')

"""
Pickle
"""
t1 = time.time()
persons_pickled = pickle5.dumps(PERSONS, protocol=5)
persons_depickled = pickle5.loads(persons_pickled)
t2 = time.time()
print(f'Pickle Time => {t2 - t1}')

我系统上的输出是:

PyArrow Time => 0.04499983787536621
Pickle Time => 0.2220008373260498

标签: pythonserializationpicklepyarrowapache-arrow

解决方案


推荐阅读