首页 > 解决方案 > 1 列有一个 int。另一个有一个整数列表。如何将数据帧转换为这些对的 numpy rec 数组?

问题描述

这是对这个问题的跟进

数百万对单个 int 与一批(2 到 100)个 int 配对的最佳数据类型(就速度/RAM 而言)

它询问存储成对的单个整数的最佳方式是什么:成批的整数。

答案是使用 np.rec,这是一种创建混合类型数组的便捷方法,允许我们将单个数字和批次彼此相邻放置。

该代码的结果如下所示:

rec.array([( 2955637, array([ 2557706,  7612432,  9348232,   462772,  8018521,  1811275,
        9230331,  7023852,  9392270,  4693741,  7854644,  5233547,
       12446986,  9534800,  2133753,  5971332,  2156690, 12031365,
        4433539, 11607217,  3461811,  5361706, 11282946, 14548809,
        8109194,  1199299,  7576507, 12035216,  6635766,  4158077,
        5403991,   212711,  1703853,  2094248,  7005438,   951244,
        6314059, 11616582, 13002385,   761714, 14016603, 14981654,
        8946411, 10050035,   658239,  1693614], dtype=int32)),
           (  822302, array([ 2579065, 14360524,  4489101, 14753709,  7440511,  2202626,
         504487,  8539709,  6309347,  9028007,  4103133,  6899943,
        9391766,  1104058, 10155666,  2845288, 10488737,  1728141,
        3976034, 13648527,  6125367, 14690826,  7387347,  7766092,
        8717468,  4088448,  2051190,  7914318, 14346922, 13792566,
       10343601], dtype=int32)),
           ( 7777177, array([ 7067232, 11850092, 10343145,  2705178,  9676842, 13392954],
      dtype=int32)),
           ( 7094192, array([  667930,  2256509,  2860846,  8740657,  3188292,   616645,
       12264189,  3827714,  1197702, 11838296,  8450768,  6224672,
       10233979,   720212, 13010797, 10508000,   485815,  4040839,
        5690852,  8699534,  7200456,  9946306, 14594793,   406437,
        5148634, 11229656,  5497334,  3438910,  8301374,  9274725,
        4141693,  8846590, 14372346,  1294167,  6341159,  7003319,
        7803775, 13882589,  4289922, 14872568,  8094153,  3783601,
       12847787, 13833383,  2996757, 12961865,  4205083, 12390923,
        5705005,  8842488,  6230348,  5690850,  7154638, 10787173,
       10200101, 13943625,   373645,  5115795,  7105045,   899756,
        6020046], dtype=int32)),
           ( 3913008, array([ 5132516,   309940,  7487946,  2927897,  6294641,   701812,
       11043226,  7788088,  7465944,  2077922, 13552610,  6345947,
         187965, 14830364,  8483266,  8128046,  3227008,  4159033,
       12652217,  1919861,  4529511,  2186353,  7407808,  5604777,
       13500413,   786580,  7588024,   303460, 13426737,  7131729,
        8763962,  5498921, 13099372,  4330432,  5795060,  8424029,
       14073436,  2315788,  5657156, 10177080,  4476134, 13418083,
        6874374,  1786599,  8115421, 11373555,  1186217,  1098336,
         160627,  9177101, 14888415, 11619492, 13326025, 13129137,
       10589806,  2659293,  7845901,  6619936,  1939703,  7692026],
      dtype=int32)),

就我而言,我的数据存储在 pandas 数据框中。对于每一行,一列有一个 int,另一列有一个 python 整数列表。

如何将其转换为上面的 np.rec 数组格式,例如

 rec.array([( int, array([ bunch of ints]) ), (int, array([ bunch of ints]) ), . . . . 

第一对将是第一行,第二对将是第二行,依此类推。

标签: pythonpandasnumpy

解决方案


数据:

data = np.rec.array([( 2955637, np.array([ 2557706,  7612432,  9348232,   462772,  8018521,  1811275,
        9230331,  7023852,  9392270,  4693741,  7854644,  5233547,
       12446986,  9534800,  2133753,  5971332,  2156690, 12031365,
        4433539, 11607217,  3461811,  5361706, 11282946, 14548809,
        8109194,  1199299,  7576507, 12035216,  6635766,  4158077,
        5403991,   212711,  1703853,  2094248,  7005438,   951244,
        6314059, 11616582, 13002385,   761714, 14016603, 14981654,
        8946411, 10050035,   658239,  1693614], dtype=np.int32)),
           (  822302, np.array([ 2579065, 14360524,  4489101, 14753709,  7440511,  2202626,
         504487,  8539709,  6309347,  9028007,  4103133,  6899943,
        9391766,  1104058, 10155666,  2845288, 10488737,  1728141,
        3976034, 13648527,  6125367, 14690826,  7387347,  7766092,
        8717468,  4088448,  2051190,  7914318, 14346922, 13792566,
       10343601], dtype=np.int32))])

数据框:

df = pd.DataFrame(data)

在此处输入图像描述

到 np.rec.array:

d2 = list(zip(df.f0.tolist(), df.f1.tolist()))
d2 = np.rec.array(d2)

最终的:

在此处输入图像描述

print(type(d2))
>>> <class 'numpy.recarray'>

推荐阅读