首页 > 解决方案 > 提取元组部分以创建另外两个元组

问题描述

我有这个数据集:

       duplicates              id   userid  timestamp_date
0   (007, us1, us2, 6, 7, 1)    b   us1      1
1   (001, us1, us2, 1, 9, 8)    b   us2      7 
2   (009, us1, us2, 1, 28, 27)  b   us1      8
3   (007, us1, us2, 6, 7, 1)    c   us2      9
4   (009, us2, us1, 1, 29, 28)  c   us4     10.    



d = pd.DataFrame({'duplicates':      [("007", "us1", "us2", 6, 7, 1),  ("001", "us1", "us2", 1, 9, 8), ("009", "us1", "us2", 1, 28, 27), ("007", "us1", "us2", 6, 7, 1), ("009", "us2", "us1", 1, 29, 28)], 
     'id': ["b",   "b",   "b", 'c', "c"], 
     'userid':         ["us1", "us2", "us1", "us2", "us4"],
     "timestamp_date": [1,    7,      8,     9,  10]})

而我要提取的元组是这样的:tuple(a, b, c, d, e, f) -> tuple(a, b, null, e) and tuple (a, c, d, f)。

所以结果应该是:

    duplicates            id
0   (007, us1, null, 7)    b
1   (007, us2, 6, 1)       b
2   (001, us1, null, 9)    b
3   (001, us2, 1, 8)       b
4   (009, us1, null, 28).  b
5   (009, us2, 1, 27)      b
6   (007, us1, null, 7)    c
7   (007, us2, 6, 1)       c
8   (009, us2, null, 29).  c
9   (009, us1, 1, 28)      c

e = pd.DataFrame({'duplicates':      [("007", "us1", null, 7),  ("007", "us2", 6, 1), 
                                      ("001", "us1", null, 9), ("001", "us2", 1, 8),
                                      ("009", "us1", null, 28), ("009", "us2", 1, 27),
                                       ("007", "us1", null, 7),  ("007", "us2", 6, 1),
                                       ("009", "us2", null, 29), ("009", "us1", 1, 28)], 
     'id': ["b",   "b",   "b", "b",   "b",   "b", "c", "c", "c", "c"]})

我不喜欢在没有代码的情况下提出问题,但我真的不知道应该从哪里开始,也找不到其他问题。我尝试将 zip 与 apply() 一起使用,但我不认为是这样,因为我什至无法让运行时错误停止出现。

标签: python-3.xpandastuples

解决方案


您可以使用.apply()将元组拆分为两个元组的列表,然后.explode()

d = (d.assign(duplicates=d['duplicates'].apply(lambda x: [(x[0], x[1], None, x[4]), (x[0], x[2], x[3], x[5])]))
      .explode('duplicates')
      .drop(columns=['userid', 'timestamp_date']))

print(d)

印刷:

             duplicates id
0   (007, us1, None, 7)  b
0      (007, us2, 6, 1)  b
1   (001, us1, None, 9)  b
1      (001, us2, 1, 8)  b
2  (009, us1, None, 28)  b
2     (009, us2, 1, 27)  b
3   (007, us1, None, 7)  c
3      (007, us2, 6, 1)  c
4  (009, us2, None, 29)  c
4     (009, us1, 1, 28)  c

推荐阅读