首页 > 解决方案 > 如果列值与另一个 DF 列表中的值匹配,则向 DF 添加值

问题描述

我有一个DF1

+---------------+                                                               
|        colName|
+---------------+
|              a|
|              m|
|              f|
|              o|
+---------------+

还有一个DF2

+---------------+                                                               
|            col|
+---------------+
|    [a,b,b,c,d]|
|      [e,f,g,h]|
|        [i,j,k]|
|    [l,m,n,o,p]|
+---------------+

如果存储的列表中的DF2.col元素在DF1.colName新的 DataFrame(or DF2) 中应该是这样的:

+---------------+---------------+                                                               
|            col|           bool|
+---------------+---------------+
|      [a,b,c,d]|              1|              #Since "a" was in `DF1.colName`
|      [e,f,g,h]|              1|              #Since "f" was in `DF1.colName`
|        [i,j,k]|              0|              #Since no element was not in `DF1.colName`
|    [l,m,n,o,p]|              1|              #Since "f" was in `DF1.colName`
+---------------+---------------+

我以前曾考虑过使用UserDefinedFunction和 Pandas 函数isIn()但无济于事。任何可以帮助我指导的事情都将不胜感激。谢谢你。

标签: pythondataframepyspark

解决方案


您可以将值转换为sets 并使用isdisjoint

s = set(DF1.colName)
DF2['bool'] = DF2['col'].apply(lambda x: not set(x).isdisjoint(s)).astype(int)

print (DF2)
               col  bool
0  [a, b, b, c, d]     1
1     [e, f, g, h]     1
2        [i, j, k]     0
3  [l, m, n, o, p]     1

或使用交集,转换为 bool 为False空集,然后转换为整数以True, False进行1,0映射:

s = set(DF1.colName)
DF2['bool'] = DF2['col'].apply(lambda x: bool(set(s).intersection(x))).astype(int)

print (DF2)
               col  bool
0  [a, b, b, c, d]     1
1     [e, f, g, h]     1
2        [i, j, k]     0
3  [l, m, n, o, p]     1

推荐阅读