首页 > 解决方案 > 在python中按行来自两个数据帧的不常见值

问题描述

我有两个数据框df1df2. 两者中的第一列都是客户 ID,它是int,但其他列包含各种字符串值。我想df3为每个客户 ID 生成一个新的数据框,其中包含一组df2df1.

例子:

df1

     v1 v2 v3 v4
cust            
1     A  B  B  A
2     A  A  A  A
3     B  B  A  A
4     B  C  A  A

df2

     v1 v2 v3 v4
cust            
1     A  A  C  B
2     A  A  C  B
3     C  B  B  A
4     C  B  B  A

预期输出:

cust
1       {C}
2    {B, C}
3       {C}
4        {}

标签: pythonpandasdataframe

解决方案


In [2]: df_2 = pd.DataFrame({"KundelID" : list(range(1,11)),
   ...:               'V1' : list('AACCBBBCCC'),
   ...:               'V2' : list('AABBBCCCAA'),
   ...:               'V3' : list('CCBBBBBAAB'),
   ...:               'V4' : list('BBAACAAAAB')})
   ...: df_1 = pd.DataFrame({"KundelID" : list(range(1,11)),
   ...:               'V1' : list('AABBCCCCCC'),
   ...:               'V2' : list('BABCCCCAAA'),
   ...:               'V3' : list('BAAAAABBBB'),
   ...:               'V4' : list('AAAACCCCBB')})

In [3]: df_1
Out[3]: 
   KundelID V1 V2 V3 V4
0         1  A  B  B  A
1         2  A  A  A  A
2         3  B  B  A  A
3         4  B  C  A  A
4         5  C  C  A  C
5         6  C  C  A  C
6         7  C  C  B  C
7         8  C  A  B  C
8         9  C  A  B  B
9        10  C  A  B  B

In [4]: df_2
Out[4]: 
   KundelID V1 V2 V3 V4
0         1  A  A  C  B
1         2  A  A  C  B
2         3  C  B  B  A
3         4  C  B  B  A
4         5  B  B  B  C
5         6  B  C  B  A
6         7  B  C  B  A
7         8  C  C  A  A
8         9  C  A  A  A
9        10  C  A  B  B

In [7]: pd.DataFrame({"KundeID" : df_2.KundelID,
   ...:             'Not-in-df_1' : [','.join([i for i in df_2_ if not i in df_1_]) if [i for i in df_2_ if not i in df_1_] else None for df_1_,df_2_ in zip(df_1.T[1:].apply(np.unique), df_2.T[1:].apply(np.unique))]})
Out[7]: 
   KundeID Not-in-df_1
0        1           C
1        2         B,C
2        3           C
3        4        None
4        5           B
5        6           B
6        7           A
7        8        None
8        9        None
9       10        None



推荐阅读