python - 将日期框架中列的值与熊猫中其他数据框架的列值进行比较
问题描述
我有两个数据框,如果 df1 的“A”列中的值包含在 df2 中“B”列的单元格中,我想将数据框 df2 中的行插入到 df1。如果是这种情况,那么我想在 df1 中的“A”列的匹配值下方插入行。需要插入的行是根据列“关键字”从 df2 中提取的,就像我从“关键字”列中的一个“测试”到以下“测试”一样。
这是数据框:
df1
A Keyword B C
m55 m32\nm83\nm18 123
m56 m12 546
m68
m32
m83
m65
m73 m77\nm78 558
m23
m98
m77
m18
m4
m12
m78
我有第二个数据框(df2),我从中提取行并插入 df1。我正在从“关键字”列中的“测试”到“测试”中提取行。
df2
Keyword Matches C
test m32\nm83\nm18 123
something
something
something
test
something
something
test m12 546
something
test m77\nm78 558
test
something
所以,最后我需要这个:
df1
A Keyword B C
m55 m32\nm83\nm18 123
m56 m12 546
m68
m32
test m32\nm83\nm18 123
something
something
something
m83
test m32\nm83\nm18 123
something
something
something
m65
m73 m77\nm78 558
m23
m98
m77
test m77\nm78 558
m18
test m32\nm83\nm18 123
something
something
something
m4
m12
test m12 546
something
m78
test m77\nm78 558
这个怎么做?
编辑:
首先我有这个功能:
def insert_row(idx, df, df_insert):
return df.iloc[:idx, ].append(df_insert).append(df.iloc[idx:, ]).reset_index(drop = True)
然后我创建了从 df2 中提取行的函数:
def TestStepsReturn(df, SearchingElement):
TestCaseList = df.index[df["Keyword"] == "test"].tolist()
TestCaseList = np.asarray(TestCaseList)
try:
idx = TestCaseList[TestCaseList <= df.index[df["Matches"].str.contains(SearchingElement)][0]].max()
idx = np.where(TestCaseList == idx)
if idx[0][0]!=(len(TestCaseList)-1):
return df.loc[TestCaseList[idx[0][0]]:TestCaseList[idx[0][0]+1]-1]
return df.loc[TestCaseList[idx[0][0]]:]
except:
return pd.DataFrame(columns=df.columns)#return the empty data frame with the same columns names
然后我创建了另一个使用TestStepsReturn的:
def SerchIDs(dfFidx, df1, df2, SearchingColumn):
for feature in dfFidx:
feature += i
df_new = TestStepsReturn(df2, df1.loc[feature, SearchingColumn])#
df1 = insert_row(feature+1, df1, df_new)
i += int(df_new.size/len(df_new.columns.values))
return df
然后我有代码:
dfFidx = df1.index
df1 = SerchIDs(dfFidx, df1, df2, "A")
解决方案
如果第一行在列中Matches
也有值,则解决方案有效:test
Keyword
#groups for get cumulative sum with comparing test value and not missing values
df2['g1'] = df2['Keyword'].eq('test').cumsum()
df2['g2'] = df2['Matches'].notna().cumsum()
#get only first groups with test
df2 = df2[df2.groupby('g2')['g1'].transform('min') == df2['g1']]
#add index value from df1 with merge
df11 = df1.reset_index()[['index','B']].dropna().rename(columns={'B':'Matches', 'index':'idx'})
df2 = df2.merge(df11, on='Matches', how='left').drop(['g1','g2'], axis=1)
df2['idx'] = df2['idx'].ffill().astype(int)
print (df2)
Keyword Matches C idx
0 test m32\nm83\nm18 123.0 0
1 something NaN NaN 0
2 something NaN NaN 0
3 something NaN NaN 0
4 test m12 546.0 1
5 something NaN NaN 1
6 test m77\nm78 558.0 6
#create dictionary of DataFrames - key is index of df1
d2 = dict(tuple(df2.groupby('idx')))
#print (d2)
d = df1['B'].dropna().to_dict()
d1 = {k: df1.index[df1['A'].str.contains("|".join(v.split("\\n")))] for k, v in d.items()}
#in loop create new index and append to list
L = []
for k, v in d1.items():
df = d2[k]
for x in v:
cum = np.cumsum(np.repeat(1, len(df)) / (len(df) + 1))
df.index = np.repeat(x, len(df)) + cum
L.append(df.copy())
#join all together
df = pd.concat([df1] + L).sort_index().reset_index(drop=True)
print (df)
A B C Keyword Matches idx
0 m55 m32\nm83\nm18 123.0 NaN NaN NaN
1 m56 m12 546.0 NaN NaN NaN
2 m68 NaN NaN NaN NaN NaN
3 m32 NaN NaN NaN NaN NaN
4 NaN NaN 123.0 test m32\nm83\nm18 0.0
5 NaN NaN NaN something NaN 0.0
6 NaN NaN NaN something NaN 0.0
7 NaN NaN NaN something NaN 0.0
8 m83 NaN NaN NaN NaN NaN
9 NaN NaN 123.0 test m32\nm83\nm18 0.0
10 NaN NaN NaN something NaN 0.0
11 NaN NaN NaN something NaN 0.0
12 NaN NaN NaN something NaN 0.0
13 m65 NaN NaN NaN NaN NaN
14 m73 m77\nm78 558.0 NaN NaN NaN
15 m23 NaN NaN NaN NaN NaN
16 m98 NaN NaN NaN NaN NaN
17 m77 NaN NaN NaN NaN NaN
18 NaN NaN 558.0 test m77\nm78 6.0
19 m18 NaN NaN NaN NaN NaN
20 NaN NaN 123.0 test m32\nm83\nm18 0.0
21 NaN NaN NaN something NaN 0.0
22 NaN NaN NaN something NaN 0.0
23 NaN NaN NaN something NaN 0.0
24 m4 NaN NaN NaN NaN NaN
25 m12 NaN NaN NaN NaN NaN
26 NaN NaN 546.0 test m12 1.0
27 NaN NaN NaN something NaN 1.0
28 m78 NaN NaN NaN NaN NaN
29 NaN NaN 558.0 test m77\nm78 6.0
推荐阅读
- haskell - 如何从 Haskell 的 SDL 绑定中的 RenderConfig 中提取值
- python - Python 对象不与类似对象排序
- reactjs - 功能 setState 计数器
- sql - 如何从表中插入数据并对其进行透视
- azure-functions - 具有多个输出绑定的 Azure 函数 - Blob 和 HTTP
- wordpress - WordPress 在 Publish 上发布随机添加带有字体的跨度
- sql - 由 java.sql.SQLException 引起:无效的列索引
- javascript - 如何让我的 JavaScript(如果正确)在倒计时后执行?
- jenkins - Jenkins 如何在远程 slave 上执行代码?
- html - ie 上的 jpg 渲染错误