首页 > 解决方案 > 将 dict 的值与数据帧的两列的值匹配,并将第三列的值替换为 dict 的键

问题描述

我有一个像这样的熊猫数据框:

Index | Line Item                   |        Insertion Order                    | Creative Type
_________________________________________________________________________________________________
1     | blbl 33 dEs '300x600' Q3    | hello 444                                 | UNKNOWN
2     | QQQ4 Hello trueview Apple   | something 68793274                        | UNKNOWN
3     |   A useless  string         | pre-roll Video <10 tttt 89 CASIO          | UNKNOWN
4     | Something not in dict       | Neither here                              | UNKNOWN

还有这样的字典:

 dct = {
'RISING STARS': ['300x600', 'Box 300x600', '300x250', 'Box 300x250', 'Classic Skin', 'Main Banner', 'Half Banner', 'Masthead', 'Push Bar', 'Strip', 'In Image', 'Mix formati display rising'],
'VIDEO': ['trueview', 'Video Banner', 'Video in Picture', 'Videobox', 'Mid-roll Video', 'Pre-roll+Inread', 'Pre-roll Video <10', 'Pre-roll Video =10', 'Pre-roll Video =15', 'Pre-roll Video =20', 'Pre-roll Video =30' ,'Pre-roll Video >30','Inread / Intext / Outstream','Mix formati video','Post-roll Video','Inread XXX (Landscape/Vertical/Square)', 'Pre-roll Video Sponsored Session' ,'Pre-roll Video Viewmax' ,'Pre-roll Video Takeover']}

我想替换我的数据框的创意类型列中的值:如果列的值Line ItemInsertion Order匹配字典的值,则列的相应行Creative Type应采用字典键的名称。如果不匹配,则列广告类型的相应行应接收值NaN

预期的输出是:

Index | Line Item                   |        Insertion Order                    | Creative Type
_________________________________________________________________________________________________
1     | blbl 33 dEs '300x600' Q3    | hello 444                                 | RISING STARS
2     | QQQ4 Hello trueview Apple   | something 68793274                        | VIDEO
3     |   A useless  string         | pre-roll Video <10 tttt 89 CASIO          | VIDEO
4     | Something not in dict       | Neither here                              | NaN

最简单的方法是什么?(如果可能的话,计算成本更低)

标签: pythonpandasdataframedictionary

解决方案


通过反转给定的键值对来创建替换dict字典,即列表中的每个值将其映射到其对应的键,然后使用Series.replace替换组合列中的字符串Line Item以及Insertion Order替换字典中的对应值匹配,最后mask是无法替换的字符串:

r = {rf'(?i).*?\b{z}\b.*':x for x, y in dct.items() for z in y}
s = df['Line Item'].add(':' + df['Insertion Order'])
df['Creative Type'] = s.replace(r, regex=True).mask(lambda x: x.eq(s))

                   Line Item                   Insertion Order Creative Type
1   blbl 33 dEs '300x600' Q3                         hello 444  RISING STARS
2  QQQ4 Hello trueview Apple                something 68793274         VIDEO
3          A useless  string  pre-roll Video <10 tttt 89 CASIO         VIDEO
4      Something not in dict                      Neither here           NaN

推荐阅读