首页 > 解决方案 > 根据名称列表更改 pandas 列中的文本

问题描述

背景

我有以下示例df

import pandas as pd
Names =    [list(['Jon', 'Mith', 'jon', 'John']),
           list(['Mark', 'Marky', 'marcs']),
           list(['Bob', 'bobby', 'Bobs'])]
df = pd.DataFrame({'Text' : ['Jon J Mmith is Here and jon John from ', 
                                   'When he came Mark was Marky but not marcs so', 
                                   'I like Bob and bobby and also Bobs diner '], 

                      'P_ID': [1,2,3], 
                      'P_Name' : Names

                     })

#rearrange columns
df = df[['Text', 'P_ID', 'P_Name']]
df


    Text                                       P_ID  P_Name
0   Jon J Mmith is Here and jon John from       1   [Jon, Smith, jon, John]
1   When he came Mark was Marky but not marcs   2   [Mark, Marky, marcs]
2   I like Bob and bobby and also Bobs diner    3   [Bob, bobby, Bobs]

df是此处看到的“老问题”的变体Alter text in pandas column based on names。我的新问题和“新问题”的唯一区别df是列中名称的格式,P_Name如下所示:

 #old names from old question
 array(['Mmith, Jon J', 'Hider, Mary', 'Doe, Jane Ann', 'Tucker, Tom'], dtype=object) 
 #new names from new question
 array([list(['Jon', 'Smith', 'jon', 'John']),
        list(['Mark', 'Marky', 'marcs']), list(['Bob', 'bobby', 'Bobs'])], dtype=object)

目标

Text列中,添加 **BLOCK**与在中找到的值相对应的值(例如[Jon, Mmith, jon, John]P_Name

问题

当我使用“老问题”中的解决方案时,该解决方案取自根据名称更改 pandas 列中的文本

 df['Text'].replace(df['P_Name'].str.split(', *').apply(lambda l: ' '.join(l[::-1])),'**BLOCK**',regex=True)

我收到以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-79-895f7ea46849> in <module>()
----> 1 df['Text'].replace(df['P_Name'].str.split(', *').apply(lambda l: ' '.join(l[::-1])),'**BLOCK**',regex=True)



/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   2353             else:
   2354                 values = self.asobject
-> 2355                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2356 
   2357         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer (pandas/_libs/lib.c:66645)()

<ipython-input-79-895f7ea46849> in <lambda>(l)
----> 1 df['Text'].replace(df['P_Name'].str.split(', *').apply(lambda l: ' '.join(l[::-1])),'**BLOCK**',regex=True)

TypeError: 'float' object is not subscriptable

期望的结果

我想要以下内容,类似于“老问题”中的答案Alter text in pandas column based on names

               Text                                       P_ID  P_Name
0   **BLOCK** J **BLOCK** is Here and **BLOCK** **BLOCK** from       1   [Jon, Smith, jon, John]
1   When he came **BLOCK** was **BLOCK** but not **BLOCK**         2   [Mark, Marky, marcs]
2   I like **BLOCK** and **BLOCK** and also **BLOCK** diner        3   [Bob, bobby, Bobs]

问题

鉴于我的P_Name列现在包含列表列表,我如何实现我想要的结果?

标签: regexpandaslisttextapply

解决方案


IIUC,您需要series.replace将列表作为参数:

to_replace :str、regex、list、dict、Series、int、float 或 None

df=df.assign(Text=df.Text.replace(df.P_Name,'**BLOCK**',regex=True))

推荐阅读