python - 如何在 Python 的 pandas 数据框中的新列中将所有名称与常见的 3 字母集映射?
问题描述
我有一个熊猫数据框df
,如下所示:
Unnamed: 0 Characters Length Characters Split A B C D Names with common 3-letters
0 FROKDUWJU 9 [FRO, KDU, WJU] FRO KDU WJU NaN
1 IDJWPZSUR 9 [IDJ, WPZ, SUR] IDJ WPZ SUR NaN
2 UCFURKIRODCQ 12 [UCF, URK, IRO, DCQ] UCF URK IRO DCQ
3 ORI 3 [ORI] ORI NaN NaN NaN
4 PROIRKIQARTIBPO 15 [PRO, IRK, IQA, RTI, BPO] PRO IRK IQA RTI
5 QAZWREDCQIBR 12 [QAZ, WRE, DCQ, IBR] QAZ WRE DCQ IBR
6 PLPRUFSWURKI 12 [PLP, RUF, SWU, RKI] PLP RUF SWU RKI
7 FROIEUSKIKIR 12 [FRO, IEU, SKI, KIR] FRO IEU SKI KIR
8 ORIUWJZSRFRO 12 [ORI, UWJ, ZSR, FRO] ORI UWJ ZSR FRO
9 URKIFJVUR 9 [URK, IFJ, VUR] URK IFJ VUR NaN
10 RUFOFR 6 [RUF, OFR] RUF OFR NaN NaN
11 IEU 3 [IEU] IEU NaN NaN NaN
12 PIMIEU 6 [PIM, IEU] PIM IEU NaN NaN
在最后一列,具有常见 3 个字母的名称中,我想列出第一列中在其名称中设置了常见 3 个字母的所有名称。例如,在第一行,我想列出所有名称中包含 FRO、KRU 和 WJU 的名称。这些名称的 3 个字母拆分也可以在“字符拆分”或 A、B、C 和 D 列中找到以供参考。
为了逐步进行讨论,我需要扫描给定行中名称中存在的 3 字母集是否也存在于其余行中的任何名称中。如果它存在,我需要将另一行的相应名称作为列表添加到“具有常见 3 个字母的名称”列中。例如,在随附的屏幕截图中,在 C 列中,黄色突出显示的单元格的名称具有共同的 3 字母集,并且名称在同一行中。
实现此目的的适当方法是什么?我应该使用函数还是循环语句?
注意:df.to_dict()
看起来如下:
{'Unnamed: 0': {0: 'FROKDUWJU',
1: 'IDJWPZSUR',
2: 'UCFURKIRODCQ',
3: 'ORI',
4: 'PROIRKIQARTIBPO',
5: 'QAZWREDCQIBR',
6: 'PLPRUFSWURKI',
7: 'FROIEUSKIKIR',
8: 'ORIUWJZSRFRO',
9: 'URKIFJVUR',
10: 'RUFOFR',
11: 'IEU',
12: 'PIMIEU'},
'Characters Length': {0: 9,
1: 9,
2: 12,
3: 3,
4: 15,
5: 12,
6: 12,
7: 12,
8: 12,
9: 9,
10: 6,
11: 3,
12: 6},
'Characters Split': {0: ['FRO', 'KDU', 'WJU'],
1: ['IDJ', 'WPZ', 'SUR'],
2: ['UCF', 'URK', 'IRO', 'DCQ'],
3: ['ORI'],
4: ['PRO', 'IRK', 'IQA', 'RTI', 'BPO'],
5: ['QAZ', 'WRE', 'DCQ', 'IBR'],
6: ['PLP', 'RUF', 'SWU', 'RKI'],
7: ['FRO', 'IEU', 'SKI', 'KIR'],
8: ['ORI', 'UWJ', 'ZSR', 'FRO'],
9: ['URK', 'IFJ', 'VUR'],
10: ['RUF', 'OFR'],
11: ['IEU'],
12: ['PIM', 'IEU']},
'A': {0: 'FRO',
1: 'IDJ',
2: 'UCF',
3: 'ORI',
4: 'PRO',
5: 'QAZ',
6: 'PLP',
7: 'FRO',
8: 'ORI',
9: 'URK',
10: 'RUF',
11: 'IEU',
12: 'PIM'},
'B': {0: 'KDU',
1: 'WPZ',
2: 'URK',
3: nan,
4: 'IRK',
5: 'WRE',
6: 'RUF',
7: 'IEU',
8: 'UWJ',
9: 'IFJ',
10: 'OFR',
11: nan,
12: 'IEU'},
'C': {0: 'WJU',
1: 'SUR',
2: 'IRO',
3: nan,
4: 'IQA',
5: 'DCQ',
6: 'SWU',
7: 'SKI',
8: 'ZSR',
9: 'VUR',
10: nan,
11: nan,
12: nan},
'D': {0: nan,
1: nan,
2: 'DCQ',
3: nan,
4: 'RTI',
5: 'IBR',
6: 'RKI',
7: 'KIR',
8: 'FRO',
9: nan,
10: nan,
11: nan,
12: nan},
'Names with common 3-letters': {0: '',
1: '',
2: '',
3: '',
4: '',
5: '',
6: '',
7: '',
8: '',
9: '',
10: '',
11: '',
12: ''}}
解决方案
可能有一种更快的方法来搜索和创建列表,但这有效:
# create a different temporary, column (you can't search the Characters Split column directly as the 3 letter combinations aren't honored
df['patrn'] = df.apply( lambda x: '|'.join(x['Characters Split']), axis=1)
def find_matches(x):
# print(x.name) # index number
new_df = df[~df.index.isin([x.name])] # all rows except current index
return set(new_df.loc[df['patrn'].str.contains(x['patrn'], case=False)]['Unnamed: 0'].tolist())
df['Names with common 3-letters'] = df.apply(lambda x: find_matches(x), axis=1)
df
输出
Unnamed: 0 Characters Length Characters Split A B C D Names with common 3-letters patrn
0 FROKDUWJU 9 [FRO, KDU, WJU] FRO KDU WJU NaN {FROIEUSKIKIR, ORIUWJZSRFRO} FRO|KDU|WJU
1 IDJWPZSUR 9 [IDJ, WPZ, SUR] IDJ WPZ SUR NaN {} IDJ|WPZ|SUR
2 UCFURKIRODCQ 12 [UCF, URK, IRO, DCQ] UCF URK IRO DCQ {URKIFJVUR, QAZWREDCQIBR} UCF|URK|IRO|DCQ
3 ORI 3 [ORI] ORI NaN NaN NaN {ORIUWJZSRFRO} ORI
4 PROIRKIQARTIBPO 15 [PRO, IRK, IQA, RTI, BPO] PRO IRK IQA RTI {} PRO|IRK|IQA|RTI|BPO
5 QAZWREDCQIBR 12 [QAZ, WRE, DCQ, IBR] QAZ WRE DCQ IBR {UCFURKIRODCQ} QAZ|WRE|DCQ|IBR
6 PLPRUFSWURKI 12 [PLP, RUF, SWU, RKI] PLP RUF SWU RKI {RUFOFR} PLP|RUF|SWU|RKI
7 FROIEUSKIKIR 12 [FRO, IEU, SKI, KIR] FRO IEU SKI KIR {PIMIEU, FROKDUWJU, ORIUWJZSRFRO, IEU} FRO|IEU|SKI|KIR
8 ORIUWJZSRFRO 12 [ORI, UWJ, ZSR, FRO] ORI UWJ ZSR FRO {FROKDUWJU, FROIEUSKIKIR, ORI} ORI|UWJ|ZSR|FRO
9 URKIFJVUR 9 [URK, IFJ, VUR] URK IFJ VUR NaN {UCFURKIRODCQ} URK|IFJ|VUR
10 RUFOFR 6 [RUF, OFR] RUF OFR NaN NaN {PLPRUFSWURKI} RUF|OFR
11 IEU 3 [IEU] IEU NaN NaN NaN {PIMIEU, FROIEUSKIKIR} IEU
12 PIMIEU 6 [PIM, IEU] PIM IEU NaN NaN {FROIEUSKIKIR, IEU} PIM|IEU
推荐阅读
- javascript - 根据兄弟的宽度动态设置 CSS border-bottom-left-radius
- python - 创建嵌套列表并使用 for 循环内联将它们配对
- java - 服务:亚马逊 S3;状态码:404;错误代码:NoSuchBucket;从 IBM Cloud Object Storage 获取文件时遇到问题
- json - 使用 PyMongo 从 MongoDB 的集合中检索许多文档时如何避免 JSON 序列化错误?
- regex - 如何用 square(var) 查找和替换所有形式的 pow(var,2)?
- mysql - MySQL 右连接的计数速度要快一个数量级
- android - onFocus 和 onBlur 没有在 Android 电视上被调用
- python - 尝试在 setup.py 中指定的同一包中安装 Python 扩展模块(Cython)时发现 ModuleNotFound
- python - 在 python 中打印钻石
- javascript - JavaScript 用 HTML 实体 ejs 替换撇号