python - 根据列表中的值拆分字符串列并将其添加到 Pandas DataFrame 中的另一列
问题描述
我有熊猫dataframe
import pandas as pd
data = {"Column1": ["258 E SONORA ST SAN",
"57474 SAXONY WAY APT 223 WESLEY",
"62748 CALIFORNIA ST APT 2 SAN",
"3211 LONGLAKE DR FERN",
"420 S PLYMOUTH CT APT 265",
"AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR LONG",
"224-22 141 STREET RICHMOND",
"15624 274TH ST CAMBRIA",
"778 SANTO DOMINGO AVE SW PALM",
"261 BROADMOOR DR SOUTH SIOUX"],
"Colum2" : ["BERNARDINO", "CHAPEL", "FRANCISCO", "CREEK", "CHICAGO", "VALLEY", "HILL", "HEIGHTS", "BAY", "CITY"]}
df = pd.DataFrame(data)
df
输出
Column1 Colum2
0 258 E SONORA ST SAN BERNARDINO
1 57474 SAXONY WAY APT 223 WESLEY CHAPEL
2 62748 CALIFORNIA ST APT 2 SAN FRANCISCO
3 3211 LONGLAKE DR FERN CREEK
4 420 S PLYMOUTH CT APT 265 CHICAGO
5 AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR LONG VALLEY
6 224-22 141 STREET RICHMOND HILL
7 15624 274TH ST CAMBRIA HEIGHTS
8 778 SANTO DOMINGO AVE SW PALM BAY
9 261 BROADMOOR DR SOUTH SIOUX CITY
我有一个list
值,我需要在其中拆分字符串column1
split_city = ["ST","DR", "STREET", "AVE SW"]
我还想包括一个拆分后APT
和数字字符。
如何根据列表中的值拆分一列列并将它们添加到 Pandas DataFrame 中的另一列?
所需输出
Column1 Colum2
0 258 E SONORA ST SAN BERNARDINO
1 57474 SAXONY WAY APT 223 WESLEY CHAPEL
2 62748 CALIFORNIA ST APT 2 SAN FRANCISCO
3 3211 LONGLAKE DR FERN CREEK
4 420 S PLYMOUTH CT APT 265 CHICAGO
5 AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR LONG VALLEY
6 224-22 141 STREET RICHMOND HILL
7 15624 274TH ST CAMBRIA HEIGHTS
8 778 SANTO DOMINGO AVE SW PALM BAY
9 261 BROADMOOR DR SOUTH SIOUX CITY
解决方案
我不知道 Pandas 中是否有一个很好的方法来做到这一点,但是因为这里的边缘情况如此之多,组合地址然后使用正则表达式而不是尝试拆分你的拆分选择会更容易(虽然也包括公寓):
解决方案
import re
pattern = re.compile(r"([-\d\w ]*)\s(ST|WAY|DR|STREET|AVE|N|S|E|W|SW|SE|NW|NE|APT \d*)\s([\w ]*)")
column1 = ["258 E SONORA ST SAN",
"57474 SAXONY WAY APT 223 WESLEY",
"62748 CALIFORNIA ST APT 2 SAN",
"3211 LONGLAKE DR FERN",
"420 S PLYMOUTH CT APT 265",
"AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR LONG",
"224-22 141 STREET RICHMOND",
"15624 274TH ST CAMBRIA",
"778 SANTO DOMINGO AVE SW PALM",
"261 BROADMOOR DR SOUTH SIOUX"]
column2 = ["BERNARDINO", "CHAPEL", "FRANCISCO", "CREEK", "CHICAGO", "VALLEY", "HILL", "HEIGHTS", "BAY", "CITY"]
combined = [" ".join(t) for t in zip(column1, column2)]
streets = []
cities = []
for t in (pattern.findall(s) for s in combined):
*street, city = t[0]
streets.append(" ".join(street))
cities.append(city)
df = pd.DataFrame({"street": streets, "city": cities})
输出:
In [10]: pd.DataFrame({"street": streets, "city": cities})
Out[10]:
street city
0 258 E SONORA ST SAN BERNARDINO
1 57474 SAXONY WAY APT 223 WESLEY CHAPEL
2 62748 CALIFORNIA ST APT 2 SAN FRANCISCO
3 3211 LONGLAKE DR FERN CREEK
4 420 S PLYMOUTH CT APT 265 CHICAGO
5 AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR LONG VALLEY
6 224-22 141 STREET RICHMOND HILL
7 15624 274TH ST CAMBRIA HEIGHTS
8 778 SANTO DOMINGO AVE SW PALM BAY
9 261 BROADMOOR DR SOUTH SIOUX CITY
推荐阅读
- android - 向上滑动时,BottomSheetDialogFragment 会删除角落,即使它没有覆盖全屏
- swift - 在应用设置暗/亮模式下,不改变时间、日期、通知的字体颜色
- pandas - 合并阶段 Dataframe.apply() 的问题
- php - 如何循环遍历自定义帖子类型的自定义元框值
- java - 如何汇总 Map 对象中的数字属性?
- ios - Swift Combine 中 combineLatest 的线程安全
- javascript - 如果在微调器运行时单击另一个组件的链接,如何避免错误?
- wso2 - 无法从管理控制台添加/更改 WSO2 5.10 中的用户角色
- android - 有没有办法让用户注册免费试用而无需自动续订(在 App/Google Play 商店中)
- spring-boot - 使用 java 11 设置 HTTPS 以在 Spring Boot 中启用 HTTP/2 的问题,