首页 > 解决方案 > 根据列表中的值拆分字符串列并将其添加到 Pandas DataFrame 中的另一列

问题描述

我有熊猫dataframe

import pandas as pd
data = {"Column1": ["258 E SONORA ST SAN",
                    "57474 SAXONY WAY APT 223 WESLEY",  
                    "62748 CALIFORNIA ST APT 2 SAN",    
                    "3211 LONGLAKE DR FERN",    
                    "420 S PLYMOUTH CT APT 265",    
                    "AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR LONG", 
                    "224-22 141 STREET RICHMOND",   
                    "15624 274TH ST CAMBRIA",   
                    "778 SANTO DOMINGO AVE SW PALM",    
                    "261 BROADMOOR DR SOUTH SIOUX"],    

        "Colum2" : ["BERNARDINO", "CHAPEL", "FRANCISCO", "CREEK", "CHICAGO", "VALLEY", "HILL", "HEIGHTS", "BAY", "CITY"]}

df = pd.DataFrame(data)

df

输出


                  Column1                                 Colum2
0   258 E SONORA ST SAN                                 BERNARDINO
1   57474 SAXONY WAY APT 223 WESLEY                     CHAPEL
2   62748 CALIFORNIA ST APT 2 SAN                       FRANCISCO
3   3211 LONGLAKE DR FERN                               CREEK
4   420 S PLYMOUTH CT APT 265                           CHICAGO
5   AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR LONG    VALLEY
6   224-22 141 STREET RICHMOND                          HILL
7   15624 274TH ST CAMBRIA                              HEIGHTS
8   778 SANTO DOMINGO AVE SW PALM                       BAY
9   261 BROADMOOR DR SOUTH SIOUX                        CITY

我有一个list值,我需要在其中拆分字符串column1

split_city = ["ST","DR", "STREET", "AVE SW"]

我还想包括一个拆分后APT和数字字符。

如何根据列表中的值拆分一列列并将它们添加到 Pandas DataFrame 中的另一列?

所需输出


                  Column1                                 Colum2
0   258 E SONORA ST                                     SAN BERNARDINO
1   57474 SAXONY WAY APT 223                            WESLEY CHAPEL
2   62748 CALIFORNIA ST APT 2                           SAN FRANCISCO
3   3211 LONGLAKE DR                                    FERN CREEK
4   420 S PLYMOUTH CT APT 265                           CHICAGO
5   AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR         LONG VALLEY
6   224-22 141 STREET                                   RICHMOND HILL
7   15624 274TH ST                                      CAMBRIA HEIGHTS
8   778 SANTO DOMINGO AVE SW                            PALM BAY
9   261 BROADMOOR DR                                    SOUTH SIOUX CITY

标签: pythonpandasdataframe

解决方案


我不知道 Pandas 中是否有一个很好的方法来做到这一点,但是因为这里的边缘情况如此之多,组合地址然后使用正则表达式而不是尝试拆分你的拆分选择会更容易(虽然也包括公寓):

解决方案

import re


pattern = re.compile(r"([-\d\w ]*)\s(ST|WAY|DR|STREET|AVE|N|S|E|W|SW|SE|NW|NE|APT \d*)\s([\w ]*)")


column1 = ["258 E SONORA ST SAN",
           "57474 SAXONY WAY APT 223 WESLEY",
           "62748 CALIFORNIA ST APT 2 SAN",
           "3211 LONGLAKE DR FERN",
           "420 S PLYMOUTH CT APT 265",
           "AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR LONG",
           "224-22 141 STREET RICHMOND",
           "15624 274TH ST CAMBRIA",
           "778 SANTO DOMINGO AVE SW PALM",
           "261 BROADMOOR DR SOUTH SIOUX"]


column2 = ["BERNARDINO", "CHAPEL", "FRANCISCO", "CREEK", "CHICAGO", "VALLEY", "HILL", "HEIGHTS", "BAY", "CITY"]


combined = [" ".join(t) for t in zip(column1, column2)]
streets = []
cities = []
for t in (pattern.findall(s) for s in combined):
    *street, city = t[0]
    streets.append(" ".join(street))
    cities.append(city)


df = pd.DataFrame({"street": streets, "city": cities})

输出:

In [10]: pd.DataFrame({"street": streets, "city": cities})
Out[10]:
                                        street              city
0                              258 E SONORA ST    SAN BERNARDINO
1                     57474 SAXONY WAY APT 223     WESLEY CHAPEL
2                    62748 CALIFORNIA ST APT 2     SAN FRANCISCO
3                             3211 LONGLAKE DR        FERN CREEK
4                    420 S PLYMOUTH CT APT 265           CHICAGO
5  AHLONA L LABARRE POA -L 274 NESTLINGWOOD DR       LONG VALLEY
6                            224-22 141 STREET     RICHMOND HILL
7                               15624 274TH ST   CAMBRIA HEIGHTS
8                     778 SANTO DOMINGO AVE SW          PALM BAY
9                             261 BROADMOOR DR  SOUTH SIOUX CITY

推荐阅读