首页 > 解决方案 > 提取列值并将其分配回 DF (Pandas) 的更惯用方式

问题描述

我有一个运行良好的现有例程。但由于我是 python 新手,我发现我的代码很难看,我正在寻找改进它的方法。

我的程序是这样的,我创建了一个类,它需要有一个完整的地址字符串,我需要处理它。因此,该类有 4 个属性,即地址、州、城市和邮政编码。

这是所说的类:

class Address:

    def __init__(self, fulladdress):
        self.fulladdress = fulladdress.split(",")
        self.address = self.get_address()
        self.city = self.get_city()
        stateandzip = str(self.fulladdress[-1]).strip()
        self.statezip = stateandzip.split(" ")
        self.state = self.get_state()
        self.zipcode = self.get_zipcode()

    def get_address(self):
        len_address = len(self.fulladdress)
        if len_address == 3:
            return self.fulladdress[0].strip()
        elif len_address == 4:
            return self.fulladdress[0].strip() + ", " + self.fulladdress[1].strip()
        elif len_address > 5:
            temp_address = self.fulladdress[0]
            for ad in self.fulladdress[0:-3]:
                temp_address = temp_address + ", " + ad.strip()
            return temp_address
        else:
            return ''

    def get_city(self):
        if len(self.fulladdress) > 0:
            address = self.fulladdress[-2]
            return address.strip()
        else:
            return ''

    def get_state(self):
        if len(self.fulladdress) > 0:
            return self.statezip[0]
        else:
            return ''

    def get_zipcode(self):
        if len(self.fulladdress) > 0:
            return self.statezip[1]
        else:
            return ''

现在我现有的例程需要将此结果附加到基于地址列的数据框中。我解析地址数据的方法是使用 df.iterrows() 因为我不知道如何使用 df.apply 方法使用地址类。

这是例程:

import pandas as pd
import datahelper as dh
import address as ad

# Find the header name of the Address column
address_header = dh.findcolumn('Address', list(df.columns))
header_loc = df.columns.get_loc(address_header)

address = []
city = []
state = []
zipcode = []

for index, row in df.iterrows():
    if not row[address_header]:
       address.append('')
       city.append('')
       state.append('')
       zipcode.append('')
       continue

       # extract details from the address
       address_data = ad.Address(row[address_header])

       address.append(address_data.address)
       city.append(address_data.city)
       state.append(address_data.state)
       zipcode.append(address_data.zipcode)

df[address_header] = address
df.insert(header_loc + 1, 'City', city)
df.insert(header_loc + 2, 'State', state)
df.insert(header_loc + 3, 'Zip Code', zipcode)

如果有人能指出我正确的方向,我将不胜感激。先感谢您。

顺便说一句,dh 是一个 datahelper 模块,我把所有的辅助函数都放在这里。

def findcolumn(searchstring, list):
    if searchstring in list:
        return searchstring
    else:
        try:
            return [i for i in list if searchstring in i][0]
        except ValueError:
            return None
        except IndexError:
            return None

鉴于地址列中的示例数据,这是我想要的输出。

df = pd.DataFrame({'Address': ['Rubin Center Dr Ste, Fort Mill, SC 29708', 'Miami, FL 33169']})

输出应该是:

Address             | City    | State | Zip Code
--------------------------------------------------
Rubin Center Dr Ste |Fort Mill|  SC   |29708
--------------------------------------------------
                    |Miami    | FL    |33169

标签: pythonpandasdataframe

解决方案


推荐阅读