首页 > 解决方案 > 应用操作以连接数据帧重新调整中的某些行无

问题描述

我有一些地址要清理。

您可以看到在 column 中address1,我们有一些只是数字的条目,它们应该是数字和街道名称,如前三行。

df = pd.DataFrame({'address1':['15 Main Street','10 High Street','5 Other Street',np.nan,'15','12'],
                  'address2':['New York','LA','London','Tokyo','Grove Street','Garden Street']})

print(df)

         address1       address2
0  15 Main Street       New York
1  10 High Street             LA
2  5 Other Street         London
3             NaN          Tokyo
4              15   Grove Street
5              12  Garden Street

我正在尝试创建一个函数来检查是否address1是一个数字,如果是,则 concataddress1和街道名称 from address2,然后 delete address2

我的预期输出是这样的。我们可以看到索引 4 和 5 现在有完整的address1条目:

           address1  address2
0    15 Main Street  New York
1    10 High Street        LA
2    5 Other Street    London
3               NaN     Tokyo
4   15 Grove Street       NaN <---
5  12 Garden Street       NaN <---

我对 .apply() 函数的尝试:

def f(x):

    try:
        #if address1 is int
        if isinstance(int(x['address1']), int):

            # create new address using address1 + address 2
            newaddress = str(x['address1']) +' '+ str(x['address2'])

            # delete address2
            x['address2'] = np.nan

            # return newaddress to address1 column
            return newadress

    except:
        pass

应用功能:

df['address1'] = df.apply(f,axis=1)

但是,该列address1现在是 all None

我已经尝试了此功能的一些变体,但无法使其正常工作。不胜感激。

标签: pythonpandasapply

解决方案


您可以创建一个掩码并更新:

mask = pd.to_numeric(df.address1, errors='coerce').notna()
df.loc[mask, 'address1'] = df.loc[mask, 'address1'] + ' ' +df.loc[mask,'address2']
df.loc[mask, 'address2'] = np.nan

输出:

           address1  address2
0    15 Main Street  New York
1    10 High Street        LA
2    5 Other Street    London
3               NaN     Tokyo
4   15 Grove Street       NaN
5  12 Garden Street       NaN

推荐阅读