首页 > 解决方案 > 如何用某些字符python数据框替换列的开头和结尾

问题描述

我有一个看起来像这样的数据框:

 clients_x                 clients_y              coords_x               coords_y 
7110001002                7100019838    -23.63013,-46.704887  -23.657433,-46.744095   
7110001002                7100021875    -23.63013,-46.704887    -23.7729,-46.591366   
7110001002                0700245857    -23.63013,-46.704887      -23.7074,-46.5698 
[7110052941, 7110107795]  7100019838        -23.609,-46.6974  -23.657433,-46.744095
[7110052941, 7110107795]  7100021875        -23.609,-46.6974    -23.7729,-46.591366
[7110052941, 7110107795]  0700245857        -23.609,-46.6974       -23.7074,-46.569

我想要做的是让clients_x列中的所有值都以“[]”开头和结尾。因此,我的预期输出是这个:

 clients_x                 clients_y              coords_x               coords_y 
[7110001002]                7100019838    -23.63013,-46.704887  -23.657433,-46.744095   
[7110001002]                7100021875    -23.63013,-46.704887    -23.7729,-46.591366   
[7110001002]                0700245857    -23.63013,-46.704887      -23.7074,-46.5698 
[7110052941, 7110107795]  7100019838        -23.609,-46.6974  -23.657433,-46.744095
[7110052941, 7110107795]  7100021875        -23.609,-46.6974    -23.7729,-46.591366
[7110052941, 7110107795]  0700245857        -23.609,-46.6974       -23.7074,-46.569

为此,我首先尝试做这样的事情:

df["clients_x"] = "[" + "df["clients_x"]" + "]"

但是,这样做实际上会在每个值的开头和结尾添加“[]”,但是对于那些已经有“[]”的行,它们会重复它们。输出是这个:

 clients_x                 clients_y              coords_x               coords_y 
[7110001002]                7100019838    -23.63013,-46.704887  -23.657433,-46.744095   
[7110001002]                7100021875    -23.63013,-46.704887    -23.7729,-46.591366   
[7110001002]                0700245857    -23.63013,-46.704887      -23.7074,-46.5698 
[[7110052941, 7110107795]]  7100019838        -23.609,-46.6974  -23.657433,-46.744095
[[7110052941, 7110107795]]  7100021875        -23.609,-46.6974    -23.7729,-46.591366
[[7110052941, 7110107795]]  0700245857        -23.609,-46.6974       -23.7074,-46.569

为了避免这个问题,我尝试使用以下代码,基本上我想在以clients_x数字开头的列中每个值的开头和结尾添加“[]”。

df['clients_x'] = df['clients_x'].mask(df['clients_x'].astype(str).str.startswith(r'^\d'), f'[{df.clients_x}]')

但是,这行代码生成的输出与我的原始数据帧相同。如果有人对如何解决此问题有任何想法,我将非常感谢您的帮助。

标签: pythonregexpandasreplace

解决方案


使用np.where -

df['clients_x'] = np.where(df['clients_x'].str.startswith('['), df['clients_x'], '[' + df['clients_x'] + ']')

使用df.where -

df['clients_x'].where(df['clients_x'].str.startswith('['), '[' + df['clients_x'] + ']')

输出

0               [7110001002]
1               [7110001002]
2               [7110001002]
3    [7110052941,7110107795]
4    [7110052941,7110107795]
5    [7110052941,7110107795]
Name: clients_x, dtype: object

推荐阅读