首页 > 解决方案 > Pandas - 用虚拟值替换列中的文本

问题描述

我有一个数据框,其中包含一些客户信息,如下所示:

Customer Name, Purchase Date
Kevin, 2020-01-10
Scott, 2020-02-01
Mark, 2020-04-01
Peter, 2020-06-12

我想用“ Customer 1”、“ Customer 2”等虚拟值替换“客户名称”列。预期输出:

Customer Name, Purchase Date
Customer 1, 2020-01-10
Customer 2, 2020-02-01
Customer 3, 2020-04-01
Customer 4, 2020-06-12

我希望这是基于 DataFrame 形状

标签: pandas

解决方案


如果所有值都是唯一的,则使用index转换为字符串的值:

df['Customer Name'] = 'Customer ' + (df.index + 1).astype(str)
print (df)
  Customer Name Purchase Date
0    Customer 1    2020-01-10
1    Customer 2    2020-02-01
2    Customer 3    2020-04-01
3    Customer 4    2020-06-12

如果需要转换列的唯一值,请Customer Name使用factorize

s = pd.Series((pd.factorize(df['Customer Name'])[0] + 1), index=df.index).astype(str)
df['Customer Name'] = 'Customer ' + s
print (df)
  Customer Name Purchase Date
0    Customer 1    2020-01-10
1    Customer 2    2020-02-01
2    Customer 3    2020-04-01
3    Customer 4    2020-06-12

重复值中可能存在差异:

print (df)
  Customer Name Purchase Date
0          Mark    2020-01-10
1         Scott    2020-02-01
2          Mark    2020-04-01
3         Peter    2020-06-12

df['Customer Name1'] = 'Customer ' + (df.index + 1).astype(str)
s = pd.Series((pd.factorize(df['Customer Name'])[0] + 1), index=df.index).astype(str)
df['Customer Name2'] = 'Customer ' + s
print (df)
  Customer Name Purchase Date Customer Name1 Customer Name2
0          Mark    2020-01-10     Customer 1     Customer 1
1         Scott    2020-02-01     Customer 2     Customer 2
2          Mark    2020-04-01     Customer 3     Customer 1
3         Peter    2020-06-12     Customer 4     Customer 3

推荐阅读