python - 将 pandas 列中的值替换为缺失键的默认值
问题描述
我有多个简单的功能需要在我的数据框的某些列的每一行上实现。数据框非常像,超过 1000 万行。我的数据框是这样的:
Date location city number value
12/3/2018 NY New York 2 500
12/1/2018 MN Minneapolis 3 600
12/2/2018 NY Rochester 1 800
12/3/2018 WA Seattle 2 400
我有这样的功能:
def normalized_location(row):
if row['city'] == " Minneapolis":
return "FCM"
elif row['city'] == "Seattle":
return "FCS"
else:
return "Other"
然后我使用:
df['Normalized Location'] =df.apply (lambda row: normalized_location (row),axis=1)
这非常慢,我怎样才能提高效率?
解决方案
我们可以map
使用defaultdict
.
from collections import defaultdict
d = defaultdict(lambda: 'Other')
d.update({"Minneapolis": "FCM", "Seattle": "FCS"})
df['normalized_location'] = df['city'].map(d)
print(df)
Date location city number value normalized_location
0 12/3/2018 NY New York 2 500 Other
1 12/1/2018 MN Minneapolis 3 600 FCM
2 12/2/2018 NY Rochester 1 800 Other
3 12/3/2018 WA Seattle 2 400 FCS
...出于性能原因绕过fillna
呼叫。这种方法很容易推广到多个替换。
推荐阅读
- c# - 如何在接口中模拟接口的只读属性已被模拟
- excel - VBA重复搜索分隔列并将值打印到新行
- laravel - Laravel Chart: xAxes doesn't start from zero
- javascript - Javascript and Mysql save api data
- c++ - clang-tidy ignoring preprocessor directives
- bash - problems with bash script array aways giving same error
- python - How to convert 4D image array to 3D image array
- python - How to open file in another folder in Python?
- python - code to figure out lowercase vowels, python question
- python - Tensorflow 序列扩展的动态 __len__?