python - 拆分(分解)熊猫数据框字符串条目以分隔行。多列
问题描述
我有一个看起来像这样的数据框:
我需要替换“欧洲联盟”并将其拆分(分解)为它的成员国家,如下例所示:
我试图用包含其成员的字典替换“欧盟”,然后用以下代码行将其拆分:
test_disc['countryname'] = test_disc['countryname'].replace({'European Union': 'Austria, Belgium, Bulgaria, Croatia, Cyprus, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland,Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands,Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden'})
test_disc[['iso_2', 'iso_3', 'countryname', 'país afetado','year',
'SPS emergenciais', 'SPS regulares']].astype(str).apply(lambda x:
x.str.split(',').explode()).reset_index()
但是,我收到以下错误:“ValueError: cannot reindex from a duplicate axis”
解决方案
使用时explode
,您应该只将目标列转换为列表内容,而不是所有列。
演示数据
data = [{'iso_2': 0, 'iso_3': 'NaN', 'countryname': 'JP', 'país afetado': 'US', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}, {'iso_2': 1, 'iso_3': 'NaN', 'countryname': 'European Union', 'país afetado': 'China', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}, {'iso_2': 2, 'iso_3': 'NaN', 'countryname': 'US', 'país afetado': 'European Union', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}, {'iso_2': 3, 'iso_3': 'NaN', 'countryname': 'European Union', 'país afetado': 'European Union', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}]
df = pd.DataFrame(data)
df
iso_2 iso_3 countryname país afetado year SPS emergenciais \
0 0 NaN JP US 2015 0
1 1 NaN European Union China 2015 0
2 2 NaN US European Union 2015 0
3 3 NaN European Union European Union 2015 0
SPS regulares
0 0
1 0
2 0
3 0
过程:
for col in ['país afetado', 'countryname']:
df[col] = df[col].replace({'European Union': 'Austria, Belgium, Netherlands,Poland'})
df[col] = df[col].str.split(',\s*')
df_result = df.explode('countryname').explode('país afetado')
结果:
iso_2 iso_3 countryname país afetado year SPS emergenciais
0 0 NaN JP US 2015 0
1 1 NaN Austria China 2015 0
1 1 NaN Belgium China 2015 0
1 1 NaN Netherlands China 2015 0
1 1 NaN Poland China 2015 0
2 2 NaN US Austria 2015 0
2 2 NaN US Belgium 2015 0
2 2 NaN US Netherlands 2015 0
2 2 NaN US Poland 2015 0
3 3 NaN Austria Austria 2015 0
3 3 NaN Austria Belgium 2015 0
3 3 NaN Austria Netherlands 2015 0
3 3 NaN Austria Poland 2015 0
3 3 NaN Belgium Austria 2015 0
3 3 NaN Belgium Belgium 2015 0
3 3 NaN Belgium Netherlands 2015 0
3 3 NaN Belgium Poland 2015 0
3 3 NaN Netherlands Austria 2015 0
3 3 NaN Netherlands Belgium 2015 0
3 3 NaN Netherlands Netherlands 2015 0
3 3 NaN Netherlands Poland 2015 0
3 3 NaN Poland Austria 2015 0
3 3 NaN Poland Belgium 2015 0
3 3 NaN Poland Netherlands 2015 0
3 3 NaN Poland Poland 2015 0
推荐阅读
- python - 如何在 Python 3.6 中创建自动换行程序
- android - App Inventor 2 中日期时间模式的非法参数
- python - 允许用户将大型 csv 文件(> 5GB)从浏览器上传到 python 烧瓶服务器的最佳方法是什么?
- angularjs - 如何在锚标记中使用 ng-bind-html 值
- python - 获取列表元素作为 csv 列值
- javascript - 如何以角度将对象的现有属性复制到另一个对象?
- sql - 在 SQL Server 中将特定列的值显示为标题
- django - 未找到“filer_folder_changelist”的反向
- java - 自定义 AppCompatEditText 组件未初始化
- r - GGPLOT2:错误:出现意外符号:“ geom_errorbar(aes(ymin=Water_Pore-sd, ymax=Water_Pore+sd))