pandas - pandas DataFrame:在 2 列数据框中提取数据
问题描述
我有一个 2 列 df,其结构冗余但不规则(“名称”、“代码”和“w”与“代码”相关联)我想提取。这里是 DF:
import pandas as pd
pd.DataFrame([('name','john'),
('date','NaN'),
('curr','NaN'),
('code','w'),
('123',0.4),
('456',0.5),
('789','0.1'),
('name','Elsa'),
('date','NaN'),
('curr','NaN'),
('code','w'),
('112',0.3),
('243',0.3),
('789','0.3'),
('351','0.1')
])
我想提取这个:
name code w
john 123 0.4
john 456 0.5
john 789 0.1
elsa 112 0.3
elsa 243 0.3
elsa 789 0.3
elsa 351 0.1
我怎样才能做到这一点 ?谢谢你
解决方案
采用:
#filter rows by name
df[3] = df.loc[df[0] == 'name', 1]
#forward filling missing values
df[3] = df[3].ffill()
#filter out rows by 0 column and change order of columns [3,0,1]
df = df.loc[~df[0].isin(['name','date', 'curr', 'code']), [3, 0, 1]]
#set columns names
df.columns= ['name','code','w']
print (df)
name code w
4 john 123 0.4
5 john 456 0.5
6 john 789 0.1
11 Elsa 112 0.3
12 Elsa 243 0.3
13 Elsa 789 0.3
14 Elsa 351 0.1