首页 > 解决方案 > 针对整个数据框出现的列名

问题描述

我正在尝试将列名添加到没有标题的数据框中。

数据框

1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00
2,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00
3,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00

尝试添加列名:

col_names=['Id','RI','Na','Mg','Al','Si','K','Ca','Ba','Fe','Glass Type']
uci=pd.read_csv('UCI.csv', delimiter=',',header=None, names=col_names)

但第一个列名出现在整个数据框中,其余列名有 NaN

输出/输出:

Id  RI  Na  Mg  Al  Si  K   Ca  Ba  Fe  Glass Type
0   1,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1   2,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

标签: pandas

解决方案


NaN只得到最后一列的 s ,因为名称列表中有更多值:

import pandas as pd

temp=u"""1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00
2,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00
3,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'

col_names=['Id','RI','Na','Mg','Al','Si','K','Ca','Ba','Fe','Glass Type']
df = pd.read_csv(pd.compat.StringIO(temp), names=col_names)

print (df)
        Id        RI     Na    Mg     Al     Si     K    Ca   Ba  Fe  \
0  1.52101  13.64000   4.49  1.10  71.78   0.06  8.75  0.00  NaN NaN   
1  2.00000   1.51761  13.89  3.60   1.36  72.73  0.48  7.83  0.0 NaN   
2  3.00000   1.51618  13.53  3.55   1.54  72.99  0.39  7.78  0.0 NaN   

   Glass Type  
0         NaN  
1         NaN  
2         NaN  

但似乎你的数据不同,有尾随",所以有必要添加参数quoting

temp=u'''"1,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00"
"2,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00"
"3,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00"'''
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'

col_names=['Id','RI','Na','Mg','Al','Si','K','Ca','Ba','Fe','Glass Type']
df = pd.read_csv(pd.compat.StringIO(temp), names=col_names, quoting=3)


print (df)
   Id       RI     Na    Mg    Al     Si     K    Ca     Ba  Fe  Glass Type
0  "1  1.52101  13.64  4.49  1.10  71.78  0.06  8.75  0.00" NaN         NaN
1  "2  1.51761  13.89  3.60  1.36  72.73  0.48  7.83  0.00" NaN         NaN
2  "3  1.51618  13.53  3.55  1.54  72.99  0.39  7.78  0.00" NaN         NaN

#last manually remove traling "
df['Id']  = df['Id'].str.strip('"')
df['Ba']  = df['Ba'].str.strip('"').astype(float)
print (df)
  Id       RI     Na    Mg    Al     Si     K    Ca    Ba  Fe  Glass Type
0  1  1.52101  13.64  4.49  1.10  71.78  0.06  8.75  0.00 NaN         NaN
1  2  1.51761  13.89  3.60  1.36  72.73  0.48  7.83  0.00 NaN         NaN
2  3  1.51618  13.53  3.55  1.54  72.99  0.39  7.78  0.00 NaN         NaN

验证问题:

col_names=['Id','RI','Na','Mg','Al','Si','K','Ca','Ba','Fe','Glass Type']
print (pd.read_csv(pd.compat.StringIO(temp), names=col_names))
                                               Id  RI  Na  Mg  Al  Si   K  Ca  \
0  1,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00 NaN NaN NaN NaN NaN NaN NaN   
1  2,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00 NaN NaN NaN NaN NaN NaN NaN   
2  3,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00 NaN NaN NaN NaN NaN NaN NaN   

   Ba  Fe  Glass Type  
0 NaN NaN         NaN  
1 NaN NaN         NaN  
2 NaN NaN         NaN  

推荐阅读