首页 > 解决方案 > 在 Pandas 中将标头添加到 .data 文件

问题描述

给定一个扩展名为 的文件.data,我已阅读它pd.read_fwf("./input.data", sep=",", header = None)

出去:

    0
0   63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3...
1   67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5...
2   67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6...
3   37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5...
4   41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4...
... ...
292 57.0,0.0,4.0,140.0,241.0,0.0,0.0,123.0,1.0,0.2...
293 45.0,1.0,1.0,110.0,264.0,0.0,0.0,132.0,0.0,1.2...
294 68.0,1.0,4.0,144.0,193.0,1.0,0.0,141.0,0.0,3.4...
295 57.0,1.0,4.0,130.0,131.0,0.0,0.0,115.0,1.0,1.2...
296 57.0,0.0,2.0,130.0,236.0,0.0,2.0,174.0,0.0,0.0...

如何向其中添加以下列名称?谢谢。

col_names = ["age", "sex", "cp", "restbp", "chol", "fbs", "restecg", 
           "thalach", "exang", "oldpeak", "slope", "ca", "thal", "num"]

更新:

pd.read_fwf("./input.data", names = col_names)

出去:

    age sex cp  restbp  chol    fbs restecg thalach exang   oldpeak slope   ca  thal    num
0   63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1   67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2   67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3   37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4   41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
292 57.0,0.0,4.0,140.0,241.0,0.0,0.0,123.0,1.0,0.2...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
293 45.0,1.0,1.0,110.0,264.0,0.0,0.0,132.0,0.0,1.2...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
294 68.0,1.0,4.0,144.0,193.0,1.0,0.0,141.0,0.0,3.4...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
295 57.0,1.0,4.0,130.0,131.0,0.0,0.0,115.0,1.0,1.2...   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
296 57.0,0.0,2.0,130.0,236.0,0.0,2.0,174.0,0.0,0.0...   NaN NaN NaN NaN NaN NaN

标签: pandasread-fwf

解决方案


如果检查read_fwf

将固定宽度格式行的表格读入 DataFrame。

所以如果有分隔符,使用read_csv

col_names = ["age", "sex", "cp", "restbp", "chol", "fbs", "restecg", 
           "thalach", "exang", "oldpeak", "slope", "ca", "thal", "num"]

df = pd.read_csv("input.data", names=col_names)
print (df)

      age  sex   cp  restbp   chol  fbs  restecg  thalach  exang  oldpeak  \
0    63.0  1.0  1.0   145.0  233.0  1.0      2.0    150.0    0.0      2.3   
1    67.0  1.0  4.0   160.0  286.0  0.0      2.0    108.0    1.0      1.5   
2    67.0  1.0  4.0   120.0  229.0  0.0      2.0    129.0    1.0      2.6   
3    37.0  1.0  3.0   130.0  250.0  0.0      0.0    187.0    0.0      3.5   
4    41.0  0.0  2.0   130.0  204.0  0.0      2.0    172.0    0.0      1.4   
..    ...  ...  ...     ...    ...  ...      ...      ...    ...      ...   
292  57.0  0.0  4.0   140.0  241.0  0.0      0.0    123.0    1.0      0.2   
293  45.0  1.0  1.0   110.0  264.0  0.0      0.0    132.0    0.0      1.2   
294  68.0  1.0  4.0   144.0  193.0  1.0      0.0    141.0    0.0      3.4   
295  57.0  1.0  4.0   130.0  131.0  0.0      0.0    115.0    1.0      1.2   
296  57.0  0.0  2.0   130.0  236.0  0.0      2.0    174.0    0.0      0.0   

     slope   ca  thal  num  
0      3.0  0.0   6.0    0  
1      2.0  3.0   3.0    1  
2      2.0  2.0   7.0    1  
3      3.0  0.0   3.0    0  
4      1.0  0.0   3.0    0  
..     ...  ...   ...  ...  
292    2.0  0.0   7.0    1  
293    2.0  0.0   7.0    1  
294    2.0  2.0   7.0    1  
295    2.0  1.0   7.0    1  
296    2.0  1.0   3.0    1  

[297 rows x 14 columns]

推荐阅读