首页 > 解决方案 > 在熊猫python中添加列

问题描述

在数据框中添加列后,我没有得到正确的输出文件。这是我的输入文件

   Security Wise Delivery Position - Compulsory Rolling Settlement
   10,MTO,01022018,592287763,0001583
   Trade Date <01-FEB-2018>,Settlement Type <N>,Settlement No <2018023>,Settlement Date <05-FEB-2018>
   Record Type,Sr No,Name of Security,Quantity Traded,Deliverable Quantity(gross across client level),% of Deliverable Quantity to Traded Quantity
   20,1,20MICRONS,EQ,53466,27284,51.03
   20,2,3IINFOTECH,EQ,7116046,3351489,47.10
   20,3,3MINDIA,EQ,2613,1826,69.88
   20,4,5PAISA,EQ,8463,5230,61.80
   20,5,63MOONS,EQ,324922,131478,40.46

期望输出

 20,1,20MICRONS,EQ,53466,27284,51.03,01022018
 20,2,3IINFOTECH,EQ,7116046,3351489,47.10,01022018
 20,3,3MINDIA,EQ,2613,1826,69.88,01022018
 20,4,5PAISA,EQ,8463,5230,61.80,01022018
 20,5,63MOONS,EQ,324922,131478,40.46,01022018

我的代码

 import pandas as pd
 df = pd.read_csv('C:/Working/dalal/MTO_11052018.DAT', sep='\t',skiprows=1)
 df=df.iloc[1]
 l1=list(str(df).split(","))
 l2=l1[2]
 df2=pd.read_csv('C:/Working/dalal/MTO_11052018.DAT',sep='\t',skiprows=3)
 df2['Trans_dt']=df2.apply(lambda row:[l2],axis=1)
 df2.to_csv('C:/Working/dalal/deldata/MTO_11052018.OUT',sep=',')

我没有得到预期。请帮忙

标签: pythonpandas

解决方案


我认为需要header=1第二行到列,nrows=0没有行并且usecols=[2]只读第三列:

import pandas as pd

temp=u"""Security Wise Delivery Position - Compulsory Rolling Settlement
10,MTO,01022018,592287763,0001583
Trade Date <01-FEB-2018>,Settlement Type <N>,Settlement No <2018023>,Settlement Date <05-FEB-2018>
Record Type,Sr No,Name of Security,Quantity Traded,Deliverable Quantity(gross across client level),% of Deliverable Quantity to Traded Quantity
20,1,20MICRONS,EQ,53466,27284,51.03
20,2,3IINFOTECH,EQ,7116046,3351489,47.10
20,3,3MINDIA,EQ,2613,1826,69.88
20,4,5PAISA,EQ,8463,5230,61.80
20,5,63MOONS,EQ,324922,131478,40.46"""
#after testing replace 'pd.compat.StringIO(temp)' to 'C:/Working/dalal/MTO_11052018.DAT'
a = pd.read_csv(pd.compat.StringIO(temp), nrows=0, header=1, usecols=[2]).columns
print (a)
Index(['01022018'], dtype='object')

然后读取所有必要的数据和assign新列:

#after testing replace 'pd.compat.StringIO(temp)' to 'C:/Working/dalal/MTO_11052018.DAT'    
df = pd.read_csv(pd.compat.StringIO(temp), skiprows=3).assign(Trans_dt=a[0])
print (df)
    Record Type   ...    Trans_dt
20            1   ...     1022018
20            2   ...     1022018
20            3   ...     1022018
20            4   ...     1022018
20            5   ...     1022018

[5 rows x 7 columns]

df2.to_csv('C:/Working/dalal/deldata/MTO_11052018.OUT')
#if columns names is necessary remove
df2.to_csv('C:/Working/dalal/deldata/MTO_11052018.OUT', header=None)

或者如果需要默认范围索引,则类似:

#after testing replace 'pd.compat.StringIO(temp)' to 'C:/Working/dalal/MTO_11052018.DAT'    
df = pd.read_csv(pd.compat.StringIO(temp), skiprows=3).rename_axis('val').reset_index().assign(Trans_dt=a[0])
print (df)
   val    ...     Trans_dt
0   20    ...      1022018
1   20    ...      1022018
2   20    ...      1022018
3   20    ...      1022018
4   20    ...      1022018

[5 rows x 8 columns]

如果列名不重要:

#after testing replace 'pd.compat.StringIO(temp)' to 'C:/Working/dalal/MTO_11052018.DAT'
df = pd.read_csv(pd.compat.StringIO(temp), skiprows=4, header=None).assign(Trans_dt=a[0])
print (df)
    0  1           2   3        4        5      6  Trans_dt
0  20  1   20MICRONS  EQ    53466    27284  51.03   1022018
1  20  2  3IINFOTECH  EQ  7116046  3351489  47.10   1022018
2  20  3     3MINDIA  EQ     2613     1826  69.88   1022018
3  20  4      5PAISA  EQ     8463     5230  61.80   1022018
4  20  5     63MOONS  EQ   324922   131478  40.46   1022018

最后:

df2.to_csv('C:/Working/dalal/deldata/MTO_11052018.OUT', index=False)
#if columns names is necessary remove
df2.to_csv('C:/Working/dalal/deldata/MTO_11052018.OUT', index=False, header=None)

推荐阅读