首页 > 解决方案 > 为什么 concat 方法不匹配多个文件中的列?

问题描述

我正在尝试将三个文件合并到一个文件中。以下是 File1 中的字段名称。

"IDRSSD"    RIAD4097    RIAD4235    RIAD4239    RIAD4341    RIAD4797    RIAD4843    RIAD4844    RIAD4845    RIAD4846    RIADB523    RIADB524    RIADB525    
    NON INT INCM OF INTL BUSINESS   PROV FOR LOAN LOSS, INTL BUSINESS   NON-INT EXPENSE, INTL BUSINESS  NET INCM ATTRIB TO INTL BUSINESS    INCOME TAX ATTRIB TO INTL BUSINESS  NET NON-INT INC(EXP) ATTRIB TO INT O    EST PRETAX INC ATTRIG TO INT OPR    ADJ TO PRETAX INC FOR INTERNAL ALLOC    EST PRETAX INC ATTRIB TO INT OPR AFT    GROSS INTEREST INCOME (INTERNL OPER)    GROSS INTEREST EXPENSE (INTERNL OPER    NET INTEREST INCOME (INTERNATL OPER)    

以下是 File2 中的字段名称。

"IDRSSD"    RIADC899    RIADC900    RIADC902    RIADC903    RIADC904    RIADC905    RIADC907    RIADC908    RIADC909    RIADC911    RIADC913    RIADC914    RIADGW64    RIADJA28    RIADKW02    
    TOTAL INTEREST INCOME IN FOREIGN OFF    TOT INTEREST EXPENSE IN FOREIGN OFFI    NONINT INCOME FRGN OFFICS:TRADNG REV    NONINT INC FRG OFFCS:INVMT BKG,ADVRY    NONINT INC FRG OFFCS:NT SECURIZATION    NONINT INC FRGN OFFICS:OTHER NONINTE    TOTAL NONINT EXPENSE IN FOREIGN OFFI    ADJMTS TO PRETAX INC FOREIGN OFFICES    APPLICABLE INCOME TAXES NET INC ATTRIBUTABLE TO FRGN OFFICES    ELIMINATIONS ARISING FRM CONSOLIDATN    CONSOLIDTD NET INC ATTRIBTLE FRGN OF    DISCONTINUED OPERATIONS, NET OF APPL    'Realized gains/losses on held-to-ma    Provision for loan and lease losses 

这是 File3 中的文件名。

"IDRSSD"    RIADC899    RIADC900    RIADC902    RIADC903    RIADC904    RIADC905    RIADC907    RIADC908    RIADC909    RIADC911    RIADC913    RIADC914    RIADGW64    RIADJA28    RIADKW02    
    TOTAL INTEREST INCOME IN FOREIGN OFF    TOT INTEREST EXPENSE IN FOREIGN OFFI    NONINT INCOME FRGN OFFICS:TRADNG REV    NONINT INC FRG OFFCS:INVMT BKG,ADVRY    NONINT INC FRG OFFCS:NT SECURIZATION    NONINT INC FRGN OFFICS:OTHER NONINTE    TOTAL NONINT EXPENSE IN FOREIGN OFFI    ADJMTS TO PRETAX INC FOREIGN OFFICES    APPLICABLE INCOME TAXES NET INC ATTRIBUTABLE TO FRGN OFFICES    ELIMINATIONS ARISING FRM CONSOLIDATN    CONSOLIDTD NET INC ATTRIBTLE FRGN OF    DISCONTINUED OPERATIONS, NET OF APPL    'Realized gains/losses on held-to-ma    Provision for loan and lease losses 

现在,我运行这段代码。

import os, glob
import pandas as pd

path = "C:\\Users\\ryans\\OneDrive\\Desktop\\schemas\\"

all_files = glob.glob(os.path.join(path, "*.txt"))

all_df = []
for f in all_files: 
   df = pd.read_csv(f, delimiter='\t') 
   df['file'] = os.path.basename(f)
   all_df.append(df) 

df_append = pd.concat(all_df, ignore_index=True, sort=True)
df_append.to_csv("C:\\Users\\ryans\\OneDrive\\Desktop\\merged.csv")

现在,当我过滤 O 列时,我看到了这一点。

RIADC899
TOTAL INTEREST INCOME IN FOREIGN OFF
TOTAL INTEREST INCOME IN FOREIGN OFF

这是数据导入 Excel 时的三个屏幕截图。

图1: 在此处输入图像描述

图2: 在此处输入图像描述

图3: 在此处输入图像描述

字段名称不应该排列吗?看到 'TOTAL INTEREST INCOME IN FOREIGN OFF' 2x 完全有道理,但为什么名为 'RIADC899' 的字段会在同一列中?这里有什么问题?

标签: pythonpython-3.xpandas

解决方案


推荐阅读