python - 为什么 concat 方法不匹配多个文件中的列?
问题描述
我正在尝试将三个文件合并到一个文件中。以下是 File1 中的字段名称。
"IDRSSD" RIAD4097 RIAD4235 RIAD4239 RIAD4341 RIAD4797 RIAD4843 RIAD4844 RIAD4845 RIAD4846 RIADB523 RIADB524 RIADB525
NON INT INCM OF INTL BUSINESS PROV FOR LOAN LOSS, INTL BUSINESS NON-INT EXPENSE, INTL BUSINESS NET INCM ATTRIB TO INTL BUSINESS INCOME TAX ATTRIB TO INTL BUSINESS NET NON-INT INC(EXP) ATTRIB TO INT O EST PRETAX INC ATTRIG TO INT OPR ADJ TO PRETAX INC FOR INTERNAL ALLOC EST PRETAX INC ATTRIB TO INT OPR AFT GROSS INTEREST INCOME (INTERNL OPER) GROSS INTEREST EXPENSE (INTERNL OPER NET INTEREST INCOME (INTERNATL OPER)
以下是 File2 中的字段名称。
"IDRSSD" RIADC899 RIADC900 RIADC902 RIADC903 RIADC904 RIADC905 RIADC907 RIADC908 RIADC909 RIADC911 RIADC913 RIADC914 RIADGW64 RIADJA28 RIADKW02
TOTAL INTEREST INCOME IN FOREIGN OFF TOT INTEREST EXPENSE IN FOREIGN OFFI NONINT INCOME FRGN OFFICS:TRADNG REV NONINT INC FRG OFFCS:INVMT BKG,ADVRY NONINT INC FRG OFFCS:NT SECURIZATION NONINT INC FRGN OFFICS:OTHER NONINTE TOTAL NONINT EXPENSE IN FOREIGN OFFI ADJMTS TO PRETAX INC FOREIGN OFFICES APPLICABLE INCOME TAXES NET INC ATTRIBUTABLE TO FRGN OFFICES ELIMINATIONS ARISING FRM CONSOLIDATN CONSOLIDTD NET INC ATTRIBTLE FRGN OF DISCONTINUED OPERATIONS, NET OF APPL 'Realized gains/losses on held-to-ma Provision for loan and lease losses
这是 File3 中的文件名。
"IDRSSD" RIADC899 RIADC900 RIADC902 RIADC903 RIADC904 RIADC905 RIADC907 RIADC908 RIADC909 RIADC911 RIADC913 RIADC914 RIADGW64 RIADJA28 RIADKW02
TOTAL INTEREST INCOME IN FOREIGN OFF TOT INTEREST EXPENSE IN FOREIGN OFFI NONINT INCOME FRGN OFFICS:TRADNG REV NONINT INC FRG OFFCS:INVMT BKG,ADVRY NONINT INC FRG OFFCS:NT SECURIZATION NONINT INC FRGN OFFICS:OTHER NONINTE TOTAL NONINT EXPENSE IN FOREIGN OFFI ADJMTS TO PRETAX INC FOREIGN OFFICES APPLICABLE INCOME TAXES NET INC ATTRIBUTABLE TO FRGN OFFICES ELIMINATIONS ARISING FRM CONSOLIDATN CONSOLIDTD NET INC ATTRIBTLE FRGN OF DISCONTINUED OPERATIONS, NET OF APPL 'Realized gains/losses on held-to-ma Provision for loan and lease losses
现在,我运行这段代码。
import os, glob
import pandas as pd
path = "C:\\Users\\ryans\\OneDrive\\Desktop\\schemas\\"
all_files = glob.glob(os.path.join(path, "*.txt"))
all_df = []
for f in all_files:
df = pd.read_csv(f, delimiter='\t')
df['file'] = os.path.basename(f)
all_df.append(df)
df_append = pd.concat(all_df, ignore_index=True, sort=True)
df_append.to_csv("C:\\Users\\ryans\\OneDrive\\Desktop\\merged.csv")
现在,当我过滤 O 列时,我看到了这一点。
RIADC899
TOTAL INTEREST INCOME IN FOREIGN OFF
TOTAL INTEREST INCOME IN FOREIGN OFF
这是数据导入 Excel 时的三个屏幕截图。
字段名称不应该排列吗?看到 'TOTAL INTEREST INCOME IN FOREIGN OFF' 2x 完全有道理,但为什么名为 'RIADC899' 的字段会在同一列中?这里有什么问题?
解决方案
推荐阅读
- c# - c#:在 Web API 核心上返回一个流
- python - 在 csv 行中查找匹配项并进一步处理
- visual-studio-code - 如何将差异内联装饰器添加到类似于 VSCode 的摩纳哥?
- r - ComplexHeatmap 包中的切片顺序
- json - 对象不是空的,但它说它是未定义的
- unity3d - Unity:即使在播放器设置中未选中 x86 后,此版本也不符合 Google Play 64 位要求错误
- c# - C#外键无法将对象添加到对象列表中
- reactjs - 如何在 React 中进行参数传递
- python - 在 SQL 到 Python 中挑选出两个日期之间具有相同时间的行
- javascript - 使用 php 将数组转换为 json ,并使用 jquery 或 javascript 检索 json