首页 > 解决方案 > Pandas DataFrame TypeError:列表索引必须是整数或切片,而不是 str

问题描述

对于当前项目,我计划在给定文件夹中抓取所有 CSV 文件,按特定单词过滤文件内容,然后将过滤后的数据框另存为新文件,扩展名包含搜索关键字.

但是,下面的脚本会产生TypeError: list indices must be integers or slices, not strline的消息df2 = df[df['tag'] == "Sales"],因此表明数据类型存在问题。

我已经尝试通过添加通用数据类型定义来解决问题,例如dtype='unicode',它没有解决问题。是否有任何聪明的调整来使这项工作?

import pandas as pd
import csv
import glob

# Crawl over all CSV files within folder
df = glob.glob(r'/Users/name/SEC/Merged/*.csv')

# Filter by key word "Sales"
df2 = df[df['tag'] == "Sales"]

# Remove duplicates
df2 = df2.drop_duplicates(subset=None, keep='first', inplace=False)

# Save as new file that includes the name of the "input" file as well as the extension '-sales'.
df2.to_csv(basename+'-sales.csv')

# Sanity check print command
print(df2)

标签: pythonpandasdataframe

解决方案


循环路径并将它们读入数据帧

import pandas as pd
import csv
import glob

# Crawl over all CSV files within folder
for csv_path in glob.glob(r'/Users/name/SEC/Merged/*.csv'):
    df = pd.read_csv(csv_path)

    # Filter by key word "Sales"
    df2 = df[df['tag'] == "Sales"]

    # Remove duplicates
    df2 = df2.drop_duplicates(subset=None, keep='first', inplace=False)

    # Save as new file that includes the name of the "input" file as well as the extension '-sales'.
    df2.to_csv(csv_path+'-sales.csv')

    # Sanity check print command
    print(df2)

推荐阅读