,python,pandas,csv,command-line-arguments"/>

首页 > 解决方案 > pandas read_csv 抛出 ValueError:无效的文件路径或缓冲区对象类型:

问题描述

我想读取作为命令行参数发送的 csv 文件。以为我可以直接使用 argsprase 的 FileType 对象,但我遇到了错误。

from argparse import ArgumentParser, FileType
from pandas import read_csv

if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument("input_file_path", help="Input CSV file", type=FileType('r'), nargs=1)
    df = read_csv(parser.parse_args().input_file_path, sep="|")
    print(df.to_string())

当我执行下面给出的程序时,Pandas read_csv 无法读取 FileType 对象 - 缺少什么?

python csv_splitter.py test.csv

Traceback (most recent call last):
  File "csv_splitter.py", line 7, in <module>
    df = read_csv(parser.parse_args().input_file_path, sep="|")
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 605, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 457, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 814, in __init__
    self._engine = self._make_engine(self.engine)
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 1045, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 1862, in __init__
    self._open_handles(src, kwds)
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 1357, in _open_handles
    self.handles = get_handle(
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\common.py", line 558, in get_handle
    ioargs = _get_filepath_or_buffer(
  File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\common.py", line 371, in _get_filepath_or_buffer
    raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'list'>

标签: pythonpandascsvcommand-line-arguments

解决方案


即使您只使用 1 个文件nargs=1,arg 解析器实际上也会为您提供该 1 个文件对象的列表

print(parser.parse_args().input_file_path)
# [<_io.TextIOWrapper>]

read_csv无法读取文件列表(即使只有 1 个!),所以只需提取单个文件:

df = pd.read_csv(parser.parse_args().input_file_path[0])
#                                                   ^^^

如果您确实有多个paths,则concat与生成器一起使用:

df = pd.concat(pd.read_csv(p) for p in paths)

concatmap

df = pd.concat(map(pd.read_csv, paths))

推荐阅读