python - 将多个 csv 合并为一个 csv
问题描述
我正在尝试将大约 5000 个 csv 表合并到一个 csv 中,各个 csv 文件的结构是相同的,所以代码应该很简单,但是我一直收到“找不到文件”的错误消息。
这是代码:
csv_paths = set(glob.glob("folder_containing_csvs/*.csv"))
full_csv_path = "folder_containing_csvs/full_df.csv"
csv_paths -= set([full_csv_path])
for csv_path in csv_paths:
print("csv_path", csv_path)
df = pd.read_csv(csv_path, sep="\t")
df[sorted(list(df.columns.values))].to_csv(full_csv_path, mode="a", header=not
os.path.isfile(full_csv_path), sep="\t", index=False)
full_df = pd.read_csv(full_csv_path, sep="\t", encoding='utf-8')
full_df
该代码导致错误消息如下:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-47-11ffadd03e3e> in <module>
----> 1 full_df = pd.read_csv(full_csv_path, sep="\t", encoding='utf-8')
2 full_df
~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer,
sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, type,
engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter,
nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates,
infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator,
chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote,
escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace,
low_memory, memory_map, float_precision)
686 )
687
--> 688 return _read(filepath_or_buffer, kwds)
689
690
~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
452
453 # Create the parser.
--> 454 parser = TextFileReader(fp_or_buf, **kwds)
455
456 if chunksize or iterator:
~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
946 self.options["has_index_names"] = kwds["has_index_names"]
947
--> 948 self._make_engine(self.engine)
949
950 def close(self):
~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
1178 def _make_engine(self, engine="c"):
1179 if engine == "c":
-> 1180 self._engine = CParserWrapper(self.f, **self.options)
1181 else:
1182 if engine == "python":
~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1991 if kwds.get("compression") is None and encoding:
1992 if isinstance(src, str):
-> 1993 src = open(src, "rb")
1994 self.handles.append(src)
1995
FileNotFoundError: [Errno 2] No such file or directory: 'folder_containing_csvs/full_df.csv'
解决方案
尝试这个:
loc_path = /path/to/folder/of/csv's
files = os.listdir(loc_path)
files = [file for file in files if '.csv' in file]
# now load them into a list
dfs = []
for file in files:
dfs.append(pd.read_csv(loc_path+file), sep='\t')
# concat the dfs list:
df = pd.concat(dfs)
# Send this df.to_csv at location of your choice.
只需阅读 5000 csv 表部分。你期待多少行?
推荐阅读
- apache - 如何针对安全漏洞强化 Apache
- javascript - 在 forEach 中查找特定的 div
- java - 将浮点值格式化为特定格式 - Java 与 C# 数字格式
- sql-server - 我正在尝试将我的插入查询更改为批量插入,我该怎么做?
- powershell - Teams PowerShell:访问令牌验证失败
- redis - 处理 Azure Redis 缓存异常
- python - 使用 numpy 交换数组中一定百分比的元素
- java - 如何使用转义字符发送到 RabbitMQ
- php - 如何在选择元素内将单击的选项显示为活动的?
- angular - Nativescript nfc 插件似乎无法正常工作