python - 如何拆分 csv 文件,将其标题保存在 Python 中的每个较小文件中?
问题描述
我使用此处的代码将 csv 文件拆分为许多较小的文件(向下滚动以查看完整代码):https ://dzone.com/articles/splitting-csv-files-in-python
文件已成功拆分并保留其结构,但标题已消失。我怀疑 pd.read() 函数中的参数有问题。
请帮我看看这个:
输入文件:
Text Header tag
0 textbody1 Y
1 textbody2 N
2 textbody2 Y
结果(结构仍然存在,但我的标题在我的拆分 csv 文件中消失了):
0 textbody1 Y
1 textbody2 N
2 textbody2 Y
请参阅下面的完整脚本:
import pandas as pd
#csv file name to be read in
in_csv = 'iii_baiterEmailTagged.csv'
#get the number of lines of the csv file to be read
number_lines = sum(1 for row in (open(in_csv)))
#size of rows of data to write to the csv,
#you can change the row size according to your need
rowsize = 10000
#start looping through data writing it to a new file for each set
for i in range(1,number_lines,rowsize):
df = pd.read_csv(in_csv,
header=None,
nrows = rowsize,#number of rows to read at each loop
skiprows = i)#skip rows that have been read
#csv to write data to a new file with indexed name. input_1.csv etc.
out_csv = 'Enronset' + str(i) + '.csv'
df.to_csv(out_csv,
index=False,
header=False,
mode='a',#append data to csv file
chunksize=rowsize)#size of data to append for each loop
谢谢
解决方案
您正在跳过 for 循环中的第一行(1 而不是 0)
for i in range(1,number_lines,rowsize):
并且您明确告诉 pandas 没有可供阅读的标题(只需省略它)
pd.read_csv(...,header=None)
并且不写一个(将 False 替换为 True)
pd.write_csv(...,header=False,...)
这是一个完整的工作代码:
import pandas as pd
#csv file name to be read in
in_csv = 'iii_baiterEmailTagged.csv'
#get the number of lines of the csv file to be read
number_lines = sum(1 for row in (open(in_csv)))
#size of rows of data to write to the csv,
#you can change the row size according to your need
rowsize = 10000
#start looping through data writing it to a new file for each set
for i in range(0,number_lines,rowsize):
df = pd.read_csv(in_csv,
nrows = rowsize,#number of rows to read at each loop
skiprows = i)#skip rows that have been read
#csv to write data to a new file with indexed name. input_1.csv etc.
out_csv = 'Enronset' + str(i) + '.csv'
df.to_csv(out_csv,
index=False,
header=True,
mode='a',#append data to csv file
chunksize=rowsize)#size of data to append for each loop
推荐阅读
- java - 如何使用 Espresso 访问 RecyclerView ViewHolder?
- mysql - 提取两个变量并用 sed 重写多行
- bash - Bash脚本以某种方式调用我目录中的文件
- python - 任何人都可以在这个 python 代码中帮助我找到 Cube
- accessibility - 在屏幕阅读器中隐藏 Aria-live(礼貌)中 div 中的内容更改
- java - 在 JUnit 5 中参数化 beforeEach/beforeAll
- python - Flask 的“app.logger”的 Pylint 误报:E1101:方法“记录器”没有“调试”成员(无成员)
- rxjs - 如何在 rxjs 6 中使用比赛
- vba - 其他解决方案不起作用:设置数据透视表时出现“无效的过程调用或参数”
- python - Python CSV Reader 搜索字符串找到匹配项