regex - 通过命令行将双引号添加到 csv 文件的第一行
问题描述
我有这个 csv 文件,我注意到在导出过程中没有添加起始报价。实际上在 ubuntu 中,如果我输入:
head -n 1 file.csv
我得到这个输出:
801","40116","Hazelnut MT -L","Thursday Promo","Large","","5.9000","","801","1.0000","","3.6500","2.2500",".0000","default","","","","","Chatime","02/06/2014","09125a9cfffd4143a00e73e3b62f15f2","CB01","",".0000","5.9000","6.9000",".0000",".0000",".0000",".0000",".0000",".0000","0","","0","0","0","","","","","","","","","Modern Milk Tea","","","0","","","1","0","","","","","","","","0","Hau Chan","","","","","","","","","","0","","","","","","","-1","","","","","","","","","","","","0","00000000420714AA","2014-06-02","1900-01-01","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""
是否有一些命令类型可以帮助我添加缺少的起始引号?
解决方案
这应该适用于每个 posix-shell:
printf \" | cat - file.csv > repaired-file.csv
如果您对结果感到满意,您可以覆盖原来的
mv repaired-file.csv file.csv
由于您的文件有 70GB 大,您可能希望避免创建第二个文件,但这比看起来要难。当然,有类似sed
's inplace option ( -i
) 和sponge
from 的实用程序之类的东西moreutils
,但它们并不像您预期的那样就地工作。sed -i
并且sponge
都使用临时文件或将整个文件保存在内存中(不再适用于 70GB)。在这篇博文中可以找到关于真正就地编辑的精彩研究。结论:没有标准工具支持真正的就地编辑。但是下面perl
的单行应该可以工作(已经适应了你的需要)。
perl <<'EOF'
use Tie::File;
my @a;
tie @a, 'Tie::File', 'path/to/your/file' or die 'Cannot tie file';
$a[0] = '"' . $a[0];
EOF
基准
出于兴趣,我运行了这里讨论的命令并测量了它们的运行时间。
9.3 GiB 输入文件f
是使用seq 1000000000 > f
. 在为单个命令计时之前,我总是f
使用sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
. 我的系统有足够的内存来保存整个文件,但我手动监控了内存使用情况——所有命令只使用了几 KB 的内存。
printf \" | cat - f > f2; mv f2 f
1m 05sperl … # script from above
1m 32ssed -i '1s/^/"/' f
25m 57s(也一直使用 100% CPU)
我自己有点惊讶cat
命令比perl
脚本快。然而,这是有道理的,因为perl
脚本做了很多寻找(可以看到使用strace
),而cat
只是复制。
摘要:如果您有足够的磁盘空间,请使用该cat
命令。如果文件大于系统上剩余的可用磁盘空间,则使用该perl
脚本。
推荐阅读
- php - 存储 ON DUPLICATE KEY UPDATE 重复值
- python - 如何在 jupyter 笔记本中修复“调用 parse_args() 时出现 2 错误”
- plot - 如何在 Octave 的非矩形区域上绘制数据?
- reactjs - 我可以在不知道切片减速器路径的情况下编写选择器吗?
- flutter - 由滑块控制的动画添加到我的照片
- uwp - ScrollViewer 是从哪里来的?
- javascript - 我需要帮助来制作右、左和中间文本
- javascript - javaScript JSON localstorage,用户输入,只保存第一个条目
- google-chrome - WebRTC:Chrome 只显示第一个视频轨道
- anylogic - Anylogic:使用表格函数的时间依赖库存流入