首页 > 解决方案 > 通过命令行将双引号添加到 csv 文件的第一行

问题描述

我有这个 csv 文件,我注意到在导出过程中没有添加起始报价。实际上在 ubuntu 中,如果我输入:

head -n 1 file.csv

我得到这个输出:

801","40116","Hazelnut MT -L","Thursday Promo","Large","","5.9000","","801","1.0000","","3.6500","2.2500",".0000","default","","","","","Chatime","02/06/2014","09125a9cfffd4143a00e73e3b62f15f2","CB01","",".0000","5.9000","6.9000",".0000",".0000",".0000",".0000",".0000",".0000","0","","0","0","0","","","","","","","","","Modern Milk Tea","","","0","","","1","0","","","","","","","","0","Hau Chan","","","","","","","","","","0","","","","","","","-1","","","","","","","","","","","","0","00000000420714AA","2014-06-02","1900-01-01","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""

是否有一些命令类型可以帮助我添加缺少的起始引号?

标签: regexbashshellcommand-line

解决方案


这应该适用于每个 posix-shell:

printf \" | cat - file.csv > repaired-file.csv

如果您对结果感到满意,您可以覆盖原来的

mv repaired-file.csv file.csv

由于您的文件有 70GB 大,您可能希望避免创建第二个文件,但这比看起来要难。当然,有类似sed's inplace option ( -i) 和spongefrom 的实用程序之类的东西moreutils,但它们并不像您预期​​的那样就地工作。sed -i并且sponge都使用临时文件或将整个文件保存在内存中(不再适用于 70GB)。在这篇博文中可以找到关于真正就地编辑的精彩研究。结论:没有标准工具支持真正的就地编辑。但是下面perl的单行应该可以工作(已经适应了你的需要)。

perl <<'EOF'
  use Tie::File;
  my @a;
  tie @a, 'Tie::File', 'path/to/your/file' or die 'Cannot tie file';
  $a[0] = '"' . $a[0];
EOF

基准

出于兴趣,我运行了这里讨论的命令并测量了它们的运行时间。

9.3 GiB 输入文件f是使用seq 1000000000 > f. 在为单个命令计时之前,我总是f使用sync && echo 3 | sudo tee /proc/sys/vm/drop_caches. 我的系统有足够的内存来保存整个文件,但我手动监控了内存使用情况——所有命令只使用了几 KB 的内存。

  • printf \" | cat - f > f2; mv f2 f   1m 05s
  • perl … # script from above         1m 32s
  • sed -i '1s/^/"/' f            25m 57s(也一直使用 100% CPU)

我自己有点惊讶cat命令比perl脚本快。然而,这是有道理的,因为perl脚本做了很多寻找(可以看到使用strace),而cat只是复制。

摘要:如果您有足够的磁盘空间,请使用该cat命令。如果文件大于系统上剩余的可用磁盘空间,则使用该perl脚本。


推荐阅读