perl - 使用 sed 或 awk 或 perl 替换数据的双引号限定符
问题描述
我有带有|
分隔符和"
限定符的 txt 文件。我想将限定符更改为~
符号,我遇到的问题是实际列值文本有双引号。
我需要更改限定符而不删除列值中的双引号。我提供了样本一记录:
"Live Your Dreams: Be You"|"20 Feb 2018"|"2 formats and editions"|"Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In
"Live Your Dreams"
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny."|"All this and more as you immerse yourself in the story that opens up like scenes from "a Bollywood movie""|"Indian Edition"
我已经尝试过sed
并awk
通过引用堆栈溢出和 unix.com 中的内容,但列内的双引号会产生问题。
期望的输出:
~Live Your Dreams: Be You~|~20 Feb 2018~|~2 formats and editions~|~Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In
"Live Your Dreams"
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny.~|~All this and more as you immerse yourself in the story that opens up like scenes from "a Bollywood movie"~|~Indian Edition~
代码尝试: sed 's_"([^*])"_~\1~_g' data.txt > tdata.txt
根据上述 sed 的结果:
"Live Your Dreams: Be You~|~20 Feb 2018~|~2 formats and editions~|~Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In
"Live Your Dreams"
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny.~|~All this and more as you immerse yourself in the story that opens up like scenes from "a Bollywood movie"~|~Indian Edition~
任何有关awk
或sed
或Perl
脚本的帮助将不胜感激。
在此先感谢,普拉布
解决方案
您实际拥有的是格式错误的 CSV 数据,其中分隔符 char 为|
.
由于“内部”引号未转义,因此格式错误:在包含引号的 CSV 字段中,引号应重复,如下所示
1,2,"field,with,commas","this field ""contains quotes"" that are duplicated"
# ..................................^^...............^^
如果可以将您的输入数据修复为如下所示:
"Live Your Dreams: Be You"|"20 Feb 2018"|"2 formats and editions"|"Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In
""Live Your Dreams""
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny."|"All this and more as you immerse yourself in the story that opens up like scenes from ""a Bollywood movie"""|"Indian Edition"
如果第 2 行和第 3 行的内部引号被正确转义,那么您可以使用 CSV 解析器来转换输出引号。Perl 的 csv 解析器可以处理包含换行符的字段:
perl -MText::CSV -e '
open my $fh, "<:encoding(UTF-8)", shift(@ARGV);
my $csv_in = Text::CSV->new({ quote_char => "\"", sep_char => "|", binary => 1 });
my $csv_out = Text::CSV->new({ quote_char => "~", escape_char => "~", sep => "|", binary => 1 });
while (my $row = $csv_in->getline($fh)) {
$csv_out->say(STDOUT, $row);
}
$csv_in->eof or $csv_in->error_diag();
' file.csv
~Live Your Dreams: Be You~|~20 Feb 2018~|~2 formats and editions~|~Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In
"Live Your Dreams"
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny.~|~All this and more as you immerse yourself in the story that opens up like scenes from "a Bollywood movie"~|~Indian Edition~
推荐阅读
- json - How can i use sass variables from json file?
- java - 动态 Javafx 视图的控制器类
- java - How to store values in an array and save them in a file
- python-3.x - 当我使用 PIL 将裁剪粘贴到另一个图像时,它会引发 ValueError
- oop - How can one access a base class method using base class object once it has been overwritten?
- laravel - Foreach 条件添加一个数字而不是在 laravel 中重复一个项目
- three.js - 在 ShaderMaterial 中洗掉颜色
- python - How to make a Update command with SQL
- c# - 上传到 azure blob 容器未能对请求进行身份验证
- angular - 仅当错误是客户端时,Angular 重试 http 调用