首页 > 解决方案 > 使用 sed 或 awk 或 perl 替换数据的双引号限定符

问题描述

我有带有|分隔符和"限定符的 txt 文件。我想将限定符更改为~符号,我遇到的问题是实际列值文本有双引号。

我需要更改限定符而不删除列值中的双引号。我提供了样本一记录:

"Live Your Dreams: Be You"|"20 Feb 2018"|"2 formats and editions"|"Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In 
"Live Your Dreams"
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny."|"All this and more as you immerse yourself in the story that opens up like scenes from "a Bollywood movie""|"Indian Edition"

我已经尝试过sedawk通过引用堆栈溢出和 unix.com 中的内容,但列内的双引号会产生问题。

期望的输出:

~Live Your Dreams: Be You~|~20 Feb 2018~|~2 formats and editions~|~Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In 
"Live Your Dreams"
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny.~|~All this and more as you immerse yourself in the story that opens up like scenes from "a Bollywood movie"~|~Indian Edition~

代码尝试: sed 's_"([^*])"_~\1~_g' data.txt > tdata.txt

根据上述 sed 的结果:

"Live Your Dreams: Be You~|~20 Feb 2018~|~2 formats and editions~|~Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In 
"Live Your Dreams"
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny.~|~All this and more as you immerse yourself in the story that opens up like scenes from "a Bollywood movie"~|~Indian Edition~

任何有关awksedPerl脚本的帮助将不胜感激。

在此先感谢,普拉布

标签: perlawksed

解决方案


您实际拥有的是格式错误的 CSV 数据,其中分隔符 char 为|.

由于“内部”引号未转义,因此格式错误:在包含引号的 CSV 字段中,引号应重复,如下所示

1,2,"field,with,commas","this field ""contains quotes"" that are duplicated"
# ..................................^^...............^^

如果可以将您的输入数据修复为如下所示:

"Live Your Dreams: Be You"|"20 Feb 2018"|"2 formats and editions"|"Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In 
""Live Your Dreams""
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny."|"All this and more as you immerse yourself in the story that opens up like scenes from ""a Bollywood movie"""|"Indian Edition"

如果第 2 行和第 3 行的内部引号被正确转义,那么您可以使用 CSV 解析器来转换输出引号。Perl 的 csv 解析器可以处理包含换行符的字段:

perl -MText::CSV -e '
    open my $fh, "<:encoding(UTF-8)", shift(@ARGV);
    my $csv_in = Text::CSV->new({ quote_char => "\"", sep_char => "|", binary => 1 });
    my $csv_out = Text::CSV->new({ quote_char => "~", escape_char => "~", sep => "|", binary => 1 });
    while (my $row = $csv_in->getline($fh)) {
        $csv_out->say(STDOUT, $row);
    }
    $csv_in->eof or $csv_in->error_diag();
' file.csv
~Live Your Dreams: Be You~|~20 Feb 2018~|~2 formats and editions~|~Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In 
"Live Your Dreams"
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny.~|~All this and more as you immerse yourself in the story that opens up like scenes from "a Bollywood movie"~|~Indian Edition~

推荐阅读