csv - Merging two csv files, can't get rid of newline
问题描述
I am merging two csv files. For simplicity, I am showing relevant columns only. There are more than four columns in both files.
file_a.csv
col2, col6, col7, col17
a, b, c, 145
e, f, g, 101
x, y, z, 243
file_b.csv
col2, col6, col7, col17
a, b, c, 88
e, f, g, 96
x, k, l, 222
Output should look like this:
col2, col6, col7, col17, col18
a, b, c, 145, 88
e, f, g, 101, 96
So col17 of file_b is added to file_a as col18 when the contents of col2, col6 and col7 match.
I tried this:
awk -F, 'NR == FNR {a[$2,$6,$7] = $17;next;} {if (! (b = a[$2,$6,$7])) b = "N/A";print $0,FS,b;}' file_a.csv file_b.csv > out.csv
The output looks like this:
col2, col6, col7, col17,
, col18
a, b, c, 145
, 88
e, f, g, 101
, 96
So the column 17 from file_b I am trying to add does get added but shows up on a new line.
I think this is because there are carriage returns after each line of file_a and file_b. In Notepad++, I can see CRLF. But I can't get rid of them. Also, I would rather not go through two steps: getting rid of carriage returns first and then merging. Instead, if I can bypass the carriage returns during the merge, it will be much faster.
Also, I will appreciate it if you could tell me how to get rid of the spaces before and after the comma separating the merged column. Note that I put spaces between the columns and commas for the other columns for better readability. That is not how it is in the actual files. But there are indeed spaces between col17 and "," and col18 in the merged file and I don't know why.
If you insist on marking this as a duplicate, kindly explain in a comment below how the answers to the previous question(s) address my issue. I tried figuring it out from those previous similar questions and I failed.
解决方案
请试试这个(GNU awk):
awk -F, -v RS="[\r\n]+" 'NR == FNR {a[$2,$6,$7] = $17;next;} {b=a[$2,$6,$7]; print $0 FS (b? b : "N/A")}' file_a.csv file_b.csv
您遇到的问题:
1. 回车, by RS="[\r\n]+"
,它将处理多个换行符,包括\r
和\n
作为行分隔符。请注意,这也会忽略空行,如果您不想这样做,请更改为RS="\r\n"
.
2.空格,那是因为awk的默认OFS
是空格。当您打印时,您使用,
了 ,这将在它们之间添加空格。只需使用空间或有时将它们写在一起就可以了,它们将被连接起来。
推荐阅读
- reactjs - reactjs:获取外部数据
- python - 以最快的方式从数据框中删除值
- c# - 将 .NET 库移植到 .NET Standard
- xml - 如何禁止在 Team Foundation Server 中制作工作项的人从一种状态转换到另一种状态?
- jquery - on("keyup paste", function) 复制数据但在 Microsoft Edge 中不更新
- python - 在 pandas 列中将相似的字符串合并在一起
- java - 如何使用 URL 参数关闭登录 HSQL DB?
- javascript - 选择下拉菜单在第一次单击时缩小,同时使用 ajax 加载选项,但如果再次单击则工作正常
- powershell - 我们如何使用 powershell cmdlet 配置 power bi 报告的凭据
- c++ - 可以从外部修改 OpenMP 线程中的对象吗?