r - 如何在 R 中将 2 行或更多行文本合并为 1 行条件
问题描述
我想将 Thunderbird Mozilla 的 Sent 文件读入 R。有时 2 行或更多行必须放在 1 行中。这些是以“,”逗号结尾的行例如:
From: Frans <frans@zeenit.nl>
Subject: volledig overzicht beschikbaar
To: aldjan@gmail.com, clen@zeenit.nl, pinge1@zeenit.nl,
griepje@zeenit.nl, Jowialj@live.com, pelicaan@hotmail.com,
pico11@zeenit.nl
Date: Mon, 21 Mar 2016 14:17:09 +0100
合并后:
From: Frans <frans@zeenit.nl>
Subject: volledig overzicht beschikbaar
To: aldjan@gmail.com, clen@zeenit.nl, pinge1@zeenit.nl, griepje@zeenit.nl, Jowialj@live.com, pelicaan@hotmail.com, pico11@zeenit.nl
Message-ID: <56EFF455.5000006@zeenit.nl>
Date: Mon, 21 Mar 2016 14:17:09 +0100
解决方案
我不会只依赖逗号。使用grep
您可以识别带有To:
标签的行并将所有行粘贴到带有以下标签的行Message-ID:/Date:
。
cleanHeader <- function(x) {
line.to <- grep("^To", header)
line.next <- grep("^Date|^Mess", header)[1]
new.to <- paste(header[line.to:(line.next - 1)], collapse="")
c(header[1:(line.to - 1)], new.to, header[line.next:length(header)])
}
结果
cleanHeader(header1)
[1] "From: Frans <frans@zeenit.nl>"
[2] "Subject: volledig overzicht beschikbaar"
[3] "To: aldjan@gmail.com, clen@zeenit.nl, pinge1@zeenit.nl, griepje@zeenit.nl,
Jowialj@live.com, pelicaan@hotmail.com, pico11@zeenit.nl"
[4] "Date: Mon, 21 Mar 2016 14:17:09 +0100"
cleanHeader(header2)
[1] "From: Frans <frans@zeenit.nl>"
[2] "Subject: volledig overzicht beschikbaar"
[3] "To: aldjan@gmail.com, clen@zeenit.nl, pinge1@zeenit.nl, griepje@zeenit.nl,
Jowialj@live.com, pelicaan@hotmail.com, pico11@zeenit.nl"
[4] "Message-ID: <56EFF455.5000006@zeenit.nl>"
[5] "Date: Mon, 21 Mar 2016 14:17:09 +0100"
数据:
tmp <- tempfile()
cat("From: Frans <frans@zeenit.nl>
Subject: volledig overzicht beschikbaar
To: aldjan@gmail.com, clen@zeenit.nl, pinge1@zeenit.nl,
griepje@zeenit.nl, Jowialj@live.com, pelicaan@hotmail.com,
pico11@zeenit.nl
Date: Mon, 21 Mar 2016 14:17:09 +0100", file=tmp, sep="\n")
header1 <- readLines(tmp)
cat("From: Frans <frans@zeenit.nl>
Subject: volledig overzicht beschikbaar
To: aldjan@gmail.com, clen@zeenit.nl, pinge1@zeenit.nl, griepje@zeenit.nl, Jowialj@live.com, pelicaan@hotmail.com, pico11@zeenit.nl
Message-ID: <56EFF455.5000006@zeenit.nl>
Date: Mon, 21 Mar 2016 14:17:09 +0100", file=tmp, sep="\n")
header2 <- readLines(tmp)
推荐阅读
- sql - 正确的查询在用作子查询时会产生语法错误
- java - Maven Shade 插件:添加 ApacheLicenseResourceTransformer 后仍会抛出重叠资源警告
- custom-fields - Shopware 6 具有自定义字段类型的产品
- google-apps-script - 允许匿名访问非常特定的 Google Apps 脚本网络应用程序
- php - 通过 PHP 访问 Joomla 用户组的问题
- python - 我从 Python 中的 requests 模块获得的 html 代码与我从浏览器获得的同一网页的源代码不同
- kubernetes - 阻止所有流量流向选定标签的网络策略
- python - 如何在 python 中的上一个循环中转到列表的下一个单词?
- macos-big-sur - 我无法在 macOS Big Sur 上运行“git gui”
- python - Django中模型对象创建中的UnicodeEncodeError