首页 > 解决方案 > 如何将带有新行的文本输出表格化到数据框中?

问题描述

这是我正在使用的文本块的结构:

reprEx <- "] WITHDRAWALS\nDATE DESCRIPTION AMOUNT\n04/01 Quickpay With Zelle Payment To Mike T 819018100 $1,450.00\n04/01 Quickpay With Zelle Payment To Mandy Doid 809012906 2,665.00"

我希望能够在每个新行上获取文本并将行中的每个元素分隔到相应的数据框列。例如,我需要将每行的日期放在 DATE 列中,将交易描述放在 DESCRIPTION 列中,并将行尾之前的数字放入 AMOUNT 列中。这是我在数据框中所需输出的示例。

desiredResult <- data.frame(DATE = c("04/01", "04/01"),
                            DESCRIPTION = c("Quickpay With Zelle Payment To Mike T 819018100", "Quickpay With Zelle Payment To Mandy Doid 819012906"),
                            AMOUNT = c("$1,450.00", "2,665.00"))

标签: rregex

解决方案


这个开头怎么样?此解决方案str_extract_allstringr包中使用:

desiredResult <- data.frame(
  DATE = unlist(str_extract_all(reprEx, "\\d{2}/\\d{2}")),
  DESCRIPTION = unlist(str_extract_all(reprEx, "(?<=[0-9]{2}/[0-9]{2}\\s)[\\s\\w$]+(?=\\d{1,3},\\d{3}\\.\\d{2})")),
  AMOUNT = unlist(str_extract_all(reprEx, "\\d{1,3},\\d{3}\\.\\d{2}"))
)

输出:

desiredResult
   DATE                                           DESCRIPTION   AMOUNT
1 04/01    Quickpay With Zelle Payment To Mike T 8090128100 $ 1,450.00
2 04/01 Quickpay With Zelle Payment To Mandy Dold 8090129906  2,665.00

如果你想去掉 column 中的美元符号DESCRIPTION,你可以这样做:

desiredResult <- data.frame(
  DATE = unlist(str_extract_all(reprEx, "[0-9]{2}/[0-9]{2}")),
  DESCRIPTION = unlist(str_extract_all(reprEx, "(?<=[0-9]{2}/[0-9]{2}\\s)[\\s\\w]+(?=\\d{1,3},\\d{3}\\.\\d{2})|(?<=[0-9]{2}/[0-9]{2}\\s)[\\s\\w]+(?=\\$\\d{1,3},\\d{3}\\.\\d{2})")),
  AMOUNT = unlist(str_extract_all(reprEx, "\\d{1,3},\\d{3}\\.\\d{2}"))
)

输出:

desiredResult
   DATE                                           DESCRIPTION   AMOUNT
1 04/01     Quickpay With Zelle Payment To Mike T 8090128100  1,450.00
2 04/01 Quickpay With Zelle Payment To Mandy Dold 8090129906  2,665.00

推荐阅读