首页 > 解决方案 > 如何在r中的新列内的分号分隔符内将几列合并为单列

问题描述

我想合并几列并创建一个包含用分号分隔的列表(或类似python中的字典)的列。
基本上,我有这个数据框:(空格是缺失值)

ID  Event Category  Start Time  End Time    Account No.   Dosage  Doctor's_ID
1    Stroke          1/1/2011       
1   Admitted         1/6/2011               24287939                  5487
1   Diagnosed        1/25/2011      
6   Diagnosed        1/1/2011       
6   Drug       A     1/2/2011   1/10/2011                  "high"
6   Drug       B     1/7/2011   1/20/2011   35287930      "medium"
10  Drug       A    1/3/2011    1/6/2011                   "low"
10  Drug       B    1/9/2011    1/13/2011                  "high"
10  Stroke          1/8/2011        

我想创建一个列attribute,它在一个列和分号分隔符内合并几列。

输出文件(可以是文本文件)看起来:

  ID    Event Category  Start Time  End Time    attributes
    1    Stroke          1/1/2011       
    1   Admitted         1/6/2011               Account No.="24287939"; Doctor's_ID="5487"
    1   Diagnosed        1/25/2011      
    6   Diagnosed        1/1/2011       
    6   Drug       A     1/2/2011   1/10/2011   Dosage="high"
    6   Drug       B     1/7/2011   1/20/2011   Account No.="35287930"; Dosage="medium"
    10  Drug       A    1/3/2011    1/6/2011    Dosage="low"
    10  Drug       B    1/9/2011    1/13/2011   Dosage="high"
    10  Stroke          1/8/2011        

我的目的是编写一个文本文件,其中的列由制表符分隔符(“\t”)和属性数据(最后一列)分隔,就像一个由“;”分隔的列表。

有关此处提供的所需输出的更多详细信息http://www.cs.umd.edu/hcil/eventflow/manual/chapter_start.html#1.4

我怎么能在 R?thx 中做到这一点

标签: rtextmerge

解决方案


一种选择是使用apply函数并为最后 3 列传递逐行数据。好的部分apply是行数据作为与列名称匹配的named-vector位置传递给函数。name

现在,必须首先name使用value命名向量组合paste,然后使用collapse=";"函数的参数再次合并到一个字符串中paste0。解决方案如下:

cbind(df[1:4],Attribute = 
   apply(df[,5:7],1, function(x)paste0(paste(names(x[!is.na(x)]),x[!is.na(x)], sep = "="),
   collapse = ";")))
# ID Event.Category Start.Time  End.Time                             Attribute
# 1  1         Stroke   1/1/2011      <NA>
# 2  1       Admitted   1/6/2011      <NA> Account.No.=24287939;Doctor.s_ID=5487
# 3  1      Diagnosed  1/25/2011      <NA>
# 4  6      Diagnosed   1/1/2011      <NA>
# 5  6   Drug       A   1/2/2011 1/10/2011                           Dosage=high
# 6  6   Drug       B   1/7/2011 1/20/2011    Account.No.=35287930;Dosage=medium
# 7 10   Drug       A   1/3/2011  1/6/2011                            Dosage=low
# 8 10   Drug       B   1/9/2011 1/13/2011                           Dosage=high
# 9 10         Stroke   1/8/2011      <NA>

数据:

df <- read.table(text =
'ID  "Event Category"  "Start Time"  "End Time"    "Account No."   Dosage  Doctor\'s_ID
1   Stroke          1/1/2011         NA          NA                NA       NA      
1   Admitted         1/6/2011        NA       24287939      NA            5487
1   Diagnosed        1/25/2011      NA          NA                NA       NA
6   Diagnosed        1/1/2011       NA          NA                NA       NA
6   "Drug       A"     1/2/2011   1/10/2011       NA           "high"         NA
6   "Drug       B"     1/7/2011   1/20/2011   35287930      "medium"         NA
10  "Drug       A"    1/3/2011    1/6/2011          NA         "low"         NA
10  "Drug       B"    1/9/2011    1/13/2011         NA         "high"         NA
10  Stroke          1/8/2011        NA          NA                NA       NA',
stringsAsFactors = FALSE, header = TRUE)

推荐阅读