首页 > 解决方案 > 在r中的模式后删除部分多行字符串

问题描述

我正在尝试使用 gsub 函数从 ROW FORMAT SERDE 中删除所有字符,但是它不起作用。任何建议。

x <- c("CREATE TABLE `cld_ml_bi_eng.iris`(", "  `sepal_length` double, ", 
  "  `sepal_width` double, ", "  `petal_length` double, ", "  `petal_width` double, ", 
  "  `species` string)", "ROW FORMAT SERDE ", "  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' ", 
  "STORED AS INPUTFORMAT ", "  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' ", 
  "OUTPUTFORMAT ", "  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'", 
  "LOCATION", "  'hdfs://haprod/warehouse/tablespace/managed/hive/cld_ml_bi_eng.db/iris'", 
  "TBLPROPERTIES (", "  'bucketing_version'='2', ", "  'transactional'='true', ", 
  "  'transactional_properties'='default', ", "  'transient_lastDdlTime'='1636686825')")

这里我使用 gsub

gsub(pattern = "(ROW FORMAT SERDE).*", replacement = "\\1", x = x)

我的预期输出

c("CREATE TABLE `cld_ml_bi_eng.iris`(", "  `sepal_length` double, ", 
  "  `sepal_width` double, ", "  `petal_length` double, ", "  `petal_width` double, ", 
  "  `species` string)")

标签: r

解决方案


一种方法是使用grep在以 text 开头的输入向量中查找字符串的索引ROW FORMAT SERDE。然后,子集输入向量并粘贴到单个字符串中:

paste0(x[1:(grep("^ROW FORMAT SERDE", x)-1)], collapse="")

[1]“创建表cld_ml_bi_eng.irissepal_length双, sepal_width双, petal_length双, petal_width双, species字符串)”


推荐阅读