string - 将具有多个分隔符的字符串列拆分为数组
问题描述
我有一个像这样的列的数据框
Title
"Over the Hill,to the Poorhouse"
"Wilson"
"Darling Lili"
"The Ten Commandments"
"12 Angry Men"
"Twelve Monkeys"
"1776"
"1941"
"Chacun sa nuit"
"2001: A Space Odyssey"
"20,000 Leagues Under the Sea"
"20,000 Leagues Under the Sea"
"24,7: Twenty Four Seven"
"Twin Falls Idaho"
"Three Kingdoms: Resurrection of the Dragon"
.......
.......
我想将此列转换为这样的数组。
[Over, the, Hill, to, the, Poorhouse]
[Wilson]
[Darling, Lili]
[The, Ten, Commandments]
[12, Angry, Men]
[Twelve, Monkeys]
[1776]
[1941]
[Chacun, sa, nuit]
[2001, , A, Space, Odyssey]
[20, 000, Leagues, Under, the, Sea]
[20, 000, Leagues, Under, the, Sea]
[24, 7, , Twenty, Four, Seven]
[Twin, Falls, Idaho]
[Three, Kingdoms, , Resurrection, of, the, Dragon]
所以我会有这两列
Title Title_Words
Over the Hill to the Poorhouse [Over, the, Hill, to, the, Poorhouse]
Wilson [Wilson]
Darling Lili [Darling, Lili]
The Ten Commandments [The, Ten, Commandments]
12 Angry Men [12, Angry, Men]
Twelve Monkeys [Twelve, Monkeys]
1776 [1776]
1941 [1941]
Chacun sa nuit [Chacun, sa, nuit]
2001: A Space Odyssey [2001, , A, Space, Odyssey]
20,000 Leagues Under the Sea [20, 000, Leagues, Under, the, Sea]
20,000 Leagues Under the Sea [20, 000, Leagues, Under, the, Sea]
24 7: Twenty Four Seven [24, 7, , Twenty, Four, Seven]
Twin Falls Idaho [Twin, Falls, Idaho]
Three Kingdoms: Resurrection of the Dragon[Three, Kingdoms, , Resurrection, of, the, Dragon]
问题是字符串可能有多个分隔符:空格、逗号、冒号。
怎么可能做到?
解决方案
尝试这个-
df.withColumn("Title_Words", split(col("Title"), "\\s+|[,:]"))
推荐阅读
- python - Python 在没有 == 运算符的情况下检查变量是否为真
- arrays - 无法将 mutableList 映射到字符串数组 KMM -> Swift
- python - 收集满足给定条件的numpy数组元素的Pythonic方法
- reactjs - 为 Nextjs 应用程序创建用户和管理员仪表板开关的最佳方法是什么?
- javascript - 为什么我的 getter/setter 方法没有按预期工作 - 怀疑它与“this”有关
- javascript - Javascript基于多个字段过滤数据
- r - 当条件相同但输出不同时如何组合mutate?
- python - 'float' 对象没有属性 'something' 错误
- r - 获取特定组的列中空单元格的百分比
- javascript - 访问 JSON 数据时出现未捕获的类型错误