首页 > 解决方案 > 分离字典和文本

问题描述

我有很多句子如下(这是一个句子,不是很多句子):

'Hello , I , am, fine.' ,{'type': 'bold', 'text': 'Multi class f1 score'}, {'type': 'mention', 'text': '@Abhishek'}, ' Singh you can continue with the deep learning specialization from Andrew Ng. It is very much informative and lots to learn and its very smplified and for the certificate you can apply for financial aid option..the courses will be available in 15 days'

我想将文本和字典分开为单独的部分,例如:

1. Hello , I , am, fine.
2. {'type': 'bold', 'text': 'Multi class f1 score'}
3. {'type': 'mention', 'text': '@Abhishek'}
4. Singh you can continue with the deep learning specialization from Andrew Ng. It is very much informative and lots to learn and its very smplified and for the certificate you can apply for financial aid option..the courses will be available in 15 days

拆分","不会有帮助,因为它会导致两个问题:

  1. 字典键和值对不会分开,看起来像{'type': 'mention' 'text': '@Abhishek'}.
  2. 我将失去,第 1 部分的所有内容

请注意,文本可能还包含 utf-8 编码形式的表情符号。

如何才能做到这一点?

标签: pythondictionarynlp

解决方案


您可以使用正则表达式以您想要的方式拆分您的内容:

import re

string = "'Hello , I , am, fine.' ,{'type': 'bold', 'text': 'Multi class f1 score'}, {'type': 'mention', 'text': '@Abhishek'}, ' Singh you can continue with the deep learning specialization from Andrew Ng. It is very much informative and lots to learn and its very smplified and for the certificate you can apply for financial aid option..the courses will be available in 15 days'"
results = re.findall(r"'(.*?)'|({.*?})", string)
results = [item for elem in results for item in elem if len(item)] # Clean empty records
for e in results:
    print(e)

这将返回:

Hello , I , am, fine.
{'type': 'bold', 'text': 'Multi class f1 score'}
{'type': 'mention', 'text': '@Abhishek'}
 Singh you can continue with the deep learning specialization from Andrew Ng. It is very much informative and lots to learn and its very smplified and for the certificate you can apply for financial aid option..the courses will be available in 15 days

推荐阅读