首页 > 解决方案 > 从字符串中提取单词

问题描述

样本输入:

'note - Part model D3H6 with specifications X30G and Y2A is having features 12H89.'

预期输出:

['D3H6', 'X30G', 'Y2A', '12H89']

我的代码:

split_note = re.split(r'[.;,\s]\s*', note)
pattern = re.compile("^[a-zA-Z0-9]+$")  
#if pattern.match(ini_str):
for a in n2:
        if pattern.match(a):
            alphaList.append(a)

我需要从拆分字符串中提取所有字母数字单词并将它们存储在列表中。

上面的代码无法给出预期的输出。

标签: python

解决方案


也许这可以解决问题:

import re 

# input string
stri = "Part model D3H6 with specifications X30 and Y2 is having features 12H89"
# words tokenization
split = re.findall("[A-Z]{2,}(?![a-z])|[A-Z][a-z]+(?=[A-Z])|[\'\w\-]+",stri)
# this statment returns words containing both numbers and letters
print([word for word in split if bool(re.match('^(?=.*[a-zA-Z])(?=.*[0-9])', word))])

#output: ['D3H6', 'X30', 'Y2', '12H89']

推荐阅读