首页 > 解决方案 > 如何提取列表中匹配正则表达式的元组对?

问题描述

我有一个元组列表:

s = [(0, 'NEW'), (1, 'YOUTUBE'), (2, 'VIDEO'), (3, 'OUT'), (4, 'NOW:TOTTENHAM'), (5, 'NEWS'), (6, 'TRANSFER'), (7, 'WINDOW'), (8, 'UPDATE'), (9, '손흥민'), (10, 'Son'), (11, 'Award'), (12, 'Link'), (13, 'to'), (14, 'Premier'), (15, 'League'), (16, 'Defen...'), (17, 'TOTTENHAM'), (18, 'NEWS'), (19, 'TRANSFER'), (20, 'WINDOW'), (21, 'UPDATE'), (22, 'Carabao'), (23, 'Cup'), (24, 'Win'), (25, 'Final.'), (26, '손흥민'), (27, 'Son'), (28, 'Contract')]

我正在尝试使用此正则表达式提取元组中的所有非 ASCII 单词:

pattern = r'[^\\x00-\\x7F]+'

预期的输出是:

[(9, '손흥민'),(26, '손흥민')]

我正在尝试这个,但它不起作用并引发错误TypeError: 'int' object is not subscriptable

res = [[tup for tup in sub_list if re.match(r'[^\x00-\x7F]+', tup[1])] for sub_list in s]

标签: pythonstring

解决方案


最简单的解决方案是使用isascii方法。

>>> s = [
...     (0, "NEW"),
...     (1, "YOUTUBE"),
...     (2, "VIDEO"),
...     (3, "OUT"),
...     (4, "NOW:TOTTENHAM"),
...     (5, "NEWS"),
...     (6, "TRANSFER"),
...     (7, "WINDOW"),
...     (8, "UPDATE"),
...     (9, "손흥민"),
...     (10, "Son"),
...     (11, "Award"),
...     (12, "Link"),
...     (13, "to"),
...     (14, "Premier"),
...     (15, "League"),
...     (16, "Defen..."),
...     (17, "TOTTENHAM"),
...     (18, "NEWS"),
...     (19, "TRANSFER"),
...     (20, "WINDOW"),
...     (21, "UPDATE"),
...     (22, "Carabao"),
...     (23, "Cup"),
...     (24, "Win"),
...     (25, "Final."),
...     (26, "손흥민"),
...     (27, "Son"),
...     (28, "Contract"),
... ]
>>>
>>> print([(index, item) for (index, item) in s if not item.isascii()])
[(9, '손흥민'), (26, '손흥민')]

推荐阅读