python - 如何提取列表中匹配正则表达式的元组对?
问题描述
我有一个元组列表:
s = [(0, 'NEW'), (1, 'YOUTUBE'), (2, 'VIDEO'), (3, 'OUT'), (4, 'NOW:TOTTENHAM'), (5, 'NEWS'), (6, 'TRANSFER'), (7, 'WINDOW'), (8, 'UPDATE'), (9, '손흥민'), (10, 'Son'), (11, 'Award'), (12, 'Link'), (13, 'to'), (14, 'Premier'), (15, 'League'), (16, 'Defen...'), (17, 'TOTTENHAM'), (18, 'NEWS'), (19, 'TRANSFER'), (20, 'WINDOW'), (21, 'UPDATE'), (22, 'Carabao'), (23, 'Cup'), (24, 'Win'), (25, 'Final.'), (26, '손흥민'), (27, 'Son'), (28, 'Contract')]
我正在尝试使用此正则表达式提取元组中的所有非 ASCII 单词:
pattern = r'[^\\x00-\\x7F]+'
预期的输出是:
[(9, '손흥민'),(26, '손흥민')]
我正在尝试这个,但它不起作用并引发错误TypeError: 'int' object is not subscriptable
:
res = [[tup for tup in sub_list if re.match(r'[^\x00-\x7F]+', tup[1])] for sub_list in s]
解决方案
最简单的解决方案是使用isascii
方法。
>>> s = [
... (0, "NEW"),
... (1, "YOUTUBE"),
... (2, "VIDEO"),
... (3, "OUT"),
... (4, "NOW:TOTTENHAM"),
... (5, "NEWS"),
... (6, "TRANSFER"),
... (7, "WINDOW"),
... (8, "UPDATE"),
... (9, "손흥민"),
... (10, "Son"),
... (11, "Award"),
... (12, "Link"),
... (13, "to"),
... (14, "Premier"),
... (15, "League"),
... (16, "Defen..."),
... (17, "TOTTENHAM"),
... (18, "NEWS"),
... (19, "TRANSFER"),
... (20, "WINDOW"),
... (21, "UPDATE"),
... (22, "Carabao"),
... (23, "Cup"),
... (24, "Win"),
... (25, "Final."),
... (26, "손흥민"),
... (27, "Son"),
... (28, "Contract"),
... ]
>>>
>>> print([(index, item) for (index, item) in s if not item.isascii()])
[(9, '손흥민'), (26, '손흥민')]
推荐阅读
- php - 在 Wordpress 的特定页面上按字母顺序对帖子进行排序
- java - 无法添加依赖:无法解决:androidx.lifecycle:lifecycle-extensions:2.2.0-rc2
- python - 如何在 Azure Databricks 笔记本中调试长时间运行的 python 命令?
- sqlite - SQLite3 不能删除行,没有这样的列错误
- python - 在 python 中创建矩阵并搜索值
- flutter - Flutter:无法确定任务':shared_preferences:compileDebugAidl'的依赖关系
- sql - 在 SQL 中输出具有最大字段的行
- julia - 如何从 splatted kwargs 字段中检索关键字参数?
- javascript - 比使用 routeguard 更安全地保护有角的路线?
- linux - 如何使用`waitpid`等待Rust中的进程?