python - 在特定字符之后获取令牌的一部分
问题描述
我想在文本文件中获取令牌的一部分。到目前为止,我编写了以下代码:
from collections import Counter
import re
freq_dist = set()
words = re.findall(r'[\w+]+', open('output.txt').read())
freq_dist = Counter(words).most_common(10)
print(freq_dist)
我的 output.txt 如下:
Türkiye+Noun ,+Punc terörizm+Noun+Gen ve+Conj kitle+Noun imha+Noun silah+Noun+A3pl+P3sg+Gen küresel+Adj düzey+Noun+Loc olus+Verb+Caus+PastPart+P3sg tehdit+Noun+Gen boyut+Noun+P3sg karsi+Adj+P3sg+Loc ,+Punc tüm+Det ülke+Noun+A3pl+Gen yay+Verb+Pass+Inf2+Gen önle+Verb+Pass+Inf2+P3sg hedef+Noun+A3pl+P3sg+Acc paylas+Verb+PastPart+P3pl ,+Punc daha+Noun güven+Noun+With ve+Conj istikrar+Noun+With bir+Num dünya+Noun düzen+Noun+P3sg için+PostpPCGen birlik+Noun+Loc çaba+Noun göster+Verb+PastPart+P3pl bir+Num asama+Noun+Dat gel+Verb+Pass+Inf2+P3sg+Acc samimi+Adj ol+Verb+ByDoingSo arzula+Verb+Prog2+Cop .+Punc
Ab+Noun ile+PostpPCNom gümrük+Noun Alan+Noun+P3sg+Loc+Rel kurumsal+Adj iliski+Noun+A3pl
club+Noun toplanti+Noun+A3pl+P3sg
Türkiye+Noun+Gen -+Punc At+Noun gümrük+Noun isbirlik+Noun+P3sg komite+Noun+P3sg ,+Punc Ankara+Noun Anlasma+Noun+P3sg+Gen 6+Num madde+Noun+P3sg uyar+Verb+When ortaklik+Noun rejim+Noun+P3sg+Gen uygula+Verb+Pass+Inf2+P3sg+Acc ve+Conj gelis+Verb+Inf2+P3sg+Acc sagla+Verb+Inf1 üzere+PostpPCNom ortaklik+Noun Konsey+Noun+P3sg+Gen 2+Num /+Punc 69+Num sayili+Adj karar+Noun+P3sg ile+Conj teknik+Noun komite+Noun mahiyet+Noun+P3sg+Loc kur+Verb+Pass+Narr+Cop .+Punc
nispi+Adj
nisbi+Adj
görece+Adj+With
izafi+Adj
obur+Adj
我想获取第一个 + 号之后的部分,并将它们以降序形式保存在列表中。Forexaple,在 Türkiye+Noun 我想获得 +Noun part 或在 terörizm+Noun+Gen 我想获得 Noun+gen 或在 isbirlik+Noun+P3sg 我想获得 Noun+P3sg 之后我想列出它们它们按降序计数,例如 +Noun 或 +Noun+gen 在文本中出现的次数。
解决方案
如何在空格上拆分您的输入?
from collections import Counter
words = [word.split('+', 1)[1].strip() for word in open('output.txt').read().split(' ') if len(word)]
freq_dist = Counter(words).most_common(10)
print(freq_dist)
这会给你:
[('Noun', 16), ('Punc', 8), ('Adj', 8), ('Noun+P3sg', 6), ('Num', 5), ('Conj', 4), ('Noun+Gen', 3), ('Noun+P3sg+Gen', 3), ('Noun+Loc', 2), ('Verb+PastPart+P3pl', 2)]
推荐阅读
- rust - 有没有办法在没有标准库的情况下在 rust 中加入 char 和 &str
- django-rest-framework - 如何为 drf 的未指定键创建序列化程序(用于 drf-yasg)
- c - 为什么地址读取在第二次读取时不返回任何内容?(使用地图)
- python - 需要帮助从 pandas 数据框中过滤前 3 个计数
- architecture - 自定义 x86_64 包到 armhfp
- javascript - 在 vuejs 中显示降价文件(通过后端链接提供降价)
- python - Python selenium 多线程循环
- excel - 使用 ARRAY 函数返回基于新月份的数值。当 false 希望值来自基于日期的另一列数字时
- python - 如何通过多个发布请求正确启动我的蜘蛛以从其响应中抓取内容?
- html - 如何通过引导程序在项目显示弹性盒设计下删除多余的空间