python - 有没有办法分析一个文本文件来检查这个标准
问题描述
我需要创建一个程序来分析文件中的一段文本,然后计算:
- 多少字
- 一个词的平均长度
- 每个单词出现多少次
- 字母表中的每个字母开头有多少个单词
到目前为止,我已经成功完成了前两个要点(如下所示),
fileName = open(input('Please enter the full name of the file: '), 'r')
w = [len(word) for line in fileName for word in line.rstrip().split(" ")]
total_w = len(w)
avg_w = sum(w) / total_w
print('The total number of words in this file is:', total_w)
print('The average length of the words in this file is:', avg_w)
解决方案
collections.Counter
使这相对简单。我re.findall(r'[\w]+', data)
用来查找单词(单词是带有字母、下划线和数字的东西)。根据需要进行调整。
import re
from collections import Counter
fn = input('Please enter the full name of the file: ')
with open(fn, 'r') as f:
words = Counter(re.findall(r'[\w]+', f.read()))
# use words = Counter(f.read().split()) if everything split by spaces
# adjust regular expression depending on whether you want or don't want
# stuff like numbers to be counted as "words"
print('Total number of words:', sum(words.values()))
# this is weighted by word occurrence, not sure whether this is correct
print('Average length of words:',
sum(len(w) * o for w, o in words.items()) / sum(words.values()))
print('Word occurrence:', words)
# this only shows letters that actually occur. If you need all letters of
# the alphabet, you have to add the rest
print('Start letter occurrence', Counter(w[0] for w in words.elements()))
推荐阅读
- apache - 重写规则可见性可能吗?
- ios - 解雇键盘 - 无法识别的选择器
- php - 尝试通过按钮发布
- javascript - AmCharts 3 - 首次运行时删除侦听器(自身内部的引用函数)
- python - 程序在创建之前检查文件是否存在
- dropwizard - 用千分尺测量事件发生率
- javascript - setInterval 和 clearInterval 不起作用
- reactjs - 如何在正文 html.js 的头部和末尾添加脚本标签?
- java - Docker compose以状态1退出,spring boot在日志中启动并立即结束,但无法确定失败原因
- assembly - 标志标志不会将值更改为“1”程序集