首页 > 解决方案 > Python - 拆分成字母而不是单词

问题描述

我想做的是从文件夹中获取一些文本,将其拆分为单词,计算单词,将其排序为列表并将其写入文件。一切都很好,除了它不是拆分成单词,而是将文本拆分成字母并计算它们。似乎很容易解决,但我不知道我在做什么......提前谢谢

import os
import os.path
import string

prefix_path = ("C:/Users/User/Desktop/Python/sampleTexts")
files = [f for f in os.listdir(prefix_path) if f.endswith(".txt")]
files.sort()
files = [os.path.join(prefix_path,name) for name in files]

textOut = open("texthere.txt", "w", encoding="utf-8")

def readText(file):
    for i in file:
        with open(i, "r", encoding= "utf-8") as f:
            textin = f.read()
    first_dict= dict()      
    
    for i in textin:
        i = i.strip()
        i = i.lower()
        i = i.translate(i.maketrans("","", string.punctuation)) 
        words = i.split()

        for word in words:
            if word in first_dict:
                first_dict[word] = first_dict[word] + 1
            else:
                first_dict[word] = 1

    sorted_dict = sorted(first_dict.items(), key= lambda x: x[1], reverse=True)
    for key, val in sorted_dict:
        print(key," :", val)

    for key, val in sorted_dict:
        textOut.write(key + " :" + str(val) + "\n")
    textOut.close()

readText(files)

标签: pythonpython-3.xsplit

解决方案


f.read() 你会给出整个文本文件的字符串,这样当你迭代它时,for i in textin你就是在迭代每个字符。你可能想要的是

for line in f.readlines():
    for word in line.split():
        blah

推荐阅读