首页 > 解决方案 > 如何确定从 txt 文件中读取项目的频率并打印项目名称以及项目出现的次数?

问题描述

我正在编写一个小程序,它从一个文本文件中读取,该文件包含我们在杂货店购买的许多物品。这个程序是一个更大的应用程序的一部分,我在其中集成了 Python 和 C++,但为了简单起见,我隔离了应用程序的这一部分,因为它似乎是问题所在。

问题是文本文件中的第一项(Spinach)在 txt 文件中存在 5 次,但程序会打印一些垃圾数据,然后是 Spinach,然后是 1 作为表示 Spinach 这个词在文件中存在的次数的数字。但它应该是 5。在项目列表中,您还可以看到再次打印了 Spinach 一词,但这次数字 4 表示它在 txt 文件中存在的次数。但是 Spinach 这个词应该只打印一次,数字 5 代表它在 txt 文件中存在的时间。例如,Spinash - 5. 查看下图。

输出截图,

我不确定问题是否出在 freq = {} 字典中。请,有人可以帮我找出导致问题的原因吗?请具体一点,因为我刚刚学习 python。请查看以下 .py 文件的代码,并查看 .txt 文件中的项目列表。

预先感谢您的帮助。

应用程序.py



    
def wordFrequency(item):                                        # This function gets called printed out by WordFrequency , it takes one argument which passes from cpp
    
    count = 0                                                  # this variable is use to count the frequency of the list iitem
   

    with open('items.txt')as myfile:   # opening file
        
        lines = myfile.readlines()                              #reading all the lines of the file
        
        for line in lines:
           
            if(line.strip("\n") == item):                        # removing the \n from the last
                count + 1
                myfile.close()
    return count
    




 # Display only
def displayWordFrequency():
    
    with open('items.txt')as myfile: # opening file
        lines = myfile.readlines()
        
        freq ={}                                              # using dictionary to store the value of the list
        for line in lines:
            
            if(line.strip("\n") in freq):                    # put the condition if the value is present aleady then it will increment it otherwise it will put one for it
                freq[line.strip("\n")] += 1                 #strip to remove \n which passes as an argument
            else:
                freq[line.strip("\n")] = 1

    
        for key , value in freq.items():                    # loops through dictionary and prints the values
            

            print(f"{key} - {value}")                      # Key is the string and the value is the integer
            

            

        myfile.close()



print(displayWordFrequency())

物品.txt

Spinach
Radishes
Broccoli
Peas
Cranberries
Broccoli
Potatoes
Cucumbers
Radishes
Cranberries
Peaches
Zucchini
Potatoes
Cranberries
Cantaloupe
Beets
Cauliflower
Cranberries
Peas
Zucchini
Peas
Onions
Potatoes
Cauliflower
Spinach
Radishes
Onions
Zucchini
Cranberries
Peaches
Yams
Zucchini
Apples
Cucumbers
Broccoli
Cranberries
Beets
Peas
Cauliflower
Potatoes
Cauliflower
Celery
Cranberries
Limes
Cranberries
Broccoli
Spinach
Broccoli
Garlic
Cauliflower
Pumpkins
Celery
Peas
Potatoes
Yams
Zucchini
Cranberries
Cantaloupe
Zucchini
Pumpkins
Cauliflower
Yams
Pears
Peaches
Apples
Zucchini
Cranberries
Zucchini
Garlic
Broccoli
Garlic
Onions
Spinach
Cucumbers
Cucumbers
Garlic
Spinach
Peaches
Cucumbers
Broccoli
Zucchini
Peas
Celery
Cucumbers
Celery
Yams
Garlic
Cucumbers
Peas
Beets
Yams
Peas
Apples
Peaches
Garlic
Celery
Garlic
Cucumbers
Garlic
Apples
Celery
Zucchini
Cucumbers
Onions

标签: pythondictionary

解决方案


您可以使用字典理解来实现这一点,循环遍历set数据以删除重复项。要保持顺序,您必须回顾原始列表

# see question for full list
s = """Spinach
Radishes
Broccoli
Peas
Cranberries
Broccoli
Potatoes
Cucumbers
...
Celery
Zucchini
Cucumbers
Onions"""

s = s.split('\n') # get the data as list

s_dict = {k: s.count(k) for k in set(s)}
original_indices = sorted(map(s.index, set(s)))

print('\n'.join(' - '.join((s[i], str(s_dict[s[i]]))) for i in original_indices))

编辑

如果您正在使用字典并且顺序很重要,那么最好使用标准库集合中的实现。

import collections

s = # defined as above

d = collections.OrderedDict()
for i in s:
    if i in d:
        d[i] += 1
    else:
        d[i] = 1

for k, v in d.items():
    print(k, '-', v)

输出

Spinach - 5
Radishes - 3
Broccoli - 7
Peas - 8
Cranberries - 10
Potatoes - 5
Cucumbers - 9
Peaches - 5
Zucchini - 10
Cantaloupe - 2
Beets - 3
Cauliflower - 6
Onions - 4
Yams - 5
Apples - 4
Celery - 6
Limes - 1
Garlic - 8
Pumpkins - 2
Pears - 1

推荐阅读