首页 > 解决方案 > Python:以3列格式显示两个文件并计算相似词的频率

问题描述

我正在尝试创建一个程序,我允许在命令行中指定两个文件,这些文件都应该是 3 列格式(只有第一列包含我需要的频率的单词,因为其他包含其他信息)。我需要获取两个文件之间所有共享单词的频率,并将它们添加到字典中。

这是我到目前为止所拥有的(前 21 行没有显示错误代码,所以我认为我还可以,问题是我尝试进一步分析它们的频率):

import sys
from nltk.corpus import stopwords

count_dicts = []

## get the filenames from the command line
filename1 = sys.argv[1]
filename2 = sys.argv[2]

## open the first file for reading
infile1 = open(filename1, 'r')
## open the second file for reading
infile2 = open(filename2, 'r')

# initialize the counters
line_counter = diff_counter = 0

## for each line in file 1
for line1 in infile1:
    # also read a line from file 2
    line2 = infile2.readline()

#define frequency count function
def Count_Frequency(infile1, infile2):

    #Creating an empty dictionary
    freq1 = {}

    for item in infile1, infile2:
        if (item in freq1):
            freq1[item] += 1
        else:
            #freq1[item] = 1

    #print first frequency dictionary
    for key, value in freq1.items():
        print (key, value)

comb_freq = Counter(infile1, infile2)
print(comb_freq)

标签: pythondictionaryword-frequency

解决方案


推荐阅读