首页 > 解决方案 > 在另一个文件中使用程序会给出不同的输出

问题描述

我的代码有一个相当独特的问题,我以前没有遇到过,可以使用一些指导。

这里尝试做一个简短的解释:

基本上,我有一个程序,它有许多与一个主要功能相关的功能。它从发送给它的文件中获取数据,并根据许多因素给出输出。在文件本身中运行这个函数会给出正确的结果,但是,如果我导入这个函数并在 main.py 中运行它,它会给出非常非常不正确的输出。

我将尽我所能在这篇文章中展示最少的代码,所以这里是GitHub。请使用它来进一步参考和了解正在发生的事情。我不知道有任何网站可用于链接和运行我的代码用于这些目的。

sentiment_analysis.py是包含所有功能的文件。main.py是利用这一切driver.py的文件,并且是我的教授提供的用于测试此作业的文件。

基本作业说明(如果不需要回答问题,请跳过):从给出的文件中获取 twitter 数据以及具有相关幸福值的关键字。获取所有数据,分成时区区域(基于给定点值的近似值,而不是实际时区),然后返回有关文件中数据的基本信息。IE。每个地区的每个时区的平均幸福度、关键字推文总数和推文总数。

运行sentiment_analysis当前将根据大量测试提供正确的输出。
运行main并将driver给出不正确的输出。前任。tweets2 总共有 25 行 twitter 数据,但使用驱动程序将返回总共 91 条推文和关键字推文(东部数据,driver.py 中的第 4 个测试场景),而不是该地区预期的总共 15 条推文。

我花了大约 3 个小时测试场景并输出不同的信息来尝试和调试,但没有运气。如果有人知道为什么在不同的文件中调用它会返回不同的输出,那就太好了。

以下是文件中最重要的三个函数,第一个是在另一个文件中调用的函数。

def compute_tweets(tweets, keywords):
    try: 
        with open(tweets, encoding="utf-8", errors="ignore") as f: # opens the file 
            tweet_list = f.read().splitlines() # reads and splitlines the file. Gets rid of the \n
            print(tweet_list)
        with open(keywords, encoding="utf-8", errors="ignore") as f:
            keyword_dict = {k: int(v) for line in f for k,v in [line.strip().split(',')]}
        # instead of opening this file normally i am using dictionary comprehension to turn the entire file into a dictionary
        # instead of the standard list which would come from using the readlines() function.
        
        determine_timezone(tweet_list) # this will run the function to split all pieces of the file into region specific ones
        eastern = calculations(keyword_dict, eastern_list)
        central = calculations(keyword_dict, central_list)
        mountain = calculations(keyword_dict, mountain_list)
        pacific = calculations(keyword_dict, pacific_list)

        return final_calculation(eastern, central, mountain, pacific)
        

    except FileNotFoundError as excpt:
        empty_list = [] 
        print(excpt)
        print("One or more of the files you entered does not exist.")
        return empty_list
# Constants for Timezone Detection
    # eastern begin
p1 = [49.189787, -67.444574]
p2 = [24.660845, -67.444574]
    # Central begin, eastern end
p3 = [49.189787, -87.518395]
# p4 = [24.660845, -87.518395]      - Not needed
    # Mountain begin, central end
p5 = [49.189787, -101.998892]
# p6 = [24.660845, -101.998892]     - Not needed
    # Pacific begin, mountain end
p7 = [49.189787, -115.236428]
# p8 = [24.660845, -115.236428]     - Not needed
    # pacific end, still pacific
p9 = [49.189787, -125.242264]
# p10 = [24.660845, -125.242264]

def determine_timezone(tweet_list):
    for index, tweet in enumerate(tweet_list): # takes in index and tweet data and creates a for loop
        long_lat = get_longlat(tweet) # determines the longlat for the tweet that is currently needed to work on
        if float(long_lat[0]) <= float(p1[0]) and float(long_lat[0]) >= float(p2[0]):
            if float(long_lat[1]) <= float(p1[1]) and float(long_lat[1]) > float(p3[1]):
                # this is testing for the eastern region
                eastern_list.append(tweet_list[index])
            elif float(long_lat[1]) <= float(p3[1]) and float(long_lat[1]) > float(p5[1]):
                # testing for the central region
                central_list.append(tweet_list[index])
            elif float(long_lat[1]) <= float(p5[1]) and float(long_lat[1]) > float(p7[1]):
                # testing for mountain region
                mountain_list.append(tweet_list[index])
            elif float(long_lat[1]) <= float(p7[1]) and float(long_lat[1]) >= float(p9[1]):
                # testing for pacific region
                pacific_list.append(tweet_list[index])
            else:
                # if nothing is found, continue to the next element in the tweet data and do nothing
                continue
        else:
            # if nothing is found for the longitude, then also continue
            continue
def calculations(keyword_dict, tweet_list):
    # - Constants for caclulations and returns
    total_tweets = 0
    total_keyword_tweets = 0
    average_happiness = 0
    happiness_sum = 0

    for entry in tweet_list: # saying for each piece of the tweet list
        word_list = input_splitting(entry) # run through the input splitting for list of words
        total_tweets += 1 # add one to total tweets
        keyword_happened_counter = 0 # this is used to know if the word list has already had a keyword tweet. Needs to be
        # reset to 0 again in this spot.
        for word in word_list:  # for each word in that word list 
            for key, value in keyword_dict.items(): # take the key and respective value for each item in the dict
                # print("key:", key, "val:", value)
                if word == key: # if the word we got is the same as the key value
                    if keyword_happened_counter == 0: # and the keyword counter hasnt gone up
                        total_keyword_tweets += 1 # add one to the total keyword tweets
                        keyword_happened_counter += 1 # then add one to keyword happened counter
                    happiness_sum += value # and, if we have a keyword tweet, no matter what add to the happiness sum
                else:
                    continue # if we don't have a word == key, continue iterating.
    if total_keyword_tweets != 0:
        average_happiness = happiness_sum / total_keyword_tweets # calculation for the average happiness value
    else:
        average_happiness = 0
    return [average_happiness, total_keyword_tweets, total_tweets] # returning a tuple of info in proper order

我为文字和代码的墙壁道歉。我是在这里发帖的新手,我正在尝试包含所有相关信息……如果有人知道除了使用 github 和代码块之外还有更好的方法,请告诉我。

提前致谢。

标签: pythonpython-3.x

解决方案


推荐阅读