python - 在另一个文件中使用程序会给出不同的输出
问题描述
我的代码有一个相当独特的问题,我以前没有遇到过,可以使用一些指导。
这里尝试做一个简短的解释:
基本上,我有一个程序,它有许多与一个主要功能相关的功能。它从发送给它的文件中获取数据,并根据许多因素给出输出。在文件本身中运行这个函数会给出正确的结果,但是,如果我导入这个函数并在 main.py 中运行它,它会给出非常非常不正确的输出。
我将尽我所能在这篇文章中展示最少的代码,所以这里是GitHub。请使用它来进一步参考和了解正在发生的事情。我不知道有任何网站可用于链接和运行我的代码用于这些目的。
sentiment_analysis.py
是包含所有功能的文件。main.py
是利用这一切driver.py
的文件,并且是我的教授提供的用于测试此作业的文件。
基本作业说明(如果不需要回答问题,请跳过):从给出的文件中获取 twitter 数据以及具有相关幸福值的关键字。获取所有数据,分成时区区域(基于给定点值的近似值,而不是实际时区),然后返回有关文件中数据的基本信息。IE。每个地区的每个时区的平均幸福度、关键字推文总数和推文总数。
运行sentiment_analysis
当前将根据大量测试提供正确的输出。
运行main
并将driver
给出不正确的输出。前任。tweets2 总共有 25 行 twitter 数据,但使用驱动程序将返回总共 91 条推文和关键字推文(东部数据,driver.py 中的第 4 个测试场景),而不是该地区预期的总共 15 条推文。
我花了大约 3 个小时测试场景并输出不同的信息来尝试和调试,但没有运气。如果有人知道为什么在不同的文件中调用它会返回不同的输出,那就太好了。
以下是文件中最重要的三个函数,第一个是在另一个文件中调用的函数。
def compute_tweets(tweets, keywords):
try:
with open(tweets, encoding="utf-8", errors="ignore") as f: # opens the file
tweet_list = f.read().splitlines() # reads and splitlines the file. Gets rid of the \n
print(tweet_list)
with open(keywords, encoding="utf-8", errors="ignore") as f:
keyword_dict = {k: int(v) for line in f for k,v in [line.strip().split(',')]}
# instead of opening this file normally i am using dictionary comprehension to turn the entire file into a dictionary
# instead of the standard list which would come from using the readlines() function.
determine_timezone(tweet_list) # this will run the function to split all pieces of the file into region specific ones
eastern = calculations(keyword_dict, eastern_list)
central = calculations(keyword_dict, central_list)
mountain = calculations(keyword_dict, mountain_list)
pacific = calculations(keyword_dict, pacific_list)
return final_calculation(eastern, central, mountain, pacific)
except FileNotFoundError as excpt:
empty_list = []
print(excpt)
print("One or more of the files you entered does not exist.")
return empty_list
# Constants for Timezone Detection
# eastern begin
p1 = [49.189787, -67.444574]
p2 = [24.660845, -67.444574]
# Central begin, eastern end
p3 = [49.189787, -87.518395]
# p4 = [24.660845, -87.518395] - Not needed
# Mountain begin, central end
p5 = [49.189787, -101.998892]
# p6 = [24.660845, -101.998892] - Not needed
# Pacific begin, mountain end
p7 = [49.189787, -115.236428]
# p8 = [24.660845, -115.236428] - Not needed
# pacific end, still pacific
p9 = [49.189787, -125.242264]
# p10 = [24.660845, -125.242264]
def determine_timezone(tweet_list):
for index, tweet in enumerate(tweet_list): # takes in index and tweet data and creates a for loop
long_lat = get_longlat(tweet) # determines the longlat for the tweet that is currently needed to work on
if float(long_lat[0]) <= float(p1[0]) and float(long_lat[0]) >= float(p2[0]):
if float(long_lat[1]) <= float(p1[1]) and float(long_lat[1]) > float(p3[1]):
# this is testing for the eastern region
eastern_list.append(tweet_list[index])
elif float(long_lat[1]) <= float(p3[1]) and float(long_lat[1]) > float(p5[1]):
# testing for the central region
central_list.append(tweet_list[index])
elif float(long_lat[1]) <= float(p5[1]) and float(long_lat[1]) > float(p7[1]):
# testing for mountain region
mountain_list.append(tweet_list[index])
elif float(long_lat[1]) <= float(p7[1]) and float(long_lat[1]) >= float(p9[1]):
# testing for pacific region
pacific_list.append(tweet_list[index])
else:
# if nothing is found, continue to the next element in the tweet data and do nothing
continue
else:
# if nothing is found for the longitude, then also continue
continue
def calculations(keyword_dict, tweet_list):
# - Constants for caclulations and returns
total_tweets = 0
total_keyword_tweets = 0
average_happiness = 0
happiness_sum = 0
for entry in tweet_list: # saying for each piece of the tweet list
word_list = input_splitting(entry) # run through the input splitting for list of words
total_tweets += 1 # add one to total tweets
keyword_happened_counter = 0 # this is used to know if the word list has already had a keyword tweet. Needs to be
# reset to 0 again in this spot.
for word in word_list: # for each word in that word list
for key, value in keyword_dict.items(): # take the key and respective value for each item in the dict
# print("key:", key, "val:", value)
if word == key: # if the word we got is the same as the key value
if keyword_happened_counter == 0: # and the keyword counter hasnt gone up
total_keyword_tweets += 1 # add one to the total keyword tweets
keyword_happened_counter += 1 # then add one to keyword happened counter
happiness_sum += value # and, if we have a keyword tweet, no matter what add to the happiness sum
else:
continue # if we don't have a word == key, continue iterating.
if total_keyword_tweets != 0:
average_happiness = happiness_sum / total_keyword_tweets # calculation for the average happiness value
else:
average_happiness = 0
return [average_happiness, total_keyword_tweets, total_tweets] # returning a tuple of info in proper order
我为文字和代码的墙壁道歉。我是在这里发帖的新手,我正在尝试包含所有相关信息……如果有人知道除了使用 github 和代码块之外还有更好的方法,请告诉我。
提前致谢。
解决方案
推荐阅读
- javascript - 如何通过文本内容找到 DOM 元素?
- asp.net - 如何将管理员用户添加到 Web 上托管的 SQL Server 数据库(远程服务器)
- javascript - 构建 VueJS 存储插件
- c# - 根据匹配变量的文本查找标签名称
- css - 当汉堡菜单可见时,如何将导航栏项目移动到第一位?
- java - 如何使用 Mockito 验证方法参数?
- vba - 代码无法在 Access 2017 中将值从一种形式传递到另一种形式
- php - 用 Laravel 比较 2 个表格行
- java - Akka 心跳延迟
- arrays - 在 O(log(n)) 时间内查找数组中缺失的数字