python - Python nested for loop runs only once
问题描述
I'm having problem with nested for loop (for doc in query) that is ran only once. It's inside for item in news_items which I have verified iterates 10 times, and the for doc in query loop should iterate 9 times. When I'm printing doc, it prints 9 documents, however as I'm trying to make if / else check on the document's content, it only happens to run one time. (I would expect 9 x 10 outputs since it's checking item from parent, to doc in query but all I get is 9 outputs). I've tried to look on stack but nothing I found seems to be relevant, from other programing languages I work with I don't see why this wouldn't work but maybe I'm missing something since I'm fairly new to Python (1 week).
def scrape(url):
# GET DATE AT THE TIME OF CRAWL START
today = date.today()
d1 = today.strftime("%d/%m/%Y")
# D2 is used for query only
d2 = today.strftime("%Y%m%d")
# LOAD URL IN DRIVER
driver.get(url)
try:
news_container = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "FlashNews-Box-Root"))
)
# array of items
news_items = news_container.find_elements_by_class_name("FlashNews-Box-Item")
refresher_ref = db.collection(u'news').document('sources').collection('refresher_news')
# query for last article
query = refresher_ref.order_by(u'article_timestamp', direction=firestore.Query.DESCENDING).limit(10).stream()
for item in news_items:
print("News items found: " + str(len(news_items)))
try:
# image is optional so we need to try it
try:
item_image = item.find_element_by_class_name("FlashNews-Box-ItemImage").find_element_by_tag_name(
"img").get_attribute("src")
except Exception as e:
item_image = "unavailable"
# time will be added to the same day as when this was ran, since this will run often and compare
# article texts, we won't have issue with wrong dates
item_time = item.find_element_by_class_name("FlashNews-Box-ItemTime").text + " " + d1
item_time_query_temp = item.find_element_by_class_name("FlashNews-Box-ItemTime").text.replace(":", "")
# normalize timestamp for sorting
if len(item_time_query_temp) == 3:
item_time_query_temp = "0" + item_time_query_temp
item_time_query = d2 + item_time_query_temp
item_text = item.find_element_by_class_name("FlashNews-Box-ItemText").text
item_redirect = item.find_element_by_class_name("FlashNews-Box-ItemText").find_element_by_tag_name(
"a").get_attribute("href")
result = {"article_time": item_time, "article_url": item_redirect, "article_image": item_image,
"article_text": item_text, "article_timestamp": item_time_query}
# print(result)
# save data to firestore - check for last item in firestore, then add this article
is_new = True
print("Printing 10x")
# THIS EXECUTES ONLY ONCE?
for doc in query:
# print(str(len(query)))
current_doc = doc.to_dict()
# print(current_doc)
# print(current_doc)
# print("Iteration: " + current_doc['article_text'])
# print("Old: " + current_doc["article_text"] + " New: " + item_text)
if current_doc['article_text'] == item_text:
print("Match")
# print(current_doc['article_text'] + item_text)
# print("Old: " + current_doc['article_text'] + " New: " + item_text)
else:
print("Mismatch")
# print(current_doc['article_text'] + item_text)
# print("Skipping article as the text exists in last 10")
# else:
# print("Old: " + current_doc['article_text'] + " New: " + item_text)
# print(str(is_new))
# if is_new:
# refresher_ref.add(result)
# print("Adding document")
except Exception as e:
print(e)
except Exception as e:
# HANDLE ERRORS
print(e)
print("Completed running.")
# quit driver at the end of function run
driver.quit()
解决方案
query
isn't a list, but some other iterable type that you can only consume once (similar to a generator
). In order to use it multiple times in the outer loop, you'll need to create a list to hold the contents in memory. For example,
# query for last article
query = refresher_ref.order_by(u'article_timestamp', direction=firestore.Query.DESCENDING).limit(10).stream()
query = list(query)
for item in news_items:
...
推荐阅读
- python - 为什么 Spark 会为一个操作创建多个作业?
- python - 在python pyqt5中单击按钮命令后如何禁用它?
- c# - 无法在 API 控制器的“HttpGet”方法中使用 Web 套接字发送数据
- rust - 为条件跳过、过滤器等重用 iter 变量
- asp.net-mvc - 在视图中显示产品和相关图像,使用存储库模式 mvc 5
- python - 根据日期时间索引组合多个 dfs
- django - 查询间接相关的模型实例
- c# - C# 从 c# 类创建 proto 文件
- node.js - 'string' 类型的参数不可分配给'FormControl' 类型的参数
- swift - 移动选定的文件