python - 脚本正在生成重复输出
问题描述
我有一个抓取机器人,目前正在检查两个网站上按钮的更改。所有代码都可以在 github上找到,但这里是重点。
这是打印输出的函数:
def notify_difference(card, original_text):
print("#######################################")
print(f" {card.get_model()} STOCK ALERT ")
print(f" {time.ctime()}")
print(f"Button has changed from {original_text} to {card.get_button_text()} for {card.get_name()}.")
if "newegg" in card.get_url():
print(
f"Add it to your cart: https://secure.newegg.com/Shopping/AddToCart.aspx?ItemList={card.get_item_id()}&Submit=ADD&target=NEWEGGCART\n\n")
print(f"Current price: {card.get_price()}.")
print(f"Please visit {card.get_url()} for more information.")
print("#######################################")
print("")
print("")
这是生成请求任务的函数:
async def get_stock():
# Get the current time and append to the end of the url just to add some minor difference
# between scrapes.
t = int(round(time.time() * 1000))
urls = {
"..."
}
s = AsyncHTMLSession()
tasks = (parse_url(s, url.split("-=")[1], url.split("-=")[0]) for url in urls)
return await asyncio.gather(*tasks)
这是获取 url 并调用解析 html 的类的代码:
async def parse_url(s, url, model):
# Narrow HTML search down using HTML class selectors.
r = await s.get(url)
cards = r.html.find('.right-column')
for item in cards:
card = Card.create(item, model)
if card is not None:
card_id = card.get_item_id()
if card_id in card_set.keys():
if card_set[card_id].get_button_text() != card.get_button_text():
original_text = card_set[card_id].get_button_text()
if card.is_in_stock():
notify_difference(card, original_text)
card_set[card_id] = card
这一切都从__main__
这里开始:
if __name__ == '__main__':
print(f"{time.ctime()} ::: Checking Stock...")
Util.clear_card_shelf()
while True:
card_set = Util.get_card_dict()
try:
asyncio.run(get_stock())
except Exception as e:
if "SSLError" in type(e).__name__:
# SSL Error. Wait 8-15 seconds and try again.
print(f"{time.ctime()} ::: {type(e).__name__} error. Retrying in 8-15 seconds...")
else:
print(f"{type(e).__name__} Exception: {str(e)}")
Util.set_card_shelf(card_set)
time.sleep(random.randint(8, 15))
现在查看此示例输出。注意时间戳。这些重复项出现在循环的后续运行中:
#######################################
3070 STOCK ALERT
Wed Nov 18 11:38:10 2020
Button has changed from Sold Out to Add to cart for MSI GeForce RTX 3070 DirectX 12 RTX 3070 VENTUS 3X OC 8GB 256-Bit GDDR6 PCI Express 4.0 HDCP Ready Video Card.
Add it to your cart: https://secure.newegg.com/Shopping/AddToCart.aspx?ItemList=N82E16814137601&Submit=ADD&target=NEWEGGCART
Current price: $549.99.
Please visit https://www.newegg.com/msi-geforce-rtx-3070-rtx-3070-ventus-3x-oc/p/N82E16814137601 for more information.
#######################################
#######################################
3070 STOCK ALERT
Wed Nov 18 11:40:12 2020
Button has changed from Sold Out to Add to cart for MSI GeForce RTX 3070 DirectX 12 RTX 3070 VENTUS 3X OC 8GB 256-Bit GDDR6 PCI Express 4.0 HDCP Ready Video Card.
Add it to your cart: https://secure.newegg.com/Shopping/AddToCart.aspx?ItemList=N82E16814137601&Submit=ADD&target=NEWEGGCART
Current price: $549.99.
Please visit https://www.newegg.com/msi-geforce-rtx-3070-rtx-3070-ventus-3x-oc/p/N82E16814137601 for more information.
#######################################
#######################################
3070 STOCK ALERT
Wed Nov 18 11:40:50 2020
Button has changed from Sold Out to Add to cart for MSI GeForce RTX 3070 DirectX 12 RTX 3070 VENTUS 3X OC 8GB 256-Bit GDDR6 PCI Express 4.0 HDCP Ready Video Card.
Add it to your cart: https://secure.newegg.com/Shopping/AddToCart.aspx?ItemList=N82E16814137601&Submit=ADD&target=NEWEGGCART
Current price: $549.99.
Please visit https://www.newegg.com/msi-geforce-rtx-3070-rtx-3070-ventus-3x-oc/p/N82E16814137601 for more information.
#######################################
我一生都无法弄清楚为什么输出重复。每次都会发生这种情况。这是并行请求的问题吗?还是有问题shelve
?还是完全不同的东西?
帮助!
附加代码
以下是上面引用的一些实用程序函数:
def get_card_dict():
s = shelve.open('cards')
stocks = s.items()
stock_dict = convert_tuple_to_dict(stocks)
s.close()
return stock_dict
def set_card_shelf(dic):
s = shelve.open('cards')
s.update(dic)
s.close()
def clear_card_shelf():
if path.exists(f"cards.dat"):
card_dat_list = glob.glob(f"cards.*")
for card_dat in card_dat_list:
remove(card_dat)
def convert_tuple_to_dict(tup):
dic = {}
for a, b in tup:
dic.setdefault(a, b)
return dic
解决方案
推荐阅读
- javascript - 将媒体放入 Node.js 上的 Kinesis Video Streams
- html - 使用 CSS 将列表显示为下拉列表
- android - 单个 ViewModel 中的多个 LiveData 对象
- java - Jframe 中的错误:无法添加或更新子行:外键约束失败
- cpython - Python C 扩展是否有可能有效地获取呼叫站点信息?
- sql-server - 使用内部联接在两个表之间进行数据透视
- spring-boot - 使用kafka启动服务期间的java.lang.NoSuchMethodError
- python - 值错误:列名:
input_tensor dtype 必须是字符串或整数。数据类型: - c# - ASP.NET Core 3.1 + Azure AD 身份验证 - 如何披露“隐藏的电子邮件”
- python - BeautifulSoup.find_all() 不打印所有结果