首页 > 解决方案 > Python Scraping 以睡眠退出并计数错误

问题描述

我想每 10 分钟运行一次以下程序,但在 10 分钟后,程序以“代码 = 0”退出。我还想为除 writeheader 之外的每一行添加一个 ID,这是有效的,但在其他条目上它会计算所有行。

我如何让这个程序每 10 分钟运行一次?我如何只计算除 writeheader 行之外的行?

我对 Python 很陌生,很可能这不是市长问题。

import time
import smtplib
import lxml
import requests
from bs4 import BeautifulSoup
import csv
import re
from datetime import datetime
import os.path


URL = 'https://stockx.com/de-de/supreme-jostens-world-famous-champion-ring-gold'

INTERVAL = 10

headers = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36'}

page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, "lxml")
search_for_class = soup.find_all(
    'div', class_="sale-value")
print(search_for_class)

x = str(soup.find('div', class_="sale-value"))
preis = str(re.sub("<.*?>", "", x))

print(preis)

datum = datetime.now().strftime('%Y-%m-%d')
uhrzeit = datetime.now().strftime('%H:%M:%S')
print(datum)
print(uhrzeit)

file_exists = os.path.isfile("test_file.csv")
if not file_exists:
    open('test_file.csv', 'w+')

with open('test_file.csv', 'r+') as csv_file:
    fieldnames = ['ID', 'Datum', 'Uhrzeit', 'Preis']
    writer = csv.DictWriter(
        csv_file, fieldnames=fieldnames, lineterminator='\n')
    entry = len(csv_file.readlines()) + 1

    # csv_file.write("\n")
    if not file_exists:
        writer.writeheader()
    writer.writerow({
        'ID': entry,
        'Datum': datum,
        'Uhrzeit': uhrzeit,
        'Preis': preis
    })
    csv_file.close()

time.sleep(INTERVAL * 60)

标签: pythonpython-3.x

解决方案


要使其每 10 分钟运行一次,请在应该重复运行的所有内容周围放置一个循环。

为了使计数正确,您可以在第一次创建文件时ID在块中写入标题行。if not file_exists:这样entry计数将始终包括标题行。

作为简化,您可以在循环之前打开文件一次。entry然后每次通过循环递增。用于csv_file.flush()确保将新行写入文件。

import time
import smtplib
import lxml
import requests
from bs4 import BeautifulSoup
import csv
import re
from datetime import datetime
import os.path

URL = 'https://stockx.com/de-de/supreme-jostens-world-famous-champion-ring-gold'
INTERVAL = 10

headers = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36'}

file_exists = os.path.isfile("test_file.csv")
if not file_exists:
    with open('test_file.csv', 'w+') as f:
        f.write('ID,Datum,Uhrzeit,Preis\n')

with open('test_file.csv', 'r+') as csv_file:
    fieldnames = ['ID', 'Datum', 'Uhrzeit', 'Preis']
    writer = csv.DictWriter(
        csv_file, fieldnames=fieldnames, lineterminator='\n')
    entry = len(csv_file.readlines())

    while True:
        page = requests.get(URL, headers=headers)
        soup = BeautifulSoup(page.content, "lxml")
        search_for_class = soup.find_all(
            'div', class_="sale-value")
        print(search_for_class)

        x = str(soup.find('div', class_="sale-value"))
        preis = str(re.sub("<.*?>", "", x))

        print(preis)

        datum = datetime.now().strftime('%Y-%m-%d')
        uhrzeit = datetime.now().strftime('%H:%M:%S')
        print(datum)
        print(uhrzeit)

        writer.writerow({
            'ID': entry,
            'Datum': datum,
            'Uhrzeit': uhrzeit,
            'Preis': preis
        })
        csv_file.flush()
        entry += 1

        time.sleep(INTERVAL * 60)

推荐阅读