首页 > 解决方案 > 通过打开的 url 请求

问题描述

我对所有关于链接 url 请求的帖子感到非常困惑,我只能自己修复它。我正在尝试从网页中获取一些信息,并进一步打开一个新的“a href”,其中存储了我想要的更多信息。

    from bs4 import BeautifulSoup
import requests
from csv import reader, writer, DictWriter, DictReader

source = requests.get("http://www.bda-ieo.it/test/Group.aspx?Lan=Ita")
soup = BeautifulSoup(source.text, "html.parser")


titolo_sezione = ""
table_row = ""
with open("genere.txt", "w", newline="") as txt_file:
    headers = ["GRUPPO MERCEOLOGICO", "CODICE MERCEOLOGICO", "ALIMENTO"]
    csv_writer = DictWriter(txt_file, fieldnames=headers, delimiter=';')
    csv_writer.writeheader()

for table_row in soup.find("table", id="tblResult").find_all("tr"):
    className = ""
    if table_row.get("class"):
        className = table_row.get("class").pop()

        if className == "testobold":
            titolo_sezione = table_row.text

        if className == "testonormale":
            for cds in table_row.find_all("td"):
                url = cds.get("a")

                urls = requests.get("http://www.bda-ieo.it/test/Groupfood.aspx?Lan=Ita + url")
                dage = BeautifulSoup(urls.text, "html.parser")


                alimenti = ""
                for alimenti in dage:
                    id_alimento, destra = alimenti.find_all("td")
                    codice = id_alimento.text
                    nome = destra.text
                    href = destra.a.get("href")

                print(f'{titolo_sezione}; {id_alimento.text}; {nome.text}')

变量 urls 不会打开任何其他页面。有人可以帮我说清楚吗?我坚持这一点。

谢谢弥撒

标签: urlbeautifulsouppython-requests

解决方案


您需要重新处理其中的一些逻辑,并阅读一些有关字符串格式的信息。我记下了我在哪里进行了更改,我不确定您到底在寻找什么作为输出,但这可能会让您继续前进。

from bs4 import BeautifulSoup
import requests
from csv import reader, writer, DictWriter, DictReader

source = requests.get("http://www.bda-ieo.it/test/Group.aspx?Lan=Ita")
soup = BeautifulSoup(source.text, "html.parser")


titolo_sezione = ""
table_row = ""
with open("c:/test/genere.txt", "w", newline="") as txt_file:
    headers = ["GRUPPO MERCEOLOGICO", "CODICE MERCEOLOGICO", "ALIMENTO"]
    csv_writer = DictWriter(txt_file, fieldnames=headers, delimiter=';')
    csv_writer.writeheader()

for table_row in soup.find("table", id="tblResult").find_all("tr"):
    className = ""
    if table_row.get("class"):
        className = table_row.get("class").pop()

        if className == "testobold":
            titolo_sezione = table_row.text

        if className == "testonormale":
            for cds in table_row.find_all("a", href=True): #<-- the hrefs are in the <a> tags within the <td> tags. So you need to find <a> tags that have href
                url = cds['href'] #<--- get the href

                urls = requests.get("http://www.bda-ieo.it/test/%s" %url) #<--- use that stored string to put into the new url you'll be using
                dage = BeautifulSoup(urls.text, "html.parser") #<-- create BeautifulSoup object with that response
                dageTbl = dage.find("table", id="tblResult") #<--- find the table in this html now 
                if dageTbl:   #<--- if there is that table
                    for alimenti in dageTbl.find_all('tr', {'class':'testonormale'}): #<--- find the rows with the specific class
                        id_alimento, destra = alimenti.find_all("td") 
                        codice = id_alimento.text
                        nome = destra.text.strip() #<--- added strip() to remove whitespace
                        href = destra.a.get("href")

                        print(f'{titolo_sezione}; {codice}; {nome}') #<--- fixed string formatting here too

输出:

PATATE; 381; PATATE
PATATE; 50399; PATATE DOLCI
PATATE; 380; PATATE NOVELLE
PATATE; 3002; PATATE, FECOLA
PATATE; 100219; PATATE, POLVERE ISTANTANEA
PATATE; 382; PATATINE IN SACCHETTO
PATATE; 18; TAPIOCA
VEGETALI; 303; ASPARAGI DI BOSCO
VEGETALI; 304; ASPARAGI DI CAMPO
VEGETALI; 305; ASPARAGI DI SERRA
VEGETALI; 700484; ASPARAGI IN SCATOLA
VEGETALI; 8035; GERMOGLI DI ERBA MEDICA
...

推荐阅读