首页 > 解决方案 > Python/Python 中的 Webscraping 问题首先在 CSV 中打印类,然后打印信息

问题描述

我想通过使用 python 进行 Web 抓取从网站获取信息(我现在学习它),但它首先在 CSV 中打印类(我从中获取信息),然后打印我想要的信息。我多次看过 Youtube 视频,我编写了相同的代码,但它并没有像我遇到的问题那样发生。有没有人能帮帮我?

这是 CSV 的图像链接,用于向您展示当我单击“运行”时它的外观 图片

代码:

import requests
from bs4 import BeautifulSoup
import csv
from itertools import zip_longest

Job_titles = []
Company_names = []
Location_names = []
Job_skills = []
Links = []
result = requests.get("https://wuzzuf.net/search/jobs/?q=python&a=hpb")
src = result.content
soup = BeautifulSoup(src, "lxml")
Job_titles = soup.find_all('h2', {"class":"css-m604qf"})
Company_names = soup.find_all('a', {"class":"css-17s97q8"})
Location_names = soup.find_all('span', {"class":"css-5wys0k"})
Job_skills = soup.find_all("div", {'class':"css-y4udm8"})

for i in range(len(Company_names)):
    Job_titles.append(Job_titles[i].text)
    Company_names.append(Company_names[i].text)
    Location_names.append(Location_names[i].text)
    Job_skills.append(Job_skills[i].text)

file_list = [Job_titles, Company_names, Location_names, Job_skills,]
exported = zip_longest(*file_list)
with open("C:/Users/Saleh saleh/Documents/jobtest.csv", "w") as myfile:
    wr = csv.writer(myfile)
    wr.writerow(["Job titles", "Company names", "Location", "Skills", "Links"])
    wr.writerows(exported)

标签: pythoncsvweb-scraping

解决方案


要从站点获取信息,您可以使用以下示例:

import csv
import requests
from bs4 import BeautifulSoup


url = "https://wuzzuf.net/search/jobs/?q=python&a=hpb"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

with open("data.csv", "w") as f_in:
    writer = csv.writer(f_in)
    writer.writerow(
        ["Job titles", "Company names", "Location", "Skills", "Links"]
    )

    for title in soup.select("h2 > a"):
        company_name = title.find_next("a")
        location = company_name.find_next("span")
        info = location.find_next("div", {"class": None})

        writer.writerow(
            [
                title.text,
                company_name.text,
                location.text,
                ",".join(
                    a.text.replace("·", "").strip() for a in info.select("a")
                ),
                title["href"],
            ]
        )

创建data.csv(来自 LibreOffice 的屏幕截图):

在此处输入图像描述


推荐阅读