首页 > 解决方案 > 如何将这些 Web 剪贴的数据导出到 csv 文件中?

问题描述

我对编码和网络抓取很陌生,我一直在 youtube 上观看大量教程,但找不到将这些数据写入 csv 文件的方法。有人可以帮忙吗?

import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup


options = Options()
options.add_argument("window-size=1400,600")
from fake_useragent import UserAgent
ua = UserAgent()
a = ua.random
user_agent = ua.random
print(user_agent)
options.add_argument(f'user-agent={user_agent}')


driver = webdriver.Chrome('/Users/raduulea/Documents/chromedriver', options=options)
driver.get('https://www.immoweb.be/fr/recherche/immeuble-de-rapport/a-vendre')

import time
time.sleep(10)

html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')

results = soup.find_all("div", {"class":"result-xl"})

for result in results:
    print(result.find("div", {"class":"title-bar-left"}).get_text())
    print(result.find("span", {"result-adress"}).get_text())
    print(result.find("div", {"class":"xl-price rangePrice"}).get_text())
    print(result.find("div", {"class":"xl-surface-ch"}).get_text())
    print(result.find("div", {"class":"xl-desc"}).get_text())

标签: python-3.xselenium-webdriverweb-scrapingbeautifulsoupexport-to-csv

解决方案


用于pandas DataFrame在其中添加数据。然后导出为更容易的 CSV 文件。

    import pandas as pd
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from bs4 import BeautifulSoup


    options = Options()
    options.add_argument("window-size=1400,600")
    from fake_useragent import UserAgent
    ua = UserAgent()
    a = ua.random
    user_agent = ua.random
    print(user_agent)
    options.add_argument(f'user-agent={user_agent}')


    driver = webdriver.Chrome('/Users/raduulea/Documents/chromedriver', options=options)

    driver.get('https://www.immoweb.be/fr/recherche/immeuble-de-rapport/a-vendre')

    import time
    time.sleep(10)

    html = driver.page_source
    soup = BeautifulSoup(html, 'html.parser')

    results = soup.find_all("div", {"class":"result-xl"})
    title=[]
    address=[]
    price=[]
    surface=[]
    desc=[]
    for result in results:
       title.append(result.find("div", {"class":"title-bar-left"}).get_text().strip())
       address.append(result.find("span", {"result-adress"}).get_text().strip())
       price.append(result.find("div", {"class":"xl-price rangePrice"}).get_text().strip())
       surface.append(result.find("div", {"class":"xl-surface-ch"}).get_text().strip())
       desc.append(result.find("div", {"class":"xl-desc"}).get_text().strip())


df = pd.DataFrame({"Title":title,"Address":address,"Price:":price,"Surface" : surface,"Description":desc})
df.to_csv("output.csv")

输出:您的 csv 文件将是这样的。

输出 CSV


推荐阅读