首页 > 解决方案 > 如何在从代码中获得的 csv 文件中打印网页抓取结果

问题描述

from bs4 import BeautifulSoup
import requests
import csv
url = "https://coingecko.com/en"

page = requests.get(url)
html_doc = page.content
soup = BeautifulSoup(html_doc,"html.parser")
coinname =soup.find_all("div",attrs={"class":"coin-content center"})
coin_sign = soup.find_all("div",attrs={"class":"coin-icon mr-2 center flex-column"})
coinvalue = soup.find_all("td",attrs={"class":"td-price price text-right "})
marketcap = soup.find_all("td",attrs={"class":"td-market_cap cap "})
Liquidity = soup.find_all("td", attrs={"class": "td-liquidity_score lit text-right "})

coin_name = []
coinsign = []
Coinvalue = []
Marketcap = []
marketliquidity = []
for div in coinname:
    coin_name.append(div.a.span.text)

for sign in coin_sign:
    coinsign.append(sign.span.text)
for Value in coinvalue:
    Coinvalue.append(Value.a.span.text)
for cap in marketcap:
    Marketcap.append(cap.div.span.text)
for liquidity in Liquidity:
marketliquidity.append(liquidity.a.span.text)
print(coin_name)
print(coinsign)
print(Coinvalue)
print(Marketcap)
print(marketliquidity)

我想将输出保存到一个包含 5 列的 csv 文件中。第 1 列是“coin_name”,第 2 列是“coinsign”,第 3 列是“coinvalue”,第 4 列是“Marketcap”,第 5 列是“Marketliquidity”。我该如何解决这个问题?

我还想限制我收到的数据,因为我只想收到 100 个 coin_name 但我收到了 200 个 coin_name。

标签: pythonweb-scrapingbeautifulsoupscreen-scraping

解决方案


from bs4 import BeautifulSoup
import requests
import csv

url = "https://coingecko.com/en"
page = requests.get(url)
soup = BeautifulSoup(page.content,"html.parser")

#Instead of assigning variable and looping you can use list comprehension.
names = [div.a.span.text for div in soup.find_all("div",attrs={"class":"coin-content center"})]
signs = [sign.span.text for sign in soup.find_all("div",attrs={"class":"coin-icon mr-2 center flex-column"})]
values = [value.a.span.text for value in soup.find_all("td",attrs={"class":"td-price price text-right "})]
caps = [cap.div.span.text for cap in soup.find_all("td",attrs={"class":"td-market_cap cap "})]
liquidities = [liquidity.a.span.text for liquidity in soup.find_all("td", attrs={"class": "td-liquidity_score lit text-right "})]

with open('coins.csv', mode='w',newline='') as coins:
    writer = csv.writer(coins, delimiter=',', quotechar='"')
    #Take only first 100 coins
    for i in range(100):
        writer.writerow([names[i],signs[i],values[i],caps[i],liquidities[i]])

输出将是

Bitcoin,BTC,"$6,578.62","$113,894,498,118","$1,476,855,331"
Ethereum,ETH,$224.49,"$22,995,876,618","$1,256,303,216"
EOS,EOS,$5.73,"$5,193,319,905","$708,339,006"
XRP,XRP,$0.48,"$19,249,618,341","$564,378,978"
Litecoin,LTC,$57.80,"$3,388,966,637","$486,289,650"
NEO,NEO,$18.11,"$1,177,368,159","$160,733,208"
Monero,XMR,$113.64,"$1,871,890,512","$55,235,745"

推荐阅读