首页 > 解决方案 > 无法使用 beautifulsoup 在网站上刮桌子

问题描述

我正在尝试刮掉这张桌子:https ://www.coingecko.com/en/coins/recently_added?page=1

这是我的代码:

import requests
from bs4 import BeautifulSoup
import csv

root_url = "https://www.coingecko.com/en/coins/recently_added"
html = requests.get(root_url)
soup = BeautifulSoup(html.text, 'html.parser')

paging = soup.find("div",{"class":"row no-gutters tw-flex flex-column flex-lg-row tw-justify-end mt-2"}).find("ul",{"class":"pagination"}).find_all("a")
start_page = paging[1].text
last_page = paging[len(paging)-2].text

#
# outfile = open('gymlookup.csv','w', newline='')
# writer = csv.writer(outfile)
# writer.writerow(["Name", "Address", "Phone"])


pages = list(range(1,int(last_page)+1))
for page in pages:
    url = 'https://www.coingecko.com/en/coins/recently_added?page=%s' %(page)
    html = requests.get(url)
    soup = BeautifulSoup(html.text, 'html.parser')

    #print(soup.prettify())
    print ('Processing page: %s' %(page))

    coins = soup.findAll("div",{"class":"coingecko-table"})
    for element in coins:
        coin = element.find(class_='coin-name text-left tablesorter-header tablesorter-headerUnSorted')
        price = element.find(class_='price text-right sorter-numeric tablesorter-header tablesorter-headerUnSorted')
        print(coin,price)
        # hr = element.find('change1h').text
        # last_added = element.find('last_added').text

#         writer.writerow([coin, price, hr,last_added])
#
# outfile.close()
print('Done')

print(coin,price) 无法打印任何内容。不知道为什么,欢迎任何帮助:)

标签: pythonbeautifulsoup

解决方案


只是pandas用来获取表格数据。

就是这样:

import pandas as pd
import requests

url = "https://www.coingecko.com/en/coins/recently_added?page=1"
df = pd.read_html(requests.get(url).text, flavor="bs4")
df = pd.concat(df).drop(["Unnamed: 0", "Unnamed: 1"], axis=1)
df.to_csv("your_table.csv", index=False)

输出:

在此处输入图像描述


推荐阅读