首页 > 解决方案 > 通过bs4获取html表格数据到python

问题描述

我正在尝试从 twitch 子计数站点获取数据以查看各种 twitch 频道数据。我希望能够输入用户名并获得频道的排名和当前的子计数。

from urllib.request import urlopen, Request 
from bs4 import BeautifulSoup as soup 

url = "https://twitchanalysis.top/topsubs"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 
Safari/537.3"}

client = Request(url=url, headers=headers)
page_html = urlopen(client).read()
page_soup = soup(page_html, "html.parser")
db = {}
table = page_soup.find("table", id="topsubs_table")

for cell in page_soup.find_all('td')[3]:
    cell = page_soup.find_all('td')
    db[cell[2].text] = [cell[0].text, cell[3].text]
print(db)

但是,此代码在运行时仅返回第一通道。它应该返回第一页上的所有频道。我不知道该怎么办。请帮忙。

标签: pythonbeautifulsoup

解决方案


要获取所有记录,您必须遍历行。

from urllib.request import urlopen, Request
from bs4 import BeautifulSoup as soup
url = "https://twitchanalysis.top/topsubs"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
client = Request(url=url, headers=headers)
page_html = urlopen(client).read()
page_soup = soup(page_html, "html.parser")
db = {}
table = page_soup.find("table", id="topsubs_table")
for row in table.find_all('tr')[1:]:
    cell = row.find_all('td')
    if 'This is a spot for an advertisement' in cell[0].text:
         continue
    else:
        db[cell[2].text] = [cell[0].text, cell[3].text]
print(db)

输出

{'montanablack88': ['9', '23192'], 'nickmercs': ['4', '25314'], 'pokimane': ['36', '11173'], 'gladd': ['6', '24422'], 'criticalrole': ['16', '17529'], 'maximilian_dood': ['31', '12111'], 'ratirl': ['20', '15106'], 'cohhcarnage': ['12', '20752'], 'jasonr': ['40', '10313'], 'forsen': ['43', '9890'], 'teepee': ['23', '14090'], 'jerma985': ['41', '10180'], 'BobbyPoffGaming': ['19', '16281'], 'castro_1021': ['5', '24670'], 'drlupo': ['7', '23886'], 'alanzoka': ['30', '12521'], 'trainwreckstv': ['29', '13087'], 'noway4u_sir': ['21', '14611'], 'dakotaz': ['39', '10551'], 'ludwig': ['38', '10805'], 'rallied': ['42', '9893'], 'cdnthe3rd': ['48', '9103'], 'therealknossi': ['13', '20563'], 'lord_kebun': ['33', '11311'], 'xqcow': ['2', '32624'], 'littlesiha': ['44', '9693'], 'zerator': ['25', '13498'], 'chocotaco': ['27', '13373'], 'paymoneywubby': ['32', '11972'], 'timthetatman': ['14', '19526'], 'tfue': ['18', '16604'], 'auronplay': ['47', '9590'], 'sacriel': ['28', '13123'], 'lirik': ['17', '17497'], 'pestily': ['15', '18013'], 'rubius': ['35', '11223'], 'FORMAL': ['22', '14200'], 'drdisrespect': ['3', '29027'], 'admiralbahroo': ['10', '22752'], 'papaplatte': ['24', '13519'], 'nick28t': ['49', '9101'], 'joshog': ['34', '11284'], 'shlorox': ['37', '10980'], 'loltyler1': ['45', '9680'], 'gronkh': ['26', '13391'], 'gamesdonequick': ['1', '35885'], 'summit1g': ['8', '23787'], 'MOONMOON': ['11', '22220'], 'zanoxvii': ['46', '9677']}

推荐阅读