首页 > 解决方案 > BeautifulSoup 抓取投资网站;AttributeError:表格问题

问题描述

我花了大约几个小时来解决一个可能很简单的任务,但无法弄清楚:我试图从以下包含表格的站点中抓取数据。

但是,如果我尝试从表中提取数据,则会收到 AttributeErrorcontainers.findAll('td')

我搜索了几个网站,但它似乎对除我以外的所有人都有效.. 有人知道吗?

import requests
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq

#OpenURL
url = requests.get('https://www.investing.com/equities/apple-computer-inc-balance-sheet',headers={'User-Agent': 'Mozilla/5.0'})

#DETERMINE FORMAT
content_page = soup(url.content,'html.parser')

containers = content_page.findAll('table', {'class':'genTbl reportTbl'})
containers.findAll('td')   ## This doesnt work for some reason.. 
                           ## also tried .find('td') &  ('tr') etc.

然后应该使用 for 循环提取数据,但无论如何,由于上述方法不起作用,我被困在这里..

A=[]

for row in containers.findAll("tr"):
    cells = row.findAll('td')
    states=row.findAll('th') #To store second column data
    if len(cells)==6: #Only extract table body not heading
        A.append(cells[0].find(text=True))

或者

data = []
rows = table.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

标签: pythonweb-scrapingbeautifulsouppython-requests

解决方案


如果你想要 'td' 的全部文本,请使用:containers = content_page.findAll('table', {'class':'genTbl reportTbl'})

for i in cointainers:
    a= i.findAll('td')
    print(a)

如果你只是想要'td'的文本然后使用:

for i in containers:
    for td in i.findAll('td'):
        print(td.text) 

推荐阅读