python - 为什么 BeautifulSoup 不抓取整个网页?
问题描述
前提:我对 Python 和网络抓取完全陌生。我正在尝试在此页面上抓取有关品牌的数据:https ://www.interbrand.com/best-brands/best-global-brands/2018/ranking/ ,但 BeautifulSoup 仅提取 html 到某一点. 那里的 html 中似乎没有什么奇怪的,因为在 BeautifulSoup 提取的标签之前有五个几乎相等的标签,没有任何问题。
我已经尝试过使用三种不同的解析器(内置的,lxml 和 html5lib),但我总是得到相同的结果。
这是代码:
import requests
page = requests.get("https://www.interbrand.com/best-brands/best-global-brands/2018/ranking/")
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content , 'html5lib')
print(soup.prettify())
解决方案
使用 Css 选择器获取输出。
from bs4 import BeautifulSoup
import requests
page = requests.get("https://www.interbrand.com/best-brands/best-global-brands/2018/ranking/")
soup = BeautifulSoup(page.content , 'lxml')
Brand=[]
Country=[]
Region=[]
Sector=[]
for brnd in soup.select('div.brand-name'):
Brand.append(brnd['title'])
for region in soup.select('div.brand-region'):
Region.append(region['title'])
for county in soup.select('div.brand-country'):
Country.append(county['title'])
for sector in soup.select('div.brand-sector'):
Sector.append(sector['title'])
print(Brand)
print(Region)
print(Country)
print(Sector)
输出:
['Brand name: Apple', 'Brand name: Google', 'Brand name: Amazon', 'Brand name: Microsoft', 'Brand name: Coca-Cola', 'Brand name: Samsung', 'Brand name: Toyota', 'Brand name: Mercedes-Benz', 'Brand name: Facebook', "Brand name: McDonald's", 'Brand name: Intel', 'Brand name: IBM', 'Brand name: BMW', 'Brand name: Disney', 'Brand name: Cisco', 'Brand name: GE', 'Brand name: Nike', 'Brand name: Louis Vuitton', 'Brand name: Oracle', 'Brand name: Honda', 'Brand name: SAP', 'Brand name: Pepsi', 'Brand name: Chanel', 'Brand name: American Express', 'Brand name: Zara', 'Brand name: J.P. Morgan', 'Brand name: IKEA', 'Brand name: Gillette', 'Brand name: UPS', 'Brand name: H&M', 'Brand name: Pampers', 'Brand name: Hermès', 'Brand name: Budweiser', 'Brand name: Accenture', 'Brand name: Ford', 'Brand name: Hyundai', 'Brand name: NESCAFÉ', 'Brand name: eBay', 'Brand name: Gucci', 'Brand name: Nissan', 'Brand name: Volkswagen', 'Brand name: Audi', 'Brand name: Philips', 'Brand name: Goldman Sachs', 'Brand name: Citi', 'Brand name: HSBC', 'Brand name: AXA', "Brand name: L'Oréal", 'Brand name: Allianz', 'Brand name: adidas', 'Brand name: Adobe', 'Brand name: Porsche', "Brand name: Kellogg's", 'Brand name: HP', 'Brand name: Canon', 'Brand name: Siemens', 'Brand name: Starbucks', 'Brand name: Danone', 'Brand name: Sony', 'Brand name: 3M', 'Brand name: Visa', 'Brand name: Nestlé', 'Brand name: Morgan Stanley', 'Brand name: Colgate', 'Brand name: Hewlett Packard Enterprise', 'Brand name: Netflix', 'Brand name: Cartier', 'Brand name: Huawei', 'Brand name: Banco Santander', 'Brand name: Mastercard', 'Brand name: Kia', 'Brand name: FedEx', 'Brand name: PayPal', 'Brand name: LEGO', 'Brand name: Salesforce.com', 'Brand name: Panasonic', 'Brand name: Johnson & Johnson', 'Brand name: Land Rover', 'Brand name: DHL', 'Brand name: Ferrari', 'Brand name: Discovery', 'Brand name: Caterpillar', 'Brand name: Tiffany & Co.', "Brand name: Jack Daniel's", 'Brand name: Corona', 'Brand name: KFC', 'Brand name: Heineken', 'Brand name: John Deere', 'Brand name: Shell', 'Brand name: MINI', 'Brand name: Dior', 'Brand name: Spotify', 'Brand name: Harley-Davidson', 'Brand name: Burberry', 'Brand name: Prada', 'Brand name: Sprite', 'Brand name: Johnnie Walker', 'Brand name: Hennessy', 'Brand name: Nintendo', 'Brand name: Subaru']
['Region: The Americas', 'Region: The Americas', 'Region: The Americas', 'Region: The Americas', 'Region: The Americas', 'Region: Asia Pacific', 'Region: Asia Pacific', 'Region: Europe & Africa', 'Region: The Americas', 'Region: The Americas', 'Region: The Americas', 'Region: The Americas', 'Region: Europe & Africa', 'Region: The Americas', 'Region: The Americas', 'Region: The Americas', 'Region: The Americas', 'Region: Europe & Africa', 'Region: The Americas', 'Region: Asia Pacific', 'Region: Europe & Africa', 'Region: The Americas', 'Region: Europe & Africa', 'Region: The Americas', 'Region: Europe & Africa', 'Region: The Americas', 'Region: Europe & Africa', 'Region: The Americas', 'Region: The Americas', 'Region: Europe & Africa', 'Region: The Americas', 'Region: Europe & Africa', 'Region: The Americas', 'Region: The Americas', 'Region: The Americas', 'Region: Asia Pacific', 'Region: Europe & Africa', 'Region: The Americas', 'Region: Europe & Africa', 'Region: Asia Pacific', 'Region: Europe & Africa', 'Region: Europe & Africa', 'Region: Europe & Africa', 'Region: The Americas', 'Region: The Americas', 'Region: Europe & Africa', 'Region: Europe & Africa', 'Region: Europe & Africa', 'Region: Europe & Africa', 'Region: Europe & Africa', 'Region: The Americas', 'Region: Europe & Africa', 'Region: The Americas', 'Region: The Americas', 'Region: Asia Pacific', 'Region: Europe & Africa', 'Region: The Americas', 'Region: Europe & Africa', 'Region: Asia Pacific', 'Region: The Americas', 'Region: The Americas', 'Region: Europe & Africa', 'Region: The Americas', 'Region: The Americas', 'Region: The Americas', 'Region: The Americas', 'Region: Europe & Africa', 'Region: Asia Pacific', 'Region: Europe & Africa', 'Region: The Americas', 'Region: Asia Pacific', 'Region: The Americas', 'Region: The Americas', 'Region: Europe & Africa', 'Region: The Americas', 'Region: Asia Pacific', 'Region: The Americas', 'Region: Europe & Africa', 'Region: The Americas', 'Region: Europe & Africa', 'Region: The Americas', 'Region: The Americas', 'Region: The Americas', 'Region: The Americas', 'Region: The Americas', 'Region: The Americas', 'Region: Europe & Africa', 'Region: The Americas', 'Region: Europe & Africa', 'Region: Europe & Africa', 'Region: Europe & Africa', 'Region: Europe & Africa', 'Region: The Americas', 'Region: Europe & Africa', 'Region: Europe & Africa', 'Region: The Americas', 'Region: Europe & Africa', 'Region: Europe & Africa', 'Region: Asia Pacific', 'Region: Asia Pacific']
['Country: United States', 'Country: United States', 'Country: United States', 'Country: United States', 'Country: United States', 'Country: South Korea', 'Country: Japan', 'Country: Germany', 'Country: United States', 'Country: United States', 'Country: United States', 'Country: United States', 'Country: Germany', 'Country: United States', 'Country: United States', 'Country: United States', 'Country: United States', 'Country: France', 'Country: United States', 'Country: Japan', 'Country: Germany', 'Country: United States', 'Country: France', 'Country: United States', 'Country: Spain', 'Country: United States', 'Country: Sweden', 'Country: United States', 'Country: United States', 'Country: Sweden', 'Country: United States', 'Country: France', 'Country: United States', 'Country: United States', 'Country: United States', 'Country: South Korea', 'Country: Switzerland', 'Country: United States', 'Country: Italy', 'Country: Japan', 'Country: Germany', 'Country: Germany', 'Country: Netherlands', 'Country: United States', 'Country: United States', 'Country: United Kingdom', 'Country: France', 'Country: France', 'Country: Germany', 'Country: Germany', 'Country: United States', 'Country: Germany', 'Country: United States', 'Country: United States', 'Country: Japan', 'Country: Germany', 'Country: United States', 'Country: France', 'Country: Japan', 'Country: United States', 'Country: United States', 'Country: Switzerland', 'Country: United States', 'Country: United States', 'Country: United States', 'Country: United States', 'Country: France', 'Country: China', 'Country: Spain', 'Country: United States', 'Country: South Korea', 'Country: United States', 'Country: United States', 'Country: Denmark', 'Country: United States', 'Country: Japan', 'Country: United States', 'Country: United Kingdom', 'Country: United States', 'Country: Italy', 'Country: United States', 'Country: United States', 'Country: United States', 'Country: United States', 'Country: Mexico', 'Country: United States', 'Country: Netherlands', 'Country: United States', 'Country: Netherlands', 'Country: United Kingdom', 'Country: France', 'Country: Sweden', 'Country: United States', 'Country: United Kingdom', 'Country: Italy', 'Country: United States', 'Country: United Kingdom', 'Country: France', 'Country: Japan', 'Country: Japan']
['Sector: Technology', 'Sector: Technology', 'Sector: Retail', 'Sector: Technology', 'Sector: Beverages', 'Sector: Technology', 'Sector: Automotive', 'Sector: Automotive', 'Sector: Technology', 'Sector: Restaurants', 'Sector: Technology', 'Sector: Business Services', 'Sector: Automotive', 'Sector: Media', 'Sector: Technology', 'Sector: Diversified', 'Sector: Sporting Goods', 'Sector: Luxury', 'Sector: Technology', 'Sector: Automotive', 'Sector: Technology', 'Sector: Beverages', 'Sector: Luxury', 'Sector: Financial Services', 'Sector: Apparel', 'Sector: Financial Services', 'Sector: Retail', 'Sector: FMCG', 'Sector: Logistics', 'Sector: Apparel', 'Sector: FMCG', 'Sector: Luxury', 'Sector: Alcohol', 'Sector: Business Services', 'Sector: Automotive', 'Sector: Automotive', 'Sector: Beverages', 'Sector: Retail', 'Sector: Luxury', 'Sector: Automotive', 'Sector: Automotive', 'Sector: Automotive', 'Sector: Electronics', 'Sector: Financial Services', 'Sector: Financial Services', 'Sector: Financial Services', 'Sector: Financial Services', 'Sector: FMCG', 'Sector: Financial Services', 'Sector: Sporting Goods', 'Sector: Technology', 'Sector: Automotive', 'Sector: FMCG', 'Sector: Technology', 'Sector: Electronics', 'Sector: Diversified', 'Sector: Restaurants', 'Sector: FMCG', 'Sector: Electronics', 'Sector: Diversified', 'Sector: Financial Services', 'Sector: FMCG', 'Sector: Financial Services', 'Sector: FMCG', 'Sector: Technology', 'Sector: Media', 'Sector: Luxury', 'Sector: Technology', 'Sector: Financial Services', 'Sector: Financial Services', 'Sector: Automotive', 'Sector: Logistics', 'Sector: Financial Services', 'Sector: FMCG', 'Sector: Business Services', 'Sector: Electronics', 'Sector: FMCG', 'Sector: Automotive', 'Sector: Logistics', 'Sector: Automotive', 'Sector: Media', 'Sector: Diversified', 'Sector: Luxury', 'Sector: Alcohol', 'Sector: Alcohol', 'Sector: Restaurants', 'Sector: Alcohol', 'Sector: Diversified', 'Sector: Energy', 'Sector: Automotive', 'Sector: Luxury', 'Sector: Media', 'Sector: Automotive', 'Sector: Luxury', 'Sector: Luxury', 'Sector: Beverages', 'Sector: Alcohol', 'Sector: Alcohol', 'Sector: Electronics', 'Sector: Automotive']
推荐阅读
- reactjs - React 生产构建未使用的代码拆分块
- html - 这HTML 元素未按预期工作
- java - 如何使用opencsv写入内容为csv的输出流?
- javascript - 用于 LIVE 徽章/图像的 HLS/m3u8 流检查器
- c++ - 如果在通过头文件包含类的构造函数时不使用大括号和类的构造函数,则会出现错误
- reactjs - useEffect Hook - 如何检测状态数组中对象属性的变化
- c++ - 枚举大小和变量的初始化
- javascript - Ant Design:如何在 React 中制作可折叠标签
- matlab - 计算 10000 x 10000 矩阵的互相关
- c# - 在下拉菜单中启用/禁用项目复选标记 (Unity)