首页 > 解决方案 > Web Scraper 未撤回所有 URL

问题描述

我正在尝试使用单个语句在单个 python 调用中撤回我需要的所有内容。这在使用在线 Python IDE 时有效,但是,当我将相同的代码放入我的 Python 3.7 时,它只会拉回 1 个结果做缩进。

这是我能够让它在 python 3.7 中正常工作的唯一方法

from bs4 import BeautifulSoup # BeautifulSoup is in bs4 package 
import requests

URLS = ['https://sc2replaystats.com/replay/playerStats/11204055/2618391', 
'https://sc2replaystats.com/replay/playerStats/11248131/352954', 
'https://sc2replaystats.com/replay/playerStats/11090108/1624902'] 

responses = []
for URL in URLS:
  response = requests.get(URL) 
  responses.append(response)
  for response in responses: 
    soup = BeautifulSoup(response.content, 'html.parser')

tb = soup.find('table', class_='table table-striped table-condensed')
for link in tb.find_all('tr'):
  name = link.find('span')
  if name is not None:
      print(name['title'])

但是,这就是它在 Python Online 3.7 中正常工作的方式

from bs4 import BeautifulSoup # BeautifulSoup is in bs4 package 
import requests

URLS = ['https://sc2replaystats.com/replay/playerStats/11204055/2618391', 
'https://sc2replaystats.com/replay/playerStats/11248131/352954', 
'https://sc2replaystats.com/replay/playerStats/11090108/1624902'] 

responses = []
for URL in URLS:
  response = requests.get(URL) 
  responses.append(response)
for response in responses: 
  soup = BeautifulSoup(response.content, 'html.parser')

  tb = soup.find('table', class_='table table-striped table-condensed')
  for link in tb.find_all('tr'):
    name = link.find('span')
    if name is not None:
        print(name['title'])

预期结果是:

Hatchery
Hatchery
Spawningpool
Extractor
Banelingnest
Hatchery
Roachwarren
Extractor
Lair
Extractor
Extractor
Evolutionchamber
Spinecrawler
Extractor
Extractor
Hydraliskden
Extractor
Lurkerdenmp
Hatchery
Creeptumorqueen
Lurkerdenmp
Sporecrawler
Extractor
Spinecrawler
Sporecrawler
Sporecrawler
Sporecrawler
Sporecrawler
Sporecrawler
Sporecrawler
Hatchery
Spawningpool
Hatchery
Extractor
Roachwarren
Sporecrawler
Spinecrawler
Sporecrawler
Sporecrawler
Creeptumorqueen
Creeptumorqueen
Evolutionchamber
Evolutionchamber
Banelingnest
Creeptumor
Hatchery
Extractor
Creeptumor
Extractor
Creeptumorqueen
Sporecrawler
Lair
Sporecrawler
Extractor
Extractor
Creeptumor
Extractor
Roachwarren
Extractor
Creeptumorqueen
Creeptumor
Hatchery
Infestationpit
Creeptumor
Hatchery
Nydusnetwork
Creeptumor
Hatchery
Commandcenter
Supplydepot
Barracks
Refinery
Orbitalcommand
Commandcenter
Barracksreactor
Supplydepot
Factory
Refinery
Factorytechlab
Orbitalcommand
Starport
Bunker
Supplydepot
Supplydepot
Starporttechlab
Supplydepot
Barracks
Refinery
Supplydepot
Barracks
Engineeringbay
Refinery
Starportreactor
Factorytechlab
Supplydepot
Barracks
Supplydepot
Supplydepot
Supplydepot
Supplydepot
Supplydepot
Commandcenter
Barrackstechlab
Barracks
Barracks
Engineeringbay
Supplydepot
Barracksreactor
Barracksreactor
Supplydepot
Armory
Supplydepot
Supplydepot
Supplydepot
Orbitalcommand
Factory
Refinery
Refinery
Supplydepot
Factoryreactor
Supplydepot
Commandcenter
Barracks
Barrackstechlab
Planetaryfortress
Supplydepot
Supplydepot

但是,在 python 3.7 中运行它时,结果是:

Commandcenter
Supplydepot
Barracks
Refinery
Orbitalcommand
Commandcenter
Barracksreactor
Supplydepot
Factory
Refinery
Factorytechlab
Orbitalcommand
Starport
Bunker
Supplydepot
Supplydepot
Starporttechlab
Supplydepot
Barracks
Refinery
Supplydepot
Barracks
Engineeringbay
Refinery
Starportreactor
Factorytechlab
Supplydepot
Barracks
Supplydepot
Supplydepot
Supplydepot
Supplydepot
Supplydepot
Commandcenter
Barrackstechlab
Barracks
Barracks
Engineeringbay
Supplydepot
Barracksreactor
Barracksreactor
Supplydepot
Armory
Supplydepot
Supplydepot
Supplydepot
Orbitalcommand
Factory
Refinery
Refinery
Supplydepot
Factoryreactor
Supplydepot
Commandcenter
Barracks
Barrackstechlab
Planetaryfortress
Supplydepot

此外还有我收到的错误:

Python 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from bs4 import BeautifulSoup  # BeautifulSoup is in bs4 package
>>> import requests
>>>
>>> URLS = [
...     'https://sc2replaystats.com/replay/playerStats/11204055/2618391',
...     'https://sc2replaystats.com/replay/playerStats/11248131/352954',
...     'https://sc2replaystats.com/replay/playerStats/11090108/1624902'
... ]
>>>
>>> responses = []
>>> for URL in URLS:
...     response = requests.get(URL)
...     responses.append(response)
... for response in responses:
  File "<stdin>", line 4
    for response in responses:
      ^
SyntaxError: invalid syntax
>>>     soup = BeautifulSoup(response.content, 'html.parser')
  File "<stdin>", line 1
    soup = BeautifulSoup(response.content, 'html.parser')
    ^
IndentationError: unexpected indent
>>>
>>>     tb = soup.find('table', class_='table table-striped table-condensed')
  File "<stdin>", line 1
    tb = soup.find('table', class_='table table-striped table-condensed')
    ^
IndentationError: unexpected indent
>>>     for link in tb.find_all('tr'):
  File "<stdin>", line 1
    for link in tb.find_all('tr'):
    ^
IndentationError: unexpected indent
>>>         name = link.find('span')
  File "<stdin>", line 1
    name = link.find('span')
    ^
IndentationError: unexpected indent
>>>         if name is not None:
  File "<stdin>", line 1
    if name is not None:
    ^
IndentationError: unexpected indent
>>>             print(name['title'])
  File "<stdin>", line 1
    print(name['title'])
    ^
IndentationError: unexpected indent
>>>

标签: python-3.x

解决方案


推荐阅读