首页 > 解决方案 > 不知道在soup.select 中放什么。(对使用 Python 自动化无聊的东西做一个变体)

问题描述

所以我正在做“下载所有 XKCD 漫画”一章,但使用的是美国宇航局的日常照片。我已经到了希望代码选择一个按钮的地步,以转到下一页。在页面https://apod.nasa.gov/apod/ap191231.html上,返回按钮是 <。而且我不知道如何选择它。

#downloads nasa's daily photos.
import requests, os, bs4

#loads web page
url = 'https://apod.nasa.gov/apod/ap191231.html'
os.makedirs('nasa_daily_photos2019', exist_ok=True) #makes directory for photos

while not url.endswith('191225.html'):
    print('Downloading page %s...' % url)
    res = requests.get(url)
    res.raise_for_status()

    soup = bs4.BeautifulSoup(res.text, 'html.parser')

    image_elem = soup.select('img') #finds image
    if image_elem == []:
        print('Could not find image.')
    else:
        image_url = 'https://apod.nasa.gov/apod/' + image_elem[0].get('src')
        print('Downloading image %s...' % (image_url))
        res = requests.get(image_url)
        res.raise_for_status

    #save image to folder
        image_file = open(os.path.join('nasa_daily_photos2019', os.path.basename(image_url)), 'wb')
        for chunk in res.iter_content(100000):
            image_file.write(chunk)
        image_file.close()

    #now slects the '<' or previous page button
    prev_link = soup.select('a[<]')[0]
    url = 'https://apod.nasa.gov/apod/' + prev_link.get('href')```

I get the error: raise SelectorSyntaxError(msg, self.pattern, index)
soupsieve.util.SelectorSyntaxError: Malformed attribute selector at position 1
  line 1:
a[<]
 ^



标签: pythonbeautifulsoup

解决方案


您可以使用 CSS 伪类:contains(),如果您只想选择一个元素,可以使用.select_one()代替.select()

#now slects the '<' or previous page button
prev_link = soup.select_one('a:contains("<")')
url = 'https://apod.nasa.gov/apod/' + prev_link.get('href')

推荐阅读