首页 > 解决方案 > 如何让这个网络爬虫只打印歌曲的标题?

问题描述

    import requests
    from bs4 import BeautifulSoup

    url = 'https://www.officialcharts.com/charts/singles-chart'
    reqs = requests.get(url)
    soup = BeautifulSoup(reqs.text, 'html.parser')

    urls = []
    for link in soup.find_all('a'):
        print(link.get('href'))

    def chart_spider(max_pages):
    page = 1

        while page >= max_pages:
            url = "https://www.officialcharts.com/charts/singles-chart"
            source_code = requests.get(url)
            plain_text = source_code.text
            soup = BeautifulSoup(plain_text, 'html.parser')

            for link in soup.findAll('a', {"class": "title"}):
                href = "BAD HABITS" + link.title(href)
                print(href)
        page += 1

    chart_spider(1)

想知道如何只打印歌曲的标题而不是整个页面。我希望它通过前 100 名图表并打印所有标题。谢谢

标签: pythonparsingbeautifulsouppython-requests

解决方案


这是一个可能的解决方案,它尽可能少地修改您的代码:

#!/usr/bin/env python3

import requests
from bs4 import BeautifulSoup

URL = 'https://www.officialcharts.com/charts/singles-chart'

def chart_spider():
    source_code = requests.get(URL)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, 'html.parser')
    for title in soup.find_all('div', {"class": "title"}):
        print(title.contents[1].string)

chart_spider()

结果是在页面中找到的所有标题的列表,每行一个。


推荐阅读