首页 > 解决方案 > Python中的网络爬虫

问题描述

我的问题是是否有可能得到一个像这样的跨度内的数字:

<html junk>
 <div class="test">
     <span>
     55
     </span>
 </div>
</html junk>

如您所见,span 没有类或 id。

我当前的代码只是刮板的默认代码(删除了用户代理和 URL):

import requests
from bs4 import BeautifulSoup

URL = ''

headers = {"User-Agent": ''}

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')

#Here is where the "55" should be found (the number is going to change over time so im not excactly looking for it
title = soup.find('') 

print(title)

标签: pythonhtmlwebweb-scraping

解决方案


如果我正确理解了您的问题,您是否正在尝试获取两个跨度标签之间的数字?如果是这样,您可以这样做。

import requests
from bs4 import BeautifulSoup

URL = ''

headers = {"User-Agent": ''}

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.text, 'html.parser')

#Here is where the "55" should be found (the number is going to change over time so im not excactly looking for it
title = soup.find('span').getText() 

print(title)

推荐阅读