首页 > 解决方案 > I need help extracting a specific piece of text from a webpage

问题描述

I am trying to assign the 11101973 number in this html file to a variable but need a way of grabbing that number only without any of the extra information :

<div class="chooseSizeContainer" id="2SizeContainer" style="display:none;">
 <div class="chooseSizeLinkContainer active">
 <a id="US-13" href="javascript:void(0);"
 class="chooseSizeLink chooseSizeLinkActive"
 data-size="13"                                                 
 onclick="ProductDetails.changeSizeAffectedLinks(
 '11101973',
 '£ 135.95',
 '£ 135.95',
 '0',
 '£ 0.00saved!',
 '13',
 '13',
 '15',
 'false',
 'false',
 'false',
 'false',
 'unknown',
 'US-',
 '555088-015');">13</a>
 </div>

The page source is here if more info is needed : view-source:https://www.kickz.com/uk/jordan-basketball-retro-air-jordan-1-retro-high-og-black_varsity_red_sail_university_blue-107840036 Any help appreciated!

标签: pythonweb-scrapingbeautifulsouppython-requests

解决方案


beautifulsoup 用于解析 html 元素而不是 javascript 变量。那里很少有 javascript 解析器,但对于简单的任务,我更喜欢Regex

import requests, re

page = requests.get(url).text
theNumber = re.search(r'collectAskInput\((\d+)).group(1)
print(theNumber)
# 11101973

它搜索号码

onclick="return ProductDetails.collectAskInput(11101973)

推荐阅读