python-2.7 - 使用soup.findAll 无法获取一些标签?
问题描述
这是一个 HTML 代码,您可以看到有两个标签,即<code>, <img>
. 现在我想让你关注的是,当你向右滚动一点code
时,你会在标签之后看到一个img
标签。
问题
现在的主要问题是,我想要所有代码标签,我为此使用 bs4,但我可以得到紧跟在图像标签之后的代码标签。不知道为什么?。任何想法?
<code style="display: none" id="bpr-guid-1535430">
{"data":{"mediaConfig":{"mprConfig":{"sizes":[{"width":60,"height":30,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":60,"height":36,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":90,"height":45,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":90,"height":54,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":100,"height":50,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":100,"height":60,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":100,"height":100,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":120,"height":60,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":120,"height":72,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":127,"height":30,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":127,"height":46,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":150,"height":75,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":150,"height":90,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":191,"height":45,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":191,"height":69,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":200,"height":100,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":200,"height":120,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":200,"height":200,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":254,"height":60,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":254,"height":92,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":337,"height":120,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":400,"height":400,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":506,"height":180,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":674,"height":240,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":750,"height":750,"$type":"com.linkedin.voyager.common.MediaProcessorSize"}],"filters":{"cover":"https://media.licdn.com/mpr/mpr/shrinknp_{width}_{height}{+id}","contain":"https://media.licdn.com/mpr/mpr/shrinknp_{width}_{height}{+id}","original":"https://media.licdn.com/media{+id}","fill":"https://media.licdn.com/mpr/mpr/shrink_{width}_{height}{+id}","$type":"com.linkedin.voyager.common.MediaProcessorFilters"},"$type":"com.linkedin.voyager.common.MediaProcessorConfig"},"$type":"com.linkedin.voyager.common.MediaConfig"},"$type":"com.linkedin.voyager.common.Configuration"},"included":[]}
</code>
<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" style="display: none" class="datalet-bpr-guid-1535430"><code style="display: none" id="bpr-guid-1535431">
{"data":{"canBrowseProfiles":false,"reactivationFeaturesEligible":false,"canViewJobAnalytics":false,"canViewWVMP":false,"premiumFreeTrialEligible":true,"canViewCompanyInsights":false,"$type":"com.linkedin.voyager.premium.FeatureAccess"},"included":[]}
</code>
<code style="display: none" id="datalet-bpr-guid-1535431">
{"request":"/voyager/api/premium/featureAccess?name\u003DreactivationFeaturesEligible","status":200,"body":"bpr-guid-1535431"}
</code>
<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" style="display: none" class="datalet-bpr-guid-1535431"><code style="display: none" id="bpr-guid-1535432">
{"data":{"companies":[],"$deletedFields":["paidProducts","postJobsEnabled"],"memberGroup":"FREE","showStaticLearning":false,"$type":"com.linkedin.voyager.common.Nav","$id":"M8x5UY0Zt6eGdBCiy+iKhA==,root"},"included":[]}
</code>
<code style="display: none" id="datalet-bpr-guid-1535432">
{"request":"/voyager/api/nav","status":200,"body":"bpr-guid-1535432"}
</code>
下面是我在 python 中使用的代码。
h = HTMLParser()
companyname = sys.argv[1]
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0',
}
url = 'https://www.linkedin.com/search/results/all/?keywords='+companyname+'&origin=GLOBAL_SEARCH_HEADER'
req = requests.get(url, headers=headers)
finding = BeautifulSoup(req.content, 'lxml')
for x in finding.findAll('code'):
print x
解决方案
推荐阅读
- python - PyQt5 Pyqtgraph 绘图离散
- javascript - puppeteer : wait for ajax call after navigation
- angular - Angular 6 Material - Await until Mat Dialog is closed
- powershell - 签入powershell权限字符串是否有效
- android - Mapbox - Reverse Geocoding - multiple language response
- javascript - 正则表达式获取除括号内的每个非特殊单词
- angularjs - 使用 2 种方法 POST 和 PUT 发出 js 请求
- asp.net - host https website using IIS across local area network
- eloquent - 在 null 上对成员函数 connection() 的代码接收调用
- c# - 基于 EndTime 属性刷新 WPF ListView