首页 > 解决方案 > 使用soup.findAll 无法获取一些标签?

问题描述

这是一个 HTML 代码,您可以看到有两个标签,即<code>, <img>. 现在我想让你关注的是,当你向右滚动一点code时,你会在标签之后看到一个img标签。

问题

现在的主要问题是,我想要所有代码标签,我为此使用 bs4,但我可以得到紧跟在图像标签之后的代码标签。不知道为什么?。任何想法?

<code style="display: none" id="bpr-guid-1535430">
      {&quot;data&quot;:{&quot;mediaConfig&quot;:{&quot;mprConfig&quot;:{&quot;sizes&quot;:[{&quot;width&quot;:60,&quot;height&quot;:30,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:60,&quot;height&quot;:36,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:90,&quot;height&quot;:45,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:90,&quot;height&quot;:54,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:100,&quot;height&quot;:50,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:100,&quot;height&quot;:60,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:100,&quot;height&quot;:100,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:120,&quot;height&quot;:60,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:120,&quot;height&quot;:72,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:127,&quot;height&quot;:30,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:127,&quot;height&quot;:46,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:150,&quot;height&quot;:75,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:150,&quot;height&quot;:90,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:191,&quot;height&quot;:45,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:191,&quot;height&quot;:69,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:200,&quot;height&quot;:100,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:200,&quot;height&quot;:120,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:200,&quot;height&quot;:200,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:254,&quot;height&quot;:60,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:254,&quot;height&quot;:92,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:337,&quot;height&quot;:120,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:400,&quot;height&quot;:400,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:506,&quot;height&quot;:180,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:674,&quot;height&quot;:240,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:750,&quot;height&quot;:750,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;}],&quot;filters&quot;:{&quot;cover&quot;:&quot;https://media.licdn.com/mpr/mpr/shrinknp_{width}_{height}{+id}&quot;,&quot;contain&quot;:&quot;https://media.licdn.com/mpr/mpr/shrinknp_{width}_{height}{+id}&quot;,&quot;original&quot;:&quot;https://media.licdn.com/media{+id}&quot;,&quot;fill&quot;:&quot;https://media.licdn.com/mpr/mpr/shrink_{width}_{height}{+id}&quot;,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorFilters&quot;},&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorConfig&quot;},&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaConfig&quot;},&quot;$type&quot;:&quot;com.linkedin.voyager.common.Configuration&quot;},&quot;included&quot;:[]}
    </code>

<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" style="display: none" class="datalet-bpr-guid-1535430"><code style="display: none" id="bpr-guid-1535431">
  {&quot;data&quot;:{&quot;canBrowseProfiles&quot;:false,&quot;reactivationFeaturesEligible&quot;:false,&quot;canViewJobAnalytics&quot;:false,&quot;canViewWVMP&quot;:false,&quot;premiumFreeTrialEligible&quot;:true,&quot;canViewCompanyInsights&quot;:false,&quot;$type&quot;:&quot;com.linkedin.voyager.premium.FeatureAccess&quot;},&quot;included&quot;:[]}
</code>

<code style="display: none" id="datalet-bpr-guid-1535431">
  {"request":"/voyager/api/premium/featureAccess?name\u003DreactivationFeaturesEligible","status":200,"body":"bpr-guid-1535431"}
</code>

<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" style="display: none" class="datalet-bpr-guid-1535431"><code style="display: none" id="bpr-guid-1535432">
  {&quot;data&quot;:{&quot;companies&quot;:[],&quot;$deletedFields&quot;:[&quot;paidProducts&quot;,&quot;postJobsEnabled&quot;],&quot;memberGroup&quot;:&quot;FREE&quot;,&quot;showStaticLearning&quot;:false,&quot;$type&quot;:&quot;com.linkedin.voyager.common.Nav&quot;,&quot;$id&quot;:&quot;M8x5UY0Zt6eGdBCiy+iKhA&#61;&#61;,root&quot;},&quot;included&quot;:[]}
</code>

<code style="display: none" id="datalet-bpr-guid-1535432">
  {"request":"/voyager/api/nav","status":200,"body":"bpr-guid-1535432"}
</code>

下面是我在 python 中使用的代码。

h = HTMLParser()

companyname = sys.argv[1]

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0',
   
}
url = 'https://www.linkedin.com/search/results/all/?keywords='+companyname+'&origin=GLOBAL_SEARCH_HEADER'
req = requests.get(url, headers=headers)
finding = BeautifulSoup(req.content, 'lxml')



for x in finding.findAll('code'):
    print x

标签: python-2.7beautifulsoup

解决方案


推荐阅读