首页 > 解决方案 > 通过 html 敏捷包从 wiki 获取数据

问题描述

我正在尝试通过 HTML 敏捷包从 wiki 主页(https://en.wikipedia.org/wiki/Main_Page )获取数据。您能否帮助 Xpath 查询以获取与“您知道...”表相关的所有数据。它实际上包含元素列表,我想得到它们。

<h2 id="mp-dyk-h2" style="clear:both; margin:0.5em; background:#cef2e0; font-family:inherit; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; color:#000; padding:0.2em 0.4em;"><span class="mw-headline" id="Did_you_know...">Did you know...</span></h2>
<div id="mp-dyk" style="padding:0.1em 0.6em 0.5em;">
<div class="dyk-img" style="float: right; margin-left: 0.5em;">
<div class="thumbinner mp-thumb" style="background: transparent; border: none; padding: 0; max-width: 180px;">
<a href="/wiki/File:Bridge_Of_Carron_(geograph_4810444)_(cropped).jpg" class="image" title="The Carron Bridge arching over the River Spey"><img alt="The Carron Bridge arching over the River Spey" src="//upload.wikimedia.org/wikipedia/commons/thumb/2/26/Bridge_Of_Carron_%28geograph_4810444%29_%28cropped%29.jpg/180px-Bridge_Of_Carron_%28geograph_4810444%29_%28cropped%29.jpg" decoding="async" width="180" height="120" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/2/26/Bridge_Of_Carron_%28geograph_4810444%29_%28cropped%29.jpg/270px-Bridge_Of_Carron_%28geograph_4810444%29_%28cropped%29.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/2/26/Bridge_Of_Carron_%28geograph_4810444%29_%28cropped%29.jpg/360px-Bridge_Of_Carron_%28geograph_4810444%29_%28cropped%29.jpg 2x" data-file-width="3024" data-file-height="2016" /></a><div class="thumbcaption" style="padding: 0.25em 0; word-wrap: break-word;">The Carron Bridge arching over <span class="nowrap">the <a href="/wiki/River_Spey" title="River Spey">River Spey</a></span></div></div>
</div>
<ul><li>... that the <b><a href="/wiki/Carron_Bridge_(River_Spey)" title="Carron Bridge (River Spey)">Carron Bridge</a></b> <i>(pictured)</i> was the last cast-iron railway bridge to be built and used in Scotland?</li>
<li>... that <b><a href="/wiki/Isold%C3%A9_Elchlepp" title="Isoldé Elchlepp">Isoldé Elchlepp</a></b> began her career as a <a href="/wiki/Protest_song" title="Protest song">protest song</a> singer and later appeared as Wagner's <a href="/wiki/Lohengrin_(opera)" title="Lohengrin (opera)">Ortrud</a> at the <a href="/wiki/Bayreuth_Festival" title="Bayreuth Festival">Bayreuth Festival</a> and as Schoeck's <a href="/wiki/Penthesilea_(opera)" title="Penthesilea (opera)">Penthesilea</a> at the <a href="/wiki/Staatsoper_Hannover" title="Staatsoper Hannover">Staatsoper Hannover</a>?</li>
<li>... that an inscription at the <b><a href="/wiki/Tomb_of_Isa_Khan" title="Tomb of Isa Khan">tomb of Isa Khan</a></b> claims that it is an "asylum of paradise"?</li>

我对此进行了研究,但我需要知道的信息范围如此之大,任务如此之少

IEnumerable<string> listItemHtml = htmlDoc.DocumentNode.SelectNodes(
    @"//div[@class='Wiki']/div[@class='Did_you_know...']/ul/li")
    .Select(li => li.OuterHtml);
use this for search

标签: c#htmlhtml-agility-pack

解决方案


推荐阅读