c# - 通过 html 敏捷包从 wiki 获取数据
问题描述
我正在尝试通过 HTML 敏捷包从 wiki 主页(https://en.wikipedia.org/wiki/Main_Page )获取数据。您能否帮助 Xpath 查询以获取与“您知道...”表相关的所有数据。它实际上包含元素列表,我想得到它们。
<h2 id="mp-dyk-h2" style="clear:both; margin:0.5em; background:#cef2e0; font-family:inherit; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; color:#000; padding:0.2em 0.4em;"><span class="mw-headline" id="Did_you_know...">Did you know...</span></h2>
<div id="mp-dyk" style="padding:0.1em 0.6em 0.5em;">
<div class="dyk-img" style="float: right; margin-left: 0.5em;">
<div class="thumbinner mp-thumb" style="background: transparent; border: none; padding: 0; max-width: 180px;">
<a href="/wiki/File:Bridge_Of_Carron_(geograph_4810444)_(cropped).jpg" class="image" title="The Carron Bridge arching over the River Spey"><img alt="The Carron Bridge arching over the River Spey" src="//upload.wikimedia.org/wikipedia/commons/thumb/2/26/Bridge_Of_Carron_%28geograph_4810444%29_%28cropped%29.jpg/180px-Bridge_Of_Carron_%28geograph_4810444%29_%28cropped%29.jpg" decoding="async" width="180" height="120" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/2/26/Bridge_Of_Carron_%28geograph_4810444%29_%28cropped%29.jpg/270px-Bridge_Of_Carron_%28geograph_4810444%29_%28cropped%29.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/2/26/Bridge_Of_Carron_%28geograph_4810444%29_%28cropped%29.jpg/360px-Bridge_Of_Carron_%28geograph_4810444%29_%28cropped%29.jpg 2x" data-file-width="3024" data-file-height="2016" /></a><div class="thumbcaption" style="padding: 0.25em 0; word-wrap: break-word;">The Carron Bridge arching over <span class="nowrap">the <a href="/wiki/River_Spey" title="River Spey">River Spey</a></span></div></div>
</div>
<ul><li>... that the <b><a href="/wiki/Carron_Bridge_(River_Spey)" title="Carron Bridge (River Spey)">Carron Bridge</a></b> <i>(pictured)</i> was the last cast-iron railway bridge to be built and used in Scotland?</li>
<li>... that <b><a href="/wiki/Isold%C3%A9_Elchlepp" title="Isoldé Elchlepp">Isoldé Elchlepp</a></b> began her career as a <a href="/wiki/Protest_song" title="Protest song">protest song</a> singer and later appeared as Wagner's <a href="/wiki/Lohengrin_(opera)" title="Lohengrin (opera)">Ortrud</a> at the <a href="/wiki/Bayreuth_Festival" title="Bayreuth Festival">Bayreuth Festival</a> and as Schoeck's <a href="/wiki/Penthesilea_(opera)" title="Penthesilea (opera)">Penthesilea</a> at the <a href="/wiki/Staatsoper_Hannover" title="Staatsoper Hannover">Staatsoper Hannover</a>?</li>
<li>... that an inscription at the <b><a href="/wiki/Tomb_of_Isa_Khan" title="Tomb of Isa Khan">tomb of Isa Khan</a></b> claims that it is an "asylum of paradise"?</li>
我对此进行了研究,但我需要知道的信息范围如此之大,任务如此之少
IEnumerable<string> listItemHtml = htmlDoc.DocumentNode.SelectNodes(
@"//div[@class='Wiki']/div[@class='Did_you_know...']/ul/li")
.Select(li => li.OuterHtml);
use this for search
解决方案
推荐阅读
- css - 设置全局 Sass 变量
- php - Laravel 5.8 更新 eloquent 模型并在一行中调用访问器?
- node.js - 我怎样才能找到并更新许多文件?
- rest - 为什么颤振拒绝在 localhost:8000 或 127.0.01:8000 上连接?
- java - 如何使用 OpenDoPE 的内容控制数据绑定从模板创建 Word 文档
- python-3.x - 在 Ubuntu 18 下为 python 3.5 构建 boost
- javascript - 数据库时间戳返回有趣的值
- sql - 需要在我的表的某些列中插入空值
- python - 如何从屏幕中提取选定的矩形作为顶视图图像?
- perl - 如何使用 perl 按顺序获取匹配项