c# - c# HTMLAgilityPack 删除节点
问题描述
我真的很陌生使用HTMLAgilityPack
. 我有以下 HTML 文档:
<a href="https://twitter.com/RedGiantNews" target="_blank"><img
src="http://image.e.redgiant.com/lib/998.png" width="24" border="0"
alt="Twitter" title="Twitter" class="smImage"></a><a
href="https://www.facebook.com/RedGiantSoftware" target="_blank"><img
src="http://image.e.redgiant.com/lib/db5.png" width="24" border="0"
alt="Facebook" title="Facebook" class="smImage"></a>
http://click.e.redgiant.com/?qs=d2ad061f
<a href="https://www.instagram.com/redgiantnews/" target="_blank"><img
src="http://image.e.redgiant.com/aa10-f8747e56f06d.png" width="24"
border="0" alt="Instagram" title="Instagram" class="smImage"></a>
我正在尝试删除所有图像,我的意思是<img....>
html 文件中的所有节点(如果这是正确的话)。我从 StackOverflow 上的另一个解决方案中尝试了以下代码,但徒劳无功,因为它返回与上面相同的 HTMl:
var sb = new StringBuilder();
doc.LoadHtml(inputHTml);
foreach (var node in doc.DocumentNode.ChildNodes)
{
if (node.Name != "img" && node.Name!="a")
{
sb.Append(node.InnerHtml);
}
}
解决方案
static string OutputHtml = @"<a href=""https://twitter.com/RedGiantNews"" target=""_blank""><img
src=""http://image.e.redgiant.com/lib/998.png"" width=""24"" border=""0""
alt=""Twitter"" title=""Twitter"" class=""smImage""></a><a
href = ""https://www.facebook.com/RedGiantSoftware"" target=""_blank""><img
src = ""http://image.e.redgiant.com/lib/db5.png"" width=""24"" border=""0""
alt=""Facebook"" title=""Facebook"" class=""smImage""></a>
<a href = ""https://www.instagram.com/redgiantnews/"" target=""_blank""><img
src = ""http://image.e.redgiant.com/aa10-f8747e56f06d.png"" width=""24""
border=""0"" alt=""Instagram"" title=""Instagram"" class=""smImage""></a>";
我从原始 html 字符串中删除了浮动链接 ( http://click.e.redgiant.com/?qs=d2ad061f )。
方法一:
public static string RemoveAllImageNodes(string html)
{
try
{
HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(html);
var nodes = document.DocumentNode.SelectNodes("//img");
foreach (var node in nodes)
{
node.Remove();
//node.Attributes.Remove("src"); //This only removes the src Attribute from <img> tag
}
html = document.DocumentNode.OuterHtml;
return html;
}
catch (Exception ex)
{
throw ex;
}
}
方法二:
public static string RemoveAllImageNodes(string html)
{
try
{
HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(html);
if (document.DocumentNode.InnerHtml.Contains("<img"))
{
foreach (var eachNode in document.DocumentNode.SelectNodes("//img"))
{
eachNode.Remove();
//eachNode.Attributes.Remove("src"); //This only removes the src Attribute from <img> tag
}
}
html = document.DocumentNode.OuterHtml;
return html;
}
catch (Exception ex)
{
throw ex;
}
}
输出 HTML:
<a href="https://twitter.com/RedGiantNews" target="_blank"></a>
<a href="https://www.facebook.com/RedGiantSoftware" target="_blank"></a>
<a href="https://www.instagram.com/redgiantnews/" target="_blank"></a>
输出 Html - 仅从“img”标签中删除“src”属性后:
<a href="https://twitter.com/RedGiantNews" target="_blank"><img width="24" border="0" alt="Twitter" title="Twitter" class="smImage"></a>
<a href="https://www.facebook.com/RedGiantSoftware" target="_blank"><img width="24" border="0" alt="Facebook" title="Facebook" class="smImage"></a>
<a href="https://www.instagram.com/redgiantnews/" target="_blank"><img width="24" border="0" alt="Instagram" title="Instagram" class="smImage"></a>
推荐阅读
- android - secure tcp socket and trust
- javascript - 如何使用正则表达式排除单词?
- python - python3连接python字典的函数不起作用
- php - 处理 sms api 的结果
- python - 重新排列python数据框索引和列
- spring - Spring Boot Rest Service 是否需要“生产”和“消费”属性?
- python - 如何使用 selenium/python 获取没有类/id 的文本?
- docker - 如何编写 dockerfile 以从我的 github 中正确提取代码
- groovy - 如何从 groovy 中的 json 对象中获取值
- c++ - g++ 中的 system() 函数