c# - 如何在 C# 中的多个 XML 文件中搜索 XElement 属性值?
问题描述
我的 XML 内容类似于:
<p class="toc-title"><a id="page_5"></a>Inhoud</p>
<p class="toc-fm"><a href="___.html#foreword">Woord vooraf</a></p>
<p class="toc-fm"><a href="___.html#Inleiding">Inleiding: wat is verslaving?</a></p>
<p class="toc-ch"><a href="___.html#Chapter01"><span class="toc-num">1. </span>Verslaving en leegte</a></p>
<p class="toc-h1"><a href="___.html#h1_1"><i>Eten</i></a></p>
<p class="toc-ch"><a href="___.html#Chapter02"><span class="toc-num">2. </span>Zelfafwijzing en zelfveroordeling</a></p>
<p class="toc-h1"><a href="___.html#h1_2"><i>Social media</i></a></p>
<p class="toc-ch"><a href="___.html#Chapter03"><span class="toc-num">3. </span>Beperking van je vrijheid</a></p>
<p class="toc-h1"><a href="___.html#h1_3"><i>Macht, aanzien en bezit</i></a></p>
<p class="toc-h1"><a href="___.html#h1_4"><i>Pornoverslaving</i></a></p>
<p class="toc-ch"><a href="___.html#Chapter04"><span class="toc-num">4. </span>Verslaving en het brein</a></p>
<p class="toc-h1"><a href="___.html#h1_5"><i>Roken</i></a></p>
<p class="toc-ch"><a href="___.html#Chapter05"><span class="toc-num">5. </span>Risicofactoren en beschermende factoren voor verslaving: uitdagingen voor de sociale en kerkelijke omgeving</a></p>
<p class="toc-h1"><a href="___.html#h1_6"><i>Gamen</i></a></p>
<p class="toc-ch"><a href="___.html#Chapter06"><span class="toc-num">6. </span>Verslaving als psychiatrische stoornis: psychiatrische klachten in combinatie met verslavingsproblematiek</a></p>
<p class="toc-h1"><a href="___.html#h1_7"><i>Medicijnverslaving</i></a></p>
<p class="toc-ch"><a href="___.html#Chapter07"><a id="page_6"></a><span class="toc-num">7. </span>Verslaving in het gezin</a></p>
<p class="toc-h1"><a href="___.html#h1_8"><i>Afhankelijkheid</i></a></p>
<p class="toc-ch"><a href="___.html#Chapter08"><span class="toc-num">8. </span>Verslaving en geloof: wetenschappelijk onderzoek</a></p>
<p class="toc-h1"><a href="___.html#h1_9"><i>Alcohol</i></a></p>
<p class="toc-ch"><a href="___.html#Chapter09"><span class="toc-num">9. </span>Is verslaving een ziekte?</a></p>
<p class="toc-h1"><a href="___.html#h1_10"><i>Drugs</i></a></p>
<p class="toc-ch"><a href="___.html#Chapter10"><span class="toc-num">10. </span>Herstel in vier relaties: behandeling en begeleiding bij verslavingsproblematiek</a></p>
<p class="toc-h1"><a href="___.html#h1_11"><i>Woede en gekrenktheid</i></a></p>
<p class="toc-ch"><a href="___.html#Chapter11"><span class="toc-num">11. </span>De man op de bank: aandachtspunten en valkuilen voor het pastoraat</a></p>
<p class="toc-h1"><a href="___.html#h1_12"><i>Gokken</i></a></p>
<p class="toc-bm"><a href="___.html#Literatuur">Geraadpleegde literatuur</a></p>
<p class="toc-bm"><a href="___.html#Personalia">Personalia</a></p>
我有 n 个 XML 文件,从 15 到 350 个文件不等。我想获取<a href>
字符串中的每个属性值,并在所有 XML 文件(包括这个文件)中搜索相同的值Attribute("id")
(如果有的话)。如果找到,我将替换___
XML 文件名并将其放在找到的位置。
我已经完成了以下操作,但速度非常慢。你能以更快的方式帮助我吗?
这是我的代码:
string pathFolder = TextBoxPath.Text;
DirectoryInfo directoryInfo = new DirectoryInfo(pathFolder);
string pathParent = directoryInfo.Parent.FullName;
string textFolder = Path.Combine(pathParent, "Text");
Regex filePattern = new Regex("\\d{13}");
if (ePUBv2CheckBox.IsChecked == true)
{
fileFolder = pathFolder;
}
else
{
fileFolder = textFolder;
}
string getISBNFile = Directory.GetFiles(fileFolder, "*.css", SearchOption.AllDirectories)
.Where(fileName => filePattern.IsMatch(Path.GetFileNameWithoutExtension(fileName))).FirstOrDefault();
string pathFileNameFolder = Path.GetFileNameWithoutExtension(getISBNFile);
List<string> getAllChapters = new List<string>();
List<string> getAllChaptersButThis = new List<string>();
if (ePUBv2CheckBox.IsChecked == true)
{
getAllChapters = Directory.GetFiles(fileFolder, "*.html")
.Where(name => !(Path.GetFileName(name).Contains(pathFileNameFolder) || Path.GetFileName(name).ToLower().Contains("cover")))
.ToList();
}
else
{
getAllChapters = Directory.GetFiles(fileFolder, "*.xhtml")
.Where(name => !(Path.GetFileName(name).ToLower().Contains("cover")))
.ToList();
}
foreach (var eachChapter in getAllChapters)
{
string nameForSaving = Path.GetFileName(eachChapter);
XDocument newChapter = XDocument.Load(eachChapter);
XNamespace newNamespace = newChapter.Root.GetDefaultNamespace();
List<XElement> hrefAttributes = newChapter.Descendants(newNamespace + "a")
.Where(at => at.Attribute("href") != null
&& (at.Attribute("href").Value.Contains(".xhtml")
|| at.Attribute("href").Value.Contains(".html")))
.ToList();
if (hrefAttributes.Count() > 0)
{
foreach (XElement hrefUnique in hrefAttributes)
{
string hrefValue = hrefUnique.Attribute("href").Value;
string hrefLink = hrefValue.Substring(hrefValue.IndexOf("#") + 1);
foreach (var anotherChapter in getAllChapters)
{
string fileName = Path.GetFileNameWithoutExtension(anotherChapter);
fileName = fileName.Substring(fileName.IndexOf("_") + 1).Replace("_", String.Empty);
XDocument temporaryDocument = XDocument.Load(anotherChapter);
foreach (XElement breakChapter in temporaryDocument.Descendants())
{
List<XElement> getListID = breakChapter.Descendants().Where(at => at.Attribute("id") != null
&& at.Attribute("id").Value.Equals(hrefLink, StringComparison.InvariantCultureIgnoreCase))
.ToList();
if (getListID.Count() > 0 || fileName.Equals(hrefLink, StringComparison.InvariantCultureIgnoreCase))
{
string getChapterFile = getAllChapters.FirstOrDefault(ch => Path.GetFileName(ch)
.Contains(fileName));
hrefUnique.SetAttributeValue("href", Path.GetFileName(getChapterFile) + "#" + hrefLink);
break;
}
}
newChapter.Save(fileFolder + "\\" + nameForSaving);
}
}
}
}
解决方案
如果目标只是更新具有以 开头的属性的所有<a>
元素,则只需搜索这些元素并更新属性值。我不确定你在用剩下的代码做什么,但你需要的是这样的:href
"___"
void UpdateFile(string path)
{
var doc = XDocument.Load(path);
var filename = Path.GetFileNameWithoutExtension(path);
foreach (var a in doc.XPathSelectElements("//a[starts-with(@href, '___.')]"))
{
var href = a.Attribute("href");
href.Value = href.Value.Replace("___", filename);
}
doc.Save(path);
}
推荐阅读
- sql - Oracle 多列以逗号分隔的项目到行。所有列中的第一个元素应该去第一行,第二到第二等等
- c++ - 并行减少(例如求和)hpx::futures 的向量
- sockets - Golang - 为到不同服务器的多个连接扩展 websocket 客户端
- c# - Acumatica PXGrid PXDropDown 值更改不会更新缓存
- sql-server - 将记录从一个 SQL 实例上的表复制到不同 SQL 实例上的相同表
- java - 多表的 DTO 保存设计
- python - 将字典值加载到熊猫数据框的行值中
- php - Symfony 查询生成器,统计搜索结果
- typescript - Typescript Mongoose 在 VS Code 中获取 Schema 字段的 IntelliSense 或警告
- colors - 如何控制 Ansible 消息的颜色(前景和背景)?