vb.net - 删除字典中的重复项
问题描述
嗨,我有一本字典,其中填充了由正则表达式匹配的实体。它正确地提取了所有数据,除了它也带来了重复的数据。如何防止重复数据进入?
这是我的代码
Dim largeFilePath As String = newMasterFilePath
Dim lines1 = File.ReadLines(largeFilePath).ToList 'don't use ReadAllLines
Dim reg = New Regex("\<\!NOTATION.*$|\<\!ENTITY.*$", RegexOptions.IgnoreCase)
Dim entities = From line In lines1
Where reg.IsMatch(line)
Dim dictionary As New Dictionary(Of Integer, String)
Dim idx = -1
For Each s In entities
idx = lines1.IndexOf(s, idx + 1)
dictionary.Add(idx, s)
Next
Dim deletedItems = 0
For Each itm In dictionary
lines1.RemoveAt(itm.Key - deletedItems)
deletedItems += 1
Next
For Each s In dictionary.Values
lines1.Insert(1, s)
Next
我期望每个项目只有一个条目。
这是示例代码
<!DOCTYPE DOC PUBLIC "-//USA-DOD//DTD 38784STD-BV7//EN"[
<!ENTITY cdcs_5-35.wmf SYSTEM "graphics\CDCS_5-35.wmf" NDATA wmf>
<!ENTITY cdcs_2-2a.wmf SYSTEM "graphics\CDCS_2-2A.wmf" NDATA wmf>
<!NOTATION bmp SYSTEM "bmp">
<!NOTATION svg SYSTEM "svg">
<!NOTATION png SYSTEM "png">
<doc service="xs" docid="BKw46" docstat="formal" verstatpg="ver" cycle="1" chglevel="1">
<front numcols="1">
<idinfo>
<?Pub Lcl _divid="100" _parentid="0">
<tmidno>Life with Pets</tmidno>
<chgnum>Change 1</chgnum>
<chgdate>2 August 2018</chgdate>
<chghistory>
<!NOTATION bmp SYSTEM "bmp">
<!NOTATION svg SYSTEM "svg">
<!NOTATION png SYSTEM "png">
<chginfo>
<chgtxt>Change 1</chgtxt>
<date>2 August 2018</date>
</front>
<!ENTITY cdcs_2-19.wmf SYSTEM "graphics\CDCS_2-19.wmf" NDATA wmf>
<!ENTITY cdcs_3-5.wmf SYSTEM "graphics\CDCS_3-5.wmf" NDATA wmf>
<!ENTITY cdcs_4-48.wmf SYSTEM "graphics\CDCS_4-48.wmf" NDATA wmf>
<body numcols="1">
<chapter>
<title>This is chapter 1</title>
<!ENTITY cdcs_2-5.wmf SYSTEM "graphics\CDCS_2-5.wmf" NDATA wmf>
<!ENTITY cdcs_2-24.wmf SYSTEM "graphics\CDCS_2-24.wmf" NDATA wmf>
<para0>
<title>Climb the ladder immedietly</title>
<para>Retrieve the cat.</para></para0></chapter>
<chapter>
<title>Don't forget to feed the dog</title>
<!ENTITY cdcs_2-5.wmf SYSTEM "graphics\CDCS_2-5.wmf" NDATA wmf>
<!ENTITY cdcs_2-24.wmf SYSTEM "graphics\CDCS_2-24.wmf" NDATA wmf>
<para0>
<!ENTITY cdcs_4-48.wmf SYSTEM "graphics\CDCS_4-48.wmf" NDATA wmf>
<title>Prep for puppies</title>
<para>Puppies are cute</para></para0>
</chapter>
</body>
</doc>
谢谢你在这方面帮助我。最大限度
解决方案
或者,在添加之前检查是否重复:
For Each s In entities
If Not dictionary.TryGetValue(lines1.IndexOf(s, idx + 1), s) Then
idx = lines1.IndexOf(s, idx + 1)
dictionary.Add(idx, s)
End If
Next
推荐阅读
- sql - 如何从同一个月获取最大日期行?
- windows - Adobe Illustrator:在 Windows 中从 Extendscript 运行 python 文件
- github - 我的哪些存储库被 GitHub 归档在北极?
- typescript - 如何从 TypeScript 中的记录类型中提取 varargs 参数?
- django - 获取字段名称列表但无法访问对象中的值
- clojure - Clojure 函数和线程宏
- c# - 如何使用 c# jquery 访问文件上传文件中的列名?
- itext - 我需要使用 itext7 或 itextsharp 从 pdf 文件中提取文本,并使用粗体字体在所有单词周围放置 html 标记以表示粗体
- ios - 应用程序的文件在模拟器的“文件”应用程序中可见,但在设备上不可见
- java - 为什么基类的实例选择执行父构造函数的基方法?