首页 > 解决方案 > 无法循环第二个路径中的项目有权访问

问题描述

我遇到了组件变量不能从第二个 mht 路径开始循环的问题。示例文件包含(file1.mht、file2.mht、file3.mht)。也许里面的包含是(aaaaaa,bbbbbb,cccccc)遵循文件的顺序。输出示例:file1.mht aaaaaa file2.mht bbbbbb file3.mht cccccc

当前结果为: 示例:file1.mht aaaaaa file2.mht aaaaaa file3.mht aaaaaa file1.mht aaaaaa

using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.Data;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
using System.Configuration;
using System.Collections.Specialized;

namespace ConsoleApp3
{
    class Program
    {
        static void Main(string[] args)
        {
            DirectoryInfo mht_file = new DirectoryInfo(@"C:\Users\manchunl\Desktop\ADVI-test\");
            string mht_text = "";

            foreach (FileInfo f in mht_file.GetFiles("*.mht"))
            {
                try
                {
                    using (StreamReader sr = new StreamReader(f.FullName))
                    {
                        string line;

                        while ((line = sr.ReadLine()) != null)
                        {
                            if (line.EndsWith("="))
                            {
                                line = line.Substring(0, line.Length - 1);
                            }
                            mht_text += line;
                        }
                    }

                    int start_index = mht_text.IndexOf("<HTML ");
                    int end_index = mht_text.IndexOf("</HTML>");

                    mht_text = mht_text.Substring(start_index, end_index + 7 - start_index);

                    mht_text = mht_text.Replace("=0D", "");
                    mht_text = mht_text.Replace("=00", "");
                    mht_text = mht_text.Replace("=0A", "");
                    mht_text = mht_text.Replace("=3D", "=");

                    HtmlDocument doc = new HtmlDocument();
                    doc.LoadHtml(mht_text);

                    var table = doc.DocumentNode.SelectSingleNode("//table[3]");
                    string component = table.SelectSingleNode(".//tr[4]").SelectSingleNode(".//td[2]").InnerHtml;

                    Console.WriteLine(f.FullName + "  " + component);

                    File.AppendAllText(@"C:\Users\manchunl\Desktop\ADVI-test\result\dataCollection.txt", f.FullName + component + Environment.NewLine);

                }
                catch (Exception e)
                {

                }

            }
            Console.ReadKey();
        }


    }

}

标签: c#

解决方案


一般建议:最小化变量范围。您mht_text在循环内使用,它不应该在迭代之间共享。

您的错误是string mht_text = "";在循环之外声明的。结果在第二次迭代中它不是空的。

第一次迭代:mht_text = "<HTML>aaaaaa</HTML>".

第二次迭代:mht_text = "<HTML>aaaaaa</HTML><HTML>bbbbbbb</HTML>".

startIndexendIndex找到第一个 HTML 标记。


推荐阅读