首页 > 解决方案 > 稍微复杂的网页文本到变量解析

问题描述

我正在从网站获取文本并将其解析为变量。但是,我拉文本时得到的字符串有点复杂。网上好像是这样的。。。

Invoice #: 1267
Date: 4/16/2018 10:44:00 AM
PO #:
Reference:
Countermen: A/A

我遇到的问题是所有这些都是一个字符串。该字符串也会动态变化,因为某些订单输入了其他订单没有的文本。例如,有些订单的每个字段都已填写,而其他订单几乎没有字段填写。

Invoice #:
1267

<br>

Date:
4/16/2018 10:44:00 AM

<br>

PO #:

<br>

Reference:

<br>

Countermen:
A/A

这是我检查 Web 元素时显示的内容。

我想将信息解析为单独的字符串和整数以进行测试,并且我在处理字符串的整个“动态”部分时遇到了困难,因为有些字符串会更长,而有些字符串会更短。

如果有帮助,这里有一些实际网站的图片:

网页html代码

网站显示的内容

标签: c#parsingweb

解决方案


假设:

  1. 数据键和值由:
  2. 每个数据点由<br>

给定您的样本数据:

using System;
using System.Collections.Specialized;


public class Program
{
    public static void Main()
    {
        var str = @"Invoice #:
                    1267

                    <br>

                    Date:
                    4/16/2018 10:44:00 AM

                    <br>

                    PO #:

                    <br>

                    Reference:

                    <br>

                    Countermen:
                    A/A";

        //Array containing "raw string data"
        var raw = str.Split(new[]{"<br>"}, StringSplitOptions.RemoveEmptyEntries);

        //Just using a simple NVC, opt for something else based on your needs       
        var kvp = new NameValueCollection();

        //Go through the raw array we created earlier and
        // add the key/value pairs to our NameValueCollection, kvp
        Array.ForEach(raw, s =>
        {
            //Because of date/time, we'll restrict colon to first occurrence
            var data = s.Split(new [] {":"}, 2, StringSplitOptions.None);
            kvp.Add(data[0].Trim(), data[1].Trim());
        });


        /*
         * At this point, we have our "parsed" data in
         * key/value pairs, kvp and can use it as needed
         *
         */

        // We can loop through the kvp and simply display
        foreach(string k in kvp.Keys){
            Console.WriteLine("{0} = {1}", k, kvp[k]);
        }


        // We can assign values to variables we create
        var invNum = kvp["Invoice #"];
    }
}

输出:

Invoice # = 1267
Date = 4/16/2018 10:44:00 AM
PO # = 
Reference = 
Countermen = A/A

文档:NameValueCollection 类

嗯...


推荐阅读