首页 > 解决方案 > 我需要将文本分成段落以获取字符串列表。分隔符将以给定的模式重复

问题描述

在本文中,有几行带有 Feature 一词,后面会跟着几行,直到找到 "(引号)。

我有兴趣获得这两个定界符之间的中间部分。只要 Feature 是该行中唯一的词。

例如:

bla bla bla bla bla bla Feature
bla bla bla bla bla bla bla bla

Feature

ble bla bla bla bla

"bla bla bla bla bla blabla bla 
bla bla bla bla bla" Feature bla bla bla bla 

Feature 

bla bla bla bla bla

"bla bla bla bla bla blabla bla 
bla bla bla bla bla bla bla bla bla 

结果将是:ble bla bla bla bla,bla bla bla bla bla

而且这种模式会一遍又一遍地重复,我需要提取单词 Feature 和以下 " 之间的部分并将段落存储到列表中。在网上我只能找到一种方法来提取字符串,而不是集合其中。我只在Features这个词是其行中唯一的词时才提取它。此外,方法split也不起作用,因为Feature这个词只需要在一行上,而引号需要是下一个

另一个例子:

    bla bla bla bla

    Feature

    ble ble ble


    " blu blu blu feature "

    bli bli bli

    Feature
    blip blop ble

    blip blop blup

    " blo blo blo

这个输出将是:ble ble ble,blip blop ble blip blop blup

谢谢您的帮助

标签: c#regexstring

解决方案


这是做你想做的吗?它将捕获段落

Paragraph [0] - bla bla bla bla bla
Paragraph [1] - bla bla bla bla bla

如果您需要捕获不同的位,可以调整正则表达式。

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using CommandLine;
using CommandLine.Text;

namespace ConsoleApplication1
{
    class Program
    {

    static int Main(string[] args)
    {
        string input = @"bla bla bla bla bla bla Feature
bla bla bla bla bla bla bla bla

Feature

bla bla bla bla bla

""bla bla bla bla bla blabla bla 
bla bla bla bla bla"" Feature bla bla bla bla 

Feature 

bla bla bla bla bla

""bla bla bla bla bla blabla bla 
bla bla bla bla bla bla bla bla bla ";

        //Matches:
        //  Any line starting with Feature (with optional whitespace)                   ^\s*Feature
        //  followed by newline (with optional whitespace)                              \s*\r\n
        //  then capturing anything that isn't a quote "                                ([^""]*)
        //  then ending with a quote                                                    \""
        Regex r = new Regex(@"^\s*Feature\s*\r\n([^""]*)\""",RegexOptions.Singleline | RegexOptions.Multiline);

        List<string> paragraphs = new List<string>();

        foreach (Match match in r.Matches(input))
            paragraphs.Add(match.Groups[1].Value.Trim());

        for (int i = 0; i < paragraphs.Count; i++)
            Console.WriteLine("Paragraph [{0}] - {1}", i, paragraphs[i]);

        Console.Read();
        return 0;
    }
}
}

推荐阅读