首页 > 解决方案 > RegEx 用于查找引号之间的匹配项

问题描述

标签: javaregexregex-lookaroundsregex-negationregex-greedy

解决方案


您可以使用 Java 代码进行 CSV 样式混合,但必须更改正则表达式。

爪哇

import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{

    public static List<String> getList(String value)
    {
        String regex = "(?:(?:^|,|\\r?\\n)\\s*)(?:(?:(\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\"|“[^“”\\\\]*(?:\\\\[\\S\\s][^“”\\\\]*)*”))(?:\\s*(?:(?=,|\\r?\\n)|$))|([^,]*)(?:\\s*(?:(?=,)|$)))"; 
        List<String> allMatches = new ArrayList<String>();
        if ( value.length() > 0  )
        {
            Matcher m = Pattern.compile( regex ).matcher( value );
            while ( m.find() ) {
                String str = m.group(2);
                if ( str == null ) {
                    str = m.group(1);
                    str = str.replaceAll( "^[\"“”]|[\"“”]$", "" );
                }
                allMatches.add(str.trim());
            }
        }
        return allMatches;
    }


    public static  void main (String[] args) throws java.lang.Exception
    {
        List<String>  result = getList("400,test,\"QT_don't split, this_QT\",15");
        System.out.println( result );

        result = getList("500,test,“LQT_don't split, this_RQT”,15");
        System.out.println( result );

        result = getList("600,test,\"QT_don't split, this_QT\",15");
        System.out.println( result );

    }
}

https://ideone.com/b8Wnz9

输出

[400, test, QT_don't split, this_QT, 15]
[500, test, LQT_don't split, this_RQT, 15]
[600, test, QT_don't split, this_QT, 15]

正则表达式扩展

 (?:
      (?: ^ | , | \r? \n )          # Delimiter comma or newline
      \s*                           # leading optional whitespaces
 )
 (?:                           # Double Quoted field
      (?:
           "                             # Quoted string field ""
           (                             # (1), double quoted string data
                [^"\\]* 
                (?: \\ [\S\s] [^"\\]* )*
           )
           "

        |                              # or

           “                             # Quoted string field Left/right double quotes “”   
           (                             # (2), double quoted string data
                [^“”\\]* 
                (?: \\ [\S\s] [^“”\\]* )*
           )
           ”
      )
      (?:
           \s*                           # trailing optional whitespaces
           (?:
                (?= , | \r? \n )              # Delimiter ahead, comma or newline
             |  $ 
           )
      )
   |                              # OR
      ( [^,]* )                     # (3), Non quoted field
      (?:
           \s*                           # trailing optional whitespaces 
           (?:
                (?= , )                       # Delimiter ahead, comma
             |  $ 
           )
      )
 )

推荐阅读