首页 > 解决方案 > Regex Match base64 string and remove non-matching text/charcters

问题描述

I have a valid Base 64 encoded string - someBase64String to which an invalid suffix (e.g. _002_CWLP265MB136330847F70EDE0813A5AC4C3A80CWLP265MB1363GBRP_--) has been added.

I want to split the string back to valid Base 64 encoded prefix and invalid suffix.

This Matches a base 64 encoded string

Regex rg = new Regex("^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$"); 

rg.Match(someBase64String) will return true if it is a valid base 64 string

When added to the someBase64String this string --_002_CWLP265MB136330847F70EDE0813A5AC4C3A80CWLP265MB1363GBRP_-- is added at the end of the base64encoded string which causes the conversion of the whole base64 string to fail. the rg.Match(someBase64String) will also return false.

In the case of an invalid base64 string(as described above), I need to extract the part/characters of the string that causes the conversion to fail; namely _002_CWLP265MB136330847F70EDE0813A5AC4C3A80CWLP265MB1363GBRP_-- in this case.

标签: c#regexencodingbase64

解决方案


Why not filter out the characters with a help of Linq? If you want to take base 64 symbols up to invalid suffix use Take instead of Skip

 string wrongSince = string.Concat(source
   .SkipWhile(c => c >= 'A' && c <= 'Z' ||
                   c >= 'a' && c <= 'z' ||
                   c >= '0' && c <= '9' ||
                   c == '+' ||
                   c == '/'));

Test:

string valid = "ABC+DEF+123";
string suffix = "_002_CWLP265MB136330847F70EDE0813A5AC4C3A80CWLP265MB1363GBRP_--";

string source = valid + suffix;

string wrongSince = string.Concat(source
   .SkipWhile(c => c >= 'A' && c <= 'Z' ||
                   c >= 'a' && c <= 'z' ||
                   c >= '0' && c <= '9' ||
                   c == '+' ||
                   c == '/'));

 string correctPrefix = string.Concat(source
   .TakeWhile(c => c >= 'A' && c <= 'Z' ||
                   c >= 'a' && c <= 'z' ||
                   c >= '0' && c <= '9' ||
                   c == '+' ||
                   c == '/'));

Console.WriteLine(wrongSince);
Console.WriteLine(correctPrefix); 

Outcome:

_002_CWLP265MB136330847F70EDE0813A5AC4C3A80CWLP265MB1363GBRP_--
ABC+DEF+123

推荐阅读