php - PHP preg_replace 仅拉丁字符
问题描述
当我为意大利的电子发票构建 XML 时,我需要过滤字符串。
仅接受来自特定的:
String1000LatinType
"[\p{IsBasicLatin}\p{IsLatin-1Supplement}]{1,1000}"
我不喜欢这个范围,但我认为:
a-z
, A-Z
, 0-9
, 重音如:à ò ù è é ì
和ç
符号如:, . _ - : ; '
和空格
我想直接从键盘中排除所有其他符号,例如:"£$%&/()=?^°§*+\|/<>
和tab
我尝试使用此功能进行转换,但我不是正则表达式的专家:
function sanitize($tag) {
$newtag = preg_replace ("/[\p{Latin}A-Z0-9a-z\-\_\.\,\:\;' ]/", "", $tag);
return $newtag;
}
$tag = "Qwerty 12345 £$%&/()=?^ èéòàùì +*°ç.,-_<>\/l'èok .,;:";
var_dump(sanitize($tag));
有人能帮我吗?
我想检索:
Qwerty 12345 èéòàùì ç.,-_l'èok .,;:
解决方案
经过一些测试,我创建了这个函数来满足我的目的:
function sanitize_string_xml($string, $opzioni = array()) {
$chr_map = array(
// Windows codepage 1252
"\xC2\x82" => "'", // U+0082⇒U+201A single low-9 quotation mark
"\xC2\x84" => '"', // U+0084⇒U+201E double low-9 quotation mark
"\xC2\x8B" => "'", // U+008B⇒U+2039 single left-pointing angle quotation mark
"\xC2\x91" => "'", // U+0091⇒U+2018 left single quotation mark
"\xC2\x92" => "'", // U+0092⇒U+2019 right single quotation mark
"\xC2\x93" => '"', // U+0093⇒U+201C left double quotation mark
"\xC2\x94" => '"', // U+0094⇒U+201D right double quotation mark
"\xC2\x9B" => "'", // U+009B⇒U+203A single right-pointing angle quotation mark
// Regular Unicode // U+0022 quotation mark (")
// U+0027 apostrophe (')
"\xC2\xAB" => '"', // U+00AB left-pointing double angle quotation mark
"\xC2\xBB" => '"', // U+00BB right-pointing double angle quotation mark
"\xE2\x80\x98" => "'", // U+2018 left single quotation mark
"\xE2\x80\x99" => "'", // U+2019 right single quotation mark
"\xE2\x80\x9A" => "'", // U+201A single low-9 quotation mark
"\xE2\x80\x9B" => "'", // U+201B single high-reversed-9 quotation mark
"\xE2\x80\x9C" => '"', // U+201C left double quotation mark
"\xE2\x80\x9D" => '"', // U+201D right double quotation mark
"\xE2\x80\x9E" => '"', // U+201E double low-9 quotation mark
"\xE2\x80\x9F" => '"', // U+201F double high-reversed-9 quotation mark
"\xE2\x80\xB9" => "'", // U+2039 single left-pointing angle quotation mark
"\xE2\x80\xBA" => "'", // U+203A single right-pointing angle quotation mark
);
$type = isset($opzioni['Type']) ? $opzioni['Type'] : ""; // IsBasicLatin /IsLatin
$lunghezzaMax = isset($opzioni['LunghezzaMax']) ? $opzioni['LunghezzaMax'] : "";
if ( $type == "IsBasicLatin" ) {
$unwanted_array = array( 'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', "ü" => "u", 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' );
$string = strtr( $string, $unwanted_array );
$string = preg_replace('/[^\x{0020}-\x{007E}]+/u', '', $string);
}
if ( $type == "IsLatin" ) {
$unwanted_array = array( 'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z' );
$string = strtr( $string, $unwanted_array );
$string = preg_replace('/[^\x{0020}-\x{007E}\x{00A0}-\x{00FF}]+/u', '', $string);
}
// CONVERTI GLI ACCENTI FUORI DAL RANGE IN APICI AMMESSI:
$chr = array_keys ($chr_map); // but: for efficiency you should
$rpl = array_values($chr_map); // pre-calculate these two arrays
$string = str_replace($chr, $rpl, html_entity_decode($string, ENT_QUOTES, "UTF-8"));
$string = htmlspecialchars(str_replace(PHP_EOL, " ", $string));
if ( $lunghezzaMax != "" ) {
$string = substr($string, 0, $lunghezzaMax);
}
return $string;
}
使用示例:
$clear_string = sanitize_string_xml($dirty_string, array("Type" => "IsLatin", "LunghezzaMax" => 60));
推荐阅读
- php - Laravel Passport 通过会话授予 API 令牌或限制对 js 文件的访问
- c - C:我需要遍历一个结构数组,但我不断收到错误校验和错误。我该如何解决?
- node.js - 在 Node JS 中使用 ssh2 保持连接保持活动状态
- r - ggplot2/ggpubr 可视化分组数据子集的全局 kruskal-wallis 的显着性水平
- android - 无法专注于另一个 Linearlayout 中的 EditText
- paypal - PayPal IPN 从第三方销售网站更新我的数据库,但不是从我自己的网站更新
- amazon-cloudformation - AWS cloudformation 将参数传递给 EC2 环境
- javascript - 在reactjs中生成缩略图作为按钮
- typescript - 打字稿中的可变参数
- python - 未保存 python 文件时计算机意外关闭