首页 > 解决方案 > 允许的字符正在被百分比编码

问题描述

字符串方法的文档addingPercentEncoding(withAllowedCharacters:)

通过用百分比编码的字符替换所有不在指定集中的字符,返回由接收器生成的新字符串。

预定义的集合CharacterSet.alphanumerics说:

返回一个字符集,其中包含 Unicode 通用类别 L*、M* 和 N* 中的字符。

(字母) 类别由 5 个子类别组成L:、、、、。所以我假设意味着 L 的所有子类别。LlLmLtLuLoL*

我将选择查看Ll子类别 ( https://www.compart.com/en/unicode/category/Ll#UNC_SCRIPTS ),然后选择字符"æ"(U+00E6)。然后我可以看到字母数字字符集确实包含这个字符。但是当我将百分比编码添加到包含此字符的字符串时,它会得到百分比编码。

"\u{E6}" // "æ"
CharacterSet.alphanumerics.contains("\u{E6}") // true
"æ".addingPercentEncoding(withAllowedCharacters: .alphanumerics) // "%C3%A6" 

// Let's try with "a"
"\u{61}" // "a"
CharacterSet.alphanumerics.contains("\u{61}") // true
"a".addingPercentEncoding(withAllowedCharacters: .alphanumerics) // "a"

为什么会这样?它在我传入的允许字符集中,所以它不应该被替换,对吧?

我觉得这与"a"(U+0061) 也在0x61UTF-8( "æ"U+00E6)的事实有关[0xC3, 0xA6];不是0xE6。或者它占用超过1个字节?

String(data: Data([0x61]), encoding: .utf8)! // "a"
String(data: Data([0xC3, 0xA6]), encoding: .utf8)! // "æ"
String(data: Data([0xE6]), encoding: .utf8)! // crashes 

更新

是因为百分比编码算法将字符串转换为Data一次并通过 1 个字节吗?所以它会看看0xC3哪个不是允许的字符,所以得到百分比编码。然后它会查看0xA6哪个也不是允许的字符,因此也可以进行百分比编码。所以从技术上讲,允许的字符必须是一个字节?

标签: swift

解决方案


真正允许的字符必须在允许的字符集中,并且是 ASCII 字符。感谢@alobaili 指出这一点。

如果你很好奇,预定义的集合总共CharacterSet.alphanumerics包含129172字符,但只有在将此集合传递给字符串的方法62时才真正允许。addingPercentEncoding(allowedSet:)

可以像这样快速检查特定中所有真正允许的字符CharacterSet

func inspect(charSet: CharacterSet) {
    var characters: [String] = []
    for char: UInt8 in 0..<128 { // ASCII range
        let u = UnicodeScalar(char)
        if charSet.contains(u) {
            characters.append(String(u))
        }
    }
    print("Characters:", characters.count)
    print(characters)
}

inspect(charSet: .alphanumerics) // [a-z, A-Z, 0-9]

这很方便,因为您不能简单地遍历CharacterSet. 了解那些允许的元素是什么会很有用。例如,预定义CharacterSet.urlQueryAllowed只说:

返回查询 URL 组件中允许的字符的字符集。

我们可以知道那些允许的字符是什么:

inspect(charSet: .urlQueryAllowed)

// Characters: 81
// ["!", "$", "&", "\'", "(", ")", "*", "+", ",", "-", ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "=", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "_", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "~"]

纯娱乐

还有另一种(长而可靠)的方法,它查看集合中的所有字符(不仅仅是 ASCII 字符),并将字符本身的字符串与添加百分比编码后的字符串进行比较,其中仅包含该字符允许集。当这两个相等时,您就知道它确实是允许的字符。改编自这篇有用的文章的代码。

func inspect(charSet: CharacterSet) {
    var characters: [String] = []
    var allowed: [String] = []
    var asciiCount = 0
    for plane: UInt8 in 0..<17 {
        if charSet.hasMember(inPlane: plane) {
            let planeStart = UInt32(plane) << 16
            let nextPlaneStart = (UInt32(plane) + 1) << 16
            for char: UTF32Char in planeStart..<nextPlaneStart {
                if let u = UnicodeScalar(char), charSet.contains(u) {
                    let s = String(u)
                    characters.append(s)
                    
                    if s.addingPercentEncoding(withAllowedCharacters: CharacterSet([u])) == s {
                        allowed.append(s)
                    }
                    
                    if u.isASCII {
                        asciiCount += 1
                    }
                }
            }
        }
    }
    print("Characters:", characters.count)
    print("Allowed:", allowed.count)
    print("ASCII:", asciiCount)
}

inspect(charSet: .alphanumerics)

// Characters: 129172
// Allowed: 62
// ASCII: 62

推荐阅读