首页 > 解决方案 > 如何在另一个字符串中找到一个字符串,忽略一些字符?

问题描述

背景

假设您希望从格式化的电话号码中查找部分文本,并且您希望标记查找结果。

例如,如果您有这个电话号码:“+972 50-123-4567”,并且您搜索 2501 ,您将能够标记其中的文本“2 50-1”。

查询的哈希图和预期结果的更多示例,如果要搜索的文本是“+972 50-123-45678”,并且允许的字符是“01234567890+*#”:

    val tests = hashMapOf(
            "" to Pair(0, 0),
            "9" to Pair(1, 2),
            "97" to Pair(1, 3),
            "250" to Pair(3, 7),
            "250123" to Pair(3, 11),
            "250118" to null,
            "++" to null,
            "8" to Pair(16, 17),
            "+" to Pair(0, 1),
            "+8" to null,
            "78" to Pair(15, 17),
            "5678" to Pair(13, 17),
            "788" to null,
            "+ " to Pair(0, 1),
            "  " to Pair(0, 0),
            "+ 5" to null,
            "+ 9" to Pair(0, 2)
    )

问题

你可能会想:为什么不直接使用“indexOf”或者清理字符串并找到出现的地方呢?

但这是错误的,因为我想标记出现,忽略途中的一些字符。

我试过的

在我研究了一段时间后,我实际上有了答案。只是想分享它,并有选择地看看是否有人可以编写更好/更短的代码,这将产生相同的行为。

我之前有一个解决方案,它非常短,但它假设查询只包含允许的字符。

问题

好吧,这次没有问题,因为我自己找到了答案。

但是,再一次,如果你能想到一个更优雅和/更短的解决方案,它和我写的一样有效,请告诉我。

我很确定正则表达式在这里可能是一个解决方案,但它们有时往往是不可读的,并且与精确代码相比也非常低效。仍然很高兴知道这种问题将如何解决它。也许我也可以对它进行一个小基准测试。

标签: searchkotlintext

解决方案


好的,这是我的解决方案,包括一个测试它的示例:

文本搜索工具.kt

object TextSearchUtil {
    /**@return where the query was found. First integer is the start. The second is the last, excluding.
     * Special cases: Pair(0,0) if query is empty or ignored, null if not found.
     * @param text the text to search within. Only allowed characters are searched for. Rest are ignored
     * @param query what to search for. Only allowed characters are searched for. Rest are ignored
     * @param allowedCharactersSet the only characters we should be allowed to check. Rest are ignored*/
    fun findOccurrenceWhileIgnoringCharacters(text: String, query: String, allowedCharactersSet: HashSet<Char>): Pair<Int, Int>? {
        //get index of first char to search for
        var searchIndexStart = -1
        for ((index, c) in query.withIndex())
            if (allowedCharactersSet.contains(c)) {
                searchIndexStart = index
                break
            }
        if (searchIndexStart == -1) {
            //query contains only ignored characters, so it's like an empty one
            return Pair(0, 0)
        }
        //got index of first character to search for
        if (text.isEmpty())
        //need to search for a character, but the text is empty, so not found
            return null
        var mainIndex = 0
        while (mainIndex < text.length) {
            var searchIndex = searchIndexStart
            var isFirstCharToSearchFor = true
            var secondaryIndex = mainIndex
            var charToSearch = query[searchIndex]
            secondaryLoop@ while (secondaryIndex < text.length) {
                //skip ignored characters on query
                if (!isFirstCharToSearchFor)
                    while (!allowedCharactersSet.contains(charToSearch)) {
                        ++searchIndex
                        if (searchIndex >= query.length) {
                            //reached end of search while all characters were fine, so found the match
                            return Pair(mainIndex, secondaryIndex)
                        }
                        charToSearch = query[searchIndex]
                    }
                //skip ignored characters on text
                var c: Char? = null
                while (secondaryIndex < text.length) {
                    c = text[secondaryIndex]
                    if (allowedCharactersSet.contains(c))
                        break
                    else {
                        if (isFirstCharToSearchFor)
                            break@secondaryLoop
                        ++secondaryIndex
                    }
                }
                //reached end of text
                if (secondaryIndex == text.length) {
                    if (isFirstCharToSearchFor)
                    //couldn't find the first character anywhere, so failed to find the query
                        return null
                    break@secondaryLoop
                }
                //time to compare
                if (c != charToSearch)
                    break@secondaryLoop
                ++searchIndex
                isFirstCharToSearchFor = false
                if (searchIndex >= query.length) {
                    //reached end of search while all characters were fine, so found the match
                    return Pair(mainIndex, secondaryIndex + 1)
                }
                charToSearch = query[searchIndex]
                ++secondaryIndex
            }
            ++mainIndex
        }
        return null
    }
}

测试它的示例用法:

MainActivity.kt

class MainActivity : AppCompatActivity() {

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        //
        val text = "+972 50-123-45678"
        val allowedCharacters = "01234567890+*#"
        val allowedPhoneCharactersSet = HashSet<Char>(allowedCharacters.length)
        for (c in allowedCharacters)
            allowedPhoneCharactersSet.add(c)
        //
        val tests = hashMapOf(
                "" to Pair(0, 0),
                "9" to Pair(1, 2),
                "97" to Pair(1, 3),
                "250" to Pair(3, 7),
                "250123" to Pair(3, 11),
                "250118" to null,
                "++" to null,
                "8" to Pair(16, 17),
                "+" to Pair(0, 1),
                "+8" to null,
                "78" to Pair(15, 17),
                "5678" to Pair(13, 17),
                "788" to null,
                "+ " to Pair(0, 1),
                "  " to Pair(0, 0),
                "+ 5" to null,
                "+ 9" to Pair(0, 2)
        )
        for (test in tests) {
            val result = TextSearchUtil.findOccurrenceWhileIgnoringCharacters(text, test.key, allowedPhoneCharactersSet)
            val isResultCorrect = result == test.value
            val foundStr = if (result == null) null else text.substring(result.first, result.second)
            when {
                !isResultCorrect -> Log.e("AppLog", "checking query of \"${test.key}\" inside \"$text\" . Succeeded?$isResultCorrect Result: $result found String: \"$foundStr\"")
                foundStr == null -> Log.d("AppLog", "checking query of \"${test.key}\" inside \"$text\" . Succeeded?$isResultCorrect Result: $result")
                else -> Log.d("AppLog", "checking query of \"${test.key}\" inside \"$text\" . Succeeded?$isResultCorrect Result: $result found String: \"$foundStr\"")

            }
        }
        //
        Log.d("AppLog", "special cases:")
        Log.d("AppLog", "${TextSearchUtil.findOccurrenceWhileIgnoringCharacters("a", "c", allowedPhoneCharactersSet) == Pair(0, 0)}")
        Log.d("AppLog", "${TextSearchUtil.findOccurrenceWhileIgnoringCharacters("ab", "c", allowedPhoneCharactersSet) == Pair(0, 0)}")
        Log.d("AppLog", "${TextSearchUtil.findOccurrenceWhileIgnoringCharacters("ab", "cd", allowedPhoneCharactersSet) == Pair(0, 0)}")
        Log.d("AppLog", "${TextSearchUtil.findOccurrenceWhileIgnoringCharacters("a", "cd", allowedPhoneCharactersSet) == Pair(0, 0)}")
    }

}

如果我想突出显示结果,我可以使用类似的东西:

    val pair = TextSearchUtil.findOccurrenceWhileIgnoringCharacters(text, "2501", allowedPhoneCharactersSet)
    if (pair == null)
        textView.text = text
    else {
        val wordToSpan = SpannableString(text)
        wordToSpan.setSpan(BackgroundColorSpan(0xFFFFFF00.toInt()), pair.first, pair.second, Spannable.SPAN_EXCLUSIVE_EXCLUSIVE)
        textView.setText(wordToSpan, TextView.BufferType.SPANNABLE)
    }

推荐阅读