首页 > 解决方案 > 如何在不使用 split 函数的情况下从字符串中提取单词?

问题描述

如何从字符串中提取单词,将这些单词用标点符号、空格、数字等分隔...不使用任何split,replace或类似re. 我仍在学习 python,这本书建议在不求助于使用列表和字符串方法的情况下找到解决方案。

Example Input : The@Tt11end
Example Output: ["The", "Tt", "end"]

这是我迄今为止的尝试:

def extract_words(sentence):

    words_list = []
    separator = [",",".",";","'","?","/","<",">","@","!","#","$","%","^","&","*","(",")","-","_","1","2","3","4","5","6","7","8","9"]
    counter= 0
    for i in range(len(sentence)):
        i=counter
        while(is_letter(sentence[i])):
            words+= sentence[i]
            i = i+1
            counter=counter+1
        words_list.append(words)
        words=""
    return words_list

我的逻辑是读取字符串,直到到达一个非字母字母,然后将其附加到单词列表中,然后从我离开的地方再次遍历字符串。

尽管如此,输出是错误的:

['The', '', '', '', '', '', '', '', '', '', '']

编辑:这是我的is_letter()方法:

def is_letter(char):
    return ("A" <= char and char <= "Z") or \
    ("a" <= char and char <= "z")

标签: python

解决方案


您的代码陷入了混乱,并且没有索引到给定的句子。

你只需要遍历句子中的字符

def is_letter(char):
    return ("A" <= char <= "Z") or ("a" <= char <= "z")

def extract_words(sentence):
    word = ""
    words_list = []
    for ch in sentence:
        if is_letter(ch):
            word += ch
        else:
            if word:
                words_list.append(word)
                word = ""
    if word:
        words_list.append(word)
    return words_list


print(extract_words('The@,Tt11end'))

输出:

['The', 'Tt', 'end']

代码遍历sentence. 如果是字母,则将其添加到当前单词中。如果没有,它会将当前单词(如果有的话)添加到输出列表中。最后,如果最后一个字符是一个字母,则会留下一个单词,该单词也将添加到输出中。


推荐阅读