python - 从没有标点符号的字符串搜索到主字符串并从那里获取没有库的标点符号切片,可能吗?
问题描述
我有这个功课要做(不允许图书馆),我低估了这个问题:
假设我们有一个字符串列表:str_list = ["my head's", "free", "at last", "into alarm", "in another moment", "neck"]
我们可以肯定的是,每个字符串都包含在 master_string 中,都是有序的,并且没有标点符号。(这一切都归功于我之前所做的控制)
然后我们有字符串:master_string = "'Come, my head's free at last!' said Alice in a tone of delight, which changed into alarm in another moment, when she found that her shoulders were nowhere to be found: all she could see, when she looked down, was an immense length of neck, which seemed to rise like a stalk out of a sea of green leaves that lay far below her."
我必须在这里做的基本上是检查 master_string 中包含的 str_list 中至少 k 的字符串序列(在这种情况下k = 2
),但是我低估了这样一个事实,即在 str_list 中每个字符串中有超过 1 个单词,所以这样做master_string.split()
赢了不要带我去任何地方,因为这意味着要问类似if "my head's" == "my"
的问题,那当然是错误的。
我正在考虑做一些事情,比如一次以某种方式连接字符串并搜索,master_string.strip(".,:;!?")
但是如果我找到相应的序列,我绝对需要直接从 master_string 中获取它们,因为我需要结果变量中的标点符号。这基本上意味着直接从 master_string 中获取切片,但这怎么可能呢?甚至有可能还是我必须改变方法?这让我完全发疯,特别是因为没有图书馆允许这样做。
如果您想知道这里的预期结果是什么:
["my head's free at last!", "into alarm in another moment,"]
(因为两者都尊重 str_list 中至少 k 个字符串的条件)和“neck”将被保存在一个 discard_list 中,因为它不尊重该条件(它不能被 .pop() 丢弃,因为我需要做其他丢弃变量的东西)
解决方案
遵循我的解决方案:
- 尝试根据
master_string
和一组有限的标点符号(例如my head’s
->my head’s free at last!
;free
->free at last!
)扩展所有内容。 - 仅保留至少已扩展
k
次数的子字符串。 - 删除多余的子字符串(例如
free at last!
,已经存在于my head’s free at last!
)。
这是代码:
str_list = ["my head’s", "free", "at last", "into alarm", "in another moment", "neck"]
master_string = "‘Come, my head’s free at last!’ said Alice in a tone of delight, which changed into alarm in another moment, when she found that her shoulders were nowhere to be found: all she could see, when she looked down, was an immense length of neck, which seemed to rise like a stalk out of a sea of green leaves that lay far below her."
punctuation_characters = ".,:;!?" # list of punctuation characters
k = 1
def extend_string(current_str, successors_num = 0) :
# check if the next token is a punctuation mark
for punctuation_mark in punctuation_characters :
if current_str + punctuation_mark in master_string :
return extend_string(current_str + punctuation_mark, successors_num)
# check if the next token is a proper successor
for successor in str_list :
if current_str + " " + successor in master_string :
return extend_string(current_str + " " + successor, successors_num+1)
# cannot extend the string anymore
return current_str, successors_num
extended_strings = []
for s in str_list :
extended_string, successors_num = extend_string(s)
if successors_num >= k : extended_strings.append(extended_string)
extended_strings.sort(key=len) # sorting by ascending length
result_list = []
for es in extended_strings :
result_list = list(filter(lambda s2 : s2 not in es, result_list))
result_list.append(es)
print(result_list) # result: ['my head’s free at last!', 'into alarm in another moment,']
推荐阅读
- ios - UIView .animate 不工作与 .animateKeyframes
- python-3.x - 如何计算 pyspark RDD 的一个键中的所有值?
- python - 如何将来自不同dict的相同键值与百分比进行比较
- c# - Swagger UI 和 ASP.NET - 未找到映射异常 - 无法访问 swagger UI 页面
- c# - 如何使用 C# 在 SharePoint Online 中读取单个文件的内容
- node.js - 使用 cookieSession 进行 Supertest 测试时,Passport 未设置 req.user
- c++ - 如何减少当前序列化所需的样板
- php - 从复杂数组创建简单数组
- bash - 使用 awk 计算行的平均值
- tensorflow - TensorFlow 可变形状分配