首页 > 解决方案 > 按顺序生成所有可能的子串

问题描述

我正在寻找一个库或一种在 Python 中实现以下内容的有效方法

Input: 
"He was hungry"

Desired Output:
[["He","was","hungry"]
["He was","hungry"]
["He","was hungry"]
["He was hungry"]]

标签: pythonsubstringpermutationn-gram

解决方案


这是一种递归方法:对于包含 N 个单词的输入,计算前 N-1 个单词的可能连接,然后选择是将最后一个单词作为自己的元素附加还是与最右边的元素连接。

def iter_joinings(items):
    if len(items) == 0:
        return
    elif len(items) == 1:
        yield items
    else:
        right = items[-1]
        for left_a in iter_joinings(items[:-1]):
            left_b = left_a.copy()
            left_a.append(right)
            yield left_a
            left_b[-1] = left_b[-1] + " " + right
            yield left_b

s = "He was hungry"
for result in iter_joinings(s.split()):
    print(result)

结果:

['He', 'was', 'hungry']
['He', 'was hungry']
['He was', 'hungry']
['He was hungry']

这是一个迭代版本,以防万一您有 999 个元素的输入并且不想达到 Python 的最大递归深度:

import itertools

def iter_joinings(items):
    for decisions in itertools.product((False, True), repeat=len(items)-1):
        result = [items[0]]
        for idx, should_append in enumerate(decisions, 1):
            if should_append:
                result.append(items[idx])
            else:
                result[-1] = result[-1] + " " + items[idx]
        yield result

s = "He was hungry"
for result in iter_joinings(s.split()):
    print(result)

...虽然如此巨大的输入在任何一种情况下都需要大约 10^300 字节码指令来执行,所以这不太可能成为一个实际问题。


推荐阅读