首页 > 解决方案 > Longest Substring with no Y Divide and Conquer

问题描述

Before I carry on to the problem, I should note that I know there are much easier ways to solve this problem without using divide and conquer; however, the point of solving this problem under this restriction is that I actually want to learn how to tackle problems with divide and conquer. I am good at recognizing correct solutions, but implementing my own D&C strategy is not a skill I currently have.

The problem is this: given a string, find the longest substring that does not contain the letter 'y'. For example, longestNoY("abydefyhi") should return "def".

My first approach to tackle this problem was to determine the base cases. If we had a string of length 2, we would want to return the non-y components (or empty string if both characters were 'y'). If we had a string of length 1, we would return it if it is not a 'y'.

So the first part should look like this:

def longestNoY(string, start, end):
     #Conquer
     if start == end:
          if string == 'y': return ''
          return string
     if start + 1 == end:
          if string == "yy": return ''
          if string[0] == 'y': return string[1]
          return string[0]
     ....

Next, I knew that I would need to recursively call the function for each half of the parent string. I also knew that I wanted the function to return the longer of the two children, except if the sum of the lengths of the two children was equal to the length of the parent, then the function should return the parent because there were no 'y's in the children.

    #Divide and Partial Implementation of Rejoin
    ....
    middle = (start + end) // 2
    leftString = longestNoY(string, start, middle)
    rightString = longestNoY(string, middle, end)
    if len(leftString) + len(rightString) == len(string): return string
    ....

The part I am having trouble with now would best be explained with an example:

0 1 2     3 4   5 6   7 8 
a b   y   d e | f y   h i
a b   y | d e | f y | h i
a b | y | d e | f y | h i 

The longest substring in the left side is either "ab" or "de", but we know that "de" is adjacent to an 'f' which would make "def" the longest. I don't know exactly how to carry on with this problem. Please do not give me a program to solve this problem.

标签: algorithmdivide-and-conquerlongest-substring

解决方案


这是可能的。但是你每次都需要返回四个值:从“切片”左端开始的最长子序列(这可以为零),“中间”的最长子序列,在右端结束的最长子序列的“切片”(这也可以为零),并且如果字符串只是一个非 Y 字符序列(布尔值)。事实上,第四个元素可以通过检查前三个元素中的一个是否等于长度来导出,但这可能更容易实现。

为什么这很重要?因为一系列非ys可以“穿过”一个分界线。例如:

abcdeYfghi   jklYmnopqr

如果我们在中间拆分它(或任何其他不是“恒定”和“休息”的方式)。

所以这里递归我们有几种情况:

  1. 空字符串返回(0, 0, 0, True),
  2. 以外的非空字符串Y,我们返回(1, 1, 1, True)
  3. Y对于我们返回的单例字符串(0, 0, 0, False)
  4. 将字符串一分为二的递归情况,然后在结果上应用“合并”逻辑。

“合并逻辑”相当复杂,尤其是因为两个“子切片”都可能只包含非Y字符串。切片后,我们因此获得了两个三元组(a0, a1, a2, a3)(b0, b1, b2, b3),我们产生了一个三元组(c0, c1, c2, c3)

如果a3 = Trueb3 = True,那么当然这意味着当前切片也不包含 Y。所以我们可以得出:

c3 = a3 and b3

给定a3成立,那么它认为c0 = a0 + b0从那时a0起没有 Y,因此左侧“序列”与子序列的整个长度加上右侧部分的左子序列相同。如果a3不成立,只是c0a0

给定b3成立,则它成立,c2 = a2 + b2原因与上述相同,如果不是,则a2 = b2.

现在中间的元素是三个元素中的最大值:

  1. 左切片中间的元素a1
  2. 右切片中间的元素b1;和
  3. 和的总和,a2因为b0可以重叠,所以这是两者的总和。

因此,我们返回树的最大值。

所以在 Python 中,这看起来像:

def longestNoY(string, start, end):
    if start == end:
        return (0, 0, 0, True)
    elif start+1 == end:
        if string[start] == 'y':
            return (0, 0, 0, False)
        else:
            return (1, 1, 1, True)
    else:
        mid = (start + end)//2
        a0, a1, a2, a3 = longestNoY(string, start, mid)
        b0, b1, b2, b3 = longestNoY(string, mid, end)
        c3 = a3 and b3
        c0 = a0 + a3 * b0
        c2 = b2 + b3 * a2
        c1 = max(a1, b1, a2 + b0)
        return (c0, c1, c2, c3)

最终结果是元组中前三项的最大值。

对于给定的样本字符串,我们因此获得:

             (1, 1, 1, True) a
             (1, 1, 1, True) b
         (2, 2, 2, True) ab
             (0, 0, 0, False) y
             (1, 1, 1, True) d
         (0, 1, 1, False) yd
     (2, 2, 1, False) abyd
             (1, 1, 1, True) e
             (1, 1, 1, True) f
         (2, 2, 2, True) ef
             (0, 0, 0, False) y
                 (1, 1, 1, True) h
                 (1, 1, 1, True) i
             (2, 2, 2, True) hi
         (0, 2, 2, False) yhi
     (2, 2, 2, False) efyhi
 (2, 3, 2, False) abydefyhi
(2, 3, 2, False)

但话虽这么说,在我看来,它是一个不必要的复杂过程来构造一些东西,就时间复杂度而言,它与遍历相同,但通常更昂贵(函数调用、构造新对象等)。特别是因为线性遍历只是:

def longestNoY(string):
    mx = 0
    cur = 0
    for c in string:
        if c == 'y':
            mx = max(mx, cur)
            cur = 0
        else:
            cur += 1
    return mx

然而,这里的一个优点是上述算法可以用于并行化。例如,如果字符串很大,则可以使用上述字符串,以便每个内核都可以计算。然而,在这种情况下,在“核心”级别上使用迭代级别可能是有益的,并且仅使用上述内容来“分发”工作和“收集”结果。


推荐阅读