首页 > 解决方案 > 比较两个字符串列表

问题描述

我熟悉比较 2 个整数和字符串列表;然而,当比较 2 个包含额外字符的字符串列表时,可能会有点挑战。

假设输出包含以下内容,我将其分解为字符串列表。我在我的代码中称它为 diff。

输出

164c164
< Apples = 
---
> Apples = 0
168c168
< Berries = 
---
> Berries = false
218c218
< Cherries = 
---
> Cherries = 20
223c223
< Bananas = 
---
> Bananas = 10
233,234c233,234
< Lemons = 2
< Strawberries = 4
---
> Lemons = 4
> Strawberries = 2
264c264
< Watermelons = 
---
> Watermelons = 524288

第二组字符串包含我希望与第一个列表进行比较的忽略变量。

>>> ignore
['Apples', 'Lemons']

我的代码:

>>> def str_compare (ignore, output):
...     flag = 0
...     diff = output.strip ().split ('\n')
...     if ignore:
...         for line in diff:
...             for i in ignore:
...                 if i in line:
...                     flag = 1
...             if flag:
...                 flag = 0
...             else:
...                 print (line)
... 
>>>

该代码适用于 Apple 和 Lemons 省略。

>>> str_compare(ignore, output)
164c164
---
168c168
< Berries = 
---
> Berries = false
218c218
< Cherries = 
---
> Cherries = 20
223c223
< Bananas = 
---
> Bananas = 10
233,234c233,234
< Strawberries = 4
---
> Strawberries = 2
264c264
< Watermelons = 
---
> Watermelons = 524288
>>>

必须有更好的方法来比较不是 O(n^2) 的 2 个字符串。如果我的差异列表不包含像“Apples =”这样的额外字符,那么可以使用 O(n) 比较两个列表。有什么建议或想法可以在不遍历每个 diff 元素上的“忽略”变量的情况下进行比较?

更新 #1 为避免混淆并使用建议的注释,我更新了代码。

>>> def str_compare (ignore, output):
...     diff = output.strip ().split ('\n')
...     if ignore:
...         for line in diff:
...             if not any ([i in line for i in ignore]):
...                 print (line)
...                 print ("---")
>>>

无论如何,对于每个 diff 元素,它仍然循环忽略两次。

标签: pythonlistpython-3.5string-comparison

解决方案


为了提高效率,请忽略未列出的集合。使用 split 从行中获取关键字。

>>> def str_compare (ignore, output):
...     ignore = set (ignore)
...     diff = output.strip ().split ('\n')
...     for line in diff:
...         if line.startswith('<') or line.startswith('>'):
...             var = line.split () [1]
...             if var not in ignore:
...                 print (line)
...         else:
...             print (line)
... 

输出

>>> str_compare (ignore, output)
164c164
---
168c168
< Berries = 
---
> Berries = false
218c218
< Cherries = 
---
> Cherries = 20
223c223
< Bananas = 
---
> Bananas = 10
233,234c233,234
< Strawberries = 4
---
> Strawberries = 2
264c264
< Watermelons = 
---
> Watermelons = 524288

您可以通过拆分和连接“---\n”来消除对标志的需要(比标志或打字机----稍微更通用的解决方案)

请注意,在 s2 最坏情况下,字符串包含 s1 应该大约是 len(s1) * len(2),而相等大约是 max(len(s1),len(s2)。虽然 python 实现相当不错(对于一般情况),但线性复杂性算法似乎存在 http://monge.univ-mlv.fr/~mac/Articles-PDF/CP-1991-jacm.pdf 另请参见查找多个字符串匹配的算法


推荐阅读