python - 一次遍历两个列表一个元素
问题描述
我有两个相同的列表。我想从列表 1 中取出第一个元素并比较列表 2 中的每个元素,一旦完成,我想从列表 1 中取出第二个元素并重复,直到每个元素都从两个列表中相互比较。
我已经创建了一个 Levenshtein 距离模型,并且能够通过我的第二个列表成功循环 1 个字符串(我硬编码)。但是,我需要使它更实用,并将目标字符串作为一个列表,并在完成前一个元素与第二个列表的比较后切换到下一个元素。然后我只希望它返回大于特定阈值 ex 的值。80.00
my_list = address['Street'].tolist()
my_list
# Import numpy to perform the matrix algebra necessary to calculate the fuzzy match
import numpy as np
# Define a function that will become the fuzzy match
# I decided to use Levenshtein Distance due to the formulas ability to handle string comparisons of two unique lengths
def string_match(seq1, seq2, ratio_calc = False):
""" levenshtein_ratio_and_distance:
Calculates levenshtein distance between two strings.
If ratio_calc = True, the function computes the
levenshtein distance ratio of similarity between two strings
For all i and j, distance[i,j] will contain the Levenshtein
distance between the first i characters of seq1 and the
first j characters of seq2
"""
# Initialize matrix of zeros
rows = len(seq1)+1
cols = len(seq2)+1
distance = np.zeros((rows,cols),dtype = int)
# Populate matrix of zeros with the indeces of each character of both strings
for i in range(1, rows):
for k in range(1,cols):
distance[i][0] = i
distance[0][k] = k
# loop through the matrix to compute the cost of deletions,insertions and/or substitutions
for col in range(1, cols):
for row in range(1, rows):
if seq1[row-1] == seq2[col-1]:
cost = 0 # If the characters are the same in the two strings in a given position [i,j] then the cost is 0
else:
# In order to align the results with those of the Python Levenshtein package, if we choose to calculate the ratio
# the cost of a substitution is 2. If we calculate just distance, then the cost of a substitution is 1.
if ratio_calc == True:
cost = 2
else:
cost = 1
distance[row][col] = min(distance[row-1][col] + 1, # Cost of deletions
distance[row][col-1] + 1, # Cost of insertions
distance[row-1][col-1] + cost) # Cost of substitutions
if ratio_calc == True:
# Computation of the Levenshtein Distance Ratio
Ratio = round(((len(seq1)+len(seq2)) - distance[row][col]) / (len(seq1)+len(seq2)) * 100, 2)
return Ratio
else:
# print(distance) # Uncomment if you want to see the matrix showing how the algorithm computes the cost of deletions,
# insertions and/or substitutions
# This is the minimum number of edits needed to convert seq1 to seq2
return distance[row][col]
Prev_addrs = my_list
target_addr = "830 Amsterdam ave"
for addr in Prev_addrs:
distance = string_match(target_addr, addr, ratio_calc = True)
print(distance)
解决方案
忽略我认为您问题中所有不相关的代码,以下是如何从标题和第一段中完成我认为是您问题的本质的内容。
import itertools
from pprint import pprint
def compare(a, b):
print('compare({}, {}) called'.format(a, b))
list1 = list('ABCD')
list2 = list('EFGH')
for a, b in itertools.product(list1, list2):
compare(a, b)
输出:
compare(A, E) called
compare(A, F) called
compare(A, G) called
compare(A, H) called
compare(B, E) called
compare(B, F) called
compare(B, G) called
compare(B, H) called
compare(C, E) called
compare(C, F) called
compare(C, G) called
compare(C, H) called
compare(D, E) called
compare(D, F) called
compare(D, G) called
compare(D, H) called
推荐阅读
- python - Tensorflow toco 工具将 Conv2D 转换为 DepthwiseConv2DNative,但简化图不会产生与原始图相同的结果
- c++ - 为什么现代 C++ 仍然使用 int argc, char** argv 保留旧的 C 风格原型
- r - 在编写 R 函数时使用多个空参数的最佳实践
- firebase - 将 Firebase 帐号与 G Suite 帐号同步
- python - 运行 Flask App 时出现 VS Code 错误
- assembly - 关于STM32 ARM Cortex M LDRB指令
- safari - Safari 应用程序扩展,如何从扩展加载本地 html 页面?
- javascript - 如何在 Html 中或通过 Javascript 向元素添加类?
- pyspark - 按顺序排列的列值
- javascript - 如何将包含 npm 下载包的网站部署到浪涌/gh-pages?