python - 处理单个字典的嵌套循环
问题描述
我正在尝试根据他们合作的电影数量来构建演员/女演员图表。我提取了一个people_dict
看起来像这样的字典:
people_dict = {..., 3: ['American Pimp (1999)',
'Beats, Rhymes & Life: The Travels of a Tribe Called Quest (2011)',
'Gangsta Rap: The Glockumentary (2007)',
'Ghetto Physics (2010)',
'Mac Dre: Legend of the Bay (2014)',
'Menace II Society (1993)',
'Pimpalation: Return of the Trill (2006)',
'Porndogs: The Adventures of Sadie (2009)',
'Rhyme & Reason (1997)',
'Stop Pepper Palmer (2014)',
'Townbiz (2010)',
'Uprising: Hip Hop and the LA Riots (2012)'],...}
其中key
是演员/女演员ID,value
是他/她演过的所有电影的列表。
现在我试图找到演员/女演员之间的协作,因此我需要计算不同键的值之间的交集,如下所示:
for people_id, movie_list in people_dict.items():
for other_people_id, other_movie_list in people_dict.items():
if people_id < other_people_id:
intersection = set(movie_list) & set(other_movie_list)
if intersection:
edge_list.append((people_id, other_people_id))
但是,我的实现是嵌套循环,这本字典中有超过 100、000 个演员/女演员。可能需要几天时间才能给我结果。
那么我可以做些什么来加速我的代码。
编辑:
我意识到我需要提供更多关于我的问题的信息。
假设我有一个people_dict
只有三个演员/女演员,因为284
和602
都在电影中表演,'Electric Daisy Carnival Experience (2011)'
同时,另外两个人没有共同的电影660
,那么我只是期望它284
会602
被写在一个文本文件中,如284 602
.
people_dict = {284: ['DGK: Parental Advisory (2012)',
'Electric Daisy Carnival Experience (2011)',
'Malibu Horror Story (2014)'],
602: ['5 Sides of a Coin (2003)',
'As I AM: The Life and Times of DJ AM (2015)',
'Electric Daisy Carnival Experience (2011)',
'Hang the DJ (1998)'],
661: ['Every Boy Needs a Father (2013)',
'The Sketches of Matty Mad Man: Part Dos (2014)',
'Trippy: Spitting Rainbows (2014)',
'Vicky Foxx: Forever in Blue Jeans (2010)']}
解决方案
每次将两个列表转换为集合 - 为什么不从一开始就将 movie_list 设置为集合?
无论如何,主要问题是复杂度 O(n^2)。您真的需要将每个项目相互比较两次吗?
也许这会激发一些想法:
from collections import defaultdict
import functools
import time
import random
movies = range(100)
actors = range(1000)
def get_random_movies(max=10):
return random.sample(movies, random.randint(1, max))
people_dict = {actor_id: get_random_movies() for actor_id in actors}
def timeit(func):
@functools.wraps(func)
def newfunc(*args, **kwargs):
start_time = time.time()
func(*args, **kwargs)
elapsed_time = time.time() - start_time
print('function [{}] finished in {} ms'.format(
func.__name__, int(elapsed_time * 1000)))
return newfunc
@timeit
def usingDefaultDict():
md = defaultdict(set)
for k, v in people_dict.items():
for m in v:
md[m].add(k)
collaborations = {
actor_id: set([aid for actors in md.values() for aid in actors if actor_id in actors])
for actor_id in people_dict}
print(next(iter(collaborations.items())))
@timeit
def usingNestedLoops():
edge_list = []
for people_id, movie_list in people_dict.items():
for other_people_id, other_movie_list in people_dict.items():
if people_id < other_people_id:
intersection = set(movie_list) & set(other_movie_list)
if intersection:
edge_list.append((people_id, other_people_id))
collaborations = {
actor_id: set([aid for actors in edge_list for aid in actors if actor_id in actors])
for actor_id in people_dict}
print(next(iter(collaborations.items())))
usingNestedLoops()
usingDefaultDict()
这导致集合的字典:{actor_id:actors_collaborated_with}
(0, {0, 1, 512, 516, 519, 8, 520, 522, 12, 524, 14, 15, 529, 18, 532, 533, 534, 535, 26, 28, 29, 30, 31, 540, 545, 35, 36, 38, 39, 40, 43, 47, 48, 49, 50, 563, 564, 54, 568, 57, 58, 572, 63, 64, 65, 576, 67, 69, 70, 581, 582, 73, 74, 75, 583, 77, 587, 81, 594, 597, 86, 89, 605, 606, 97, 610, 100, 101, 617, 107, 621, 110, 622, 623, 629, 118, 121, 635, 124, 639, 129, 130, 131, 132, 133, 135, 650, 139, 140, 141, 653, 143, 655, 146, 147, 658, 660, 151, 154, 666, 669, 158, 160, 161, 673, 675, 164, 165, 166, 680, 169, 684, 175, 687, 177, 181, 693, 183, 184, 185, 698, 189, 190, 704, 705, 195, 197, 709, 710, 201, 713, 714, 715, 205, 718, 721, 210, 723, 213, 214, 215, 730, 222, 734, 735, 225, 737, 228, 230, 743, 232, 746, 748, 238, 240, 753, 243, 755, 246, 761, 762, 251, 763, 253, 766, 255, 767, 769, 258, 259, 261, 775, 776, 777, 779, 270, 273, 274, 275, 786, 277, 278, 787, 792, 281, 282, 795, 286, 287, 292, 294, 299, 813, 815, 304, 817, 308, 309, 821, 311, 824, 826, 828, 829, 830, 319, 321, 837, 328, 840, 331, 334, 846, 847, 849, 339, 340, 851, 342, 854, 344, 855, 352, 867, 868, 359, 872, 362, 363, 366, 367, 880, 882, 883, 372, 373, 886, 890, 892, 384, 387, 390, 902, 903, 393, 394, 908, 399, 913, 916, 918, 921, 411, 412, 421, 423, 936, 937, 426, 938, 430, 431, 432, 433, 942, 949, 950, 951, 952, 444, 445, 447, 960, 450, 964, 454, 457, 460, 461, 462, 463, 464, 972, 981, 471, 983, 986, 991, 995, 484, 997, 998, 487, 489, 495, 499, 502})
函数 [usingNestedLoops] 在 66736 毫秒内完成
(0, {0, 1, 512, 516, 519, 8, 520, 522, 12, 524, 14, 15, 529, 18, 532, 533, 534, 535, 26, 540, 29, 30, 31, 28, 545, 35, 36, 38, 39, 40, 43, 47, 48, 49, 50, 563, 564, 54, 568, 57, 58, 572, 63, 576, 65, 64, 67, 581, 582, 583, 70, 73, 74, 587, 69, 77, 75, 81, 594, 597, 86, 89, 605, 606, 97, 610, 100, 101, 617, 107, 621, 622, 623, 110, 629, 118, 121, 635, 124, 639, 129, 130, 131, 132, 133, 135, 650, 139, 140, 653, 141, 143, 655, 658, 146, 147, 660, 151, 154, 666, 669, 158, 160, 161, 673, 675, 164, 165, 166, 680, 169, 684, 175, 687, 177, 181, 693, 183, 184, 185, 698, 189, 190, 704, 705, 195, 197, 709, 710, 713, 714, 201, 715, 205, 718, 721, 210, 723, 213, 214, 215, 730, 734, 735, 222, 737, 225, 228, 230, 743, 232, 746, 748, 238, 240, 753, 755, 243, 246, 761, 762, 251, 763, 253, 766, 255, 767, 769, 258, 259, 261, 775, 776, 777, 779, 270, 273, 786, 274, 275, 277, 278, 787, 792, 281, 282, 795, 286, 287, 292, 294, 299, 813, 815, 304, 817, 308, 821, 309, 311, 824, 826, 828, 829, 830, 319, 321, 837, 840, 328, 331, 846, 847, 334, 849, 339, 851, 340, 342, 854, 344, 855, 352, 867, 868, 359, 872, 362, 363, 366, 367, 880, 882, 883, 372, 373, 886, 890, 892, 384, 387, 902, 903, 390, 393, 394, 908, 399, 913, 916, 918, 921, 411, 412, 421, 423, 936, 937, 426, 938, 430, 942, 432, 433, 431, 949, 950, 951, 952, 444, 445, 447, 960, 450, 964, 454, 457, 972, 460, 462, 461, 464, 463, 981, 471, 983, 986, 991, 995, 484, 997, 998, 487, 489, 495, 499, 502})
函数 [usingDefaultDict] 在 816 毫秒内完成
推荐阅读
- eclipse - 在 Tomcat 上运行 Vaadin (maven) 示例项目的问题
- apache-kafka - 通过在 ksqlDB 中翻转窗口生成零计数的可能方法
- r - R,从图中推断平均分数
- javascript - 在 WEBGL 片段着色器中确定屏幕的中心
- python - NGINX 单元,Fastapi:在可用的应用程序模块中找不到运行“python 3.8”的模块
- jenkins - Jenkins 声明性管道在使用 try catch 时失败,脚本标签中的最终块也在发布时
- arrays - 无法使用速度模板为每个循环迭代 json 数组
- python - 如何在 VirtualEnv 中安装 Python 3.6.0?而不是 3.9(关闭我)
- bootstrap-4 - 在 Laravel 8.4 中显示重复元素的引导程序
- vb.net - Visual Studio 创建安装程序