python - Is there anyway to solve this without using two for loop
问题描述
This is problem from my assignment in college
I have list(n elements) of tuple(2 elements = ID, text)
for example:
data = [('user1', 'text1'), ('user2', 'text1'),...]
and i want to find frequency of pair of 'ID' that have same 'text'
for example:
data = [('user1', 'text1'), ('user2', 'text1'), ('user3', 'text1'),
('user1', 'text2'), ('user2', 'text2'), ('user3', 'text2')]
frequency = {('user1', 'user2'): 2, ('user1', 'user3'): 2, ('user2', 'user3'): 2}
(pairs are ascending 'ID' from least to most)
only ('user1','user2')
not ('user2','user1')
but the multiple same 'text' with same pair will count as one.
for example:
data = [('user1', 'text1'), ('user2', 'text1'), ('user1', 'text1'), ('user2', 'text1')]
will result in
frequency = {('user1', 'user2'): 1}
here are my codes
def somefunction(data):
sametext = {}
for element in data:
if element[1] in sametext:
if element[0] not in sametext[element[1]]:
sametext[element[1]].append(element[0])
else:
sametext[element[2]] = [element[1]]
frequency = {}
for ID in sametext.values():
ID.sort()
index = 1
for x1 in ID:
for x2 in ID[index:]:
pair = (x1, x2)
if pair in frequency:
frequency[pair] += 1
else:
frequency[pair] = 1
index += 1
return frequency
I store 'ID' which have same text in dictionary to smaller the loop, but it isn't enough.
With data size of 1 million element, it took more than 1 min.
Is there a way to make this faster?
Thank for every comment
解决方案
是的,主要是因为您使用的是列表,请使用集合,如下所示:
from collections import defaultdict, Counter
from itertools import combinations
data = [('user1', 'text1'), ('user2', 'text1'), ('user3', 'text1'),
('user1', 'text2'), ('user2', 'text2'), ('user3', 'text2')]
def fun(d):
# group users by text
groups = defaultdict(set)
for user, text in d:
groups[text].add(user)
# compute the frequencies
# combinations is going to generate the pair interactions
counts = Counter(tuple(sorted(pair)) for v in groups.values() for pair in combinations(v, 2))
# convert to dictionary (the call to dict is optional)
return dict(counts)
frequencies = fun(data)
print(frequencies)
输出
{('user1', 'user2'): 2, ('user2', 'user3'): 2, ('user1', 'user3'): 2}
列表中的包含查询O(n)
与集合中O(1)
的平均值相比。另请参阅Counter和defaultdict的文档。
推荐阅读
- javascript - 删除第二个到最后一个项目数组
- c# - 具有聚合值和 ProgreessBar 顺序的 WPF ListView 列
- r - 我想用循环更改列名
- bash - 如何打印特定范围的行并使用 sed 或 awk 枚举它们
- python - 为什么我的 Node.js gRPC 客户端需要 3 秒才能向我的 Python gRPC 服务器发送请求?
- javascript - 使用 (yourdomain.com/username) 的 firebase.json 中的重写规则配置 Firebase 托管行为
- docker - Docker Compose 将一项服务附加到标准输入和标准输出
- reactjs - 反应中渲染和重新渲染之间的区别?
- angular - 在 Angular 8 中单击按钮时多次调用 Subscribe() 方法
- c# - 我的 ViewModel 流程正确吗?如何将其发送回控制器并在控制器操作上刷新它?