首页 > 解决方案 > Is there anyway to solve this without using two for loop

问题描述

This is problem from my assignment in college

I have list(n elements) of tuple(2 elements = ID, text)

for example:

data = [('user1', 'text1'), ('user2', 'text1'),...]

and i want to find frequency of pair of 'ID' that have same 'text'

for example:

data = [('user1', 'text1'), ('user2', 'text1'), ('user3', 'text1'), 
        ('user1', 'text2'), ('user2', 'text2'), ('user3', 'text2')]
frequency = {('user1', 'user2'): 2, ('user1', 'user3'): 2, ('user2', 'user3'): 2}

(pairs are ascending 'ID' from least to most)

only ('user1','user2')
not ('user2','user1')

but the multiple same 'text' with same pair will count as one.

for example:

data = [('user1', 'text1'), ('user2', 'text1'), ('user1', 'text1'), ('user2', 'text1')]

will result in

frequency = {('user1', 'user2'): 1}

here are my codes

def somefunction(data):
    sametext = {}
    for element in data:
        if element[1] in sametext:
            if element[0] not in sametext[element[1]]:
                sametext[element[1]].append(element[0])
        else:
            sametext[element[2]] = [element[1]]
    frequency = {}
    for ID in sametext.values():
        ID.sort()
        index = 1
        for x1 in ID:
            for x2 in ID[index:]:
                pair = (x1, x2)
                if pair in frequency:
                    frequency[pair] += 1
                else:
                    frequency[pair] = 1
            index += 1
    return frequency

I store 'ID' which have same text in dictionary to smaller the loop, but it isn't enough.

With data size of 1 million element, it took more than 1 min.

Is there a way to make this faster?

Thank for every comment

标签: python

解决方案


是的,主要是因为您使用的是列表,请使用集合,如下所示:

from collections import defaultdict, Counter
from itertools import combinations

data = [('user1', 'text1'), ('user2', 'text1'), ('user3', 'text1'),
        ('user1', 'text2'), ('user2', 'text2'), ('user3', 'text2')]


def fun(d):
    # group users by text
    groups = defaultdict(set)
    for user, text in d:
        groups[text].add(user)

    # compute the frequencies
    # combinations is going to generate the pair interactions
    counts = Counter(tuple(sorted(pair)) for v in groups.values() for pair in combinations(v, 2))

    # convert to dictionary (the call to dict is optional)
    return dict(counts)


frequencies = fun(data)

print(frequencies)

输出

{('user1', 'user2'): 2, ('user2', 'user3'): 2, ('user1', 'user3'): 2}

列表中的包含查询O(n)与集合中O(1)的平均值相比。另请参阅Counterdefaultdict的文档。


推荐阅读