首页 > 解决方案 > 计算元组中的频率

问题描述

我必须计算元组中“Id”的频率,如下所示:

('{44371-zwart,40793,41878,44747,44371-wit}',), 
('{46022,47917,48267,48343,48221}',), 
('{43566,43834,31726,23503,4488}',), 
('{21896,9391,32171,30984-wit-3942,27211}',), 
('{35306,16901,24027,44222,38597}',), 
('{40867,40872,41437,31421,35570-grijs}',), 
('{32481,35728,36463,32473,43719}',)

这只是数据的一小部分(大约 0.5%)

我当前的代码:

cur.execute('SELECT similars FROM profiles')
data = cur.fetchall()
c = Counter(elem[0] for elem in data)

它返回以下内容:

{
45110,46709,45109,45115,46462}': 1, 
'{38535,38529,38532,38527,38546}': 1, 
'{20062,17013,20634,21691,20622}': 1, 
'{21141,43588,39649,45900,17126}': 1, 
'{43552,41475,41478,32848,41477}': 1, 
'{42265,42266,43570,26203,28862}': 1, 
'{47874,47873,47878,47802-bruin,33101-avengers}': 1, 
'{26234,2401,30414,5655,16605}': 1, 
'{43405,43575,39649,21141,43195}': 1, 
'{35420,35422,35367,35418,35417}': 1, 
'{43195,47323,39649,43575,44454}': 1, 
'{9760,43572,9764,9768,9816}': 1

我期望/想要的结果是:

{'12392': 2, '7862': 1, '12313': 41}

标签: pythoncounttuples

解决方案


既然你得到这个

'{45110,46709,45109,45115,46462}': 1, '{38535,38529,38532,38527,38546}': 1, '{20062,17013,20634,21691,20622}': 1, '{21141,43588,39649,45900,17126}': 1, '{43552,41475,41478,32848,41477}': 1, '{42265,42266,43570,26203,28862}': 1, '{47874,47873,47878,47802-bruin,33101-avengers}': 1, '{26234,2401,30414,5655,16605}': 1, '{43405,43575,39649,21141,43195}': 1, '{35420,35422,35367,35418,35417}': 1, '{43195,47323,39649,43575,44454}': 1, '{9760,43572,9764,9768,9816}': 1

将此输出转换为字典,这样您的第一级输出将如下所示:

dct = {'{45110,46709,45109,45115,46462}': 1, '{38535,38529,38532,38527,38546}': 1, '{20062,17013,20634,21691,20622}': 1, '{21141,43588,39649,45900,17126}': 1, '{43552,41475,41478,32848,41477}': 1, '{42265,42266,43570,26203,28862}': 1, '{47874,47873,47878,47802-bruin,33101-avengers}': 1, '{26234,2401,30414,5655,16605}': 1, '{43405,43575,39649,21141,43195}': 1, '{35420,35422,35367,35418,35417}': 1, '{43195,47323,39649,43575,44454}': 1, '{9760,43572,9764,9768,9816}': 1
}

现在创建一个空的 id_corpus 来list type 获取这个字典的所有键,dct.keys()并在这些键上启动一个循环。
现在删除第一个和最后一个括号 usingreplace()方法str class并将剩余的字符串解包到listusingsplit()方法中。将此新表单列表添加到 id_corpus。记住不要,使用运算符append添加它 最后,创建一个空的语料库并遍历 id_corpus 中的元素,如果该元素存在于语料库词典中,则将其值加 1,否则将其值设置为 1。+

这是最终的解决方案

# Since I don't know how your data looks like
# and in what format are you getting data from MySQL
# that's why I am appending your solution
# A more optimized approach can be developed
# if I know more about the problem
dct = {'{45110,46709,45109,45115,46462}': 1, '{38535,38529,38532,38527,38546}': 1, '{20062,17013,20634,21691,20622}': 1, '{21141,43588,39649,45900,17126}': 1, '{43552,41475,41478,32848,41477}': 1, '{42265,42266,43570,26203,28862}': 1, '{47874,47873,47878,47802-bruin,33101-avengers}': 1, '{26234,2401,30414,5655,16605}': 1, '{43405,43575,39649,21141,43195}': 1, '{35420,35422,35367,35418,35417}': 1, '{43195,47323,39649,43575,44454}': 1, '{9760,43572,9764,9768,9816}': 1}
lst = []
for ky in dct.keys():
    ky = ky.replace('{', '')
    ky = ky.replace('}', '')
    ky = ky.split(',')
    lst += ky

sol = dict()
for id in lst:
    if id in sol.keys():
        sol[id] += 1
    else:
        sol[id] = 1

print(sol)

输出

{'16605': 1, '44454': 1, '45900': 1, '20634': 1, '46462': 1, '35422': 1, '35420': 1, '17013': 1, '38532': 1, '47323': 1, '21141': 2, '43405': 1, '38527': 1, '17126': 1, '9816': 1, '38529': 1, '35418': 1, '45109': 1, '2401': 1, '41477': 1, '41478': 1, '41475': 1, '47802-bruin': 1, '26234': 1, '32848': 1, '35367': 1, '43195': 2, '20622': 1, '43588': 1, '35417': 1, '9760': 1, '38546': 1, '9764': 1, '28862': 1, '26203': 1, '9768': 1, '5655': 1, '39649': 3, '47874': 1, '43552': 1, '47873': 1, '38535': 1, '21691': 1, '30414': 1, '20062': 1, '43570': 1, '42266': 1, '42265': 1, '43575': 2, '46709': 1, '43572': 1, '47878': 1, '45110': 1, '33101-avengers': 1, '45115': 1}

推荐阅读