首页 > 解决方案 > 从电子邮件数据库列表中提取域

问题描述

我需要从数据集中的电子邮件中提取域并计算前 5 个域。

import re
from collections import Counter
with open("emails")
domain = re.search('@[\w.)]+, email')
 print(domain.group())

 jbutt@gmail.com  http://www.bentonjohnbjr.com
 josephine_darakjy@darakjy.org  http://www.chanayjeffreyaesq.com
 art@venere.org http://www.chemeljameslcpa.com
 lpaprocki@hotmail.com  http://www.feltzprintingservice.com
 donette.foller@cox.net http://www.printingdimensions.com

标签: pythoncomputer-sciencegoogle-colaboratory

解决方案


这列出了前 5 个域:

import re
from collections import Counter 
resultList = []
with open("emails", "r") as email:
    for x in email:
        result = re.search('@(.*) ', x)
        resultList.append(result.group(1))
occurence_count = Counter(resultList) 
print(occurence_count.most_common(5))

输出:

[('gmail.com ', 1), ('darakjy.org ', 1), ('venere.org', 1), ('hotmail.com ', 1), ('cox.net', 1)]

输出是 5 个最常见的域名


推荐阅读