首页 > 解决方案 > pythonic 方法来识别 url 中的名称并将其与现有的一组名称匹配

问题描述

您好,这是我想解决的问题,但我被卡住了。

给定一个 url 列表,我想执行以下操作:

  1. 提取 url 中的名称
  2. 将从 url 中找到的名称与现有名称的字典匹配
  3. 有 1 个找到的所有名称的字典,将找到的名称拆分为 2 个单独的字典,1 个与在字典中找到的名称相关联,另一个与未找到的名称相关联

例子:

INPUT : 
urls = ['www.twitter.com/users/aoba-joshi/$#fsd=43r', 
        'www.twitter.com/users/chrisbrown-e2/#4f=34ds', 
        'www.facebook.com/celebrity/neil-degrasse-tyson',
        'www.instagram.com/actor-nelson-bigetti']

# the key is the ID associated to the names, and the values are all the potential names

existing_names = {1 : ['chris brown', 'chrisbrown', 'Brown Chris', 'brownchris'] ,
                  2 : ['nelson bigetti', 'bigetti nelson', 'nelsonbigetti', 'bigettinelson'],
                  3 : ['neil degrasse tyson', 'tyson neil degreasse', 'tysonneildegrasse', 'neildegrassetyson']}


OUTPUT : 
# names_found will be a dictionary with the key as the URL and the values as the found name
names_found = {'www.twitter.com/users/aoba-joshi/$#fsd=43r' : 'aoba joshi',
               'www.twitter.com/users/chrisbrown-e2/#4f=34ds' : 'chris brown',
               'www.facebook.com/celebrity/neil-degrasse-tyson' : 'neil degrasse tyson',
               'www.instagram.com/actor-nelson-bigetti' : 'nelson bigetti'}

# existing_names_found is a dictionary where the keys are the found name, and the values are the corresponding list of names in the existing names dictionary

existing_names_found = {'chris brown' : ['chris brown', 'chrisbrown', 'Brown Chris', 'brownchris'],
                        'neil degrasse tyson' : ['neil degrasse tyson', 'tyson neil degreasse', 'tysonneildegrasse', 'neildegrassetyson'],
                        'nelson bigetti' : ['nelson bigetti', 'bigetti nelson', 'nelsonbigetti', 'bigettinelson']}

# new_names_found is a dictionary with the keys as the new name found, and the values as the url associated to the new found name
new_names_found = {'aoba joshi' : 'www.twitter.com/users/aoba-joshi/$#fsd=43r'}

标签: pythonarrayslistdictionaryparsing

解决方案


好吧......如果我得到正确的你想要做的......这是应该工作的东西


for link in links_list:
    link_split = link.split('/')
    name_list = link_split[2].split('-')     # makes from chris-brown-xx => chrisbrownxx
    name = ""
    for part in name:
        name + part
    for (key, value) in existing_names:    # check if the name is in the list
        for name_x in value:
            name_x = # same as I did with name_list, but this time with " "
            if name_x in name.lower():
                # append it to new_names_found

(抱歉,我正在手机上输入此内容,但希望对您有所帮助:))

(或者,您可以尝试查看它是否包含文本的两个部分......但这样的事情会失败 - >“Luke Luk”并在“Luke O'Niel”上检查它)......有很多有问题的


推荐阅读