首页 > 解决方案 > 希望在数据结构中存储两个不同的 Web url

问题描述

我有两个包含“hello”字符串的 URL。

https://us.search.yahoo.com/search?p=%22hello%22&fr=yfp-t&fp=1&toggle=1&cop=mss&ei=UTF-8

https://www.google.com/search?safe=strict&sxsrf=ACYBGNQKbsRTGVazpquHLRnglPuOj1xW9w%3A1576297488275&source=hp&ei=EGT0XcCLDoj4abrUjcAF&q=hello&oq=hello&gs_l=psy-ab.3..35i39j0i203l9.1236.2231..2801...1.0..0.56.247.5......0....1..gws-wiz.......0.LoDd2hNQIFQ&ved=0ahUKEwjA0-LepbTmAhUIfBoKHTpqA1gQ4dUDCAU&uact=5

我正在寻找一个函数,该函数将在 Python 函数中使用搜索引擎名称参数和字符串,并使用上面的这些 URL,以便用户可以使用search_keyword(yahoo, 'hello')search_keyword(google, 'hello')

我目前遇到的困难是我为同一个 URL 使用不同的 URL 格式,例如在其中插入双引号来自定义搜索,就像在 Google 或其他引擎中一样。但这增加了我必须使用的不同 URL 的数量,以尝试创建一个足够灵活以考虑不同 URL 格式的函数。

标签: python-3.xfunctional-programming

解决方案


您可以将不同的搜索格式存储到 中dict,使用搜索引擎名称作为关键字。然后将占位符用于搜索 URL 的部分内容,这些部分replaced稍后可能来自函数输入。例如,对于查询字符串,__QUERY__用作占位符。

url_format = {
    "yahoo": "https://us.search.yahoo.com/search?p=__QUERY__&fr=yfp-t&fp=1&toggle=1&cop=mss&ei=UTF-8",
    "google": "https://www.google.com/search?safe=strict&sxsrf=ACYBGNQKbsRTGVazpquHLRnglPuOj1xW9w%3A1576297488275&source=hp&ei=EGT0XcCLDoj4abrUjcAF&q=__QUERY__&oq=__QUERY__&gs_l=psy-ab.3..35i39j0i203l9.1236.2231..2801...1.0..0.56.247.5......0....1..gws-wiz.......0.LoDd2hNQIFQ&ved=0ahUKEwjA0-LepbTmAhUIfBoKHTpqA1gQ4dUDCAU&uact=5"
}

对于替换占位符的输入,您可以使用urllib.parse.quote_plus将输入格式化为与 URL 兼容。

>>> import urllib.parse
>>> urllib.parse.quote_plus("hello")
'hello'
>>> urllib.parse.quote_plus('"quoted text"')
'%22quoted+text%22'
>>> urllib.parse.quote_plus("spec|@l ch@arac+3r$")
'spec%7C%40l+ch%40arac%2B3r%24'
>>> 

把它们放在一起:

import urllib.parse

def search_keyword(engine_name, query_string):
    # Store formats for each search engine with placeholders
    url_format = {
        "yahoo": "https://us.search.yahoo.com/search?p=__QUERY__&fr=yfp-t&fp=1&toggle=1&cop=mss&ei=UTF-8",
        "google": "https://www.google.com/search?safe=strict&sxsrf=ACYBGNQKbsRTGVazpquHLRnglPuOj1xW9w%3A1576297488275&source=hp&ei=EGT0XcCLDoj4abrUjcAF&q=__QUERY__&oq=__QUERY__&gs_l=psy-ab.3..35i39j0i203l9.1236.2231..2801...1.0..0.56.247.5......0....1..gws-wiz.......0.LoDd2hNQIFQ&ved=0ahUKEwjA0-LepbTmAhUIfBoKHTpqA1gQ4dUDCAU&uact=5"
    }

    url = url_format[engine_name]
    # Make sure to handle the case where the dict does not contain engine_name (KeyError)

    # Format the input params for URL use
    query_key = "__QUERY__"
    query = urllib.parse.quote_plus(query_string)

    # Replace placeholders
    url = url.replace(query_key, query)

    print(url)

search_keyword("yahoo", "hello")
# https://us.search.yahoo.com/search?p=hello&fr=yfp-t&fp=1&toggle=1&cop=mss&ei=UTF-8

search_keyword("google", "this has spaces")
# https://www.google.com/search?safe=strict&sxsrf=ACYBGNQKbsRTGVazpquHLRnglPuOj1xW9w%3A1576297488275&source=hp&ei=EGT0XcCLDoj4abrUjcAF&q=this+has+spaces&oq=this+has+spaces&gs_l=psy-ab.3..35i39j0i203l9.1236.2231..2801...1.0..0.56.247.5......0....1..gws-wiz.......0.LoDd2hNQIFQ&ved=0ahUKEwjA0-LepbTmAhUIfBoKHTpqA1gQ4dUDCAU&uact=5

search_keyword("google", '"quoted text"')
# https://www.google.com/search?safe=strict&sxsrf=ACYBGNQKbsRTGVazpquHLRnglPuOj1xW9w%3A1576297488275&source=hp&ei=EGT0XcCLDoj4abrUjcAF&q=%22quoted+text%22&oq=%22quoted+text%22&gs_l=psy-ab.3..35i39j0i203l9.1236.2231..2801...1.0..0.56.247.5......0....1..gws-wiz.......0.LoDd2hNQIFQ&ved=0ahUKEwjA0-LepbTmAhUIfBoKHTpqA1gQ4dUDCAU&uact=5

search_keyword("google", "spec|@l ch@arac+3r$")
# https://www.google.com/search?safe=strict&sxsrf=ACYBGNQKbsRTGVazpquHLRnglPuOj1xW9w%3A1576297488275&source=hp&ei=EGT0XcCLDoj4abrUjcAF&q=spec%7C%40l+ch%40arac%2B3r%24&oq=spec%7C%40l+ch%40arac%2B3r%24&gs_l=psy-ab.3..35i39j0i203l9.1236.2231..2801...1.0..0.56.247.5......0....1..gws-wiz.......0.LoDd2hNQIFQ&ved=0ahUKEwjA0-LepbTmAhUIfBoKHTpqA1gQ4dUDCAU&uact=5

推荐阅读