python - 需要优化抓取代码 - 选择带参数的 URL
问题描述
这是一个使用搜索参数获取 url 的简单代码。它确实有效,但我认为它需要优化。
def target_url(search_term, include_term, intext_term, target_site_in, page):
base_template_0 = f'https://www.google.com/search?q={search_term}+"{include_term}"+intext:{intext_term}+site:{target_site_in}&hl=en&rlz='
base_template_1 = f'https://www.google.com/search?q={search_term}+"{include_term}"+intext:{intext_term}&hl=en&rlz='
base_template_2 = f'https://www.google.com/search?q={search_term}+"{include_term}"&hl=en&rlz='
base_template_3 = f'https://www.google.com/search?q={search_term}&hl=en&rlz='
search_term = search_term.replace(' ', '+')
base_url_0 = base_template_0.format(search_term)
base_url_1 = base_template_1.format(search_term)
base_url_2 = base_template_2.format(search_term)
base_url_3 = base_template_3.format(search_term)
url_template_0 = base_url_0 + '&start={}'
url_template_1 = base_url_1 + '&start={}'
url_template_2 = base_url_2 + '&start={}'
url_template_3 = base_url_3 + '&start={}'
if page == 0 and search_term and include_term and intext_term and target_site:
return base_url_0
if page == 0 and search_term and include_term and intext_term:
return base_url_1
if page == 0 and search_term and include_term:
return base_url_2
if page == 0 and search_term:
return base_url_3
else:
if search_term and include_term and intext_term and target_site:
return url_template_0.format(page)
if search_term and include_term and intext_term:
return url_template_1.format(page)
if search_term and include_term:
return url_template_2.format(page)
if search_term:
return url_template_3.format(page)
需要四个参数:search_term、inclusion_term、input_term、target_site_in - 在每种情况下,条件 URL 的指定方式都不同。
给我一个更好的优化思路。
解决方案
您可以创建一个为您提供最终搜索查询的方法,而不是拥有多个模板字符串并对其进行选择:
def get_search_query(search_term, include_term, intext_term, target_site_in):
response = search_term.replace(' ', '+')
if include_term:
response = f"{response}+{include_term}"
if intext_term:
response = f"{response}+intext:{intext_term}"
if target_site_in:
response = f"{response}+site:{target_site_in}"
return response
现在在你的方法中你可以调用它
def target_url(search_term, include_term, intext_term, target_site_in, page):
query = get_search_query(search_term, include_term, intext_term, target_site_in)
url = f'https://www.google.com/search?q={query}&hl=en&rlz='
if page != 0:
url = f"{url}&page={page}"
return url
推荐阅读
- java - 如何调用所有`Mono
` 同时 - api-doc - APIDOCJS - 在描述中添加换行符
- bash - 为什么我的变量在测试语句中起作用,而不是在 case 语句中起作用,直到我使用 echo 将其读入自身?
- java - java.lang.NoClassDefFoundError: org/hamcrest/SelfDescribeing 尽管类路径上有 hamcrest
- reactjs - 当我们在本机反应中触摸屏幕上的任何位置时如何隐藏侧边栏
- python - 如何在 Python 3 中绘制总和
- react-native - 初始化Vue Native项目时如何修复“ENOENT:没有这样的文件或目录,chdir'project-name'”?
- python - 试图从 VBA 调用 python 脚本
- java - 如何使用 PrintWriter 对象作为函数参数并将该函数用于递归树遍历
- mongodb - 如何取消“查找”中的关系?