python - 多处理线程池在最后停止
问题描述
我编写了一个脚本来“解析”文件中的所有域。启动后,一切正常。但是当最后剩下几个域时,它就会卡住。有时解析最后几个域需要很长时间。我无法弄清楚问题是什么。谁遇到过这样的情况?告诉我如何治愈它。
发布后,一切都很快(应该如此)直到结束。最后,当剩下几个域时它会停止。没有区别,1000 个域或10000个域。
完整代码:
import re
import sys
import json
import requests
from bs4 import BeautifulSoup
from multiprocessing.pool import ThreadPool
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
pool = 100
with open("Rules.json") as file:
REGEX = json.loads(file.read())
ua = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0'}
def Domain_checker(domain):
try:
r = requests.get("http://" + domain, verify=False, headers=ua)
r.encoding = "utf-8"
for company in REGEX.keys():
for type in REGEX[company]:
check_entry = 0
for ph_regex in REGEX[company][type]:
if bool(re.search(ph_regex, r.text)) is True:
check_entry += 1
if check_entry == len(REGEX[company][type]):
title = BeautifulSoup(r.text, "lxml")
Found_domain = "\nCompany: {0}\nRule: {1}\nURL: {2}\nTitle: {3}\n".format(company, type, r.url, title.title.text)
print(Found_domain)
with open("/tmp/__FOUND_DOMAINS__.txt", "a", encoding='utf-8', errors = 'ignore') as file:
file.write(Found_domain)
except requests.exceptions.ConnectionError:
pass
except requests.exceptions.TooManyRedirects:
pass
except requests.exceptions.InvalidSchema:
pass
except requests.exceptions.InvalidURL:
pass
except UnicodeError:
pass
except requests.exceptions.ChunkedEncodingError:
pass
except requests.exceptions.ContentDecodingError:
pass
except AttributeError:
pass
except ValueError:
pass
return domain
if __name__ == '__main__':
with open(sys.argv[1], "r", encoding='utf-8', errors = 'ignore') as file:
Domains = file.read().split()
pool = 100
print("Pool = ", pool)
results = ThreadPool(pool).imap_unordered(Domain_checker, Domains)
string_num = 0
for result in results:
print("{0} => {1}".format(string_num, result))
string_num += 1
with open("/tmp/__FOUND_DOMAINS__.txt", encoding='utf-8', errors = 'ignore') as found_domains:
found_domains = found_domains.read()
print("{0}\n{1}".format("#" * 40, found_domains))
解决方案
requests.get("http://" + domain, headers=ua, verify=False, timeout=10)
安装超时后问题解决
感谢昵称“ eri ”的用户(https://ru.stackoverflow.com/users/16574/eri):)
推荐阅读
- python - 为什么我的 matplotlib 子图水平距离不会变小?wspace 无功能
- android - 发行版 APK 的大小不一致
- esp32 - FreeRTOS:如何交替执行 2 个任务
- prolog - 序言比较列表中的浮点数
- oracle - Oracle NetSuite REST 连接示例
- abap - 在选择子屏幕中选择下拉选项会导致屏幕不必要的刷新
- matlab - 尝试 fplot 函数时有 2 个不同的输出
- r - sh:-c:第 0 行:在 R 中安装包时寻找匹配的“”时出现意外 EOF(Windows 10)
- java - Mapsforge 标记在摆动
- ruby-on-rails - 如何在 turbo_frame 中重定向表单?