python - multithreading for loop not working in Python with no errors
问题描述
I have put together the below and wanted to test multithreading.
I am trying to make the for loop run threaded, so several URLs in the list can be processed in parallel.
This script doesn't error, but it doesn't do anything and I am not sure why.
If I remove the multithreading pieces, it works fine
Can anyone help me?
import multiprocessing.dummy as mp
import requests
import pandas as pd
import datetime
urls = [
'http://google.co.uk',
'http://bbc.co.uk/'
]
def do_print(s):
check_data = pd.DataFrame([])
now = datetime.datetime.now()
try:
response = requests.get(url)
except:
response = 'null'
try:
response_code = response.status_code
except:
response_code = 'null'
try:
response_content = response.content
except:
response_content = 'null'
try:
response_text = response.text
except:
response_text = 'null'
try:
response_content_type = response.headers['Content-Type']
except:
response_content_type = 'null'
try:
response_server = response.headers['Server']
except:
response_server = 'null'
try:
response_last_modified = response.headers['Last-Modified']
except:
response_last_modified = 'null'
try:
response_content_encoding = response.headers['Content-Encoding']
except:
response_content_encoding = 'null'
try:
response_content_length = response.headers['Content-Length']
except:
response_content_length = 'null'
try:
response_url = response.url
except:
response_url = 'null'
if int(response_code) <400:
availability = 'OK'
elif int(response_code) >399 and int(response_code) < 500:
availability = 'Client Error'
elif int(response_code) >499:
availability = 'Server Error'
if int(response_code) <400:
availability_score = 1
elif int(response_code) >399 and int(response_code) < 500:
availability_score = 0
elif int(response_code) >499:
availability_score = 0
d = {'check_time': [now], 'code': [response_code], 'type': [response_content_type], 'url': [response_url], 'server': [response_server], 'modified': [response_last_modified], 'encoding': [response_content_encoding], 'availability': [availability], 'availability_score': [availability_score]}
df = pd.DataFrame(data=d)
check_data = check_data.append(df ,ignore_index=True,sort=False)
if __name__=="__main__":
p=mp.Pool(4)
p.map(do_print, urls)
p.close()
p.join()
解决方案
When I run code I get error because it try to convert int("null")
- all because you have
except:
response_code = 'null'`
If I use except Exception as ex: print(ex)
then I get error that variable url
doesn't exists. And it is true because you have def do_print(s):
but it should be def do_print(url):
BTW: instead of 'null'
you could use standard None
and later check if response_code:
before you try to covnert it to integer. Or simply skip rest of code when you get error.
Other problem - process should use return df
and you should get it
results = p.map(...)
and then use results
to create DataFrame check_data
推荐阅读
- python - 无法使用 tensorflow 1.15 保存通用 lite 的 tensorflow hub 模型
- apache-kafka - 无法运行 Kafka 控制台生产者 (NoSuchMethodError)
- sass - 当我尝试使用变量或混合时,使用“@use”规则将部分导入到主 sass 文件中会返回未定义的错误
- algorithm - 如何从表中选择特定值并避免重复?
- c - Tcp 和 udp 最大数据包大小
- json - 使用 jq json cli 以千位分隔符格式化数字
- c# - 如何在 VS2019 中启用编辑并继续?
- typescript - 检索 ng-select 的选定值 - Angular8
- javascript - 仅使用 Sublime Text 作为 IDE 时 Meteor 不刷新浏览器
- javascript - 使用 JavaScript 将 HTML div 排序到其对应的 div