首页 > 解决方案 > multithreading for loop not working in Python with no errors

问题描述

I have put together the below and wanted to test multithreading.

I am trying to make the for loop run threaded, so several URLs in the list can be processed in parallel.

This script doesn't error, but it doesn't do anything and I am not sure why.

If I remove the multithreading pieces, it works fine

Can anyone help me?

import multiprocessing.dummy as mp 
import requests
import pandas as pd
import datetime

urls = [
  'http://google.co.uk',
  'http://bbc.co.uk/'
]

def do_print(s):
    check_data = pd.DataFrame([])

    now = datetime.datetime.now()

    try:
        response = requests.get(url)
    except:
        response = 'null'

    try:
        response_code = response.status_code
    except:
        response_code = 'null'

    try:
        response_content = response.content
    except:
        response_content = 'null'

    try:        
        response_text = response.text
    except:
        response_text = 'null'

    try:
        response_content_type = response.headers['Content-Type']
    except:
        response_content_type = 'null'

    try:
        response_server = response.headers['Server']
    except:
        response_server = 'null'

    try:
        response_last_modified = response.headers['Last-Modified']
    except:
        response_last_modified = 'null'

    try:
        response_content_encoding = response.headers['Content-Encoding']
    except:
        response_content_encoding = 'null'

    try:
        response_content_length = response.headers['Content-Length']
    except:
        response_content_length = 'null'    

    try:
        response_url = response.url
    except:
        response_url = 'null'

    if int(response_code) <400:
        availability = 'OK'
    elif int(response_code) >399 and int(response_code) < 500:
        availability = 'Client Error'
    elif int(response_code) >499:
        availability = 'Server Error'

    if int(response_code) <400:
        availability_score = 1
    elif int(response_code) >399 and int(response_code) < 500:
        availability_score = 0
    elif int(response_code) >499:
        availability_score = 0

    d = {'check_time': [now], 'code': [response_code], 'type': [response_content_type], 'url': [response_url], 'server': [response_server], 'modified': [response_last_modified], 'encoding': [response_content_encoding], 'availability': [availability], 'availability_score': [availability_score]}
    df = pd.DataFrame(data=d)
    check_data = check_data.append(df ,ignore_index=True,sort=False)

if __name__=="__main__":
    p=mp.Pool(4)
    p.map(do_print, urls) 
    p.close()
    p.join()

标签: pythonpython-3.x

解决方案


When I run code I get error because it try to convert int("null") - all because you have

except: 
    response_code = 'null'`

If I use except Exception as ex: print(ex) then I get error that variable url doesn't exists. And it is true because you have def do_print(s): but it should be def do_print(url):

BTW: instead of 'null' you could use standard None and later check if response_code: before you try to covnert it to integer. Or simply skip rest of code when you get error.


Other problem - process should use return df and you should get it

results = p.map(...)

and then use results to create DataFrame check_data


推荐阅读