首页 > 解决方案 > 如何使以下for循环在Python中使用多个核心?

问题描述

这是一个正常运行的普通 Python 代码

import pandas as pd
dataset=pd.read_csv(r'C:\Users\efthi\Desktop\machine_learning.csv')
registration = pd.read_csv(r'C:\Users\efthi\Desktop\studentVle.csv')


students = list()
result = list()
p=350299
i =749
interactions = 0 
while i <8659:
    student = dataset["id_student"][i]
    print(i)
    i +=1
    while p <1917865:
        if student == registration['id_student'][p]:
            interactions += registration ["sum_click"][p]
        p+=1
    students.insert(i,student)
    result.insert(i,interactions)
    p=0
    interactions = 0


st = pd.DataFrame(students)#create data frame 
st.to_csv(r'C:\Users\efthi\Desktop\ttest.csv', index=False)#insert data frame to csv       

st = pd.DataFrame(result)#create data frame 
st.to_csv(r'C:\Users\efthi\Desktop\results.csv', index=False)#insert data frame to csv       

这应该在更大的数据集中运行,我认为这更有效地利用我电脑的多个内核

如何实现它以使用所有 4 个内核?

标签: python-3.xmultithreadingoptimizationparallel-processing

解决方案


要并行执行任何功能,您可以执行以下操作:

import multiprocessing
import pandas as pd

def f(x):
    # Perform some function
    return y

# Load your data
data = pd.read_csv('file.csv')
# Look at docs to see why "if __name__ == '__main__'" is necessary
if __name__ == '__main__':
    # Create pool with 4 processors
    pool = multiprocessing.Pool(4)
    # Create jobs
    jobs = []
    for group in data['some_group']:
        # Create asynchronous jobs that will be submitted once a processor is ready
        data_for_job = data[data.some_group == group]
        jobs.append(pool.apply_async(f, (data_for_job, )))
    # Submit jobs
    results = [job.get() for job in jobs]
# Combine results
results_df = pd.concat(results)

无论您执行什么功能,对于多处理您:

  1. 使用所需数量的处理器创建池
  2. 以任何你想分块的方式循环你的数据
  3. 使用该块创建一个作业(使用pool.apply_async()<- 如果它令人困惑,请阅读有关此的文档)
  4. 提交您的工作job.get()
  5. 结合你的结果

推荐阅读