首页 > 解决方案 > 将多个 GPU (CNTK) 与 C++ cntk 一起使用

问题描述

我使用 CNTK 库在单个 CPU 或 GPU 上学习 LSTM。我不明白如何更改我的代码并在多个 GPU (CPU) 上运行训练。我目前的代码是:

labels = InputVariable({ numOutputClasses }, DataType::Float, L"labels");
trainingLoss = CrossEntropyWithSoftmax(lstmModel, labels, 
L"lossFunction");
prediction   = CrossEntropyWithSoftmax(lstmModel, labels, 
L"classificationError");

//create learner 
paramLearner = AdamLearner(lstmModel->Parameters(),
                          learningRate,
                                momentumSchedule, 
                                false);

//create trainer
trainer = CreateTrainer(lstmModel, trainingLoss, prediction, vector<LearnerPtr>({ paramLearner }));

sampleShape = { inputDim };
labelsShape = { numOutputClasses };

classifierOutputVar = lstmModel->Output();
unordered_map<Variable, ValuePtr> argumentsOut;
double trainLossValue;

// run train
for (size_t i = 1; i <= countEpoch; ++i)
{
    cout << "Epoch " << i << ":" << countEpoch << endl;

    for (int k = 0; k < inputData.size(); ++k) 
    {
        argumentsOut = { { classifierOutputVar, outputValue },
                         { prediction, predictionErrorValue } };

        featuresValue = Value::Create(sampleShape, inputData.at(k),  device);
        labelValue    = Value::Create(labelsShape, labelsData.at(k), device);
        argumentsIn = { { features, featuresValue }, { labels, labelValue } };

        trainer->TrainMinibatch(argumentsIn, true, argumentsOut, device);
        argumentsIn.clear();

        trainLossValue = trainer->PreviousMinibatchLossAverage();
        cout << "\tBatch " << k + 1 << ":" << inputData.size() << "\ttrainLossValueBatch: " << trainLossValue << endl;
    }

    saveModel(path);        
}

试图自行解决的尝试失败了:

auto sync = MPICommunicator();

auto numWorkers = sync->Workers().size();
auto workerRank = sync->CurrentWorker().m_globalRank;

labels = InputVariable({ numOutputClasses }, DataType::Float, L"labels");
trainingLoss = CrossEntropyWithSoftmax(lstmModel, labels, L"lossFunction");
prediction = ClassificationError(lstmModel, labels, L"classificationError");

paramLearner = FSAdaGradLearner(lstmModel->Parameters(),
                                learningRate,
                                momentumSchedule,
                                false);

DistributedLearnerPtr distributedLearner =
    CreateDataParallelDistributedLearner(MPICommunicator(), paramLearner, 0);

trainer = CreateTrainer(lstmModel, trainingLoss, prediction, { distributedLearner });

目前尚不清楚如何在多个 GPU(CPU)上运行......我知道您需要使用 CreateCompositeMinibatchSource 创建一个 MinibatchSource,只是不清楚如何使用数组或 MFCC 序列容器创建一个 MinibatchSourceConfig 对象。

标签: gpulstmcntk

解决方案


推荐阅读