首页 > 解决方案 > 如何使用带有简单浮点 I/O 数据的 Pytorch C++ API 的数据集?

问题描述

我正在尝试使用 Pytorch C++ API 来训练一个具有浮点输入和输出值的简单神经网络。值保存在“input.csv”和一个 output.csv“文件中。只有 10x10 个输入和输出表示否定。因此,如果输入为 1,则输出为 0。

我能够通过手动创建输入和输出张量来构建一个简单的神经网络。代码在我的 master 分支上,你可以在这里找到:master branch repo

但现在我想使用torch::data::make_data_loadertorch::data::datasets制作我自己的自定义数据集,并且能够打乱我的数据并为训练循环创建批量大小。

不幸的是,pytorch cpp API 的所有示例和教程都只展示了如何进行复杂的图像识别。对于像我这样的编程初学者来说,这些例子太复杂了。(几个月前才开始编程)

我自己尝试过创建一个自定义数据集类,如大多数示例中所示,然后使用该类制作数据加载器。你可以在这里找到它:数据集测试分支

我一直在关注中文教程,因为这是唯一对初学者友好的教程:中文教程

我想我想要完成的是这个python教程中的东西: pytorch dataset loader with python

这是我在 src/owndata.cpp 中定义的类:

#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <torch/torch.h>
#include <vector>
//#include "include/owndata.h"

//----------------------Funktionsprototyp Onelinevector----------------------

std::vector<float>
onelinevector(const std::vector<std::vector<float>> &invector);

//------------------------Funktionsprototyp csv2Dvector----------------------

std::vector<std::vector<float>> csv2Dvector(std::string inputFileName);

//-------------------Custom Dataset Class------------------------------------

class CustomDataset : public torch::data::datasets::Dataset<CustomDataset> {

private:
std::vector<std::vector<float>> xdata, ydata;
torch::Tensor outputtensor, inputtensor;
public:
CustomDataset(std::string xpath, std::string ypath) {
    xdata = csv2Dvector(xpath);
    ydata = csv2Dvector(ypath);

    //create input tensor:
    unsigned int ivsize = xdata.size() * xdata.front().size();
    std::vector<float> olinevec(ivsize);
    olinevec = onelinevector(xdata);

    inputtensor = torch::from_blob(
      olinevec.data(), {static_cast<unsigned int>(xdata.size()),
                        static_cast<unsigned int>(xdata.front().size())});
    //create output tensor:
    unsigned int ovsize = ydata.size() * ydata.front().size();
    std::vector<float> iovec(ovsize);
    iovec = onelinevector(ydata);

    outputtensor = torch::from_blob(
      iovec.data(), {static_cast<unsigned int>(ydata.size()),
                        static_cast<unsigned int>(ydata.front().size())});
};

torch::data::Example<> get(size_t index) override {

    torch::Tensor sample_input = inputtensor[index];
    torch::Tensor sample_output = outputtensor[index];

    return {sample_input.clone(), sample_output.clone()};
};

// Return the length of data
torch::optional<size_t> size() const override {
    return ydata.size();
  };

};

size_t batch_size = 5;

//-----------------------------NETZDEFINITION-----------------------------
struct MeinNetz : torch::nn::Module {
  MeinNetz() {
    fc1 = register_module("fc1", torch::nn::Linear(10, 10));
    fc2 = register_module("fc2", torch::nn::Linear(10, 10));
  }

  torch::Tensor forward(torch::Tensor x) {
    x = torch::relu(fc1->forward(x));
    x = fc2->forward(x);
    return x;
  }

  torch::nn::Linear fc1{nullptr}, fc2{nullptr};
};


//-------------------------------------main-Funktion-------------------------

int main() {

  auto custom_dataset = CustomDataset("input.csv","output.csv");
  auto data_loader = torch::data::make_data_loader<torch::data::samplers::SequentialSampler>(std::move(custom_dataset),batch_size);

  auto dataset_size = custom_dataset.size().value();
  int n_epochs = 50;

  auto net = std::make_shared<MeinNetz>();

  torch::optim::SGD optimizer(net->parameters(), 0.2);

  for(int epoch=1; epoch<=n_epochs; epoch++) {
    for(auto& batch: *data_loader) {
      auto data = batch.data;
      auto target = batch.target;

      data = data.to(torch::kF32);
      target = target.to(torch::kF32);

      optimizer.zero_grad();

      auto prediction = net->forward(data);
      auto loss = torch::mse_loss(prediction, target);

      loss.backward();

      optimizer.step();

      std::cout << "Epoch: " << epoch << " Loss: " 
      << loss.item<float>() << std::endl;
    }
  }

  return 0;
}

//--------------------------Funktionen--------------------------------------------

std::vector<float>
onelinevector(const std::vector<std::vector<float>> &invector) {

  std::vector<float> v1d;
  if (invector.size() == 0)
    return v1d;
  v1d.reserve(invector.size() * invector.front().size());

  for (auto &innervector : invector) {
    v1d.insert(v1d.end(), innervector.begin(), innervector.end());
  }

  return v1d;
}

//-------------------------csv2vector Funktionsdefinition--------------------------

std::vector<std::vector<float>> csv2Dvector(std::string inputFileName) {
  using namespace std;

  vector<vector<float>> data;
  ifstream inputFile(inputFileName);
  int l = 0;

  while (inputFile) {
    l++;
    string s;
    if (!getline(inputFile, s))
      break;
    if (s[0] != '#') {
      istringstream ss(s);
      vector<float> record;

      while (ss) {
        string line;
        if (!getline(ss, line, ','))
          break;
        try {
          record.push_back(stof(line));
        } catch (const std::invalid_argument e) {
          cout << "NaN found in file " << inputFileName << " line " << l
               << endl;
          e.what();
        }
      }

      data.push_back(record);
    }
  }

  if (!inputFile.eof()) {
    cerr << "Could not read file " << inputFileName << "\n";
    throw invalid_argument("File not found.");
  }

  return data;
}

这根本行不通。

PS:pytorch repo 中还有一个 Python 示例,但它比中文示例复杂得多:python example for image import 2。

标签: c++machine-learningpytorch

解决方案


推荐阅读