首页 > 解决方案 > 可以将动态数组中的元素存储在 HDF5 的不同列中吗?

问题描述

我正在尝试将来自 C++ 的模拟数据存储在 HDF5 中(稍后我将在 Python + Pandas 中分析这些数据)。我的目标是尝试在 C++ 中正确组织所有数据,所以以后我只需要阅读它。

我的问题是尝试将动态数组存储在 HDF5 的不同列中:我正在使用 H5::VarLenType 来存储数组。我成功了,但是我将数组放在一个列中,这对我来说并不方便:我需要一个列中的每个值。

如果我使用固定大小的数组,但不使用hvl_t数据类型的临时缓冲区,我可以做到这一点。如果我对可变长度数组使用相同的方法(迭代循环并手动计算偏移量并添加数据类型),我会得到垃圾数据。

我在这个 SO 答案中学到了这种方法

这是我的概念证明,稍后我会将其添加到我的项目中。

#include <stddef.h>

#include <cstring>
#include <string>
#include <sstream>
#include <iostream>
#include "H5Cpp.h"

const int MAX_NAME_LENGTH = 32;
const int N_PLACES = 3;
const int N_ROWS = 3;
const std::string FileName("SimulationResults-test.h5");
const std::string DatasetName("SimulationData");
const std::string member_simulation("Simulation");
const std::string member_iteration("Iteration");
const std::string member_time_elapsed("Time_elapsed");
const std::string member_place_states("States");
const std::string member_fired_transition("Fired_transition");

typedef struct {
    int simulation;
    int iteration;
    double time_elapsed;
    char fired_transition[MAX_NAME_LENGTH];
    int * place_states;
} SimulationData;

typedef struct {
    int simulation;
    int iteration;
    double time_elapsed;
    char fired_transition[MAX_NAME_LENGTH];  // MAX_NAME_LENGTH
    hvl_t place_states;      // N_PLACES
} SimulationData_buffer;

int main(void) {
    // Data to write
    SimulationData states_simulation[N_ROWS];
    SimulationData_buffer states_simulation_buffer[N_ROWS];
    // {
    //     { 1, 0, 0.0, {0, 0, 0},  "T1"   },
    //     { 1, 1, 1.0, {0, 1, 0},  "T2"   },
    //     { 1, 2, 5.0, {0, 0, 1},  "T1"   }
    // };

    for (int i = 0; i< N_ROWS; i++) {
      states_simulation[i].simulation = 1;
      states_simulation[i].iteration = 0;
      states_simulation[i].time_elapsed = 0.0;


      // states_simulation[i].fired_transition = "T1";
      strncpy(states_simulation[i].fired_transition, "T1",
              sizeof(states_simulation[i].fired_transition) - 1);
      states_simulation[i].fired_transition[sizeof(states_simulation[i].fired_transition) - 1] = 0;

      states_simulation[i].place_states = new int[N_PLACES];

      states_simulation[i].place_states[0] = 0;
      states_simulation[i].place_states[1] = 10;
      states_simulation[i].place_states[2] = 20;
    }


    // Number of rows
    hsize_t dim[] = {sizeof(states_simulation) / sizeof(SimulationData)};

    // Dimension of each row
    int rank = sizeof(dim) / sizeof(hsize_t);

    // defining the datatype to pass HDF5
    H5::CompType mtype(sizeof(SimulationData_buffer));
    mtype.insertMember(member_simulation,
                      HOFFSET(SimulationData, simulation),
                      H5::PredType::NATIVE_INT);
    mtype.insertMember(member_iteration,
                      HOFFSET(SimulationData, iteration),
                      H5::PredType::NATIVE_INT);
    mtype.insertMember(member_time_elapsed,
                      HOFFSET(SimulationData, time_elapsed),
                      H5::PredType::NATIVE_DOUBLE);

    mtype.insertMember(member_fired_transition,
                      HOFFSET(SimulationData, fired_transition),
                      H5::StrType(H5::PredType::C_S1, MAX_NAME_LENGTH));


    auto vlen_id_places = H5::VarLenType(H5::PredType::NATIVE_INT);

    // Set different columns for the array  <-------------------------
    // auto offset = HOFFSET(SimulationData, place_states);
    // for (int i = 0; i < N_PLACES; i++) {
    //   std::stringstream ss;
    //   ss << "Place_" << i+1;
    //   auto new_offset = offset + i*sizeof(int);
    //   std::cout << offset << " -> " << new_offset <<  std::endl;
    //   mtype.insertMember(ss.str(),
    //                     new_offset,
    //                     H5::PredType::NATIVE_INT);
    // }
    // Set the column as an array <-----------------------------------
    mtype.insertMember("Places", HOFFSET(SimulationData, place_states), vlen_id_places);


    // Filling buffer
    for (int i = 0; i < N_ROWS; ++i) {
      states_simulation_buffer[i].simulation = states_simulation[i].simulation;
      states_simulation_buffer[i].iteration = states_simulation[i].iteration;
      states_simulation_buffer[i].time_elapsed = states_simulation[i].time_elapsed;


      strncpy(states_simulation_buffer[i].fired_transition,
              states_simulation[i].fired_transition,
              MAX_NAME_LENGTH);

      states_simulation_buffer[i].place_states.len = N_PLACES;
      states_simulation_buffer[i].place_states.p = states_simulation[i].place_states;

    }


    // preparation of a dataset and a file.
    H5::DataSpace space(rank, dim);
    H5::H5File *file = new H5::H5File(FileName, H5F_ACC_TRUNC);
    H5::DataSet *dataset = new H5::DataSet(file->createDataSet(DatasetName,
                                                              mtype,
                                                              space));

    H5::DataSet *dataset2 = new H5::DataSet(file->createDataSet("Prueba2",
                                                              mtype,
                                                              space));
    // Write
    dataset->write(states_simulation_buffer, mtype);
    dataset2->write(states_simulation_buffer, mtype);

    delete dataset;
    delete file;
    return 0;
}

可以用g++ h5-test-dynamic.cpp -lhdf5 -lhdf5_cpp -o h5-test-dynamic.

如前所述,我需要每个值一列,而不是单列中的数组。我不知道为什么它不起作用,因为我已经hvl_t正确设置了变量的指针和偏移量。如果我打开手动处理偏移量和数据类型的代码块并稍后立即关闭它,我会得到垃圾值。

这就是我得到的

[(1, 0, 0., b'T1', 3, 0, -971058832),
 (1, 0, 0., b'T1', 3, 0, -971058800),
 (1, 0, 0., b'T1', 3, 0, -971058768)]

这是我能得到的最好的

[(1, 0, 0., b'T1', array([ 0, 10, 20], dtype=int32)),
 (1, 0, 0., b'T1', array([ 0, 10, 20], dtype=int32)),
 (1, 0, 0., b'T1', array([ 0, 10, 20], dtype=int32))]

标签: c++arraysdynamicstructhdf5

解决方案


推荐阅读