首页 > 解决方案 > Passing pointer to pointer to float from Java through JNA to a C dynamic library

问题描述

Catboost offers a dynamic C library, which theoretically can be used from any programming language.

I'm trying to call it through Java using JNA.

I'm having a problem with the CalcModelPrediction function, defined in the header file as follows:

EXPORT bool CalcModelPrediction(
    ModelCalcerHandle* calcer,
    size_t docCount,
    const float** floatFeatures, size_t floatFeaturesSize,
    const char*** catFeatures, size_t catFeaturesSize,
    double* result, size_t resultSize);

In Java, I've defined the interface function as follows:

public interface CatboostModel extends Library {
        public Pointer ModelCalcerCreate();
        public String GetErrorString();
        public boolean LoadFullModelFromFile(Pointer calcer, String filename);
        public boolean CalcModelPrediction(Pointer calcer, int docCount,
                PointerByReference floatFeatures, int floatFeaturesSize,
                PointerByReference catFeatures, int catFeaturesSize,
                Pointer result, int resultSize);
        public int GetFloatFeaturesCount(Pointer calcer);
        public int GetCatFeaturesCount(Pointer calcer);
    }

and then I'm calling it like this:

CatboostModel catboost;
Pointer modelHandle;

catboost = Native.loadLibrary("catboostmodel", CatboostModel.class);
            modelHandle = catboost.ModelCalcerCreate();
if (!catboost.LoadFullModelFromFile(modelHandle, "catboost_test.model"))
{
    throw new RuntimeException("Cannot load Catboost model.");
}

final PointerByReference ppFloatFeatures = new PointerByReference();
final PointerByReference ppCatFeatures = new PointerByReference();
final Pointer pResult = new Memory(Native.getNativeSize(Double.TYPE));

float[] floatFeatures = {0.5f, 0.8f, 0.3f, 0.3f, 0.1f, 0.5f, 0.4f, 0.8f, 0.3f, 0.3f} ;
String[] catFeatures = {"1", "2", "3", "4"};
int catFeaturesLength = 0;
for (String s : catFeatures)
{
    catFeaturesLength += s.length() + 1;
}

try
{
    final Pointer pFloatFeatures = new Memory(floatFeatures.length * Native.getNativeSize(Float.TYPE));
    for (int dloop=0; dloop<floatFeatures.length; dloop++) {
        pFloatFeatures.setFloat(dloop * Native.getNativeSize(Float.TYPE), floatFeatures[dloop]);
    }
    ppFloatFeatures.setValue(pFloatFeatures);

    final Pointer pCatFeatures = new Memory(catFeaturesLength * Native.getNativeSize(Character.TYPE));
    long offset = 0;
    for (final String s : catFeatures) {
        pCatFeatures.setString(offset, s);
        pCatFeatures.setMemory(offset + s.length(), 1, (byte)(0));
        offset += s.length() + 1;
    }
    ppCatFeatures.setValue(pCatFeatures);

}
catch (Exception e)
{
    throw new RuntimeException("Couldn't initialize parameters for catboost");
}

try
{
    if (!catboost.CalcModelPrediction(
                modelHandle,
                1,
                ppFloatFeatures, 10,
                ppCatFeatures, 4,
                pResult, 1
                ))
    {
        throw new RuntimeException("No prediction made: " + catboost.GetErrorString());
    }
    else
    {
        double[] result = pResult.getDoubleArray(0, 1);
        log.info("Catboost prediction: " + String.valueOf(result[0]));
        Assert.assertFalse("ERROR: Result empty", result.length == 0);
    }
}
catch (Exception e)
{
    throw new RuntimeException("Prediction failed: " + e);
}

I've tried passing Pointer, PointerByReference and Pointer[] to the CalcModelPrediction function in place of float **floatFeatures and char ***catFeatures but nothing worked. I always get a segmentation fault, presumably when the CalcModelPrediction function attempts to get elements of floatFeatures and catFeatures by calling floatFeatures[0][0] and catFeatures[0][0].

So the question is, what's the right way of passing a multi-dimensional array from Java through JNA into C, where it could be treated as a pointer to a pointer to a value?

An interesting thing is that the CalcModelPredictionFlat function that accepts only float **floatFeatures and then simply calls *floatFeatures, works perfectly fine when passing PointerByReference.

UPDATE - 5.5.2018

Part 1

After trying to debug the segfault by slightly modifying the original Catboost .cpp and .h files and recompiling the libcatboost.so library, I found out that the segfault was due to me mapping size_t in C to int in Java. After fixing this, my interface function in Java looks like this:

public interface CatboostModel extends Library {
        public boolean LoadFullModelFromFile(Pointer calcer, String filename);
        public boolean CalcModelPrediction(Pointer calcer, size_t docCount,
                Pointer[] floatFeatures, size_t floatFeaturesSize,
                String[] catFeatures, size_t catFeaturesSize,
                Pointer result, size_t resultSize);
    }

Where the size_t Class is defined as follows:

public static class size_t extends IntegerType {                                           
    public size_t() { this(0); }                                                           
    public size_t(long value) { super(Native.SIZE_T_SIZE, value); }                       
}

Part 2 Looking more into the Catboost code, I've noticed that **floatFeatures are being accessed by rows, like floatFeatures[i] while ***catFeatures are accessed by rows and columns, like catFetures[i][catFeatureIdx].

After changing floatFeatures in Java to an array of Pointer, my code started to work with the model trained without categorical features, i.e. catFeatures length is zero.

This trick, however, didn't work with catFeatures that are accessed through a double subscript operator [i][catFeatureidx]. So for now, I modified the original Catboost code so that it would accept char **catFeatures - an array of strings. In Java interface function, I set String[] catFeatures. Now I can make predictions for one element at a time, which is not ideal.

标签: javacjnacatboost

解决方案


我已经设法使用原始 Catboost 代码和libcatboost.so.

Java接口函数是这样定义的。请注意,为了模拟浮点值和字符串的二维数组(或指向指针的指针),我使用了Pointer[]type.

public interface CatboostModel extends Library {
        public boolean LoadFullModelFromFile(Pointer calcer, String filename);
        public boolean CalcModelPrediction(Pointer calcer, size_t docCount,
                Pointer[] floatFeatures, size_t floatFeaturesSize,
                Pointer[] catFeatures, size_t catFeaturesSize,
                Pointer result, size_t resultSize);
    }

之后,我像这样填充floatFeaturescatFeatures参数(这里有一些虚拟数据)。请注意,对于字符串,我使用的是 JNA 的StringArray.

float[] floatFeatures = {0.4f, 0.8f, 0.3f, 0.3f, 0.1f, 0.5f, 0.4f, 0.8f, 0.3f, 0.3f} ;
String[] catFeatures = {"1", "2", "3", "4"};

final Pointer pFloatFeatures = new Memory(floatFeatures.length * Native.getNativeSize(Float.TYPE));
final Pointer[] ppFloatFeatures = new Pointer[2];
for (int dloop=0; dloop<10; dloop++) {
    pFloatFeatures.setFloat(dloop * Native.getNativeSize(Float.TYPE), floatFeatures[dloop]);
}
ppFloatFeatures[0] = pFloatFeatures;
ppFloatFeatures[1] = pFloatFeatures;

final Pointer[] ppCatFeatures = new Pointer[catFeatures.length];
final Pointer pCatFeatures = new StringArray(catFeatures);
ppCatFeatures[0] = pCatFeatures;
ppCatFeatures[1] = pCatFeatures;

最后,我将这些参数传递给 Catboost:

if (!catboost.CalcModelPrediction(
                modelHandle,
                new size_t(2L),
                ppFloatFeatures, new size_t((long)floatFeatures.length),
                ppCatFeatures, new size_t((long)catFeatures.length),
                pResult, new size_t(2L)
                ))
{
    throw new RuntimeException("No prediction made: " + catboost.GetErrorString());
}

为了得到预测,我们可以这样做:

double[] result = pResult.getDoubleArray(0, 2);

推荐阅读