首页 > 解决方案 > 映射数据集行获取“此表达式的目标类型必须是功能接口”

问题描述

尝试映射到数据集行但有很多问题,我在map(r -> new MapFunction<r, List<Tuple3<Long, Integer, Double>>>(). 代码如下:

Dataset<Object> df1 = session.read().parquet(tableName).as(Encoders.bean(Object.class));

        JavaRDD<List<Tuple3<Long, Integer, Double>>> tempData = df1.map(r -> new MapFunction<r, List<Tuple3<Long, Integer, Double>>>(){


            // to get each sample individually
            List<Tuple2<String, Integer>> samples = zipWithIndex((r.getString(9).trim().split(",")));

            // Gets the unique identifier for that pos.
            Long snp = r.getString(1);

            // Calculates the distance for this pos for each sample.
            // i.e. 0|0 => 0, 0|1 => 1, 1|0 => 1, 1|1 => 2
            return samples.stream().map(t -> {
                String alleles = t._1();
                Integer patient = t._2();

                List<String> values = Arrays.asList(alleles.split("\\|"));

                Double firstAllele = Double.parseDouble(values.get(0));
                Double secondAllele = Double.parseDouble(values.get(1));

                // Returns the initial SNP id, patient id and the distance in form of Tuple.
                return new Tuple3<>(snp, patient, firstAllele + secondAllele);
            }).collect(Collectors.toList());
        });

任何帮助,将不胜感激。

标签: javaapache-sparkjava-8functional-programmingparquet

解决方案


由于缺乏明确性,我只是给出我未经测试的建议。

假设您指的是 org.apache.spark.api.java.function.MapFunction.

我看到一些系列的问题,

  1. r在行中的泛型中 使用as 类型MapFunction<r ..>

  2. 匿名函数的定义不正确,要么用 lambda 表达式说,

```

JavaRDD<List<Tuple3<Long, Integer, Double>>> tempData = df1.map(r -> () {
// your definition
}

```

或者,将您的代码移动到类的方法中,```

JavaRDD<List<Tuple3<Long, Integer, Double>>> tempData = df1.map(r -> new MapFunction<theRespectiveType, List<Tuple3<Long, Integer, Double>>>(){
public List<Tuple3<Long, Integer, Double>> call(theRespectiveType variable)
{
// your implementation here
});

```


推荐阅读