首页 > 解决方案 > 映射数据集行获取“此表达式的目标类型必须是功能接口”


尝试映射到数据集行但有很多问题,我在map(r -> new MapFunction<r, List<Tuple3<Long, Integer, Double>>>(). 代码如下:

Dataset<Object> df1 = session.read().parquet(tableName).as(Encoders.bean(Object.class));

        JavaRDD<List<Tuple3<Long, Integer, Double>>> tempData = df1.map(r -> new MapFunction<r, List<Tuple3<Long, Integer, Double>>>(){

            // to get each sample individually
            List<Tuple2<String, Integer>> samples = zipWithIndex((r.getString(9).trim().split(",")));

            // Gets the unique identifier for that pos.
            Long snp = r.getString(1);

            // Calculates the distance for this pos for each sample.
            // i.e. 0|0 => 0, 0|1 => 1, 1|0 => 1, 1|1 => 2
            return samples.stream().map(t -> {
                String alleles = t._1();
                Integer patient = t._2();

                List<String> values = Arrays.asList(alleles.split("\\|"));

                Double firstAllele = Double.parseDouble(values.get(0));
                Double secondAllele = Double.parseDouble(values.get(1));

                // Returns the initial SNP id, patient id and the distance in form of Tuple.
                return new Tuple3<>(snp, patient, firstAllele + secondAllele);


标签: javaapache-sparkjava-8functional-programmingparquet



假设您指的是 org.apache.spark.api.java.function.MapFunction.


  1. r在行中的泛型中 使用as 类型MapFunction<r ..>

  2. 匿名函数的定义不正确,要么用 lambda 表达式说,


JavaRDD<List<Tuple3<Long, Integer, Double>>> tempData = df1.map(r -> () {
// your definition



JavaRDD<List<Tuple3<Long, Integer, Double>>> tempData = df1.map(r -> new MapFunction<theRespectiveType, List<Tuple3<Long, Integer, Double>>>(){
public List<Tuple3<Long, Integer, Double>> call(theRespectiveType variable)
// your implementation here

