首页 > 解决方案 > Scala spark thorws ArrayOutOfBound 异常计数,显示函数

问题描述

我在 scala spark 中运行以下代码,每当我点击 count() 之类的操作函数或显示我得到数组越界异常时。我可以打印架构

val wordsDF = spark.read.format("bigquery")
            .option("table","bigquery-public-data.samples.shakespeare")
      .load()
      .cache()

    wordsDF.printSchema()
    wordsDF.count()
    wordsDF.show()

错误堆栈跟踪

线程“main”中的异常 java.lang.ArrayIndexOutOfBoundsException: com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563) 处的 10582 com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java: 338) 在 com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103) 在 com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90) 在 com.fasterxml.jackson.module.scala.introspect.BeanIntrospector $.getCtorParams(BeanIntrospector.scala:44) at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58) at com.fasterxml.jackson.module.scala.introspect .BeanIntrospector$。$anonfun$apply$1$adapted(BeanIntrospector.scala:58) at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240) at scala.collection.Iterator.foreach(Ite​​rator.scala:937) at scala .collection.Iterator.foreach$(Iterator.scala:937) at scala.collection.AbstractIterator.foreach(Ite​​rator.scala:1425) at scala.collection.IterableLike.foreach(Ite​​rableLike.scala:70) at scala.collection.IterableLike .foreach$(IterableLike.scala:69) at scala.collection.AbstractIterable.foreach(Ite​​rable.scala:54) at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240) at scala.collection.TraversableLike.flatMap$( TraversableLike.scala:237) 在 scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) 在 com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$。findConstructorParam$1(BeanIntrospector.scala:58) at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$19(BeanIntrospector.scala:176) at scala.collection.TraversableLike.$anonfun$map$1( TraversableLike.scala:233) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29) at scala.collection.mutable.ArrayOps$ofRef.foreach( ArrayOps.scala:194) 在 scala.collection.TraversableLike.map(TraversableLike.scala:233) 在 scala.collection.TraversableLike.map$(TraversableLike.scala:226) 在 scala.collection.mutable.ArrayOps$ofRef.map( ArrayOps.scala:194) 在 com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14(BeanIntrospector.scala:170) 在 com.fasterxml。jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14$adapted(BeanIntrospector.scala:169) at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240) at scala.collection.immutable .List.foreach(List.scala:388) 在 scala.collection.TraversableLike.flatMap(TraversableLike.scala:240) 在 scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237) 在 scala.collection.immutable.List .flatMap(List.scala:351) at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.apply(BeanIntrospector.scala:169) at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$._descriptorFor (ScalaAnnotationIntrospectorModule.scala:22) 在 com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.fieldName(ScalaAnnotationIntrospectorModule.scala:30) 在 com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.findImplicitPropertyName(AnnotationIntrospectorPair.java:467) 在 com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.findImplicitPropertyName(ScalaAnnotationIntrospectorModule.scala:78)。 fastxml.jackson.databind.introspect.POJOPropertiesCollector._addFields(POJOPropertiesCollector.java:351) 在 com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:283) 在 com.fasterxml.jackson.databind.introspect。 POJOPropertiesCollector.getJsonValueMethod(POJOPropertiesCollector.java:169) 在 com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findJsonValueMethod(BasicBeanDescription.java:223) 在 com.fasterxml.jackson.databind.ser.BasicSerializerFactory。findSerializerByAnnotations(BasicSerializerFactory.java:348) at com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:210) at com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:153)在 com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1203) 在 com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1157) 在 com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer (SerializerProvider.java:481) 在 com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:679) 在 com.fasterxml.jackson.databind.ser.DefaultSerializerProvider。serializeValue(DefaultSerializerProvider.java:107) at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559) at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927) at org.apache .spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52) 在 org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:142) 在 org.apache.spark.sql.execution.SparkPlan.executeQuery( SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247) at org .apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:296) 在 org.apache.spark.sql.Dataset.$anonfun$count$1(Dataset.scala:2831) 在 org.apache.spark.sql .Dataset.$anonfun$count$1$adapted(Dataset.scala:2830) at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3365) at org.apache.spark.sql.execution。 SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$ .withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3365) at org.apache.spark.sql.Dataset.count(Dataset.scala:2830) at Transform$ .main(Transform.scala:29) 在 Transform.main(Transform.scala)$anonfun$withNewExecutionId$1(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId( SQLExecution.scala:73) 在 org.apache.spark.sql.Dataset.withAction(Dataset.scala:3365) 在 org.apache.spark.sql.Dataset.count(Dataset.scala:2830) 在 Transform$.main( Transform.scala:29) 在 Transform.main(Transform.scala)$anonfun$withNewExecutionId$1(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId( SQLExecution.scala:73) 在 org.apache.spark.sql.Dataset.withAction(Dataset.scala:3365) 在 org.apache.spark.sql.Dataset.count(Dataset.scala:2830) 在 Transform$.main( Transform.scala:29) 在 Transform.main(Transform.scala)29)在Transform.main(Transform.scala)29)在Transform.main(Transform.scala)

使用的 Spark 依赖项

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>2.4.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.12</artifactId>
            <version>2.4.0</version>
        </dependency>

试图弄清楚这里会出现什么问题?

标签: scalaapache-sparkapache-spark-sqldataset

解决方案


通过将 spark 版本更新为 2.4.5 并将 protobuf-java 版本更新为 3.11.4 来修复它。


推荐阅读