apache-spark - SparkHadoopWriter 在 UserProvider 出现 NPE 失败
问题描述
我正在使用 Spark 将数据写入 Hbase,我可以很好地读取数据,但写入失败并出现以下异常。我发现通过添加 *site.xml 和 hbase JAR 解决了类似的问题。但这对我不起作用。我正在尝试从一个表中读取数据,将数据写入另一个表。我可以很好地读取数据,但在写入时出现异常。
JavaPairRDD<ImmutableBytesWritable, Put> tablePuts = hBaseRDD.mapToPair(new PairFunction<Tuple2<ImmutableBytesWritable, Result>, ImmutableBytesWritable, Put>() {
@Override
public Tuple2<ImmutableBytesWritable, Put> call(Tuple2<ImmutableBytesWritable, Result> results) throws Exception {
byte[] accountId = results._2().getValue(Bytes.toBytes(COLFAMILY), Bytes.toBytes("accountId"));
String rowKey = new String(results._2().getRow();
String accountId2 = (Bytes.toString(accountId));
String prefix = getMd5Hash(rowKey);
String newrowKey = prefix + rowKey;
Put put = new Put( Bytes.toBytes(newrowKey) );
put.addColumn(Bytes.toBytes("def"), Bytes.toBytes("accountId"), accountId);
}
});
Job newAPIJobConfiguration = Job.getInstance(conf);
newAPIJobConfiguration.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, OUT_TABLE_NAME);
newAPIJobConfiguration.setOutputFormatClass(org.apache.hadoop.hbase.mapreduce.TableOutputFormat.class);
newAPIJobConfiguration.setOutputKeyClass(org.apache.hadoop.hbase.io.ImmutableBytesWritable.class);
newAPIJobConfiguration.setOutputValueClass(org.apache.hadoop.io.Writable.class);
tablePuts.saveAsNewAPIHadoopDataset(newAPIJobConfiguration.getConfiguration());
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java 的 org.apache.hadoop.hbase.security.UserProvider.instantiate(UserProvider.java:123) 的线程“主”java.lang.NullPointerException 中的异常:214) org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119) org.apache.hadoop.hbase.mapreduce.TableOutputFormat.checkOutputSpecs(TableOutputFormat.java:177) org.apache。 spark.internal.io.HadoopMapReduceWriteConfigUtil.assertConf(SparkHadoopWriter.scala:387) at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:71) at org.apache.spark.rdd.PairRDDFunctions$$ anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1083) 在 org.apache.spark.rdd。PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081) at org.apache.spark.rdd。 RDDOperationScope$.withScope(RDDOperationScope.scala:151) 在 org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 在 org.apache.spark.rdd.RDD.withScope(RDD.scala:363)在 org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1081) 在 org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopDataset(JavaPairRDD.scala:831) 在 com.voicebase.etl.s3tohbase.HbaseScan2 .main(HbaseScan2.java:148) 在 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 在 sun.reflect.NativeMethodAccessorImpl。invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication .start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879) at org.apache.spark.deploy.SparkSubmit $.doRunMain$1(SparkSubmit.scala:197) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136 ) 在 org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy $SparkSubmit$$runMain(SparkSubmit.scala:879) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala :227) 在 org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) 在 org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy $SparkSubmit$$runMain(SparkSubmit.scala:879) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala :227) 在 org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) 在 org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)doRunMain$1(SparkSubmit.scala:197) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)doRunMain$1(SparkSubmit.scala:197) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
解决方案
推荐阅读
- mysql - 带有本地 mysql 主服务器的 azure mysql 数据库复制失败并出现 2003 错误
- c# - Entity Framework 6 中的 ScriptIgnore 属性破坏了外键属性
- javascript - Codeigniter Ajax 调用 404
- javascript - 将字母宽度设置为画布
- machine-learning - 从 StandardScaler 转移到 MinMaxScaler 会导致问题
- autodesk-forge - 上传较大的 IFS 文件时,Forge 服务真的很慢
- python - 如何即时将 CSV 文件写入 zip
- django - Django 管理员更改列表视图标题
- pdf - 以编程方式编辑 PDF 文件页面上的文本和图像
- azure - 针对不同客户的 Azure 认知服务,可以使用一种模型吗?