apache-spark - 调用 sc.textFile("...").take(1)[0] 时出现 StringIndexOutOfBoundsException
问题描述
我正在尝试通过textFile()
方法读取文件。但是,当我在它之后尝试take()
方法时,StringIndexOutOfBoundsException
会引发异常。该文件确实存在。
schema_string = sc.textFile(schema_location).take(1)[0]
我收到的错误信息如下。
File "/home/spark-current/python/lib/pyspark.zip/pyspark/rdd.py", line 1376, in first
File "/home/spark-current/python/lib/pyspark.zip/pyspark/rdd.py", line 1325, in take
File "/home/spark-current/python/lib/pyspark.zip/pyspark/rdd.py", line 389, in getNumPartitions
File "/home/spark-current/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/home/spark-current/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/home/spark-current/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o136.partitions.
: org.apache.hadoop.fs.azure.AzureException: java.lang.StringIndexOutOfBoundsException: String index out of range: 7
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:942)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:439)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1174)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2812)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:265)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:236)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.api.java.JavaRDDLike$class.partitions(JavaRDDLike.scala:61)
at org.apache.spark.api.java.AbstractJavaRDDLike.partitions(JavaRDDLike.scala:45)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 7
解决方案
推荐阅读
- asp.net - WinForms 服务器处理 ASP.NET Web
- azure - 在 Azure 中使用异地复制时,如何在应用服务中设置配置字符串?
- java - 使用环境变量进行 Spring Boot 集成测试
- excel - 作者姓名转换的Excel公式
- javascript - Express.js 应用程序在请求 url 中添加一个额外的斜杠
- php - 试图将字符串更改为 Date PHP
- c# - 解析 Azure 搜索查询筛选器
- javascript - 如何通过单击从循环的 js 对象中获取的导航栏项来使用 useState 来切换下拉菜单
- java - 二进制 XML 文件第 17 行:膨胀类 com.camerakit.CameraKitView 时出错
- sql - BigQuery:为表中的缺失列填充空值