首页 > 解决方案 > 无法读取管道分隔的 csv

问题描述

输入数据:

Ord_value|other_data
12345|u1=876435;u5=4356|4357|4358;u15=Mr. Noodles,n/a,Great Value;u16=0.77,4.92,7.96;u17=4,1,7;

U 变量的详细信息

U1= order I'd --single value
U5= pid --is a list
U15= name --is a list
U16= price -- is a list
U17= quantity -- is a list

输出:

Ord_value|orderid|pid|name|price|quantity
12345|876435|4356|Mr. Noodles|0.77|4
12345|876435|4357|n/a|4.92|1

我尝试使用分号读取文件

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.functions._

object example3 {
def main(args:Array[String]):Unit={
      System.setProperty("hadoop.home.dir", "C:\\hadoop\\")
      val conf=new SparkConf().setAppName("first_demo").setMaster("local[*]")
      val sc=new SparkContext(conf)
       val spark=SparkSession.builder().getOrCreate()
        spark.sparkContext.setLogLevel("Error")
      import spark.implicits._
    // val rdd1=sc.textFile("file:///C://Users//User//Desktop//example3.txt")
    // rdd1.map(x=>x.split(";")).foreach(println)
      spark.read.option("delimiter",";").option("header","true").load("file:///C://Users//User//Desktop//example3.txt").show()
}
}

无法读取文件。正在超越错误。它看起来像复杂的文件。

Caused by: java.io.IOException: Could not read footer for file: FileStatus{path=file:/C:/Users/User/Desktop/example3.txt; isDirectory=false; length=95; replication=0; blocksize=0; modification_time=0; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false}

我又试了

val df1=spark.read.option("delimiter","|").csv("file:///C://Users//User//Desktop//example3.txt")

+-----+-----------------+----+------------------------------------------------------------------+
|_c0  |_c1              |_c2 |_c3                                                               |
+-----+-----------------+----+------------------------------------------------------------------+
|12345|u1=876435;u5=4356|4357|4358;u15=Mr. Noodles,n/a,Great Value;u16=0.77,4.92,7.96;u17=4,1,7;|
+-----+-----------------+----+------------------------------------------------------------------+

标签: scalaapache-spark

解决方案


推荐阅读