首页 > 解决方案 > 逐行读取数据集并将每个空格分隔的行转换为 Scala 中的逗号分隔

问题描述

我有一个长度为 200 的数据集,每行数据的长度也是 200。这个数据集是空格分隔的。这是示例数据集(第一行)。

-0.1100208269729097 0.1248460463105589 -0.01559138588255286 -0.01625839428292603 -0.05323888667281371 0.06722185430549973 -0.0490877148079949 -0.05039368886946847 0.0897270838973875 0.00754589058726465 -0.06693447805463611 -0.1193740974362337 -0.02214573804045866 0.02930806967704801 -0.009567144727872222 -0.02288991169653539 0.04256313697292451 -0.08190168271952417 0.008274133732539695 -0.02299227162395361 0.0111923018567119 -0.009872522389769637 0.06866110814693088 0.04622954799009332 0.05498202029091768 -0.06672541846259043 -0.05130079655965012 0.1107659505844031 0.07912810279475517 0.02246390669165305 -0.06997067603392053 -0.02069109953229961 -0.05191987832821615 -0.01971016519416264 -0.008691704006401698 -0.02963829527404451 0.02332929010677706 -0.1035585589634834 0.03801924036385142 -0.07035181096148016 -0.02460761051792025 0.05545479574143786 0.06632500394350074 -0.01693623441811409 -0.0202000922412099 0.0387732166529701 -0.06835009268170482 -0.06684471565316714 0.09737868086728406 -0.03776102176325794 -0.03087980353481784 -0.04630278791951752 -0.1129739647985331 0.09622849675187727 0.05975310144103099 -0.08083650075114446 0.05258346559791484 0.05583993856089118 -0.03916345795047688 -0.2981097687887527 0.04087798461219992 0.07153463501552468 0.07113045074135986 0.01717619972420815 -0.01893649865573213 -0.007503347735166889 0.06551854299072507 -0.005153581328393866 -0.08659840104899437 0.04864888731854276 0.08965801176651583 -0.004562179660153576 -0.1252787635844004 0.06896990208188783 -0.003925090827015415 -0.05755687748680104 -0.02544736897698906 0.02530385776038159 -0.125784848738536 0.07433650535349738 0.02153916317259382 0.04738213124034089 -0.03299623626264642 0.02073383160046674 -0.008966711746564809 0.04983292315200202 0.01974696673478601 -0.04419678420395467 -0.02442715323795661 -0.0694663145847256 0.1101497271416977 0.04200639135007367 -0.06082113335723243 -0.01473508072467703 0.01142600017146485 -0.03532257289246362 -0.02260329422449697 0.05396810070565884 0.1581078158241939 -0.05426153505070038 -0.01534772560258162 -0.04461245038675606 -0.05082561044342486 0.003953621713155758 -0.09395992245069541 0.02029879424655968 0.09397373054431565 -0.01540603811173099 -0.00188325436669238 0.07341578917873427 -0.07930228379622654 -0.01519407550785842 0.01388266474816023 0.09152064522133056 0.0106446218365201 -0.2157572256227169 0.04804075039482639 0.01970079327929429 -0.04738197196862703 0.06770927522186629 0.1006260778362594 -0.06299061441376895 0.02961951153113571 0.01572783315493193 0.1349089347411493 -0.0242042239418958 -0.07337276266118564 -0.09620055007994345 0.04754719051788902 -0.04777964847293222 0.01477148963357754 0.06678924792453055 0.05579081171364433 -0.03405429131223387 0.03615588517175376 -0.1554971840439641 -0.04581567263300179 -0.07873107398807083 0.05966093431149457 -0.128446162280915 -0.05912532817875745 0.1194692701951161 0.1103496401807509 0.0153127716173752 0.01607453121383664 -0.07114032721360454 0.03276185612322021 0.1169776569257143 0.07706242373764424 0.04889932405415184 0.0008715101384050066 0.006894007893755344 0.04519320187367908 -0.001306669064508431 0.0291067296150834 -0.02697983215093226 -0.07374490898814057 -0.04408652590757124 0.118965444980577 0.08668199929217432 0.02704832616237655 0.01473294258443707 0.02049896556673346 -0.0569226246137925 -0.0120183686689177 -0.1007080842912528 0.03517628230997978 -0.2003177929062758 0.01491215547976228 0.04590546935765301 0.1670139443078561 -0.05992676476987346 0.07038240324837636 -0.003567431692839979 0.08197255057946093 -0.01384071718153512 0.01443837418022523 -0.0393556604031245 0.003264844777785919 -0.190455395258628 -0.09122702488367737 -0.007113243408323287 0.1221344569965773 -0.06583221256210335 0.002275841418885295 -0.02418590378253777 -0.02462843336523757 -0.1054326841702153 -0.009075125286585313 0.05233463322601897 -0.09944517224527978 0.08201627957443283 0.1144830692826725 -0.1488155291532296 0.001711351371442085 0.06463339531524601 0.02089587578959802 -0.05699940762150812 0.01798950350182588 -0.01642350646709232

我尝试按照以下方式将其转换为逗号分隔的数据。这是我的代码

val bufferedSource1 = Source.fromFile(Path1 + name)
val lines1 : Iterator[String] = bufferedSource1.getLines()

val lines2 = lines1.toArray
println( lines2(0).toList )

最后一行代码的结果是

List(-, 0, ., 1, 1, 0, 0, 2, 0, 8, 2, 6, 9, 7, 2, 9, 0, 9, 7,  , 0, ., 1, 2, 4, 8, 4, 6, 0, 4, 6, 3, 1, 0, 5, 5, 8, 9,  , -, 0, ., 0, 1, 5, 5, 9, 1, 3, 8, 5, 8, 8, 2, 5, 5, 2, 8, 6,  , -, 0, ., 0, 1, 6, 2, 5, 8, 3, 9, 4, 2, 8, 2, 9, 2, 6, 0, 3,  , -, 0, ., 0, 5, 3, 2,.........

这将返回我单个字符,但我想要完整的行,它将被空格分隔。我该如何解决这个问题?

这是剩余的转换代码

 val data1 : Array[Array[Double]]  = lines2.flatMap{xz : String =>
  Seq (xz.replaceAll(" ", ",").split(",").map(_.toDouble) )
}.toArray

标签: scalaio

解决方案


 import spark.implicits._

  val ds = List("-0.1100208269729097 0.1248460463105589 -0.01559138588255286 -0.01625839428292603 -0.05323888667281371 0.06722185430549973 -0.0490877148079949 -0.05039368886946847 0.0897270838973875 0.00754589058726465 -0.06693447805463611 -0.1193740974362337 -0.02214573804045866 0.02930806967704801 -0.009567144727872222 -0.02288991169653539 0.04256313697292451 -0.08190168271952417 0.008274133732539695 -0.02299227162395361 0.0111923018567119 -0.009872522389769637 0.06866110814693088 0.04622954799009332 0.05498202029091768 -0.06672541846259043 -0.05130079655965012 0.1107659505844031 0.07912810279475517 0.02246390669165305 -0.06997067603392053 -0.02069109953229961 -0.05191987832821615 -0.01971016519416264 ","-0.1100208269729097 0.1248460463105589 -0.01559138588255286 -0.01625839428292603 -0.05323888667281371 0.06722185430549973 -0.0490877148079949 -0.05039368886946847 0.0897270838973875 0.00754589058726465 -0.06693447805463611 -0.1193740974362337 -0.02214573804045866 0.02930806967704801 -0.009567144727872222 -0.02288991169653539 0.04256313697292451 -0.08190168271952417 0.008274133732539695 -0.02299227162395361 0.0111923018567119 -0.009872522389769637 0.06866110814693088 0.04622954799009332 0.05498202029091768 -0.06672541846259043 -0.05130079655965012 0.1107659505844031 0.07912810279475517 0.02246390669165305 -0.06997067603392053 -0.02069109953229961 -0.05191987832821615 -0.01971016519416264 ").toDS()

  ds.map(i=> i.split(" ").mkString(",")).show(false)



-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|-0.1100208269729097,0.1248460463105589,-0.01559138588255286,-0.01625839428292603,-0.05323888667281371,0.06722185430549973,-0.0490877148079949,-0.05039368886946847,0.0897270838973875,0.00754589058726465,-0.06693447805463611,-0.1193740974362337,-0.02214573804045866,0.02930806967704801,-0.009567144727872222,-0.02288991169653539,0.04256313697292451,-0.08190168271952417,0.008274133732539695,-0.02299227162395361,0.0111923018567119,-0.009872522389769637,0.06866110814693088,0.04622954799009332,0.05498202029091768,-0.06672541846259043,-0.05130079655965012,0.1107659505844031,0.07912810279475517,0.02246390669165305,-0.06997067603392053,-0.02069109953229961,-0.05191987832821615,-0.01971016519416264|
|-0.1100208269729097,0.1248460463105589,-0.01559138588255286,-0.01625839428292603,-0.05323888667281371,0.06722185430549973,-0.0490877148079949,-0.05039368886946847,0.0897270838973875,0.00754589058726465,-0.06693447805463611,-0.1193740974362337,-0.02214573804045866,0.02930806967704801,-0.009567144727872222,-0.02288991169653539,0.04256313697292451,-0.08190168271952417,0.008274133732539695,-0.02299227162395361,0.0111923018567119,-0.009872522389769637,0.06866110814693088,0.04622954799009332,0.05498202029091768,-0.06672541846259043,-0.05130079655965012,0.1107659505844031,0.07912810279475517,0.02246390669165305,-0.06997067603392053,-0.02069109953229961,-0.05191987832821615,-0.01971016519416264|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

推荐阅读