首页 > 解决方案 > 从特定字符列表中过滤 RDD

问题描述

我想从密码字典“数据”中生成统计信息。结果:

data_pass_week.count()
data_pass_medium.count()
data_pass_hard.count()

我必须检查字典中每个单词的字符。例如,我想放入 data_pass_medium,所有包含特殊字符的单词,但它不起作用。我想我犯了很多错误。你能帮我吗?

scala> val data = sc.textFile("folder.txt")
scala> val specialChars = List('*', '@', '&', '=', '#', '?', '!', '%', '+', '-', '<', '>')
val data2 = data.filter(ligne => for (i <- 0 to specialChars.size) ligne.contains(specialChars(i)))

结果:

java.lang.IllegalArgumentException: Unsupported class file major version 55
  at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:166)
  at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:148)
  at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:136)
  at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:237)
  at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:49)
  at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:517)
  at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:500)
  at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
  at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
  at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
  at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:134)
  at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
  at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:500)
  at org.apache.xbean.asm6.ClassReader.readCode(ClassReader.java:2175)
  at org.apache.xbean.asm6.ClassReader.readMethod(ClassReader.java:1238)
  at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:631)
  at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:355)
  at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:307)
  at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:306)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:306)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:2326)
  at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:388)
  at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:387)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
  at org.apache.spark.rdd.RDD.filter(RDD.scala:387)
  at $anonfun$1.apply(<console>:27)
  at $anonfun$1.apply(<console>:27)
  at scala.collection.immutable.Range.foreach(Range.scala:160)
  ... 49 elided

标签: scalaapache-spark

解决方案


推荐阅读