scala - 从特定字符列表中过滤 RDD
问题描述
我想从密码字典“数据”中生成统计信息。结果:
data_pass_week.count()
data_pass_medium.count()
data_pass_hard.count()
我必须检查字典中每个单词的字符。例如,我想放入 data_pass_medium,所有包含特殊字符的单词,但它不起作用。我想我犯了很多错误。你能帮我吗?
scala> val data = sc.textFile("folder.txt")
scala> val specialChars = List('*', '@', '&', '=', '#', '?', '!', '%', '+', '-', '<', '>')
val data2 = data.filter(ligne => for (i <- 0 to specialChars.size) ligne.contains(specialChars(i)))
结果:
java.lang.IllegalArgumentException: Unsupported class file major version 55
at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:166)
at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:148)
at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:136)
at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:237)
at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:49)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:517)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:500)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:134)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:500)
at org.apache.xbean.asm6.ClassReader.readCode(ClassReader.java:2175)
at org.apache.xbean.asm6.ClassReader.readMethod(ClassReader.java:1238)
at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:631)
at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:355)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:307)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:306)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:306)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2326)
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:388)
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:387)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.filter(RDD.scala:387)
at $anonfun$1.apply(<console>:27)
at $anonfun$1.apply(<console>:27)
at scala.collection.immutable.Range.foreach(Range.scala:160)
... 49 elided
解决方案
推荐阅读
- r - 在函数中使用效果包
- google-cloud-platform - GCP 数据存储模拟器导入已损坏引用
- ruby-on-rails - 我如何在铁轨中遵循干法
- c++ - 将 boost::filesystem::path 传递给 boost::process::child 在 Windows 上会导致异常
- c# - Linq 查询执行
- visual-studio-code - VS Code 源代码控制窗格为空白
- sql - 这是在 oracle 中更新多个表数据的过程,因为编译器显示错误:ORA-00922:缺少或无效选项错误
- r - rvest 和选择器小工具的网络抓取问题
- gremlin - Gremlin - 同时限制遍历迭代和搜索边缘属性
- java - 如何在 Java 中刷新 Window 类的图形