apache-spark - pyspark wordcount 按值排序
问题描述
我正在学习 pyspark,我正在尝试下面的代码。有人可以帮我理解什么问题吗?
>>> pairs=data.flatMap(lambda x:x.split(' ')).map(lambda x:(x,1)).reduceByKey(lambda a,b: a+ b)
>>> pairs.collect()
[(u'one', 1), (u'ball', 4), (u'apple', 4), (u'two', 4), (u'three', 1)]
pairs=data.flatMap(lambda x:x.split(' ')).map(lambda x:(x,1)).reduceByKey(lambda a,b: a+ b).map(lambda a,b: (b,a)).sortByKey()
我正在尝试根据值进行排序,上面的代码给了我错误
19/09/25 08:55:07 WARN TaskSetManager: Lost task 1.0 in stage 36.0 (TID 67, dbsld0107.uhc.com, executor 2): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/mapr/tmp/hadoop-mapr/nm-local-dir/usercache/avenkat9/appcache/application_1565204780647_2728325/container_e07_1565204780647_2728325_01_000003/pyspark.zip/pyspark/worker.py", line 177, in main
process()
File "/opt/mapr/tmp/hadoop-mapr/nm-local-dir/usercache/avenkat9/appcache/application_1565204780647_2728325/container_e07_1565204780647_2728325_01_000003/pyspark.zip/pyspark/worker.py", line 172, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/opt/mapr/spark/spark-2.2.1/python/pyspark/rdd.py", line 2423, in pipeline_func
File "/opt/mapr/spark/spark-2.2.1/python/pyspark/rdd.py", line 2423, in pipeline_func
File "/opt/mapr/spark/spark-2.2.1/python/pyspark/rdd.py", line 2423, in pipeline_func
File "/opt/mapr/spark/spark-2.2.1/python/pyspark/rdd.py", line 346, in func
File "/opt/mapr/spark/spark-2.2.1/python/pyspark/rdd.py", line 1041, in <lambda>
File "/opt/mapr/spark/spark-2.2.1/python/pyspark/rdd.py", line 1041, in <genexpr>
TypeError: <lambda>() takes exactly 2 arguments (1 given)
解决方案
我认为您正在尝试按 value 排序。尝试这个
data.flatMap(lambda x:x.split(' ')).map(lambda x:(x,1)).reduceByKey(lambda a,b: a+ b).sortBy(lambda a:a[1]).collect()
如果您希望修复代码,请尝试以下操作
data.flatMap(lambda x:x.split(' ')).map(lambda x:(x,1)).reduceByKey(lambda a,b: a+ b).map(lambda a:(a[1],a[0])).sortByKey().collect()
推荐阅读
- python - 如何使用 wsdl 文件创建异步 zeep 客户端?
- python - selenium.common.exceptions.NoSuchElementException
- c# - Selenium C# - StaleElementReferenceException - 即使在刷新页面之后
- javascript - React unmount 真的会移除元素吗?
- java - 我想通过单击按钮将面板插入到 jframe
- c# - 使用数字签名调整 PDF 文档的大小
- javascript - 如何获取 React.js 中选中复选框的数量?
- rust - 如何创建实现 Fn 并可以克隆到不同对象的特征对象?
- python - Python中的逆序
- javascript - 从数组创建嵌套对象