python - 在 mapreduce 上运行作业会产生错误代码 1
问题描述
它可以在本地完美运行,但不能在 mapreduce 上运行。是因为我在 reducer.py 中使用了一些第三方库吗?
在 gcp Console shell 中,我使用这个命令来启动 mapreduce:
gcloud dataproc jobs submit hadoop --cluster cc-1 --region=us-central1 --jar file:///usr/lib/hadoop-mapreduce/hadoop-streaming.jar --files=mapper.py,reducer.py -- -mapper "mapper.py" -reducer "reducer.py" -input gs://2018weather/2018.csv -output gs://2018weather/output-streaming
这是错误信息:
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.9.2.jar] /tmp/streamjob3637931789919464651.jar tmpDir=null
19/11/22 00:58:35 INFO client.RMProxy: Connecting to ResourceManager at cc-1-m/10.128.0.21:8032
19/11/22 00:58:35 INFO client.AHSProxy: Connecting to Application History server at cc-1-m/10.128.0.21:10200
19/11/22 00:58:36 INFO client.RMProxy: Connecting to ResourceManager at cc-1-m/10.128.0.21:8032
19/11/22 00:58:36 INFO client.AHSProxy: Connecting to Application History server at cc-1-m/10.128.0.21:10200
19/11/22 00:58:38 INFO mapred.FileInputFormat: Total input files to process : 1
19/11/22 00:58:38 INFO mapreduce.JobSubmitter: number of splits:42
19/11/22 00:58:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1574381669902_0002
19/11/22 00:58:39 INFO impl.YarnClientImpl: Submitted application application_1574381669902_0002
19/11/22 00:58:39 INFO mapreduce.Job: The url to track the job: http://cc-1-m:8088/proxy/application_1574381669902_0002/
19/11/22 00:58:39 INFO mapreduce.Job: Running job: job_1574381669902_0002
19/11/22 00:58:49 INFO mapreduce.Job: Job job_1574381669902_0002 running in uber mode : false
19/11/22 00:58:49 INFO mapreduce.Job: map 0% reduce 0%
19/11/22 00:59:07 INFO mapreduce.Job: map 5% reduce 0%
19/11/22 00:59:13 INFO mapreduce.Job: map 12% reduce 0%
19/11/22 00:59:15 INFO mapreduce.Job: map 31% reduce 0%
19/11/22 00:59:16 INFO mapreduce.Job: map 33% reduce 0%
19/11/22 00:59:24 INFO mapreduce.Job: map 38% reduce 0%
19/11/22 00:59:35 INFO mapreduce.Job: map 45% reduce 0%
19/11/22 00:59:36 INFO mapreduce.Job: map 50% reduce 0%
19/11/22 00:59:37 INFO mapreduce.Job: map 60% reduce 0%
19/11/22 00:59:38 INFO mapreduce.Job: map 67% reduce 0%
19/11/22 00:59:40 INFO mapreduce.Job: map 71% reduce 0%
19/11/22 00:59:56 INFO mapreduce.Job: map 79% reduce 0%
19/11/22 00:59:58 INFO mapreduce.Job: map 83% reduce 0%
19/11/22 00:59:59 INFO mapreduce.Job: map 90% reduce 0%
19/11/22 01:00:00 INFO mapreduce.Job: map 100% reduce 0%
19/11/22 01:00:09 INFO mapreduce.Job: Task Id : attempt_1574381669902_0002_r_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
19/11/22 01:00:17 INFO mapreduce.Job: Task Id : attempt_1574381669902_0002_r_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
19/11/22 01:00:25 INFO mapreduce.Job: Task Id : attempt_1574381669902_0002_r_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
19/11/22 01:00:35 INFO mapreduce.Job: map 100% reduce 100%
19/11/22 01:00:35 INFO mapreduce.Job: Job job_1574381669902_0002 failed with state FAILED due to: Task failed task_1574381669902_0002_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1
19/11/22 01:00:35 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=8927219
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
GS: Number of bytes read=1171999949
GS: Number of bytes written=0
GS: Number of read operations=0
GS: Number of large read operations=0
GS: Number of write operations=0
HDFS: Number of bytes read=3234
HDFS: Number of bytes written=0
HDFS: Number of read operations=42
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed reduce tasks=4
Killed map tasks=2
Launched map tasks=42
Launched reduce tasks=4
Rack-local map tasks=42
Total time spent by all maps in occupied slots (ms)=3381160
Total time spent by all reduces in occupied slots (ms)=197616
Total time spent by all map tasks (ms)=845290
Total time spent by all reduce tasks (ms)=24702
Total vcore-milliseconds taken by all map tasks=845290
Total vcore-milliseconds taken by all reduce tasks=24702
Total megabyte-milliseconds taken by all map tasks=865576960
Total megabyte-milliseconds taken by all reduce tasks=50589696
Map-Reduce Framework
Map input records=33448213
Map output records=935
Map output bytes=28541
Map output materialized bytes=30663
Input split bytes=3234
Combine input records=0
Spilled Records=935
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=16391
CPU time spent (ms)=63120
Physical memory (bytes) snapshot=18240372736
Virtual memory (bytes) snapshot=107584892928
Total committed heap usage (bytes)=13746216960
File Input Format Counters
Bytes Read=1171999949
19/11/22 01:00:35 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
解决方案
推荐阅读
- vectorization - 用户警告:您的 stop_words 可能与您的预处理不一致
- javascript - 有没有办法停止前一个循环附加下一个循环?
- express - 路由执行后在中间件中使用更新的 req 变量
- ruby-on-rails - postgres_ext-serializer each_serializer Rails 5 添加自定义 JSON 元数据
- azure-devops - 无法在 azure devops 中使用来自 docker 文件任务的 dep/glide 包来构建来自其他存储库的映像。主机密钥验证失败错误
- datetime - 两个localDate之间的区别
- html5-canvas - 如何在移动设备上的 fabricjs 上启用和处理“点击”事件?
- python - 执行 PCA 后如何绘制每个变量的主向量?
- python - 如何匹配输入数据和df中的数据,for循环减
- jenkins - 在 Jenkins 管道中不起作用凭证