首页 > 解决方案 > 如何通过覆盖 Cloud Dataproc 上的 log4j.properties 文件来限制 pyspark 中的错误?

问题描述

我在GCP上,并且一直在阅读有关控制pysparklog4j.properties日志记录的不同帖子,但在找到有关覆盖文件的帖子之前,我无法进行任何工作。这是一种工作,除了我收到有关log4j 的多个类路径的错误,如下所示:

[2019-09-23 20:38:48.495]Container exited with a non-zero exit code 134. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/bin/bash: line 1: 23850 Aborted                 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx38168m '-Dflogger.backend_factory=com.google.cloud.hadoop.repackaged.gcs.com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance' -Djava.io.tmpdir=/hadoop/yarn/nm-local-dir/usercache/davidsc/appcache/application_1569259407961_0001/container_1569259407961_0001_01_000400/tmp '-Dspark.driver.port=33949' '-Dspark.rpc.message.maxSize=512' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/userlogs/application_1569259407961_0001/container_1569259407961_0001_01_000400 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@davidsc-prod-m.c.unity-ads-ds-prd.internal:33949 --executor-id 368 --hostname davidsc-prod-w-0.c.unity-ads-ds-prd.internal --cores 8 --app-id application_1569259407961_0001 --user-class-path file:/hadoop/yarn/nm-local-dir/usercache/davidsc/appcache/application_1569259407961_0001/container_1569259407961_0001_01_000400/__app__.jar --user-class-path file:/hadoop/yarn/nm-local-dir/usercache/davidsc/appcache/application_1569259407961_0001/container_1569259407961_0001_01_000400/tensorflow-hadoop-1.6.0.jar > /var/log/hadoop-yarn/userlogs/application_1569259407961_0001/container_1569259407961_0001_01_000400/stdout 2> /var/log/hadoop-yarn/userlogs/application_1569259407961_0001/container_1569259407961_0001_01_000400/stderr
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

我点击了倒数第二个日志条目的链接,但没有发现它很有帮助。

我正在尝试将日志记录减少为错误。在我找到的帖子之后,我正在上传一个log4j.properties包含以下内容的文件:

# Set everything to be logged to the console
log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=ERROR
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ERROR
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ERROR

# Reduce verbosity for other spammy core classes.
log4j.logger.org.apache.spark=ERROR
log4j.logger.org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter=ERROR
log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=ERROR
log4j.logger.org.spark-project.jetty.server.handler.ContextHandler=ERROR

# Spark 2.0 specific Spam
log4j.logger.org.spark_project.jetty.server.handler.ContextHandler=ERROR

# from https://stackoverflow.com/questions/40608412/how-can-set-the-default-spark-logging-level
log4j.logger.org.apache.spark.api.python.PythonGatewayServer=ERROR

我添加了关于 Python API 的最后一件事,也许这会引入一些冲突的记录器?添加它似乎是一件好事。

无论如何,我上传的只是对我在这里找到的内容的修改:

/usr/lib/spark/conf/log4j.properties

这是一个软链接:

./etc/spark/conf.dist/log4j.properties

我在我的 Cloud Dataproc 集群初始化中进行复制。如果我 ssh 到一个工作人员并查看log4j.properties我找到的所有文件:

 find . 2>/dev/null | grep log4j.prop 2>/dev/null
./etc/pig/conf.dist/log4j.properties.template
./etc/pig/conf.dist/test-log4j.properties
./etc/hadoop/conf.empty/log4j.properties
./etc/zookeeper/conf.dist/rest/log4j.properties
./etc/zookeeper/conf.dist/log4j.properties
./etc/spark/conf.dist/log4j.properties.template
./etc/spark/conf.dist/log4j.properties
./ump-ltv-spark-log4j.properties.error

我可以看到这./etc/spark/conf.dist/log4j.properties是我的新错误记录器。

我希望我不必修改该./etc/hadoop/conf.empty/log4j.properties文件,其中包含更多内容。

在另一个链接之后,我可以查看我的 spark 启动命令:

~$ SPARK_PRINT_LAUNCH_COMMAND=1 spark-shell
Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp /usr/lib/spark/conf/:/usr/lib/spark/jars/*:/etc/hadoop/conf/:/etc/hive/conf/:/usr/share/java/mysql.jar:/usr/local/share/google/dataproc/lib/* -Dscala.usejavacp=true -Dscala.usejavacp=true -Xmx106496m -Dflogger.backend_factory=com.google.cloud.hadoop.repackaged.gcs.com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name Spark shell spark-shell

这让我觉得 GCP 已经放入了一些后端记录器?也许这就是这些消息的来源。

标签: google-cloud-platformpysparklog4jgoogle-cloud-dataproc

解决方案


推荐阅读