apache-flink - Flink Shaded Hadoop S3 Filesystems 仍然需要 hdfs-default 和 hdfs-site 配置路径
问题描述
我正在尝试使用 Flink 1.6.0 将 S3 配置为我的状态后端。
flink-conf.yaml
state.backend: filesystem
state.checkpoints.dir: s3://***/flink-checkpoints
state.savepoints.dir: s3://***/flink-savepoints
s3.access-key: *******
s3.secret-key: *******
我已将 flink-s3-fs-hadoop-1.6.0.jar 移至 lib 目录。文档没有指定此特定方法对 hadoop 配置文件的任何需求。然而,我正面临这个错误,抱怨缺少 hadoop 配置路径。
2018-08-24 23:25:17,829 INFO org.apache.flink.streaming.runtime.tasks.StreamTask - State backend is set to heap memory (checkpoints to filesystem "s3://***/flink-checkpoints")
2018-08-24 23:25:17,831 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory - Creating Hadoop file system (backed by Hadoop s3a file system)
2018-08-24 23:25:17,831 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory - Loading Hadoop configuration for Hadoop s3a file system
2018-08-24 23:25:17,872 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util.HadoopUtils - Cannot find hdfs-default configuration-file path in Flink config.
2018-08-24 23:25:17,873 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util.HadoopUtils - Cannot find hdfs-site configuration-file path in Flink config.
2018-08-24 23:25:17,873 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util.HadoopUtils - Could not find Hadoop configuration via any of the supported methods (Flink configuration, environment variables).
2018-08-24 23:25:17,878 INFO org.apache.flink.runtime.taskmanager.Task - Source: Custom Source -> Map -> Sink: Print to Std. Out (1/1) (ee0eeb00ea0f01043d90f6b8d3c0cc2e) switched from RUNNING to FAILED.
javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2565)
at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2541)
at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2424)
at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.set(Configuration.java:1149)
at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.set(Configuration.java:1121)
at org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.HadoopConfigLoader.loadHadoopConfigFromFlink(HadoopConfigLoader.java:101)
at org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.HadoopConfigLoader.getOrLoadHadoopConfig(HadoopConfigLoader.java:80)
at org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory.create(AbstractFileSystemFactory.java:55)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:395)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:318)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
at org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.<init>(FsCheckpointStorage.java:61)
at org.apache.flink.runtime.state.filesystem.FsStateBackend.createCheckpointStorage(FsStateBackend.java:443)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:257)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
我在这里错过了什么吗?任何帮助表示赞赏。
解决方案
搞砸了我的依赖,这就是导致这个无关异常的原因。正在尝试需要 Hadoop 依赖项的 Bucketing 和 Rolling Sink 连接器。将它们添加到 maven 提供的范围内,并且无法从 IntelliJ IDEA 运行它们。所以将它们切换为编译并保持原样。他们打包了工件 jar 的一部分并导致了这个问题。
经验教训:永远不要在默认(编译)范围内添加 Hadoop 依赖项。IntelliJ 在运行配置中有一个选项来包含在提供范围内声明的依赖项。
推荐阅读
- javascript - onblur 事件仅在 Firefox 中的备用输入字段上触发
- yii2 - Yii 2 基本模板安装问题
- jenkins - 中止单击时的詹金斯输入步骤
- c# - 检查用户名是否存在于数据库中
- node.js - 我可以在没有输入文本的情况下在特定时间调用 Dialogflow 意图吗?
- java - 我无法将 android 应用程序连接到 MySQL docker 容器
- c# - ASP.NET Web API 代码使用哪种模式 - C#
- javascript - 如何在谷歌应用脚本中将html字符串转换为没有html标签的纯文本?
- python - 用 Pandas 填充缺失值
- javascript - 使用 codetrix-studio 电容器-google-auth 的谷歌社交登录在离子电容器中不起作用