spring - 如何将 Hadoop 作为 Spring 应用程序测试套件的一部分运行?
问题描述
我想设置一个简单的“Hello, World!” 了解如何使用基本的 Hadoop 功能,例如使用 HDFS 存储/读取文件。
是否有可能:
- 将嵌入式 Hadoop 作为我的应用程序的一部分运行?
- 在我的测试中运行嵌入式 Hadoop?
我想为此设置一个最小的 Spring Boot。为此所需的最小 Spring 配置是什么?有足够的示例说明如何使用 HDFS 读取/写入文件,但我仍然无法计算出我需要的 Spring 配置。很难弄清楚一个人真正需要什么库,因为 Spring Hadoop 示例似乎已经过时了。任何帮助将非常感激。
解决方案
You can easily use Hadoop Filesystem API 1 2 with any local POSIX filesystem without Hadoop cluster. The Hadoop API is very generic and provides many concrete implementations for different storage systems such as HDFS, S3, Azure Data Lake Store, etc.
You can embed HDFS within your application (i.e run Namenode and Datanodes withing single JVM process), but this is only reasonable for tests.
There is Hadoop Minicluster which you can start from command-line (CLI MiniCluster
) 3 or via Java API in your unit-tests with MiniDFSCluster
class 4 found in hadoop-minicluster
package.
You can start Mini Cluster with Spring by making a separate configuration for it and using it as @ContextConfiguration
with your unit tests.
@org.springframework.context.annotation.Configuration
public class MiniClusterConfiguration {
@Bean(name = "temp-folder", initMethod = "create", destroyMethod = "delete")
public TemporaryFolder temporaryFolder() {
return new TemporaryFolder();
}
@Bean
public Configuration configuration(final TemporaryFolder temporaryFolder) {
final Configuration conf = new Configuration();
conf.set(
MiniDFSCluster.HDFS_MINIDFS_BASEDIR,
temporaryFolder.getRoot().getAbsolutePath()
);
return conf;
}
@Bean(destroyMethod = "shutdown")
public MiniDFSCluster cluster(final Configuration conf) throws IOException {
final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf)
.clusterId(String.valueOf(this.hashCode()))
.build();
cluster.waitClusterUp();
return cluster;
}
@Bean
public FileSystem fileSystem(final MiniDFSCluster cluster) throws IOException {
return cluster.getFileSystem();
}
@Bean
@Primary
@Scope(BeanDefinition.SCOPE_PROTOTYPE)
public Path temp(final FileSystem fs) throws IOException {
final Path path = new Path("/tmp", UUID.randomUUID().toString());
fs.mkdirs(path);
return path;
}
}
You will inject FileSystem
and a temporary Path
into your tests, and as I've mentioned above, there's no difference from API stand point in either it's a real cluster, mini-cluster, or local filesystem. Note that there is a startup cost of this, so you likely want to annotated your tests with @DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD)
in order to prevent cluster restart for each test.
In you want this code to run on Windows you will need a compatibility layer called wintuils
5 (which makes possible to access Windows filesystem in a POSIX way).
You have to point environment variable HADOOP_HOME
to it, and depending on version load its shared library
String HADOOP_HOME = System.getenv("HADOOP_HOME");
System.setProperty("hadoop.home.dir", HADOOP_HOME);
System.setProperty("hadoop.tmp.dir", System.getProperty("java.io.tmpdir"));
final String lib = String.format("%s/lib/hadoop.dll", HADOOP_HOME);
System.load(lib);
推荐阅读
- android - 为了使我的 AppStore 应用程序与 MS inTune 兼容,是否需要做任何额外的事情?
- c++ - 模块中的 C++“使用”
- sql - 在 SQLite 中使用相同的表组合 ST_CONTAINS 和 ST_BUFFER
- javascript - 表单输入时如何运行JS函数
- flutter - 如何在 Flutter 中滚动时将容器固定在顶部
- java - 提高查询响应时间 JPA Spring Boot(使用先前选择进行批量插入)
- c# - 从另一个数据表和 Linq 中删除数据表的行
- javascript - 如何构建 Javascript 递归树
- python - 在 matshow 中为 Numpy 数组矩阵定义特定颜色
- java - 如何编写静态 void 方法的单元测试