java - Apache Spark - dataset presenting a csv to java.io.File
问题描述
Small question regarding Apache Spark, and how to get the dataset as File please.
I would like to upload some java.io.File to some destinations. The destinations are not databases, but rather some sort of DropBox, S3, and such.
The good thing, I have some utility packages that are already provided to me, and they are working fine, tested with non-Spark jobs.
public static void main(String[] args) {
File myCSVfile = new File("/path/to/my/file.csv");
SomeUtil.uploadfileToDropBox(myCSVfile);
SomeOtherUtil.uploadFileToS3(myCSVfile);
//this is working fine!
Above successfully runs fine, very happy.
Now I need to upload the file result of a Spark job using the same.
Therefore, I tried:
public static void main(String[] args) {
final Dataset<Row> dataSetRow = sparkSession.read().[...].load();
final Dataset<Row> dataSetRowTransformed = dataSetRow.map((MapFunction<Row, Row>) row -> doSomeComplexTransformation(row), getMyEncoder());
dataSetRowTransformed.repartition(1).write().csv("/path/to/where/to/save/the/csv");
And magic, I do see the final csv file generated by Spark in the folder, I can open it.
However, I am not able in the code to get it as File to upload it with previous mechanism.
Question: How to get the file that I generated (I see it, I can open it, it is correct) as File, so Spark can upload it using the utility classes mentioned above, all within one Spark job please?
Thank you
解决方案
推荐阅读
- regex - Notepad ++如何删除特定字符和符号组合之前的所有字符
- javascript - AngularJS 元素指令不在 HTML 上呈现
- mysql - 配置文件中的 ProxySQL SSL 后端配置
- javascript - Vue JS removeEventListener 不起作用。为什么?
- css - 如何设置适合网格的 HighCharts 的宽度
- javascript - 使用参数重复调用 jquery 函数导致超出最大调用堆栈
- javascript - 如何使用传单图表在 reactjs 中显示数组中的标记
- swift - SwiftUI 具有允许在查看如何访问它的标签上设置标签的功能
- laravel - Laravel Api -> 限制 Telescope 的生产
- android - 您试图通过调用 firebase.analytics() 来使用未安装在您的 android 项目上的 fire 基础模块