首页 > 解决方案 > 备份 ADLS gen2

问题描述

我有数据湖和数据仓库,其中包含大约 5-10 TB 的 Azure ADLS gen2、CSV 和 Delta 格式的数据。ADLS 的 Performance/Tier=Standard/Hot,replication=GRS,type=StorageV2。

备份我的 ADLS gen2 数据的最佳方法是什么?

注意事项:

标签: azureazure-data-factorydatabricksazure-data-lake-gen2

解决方案


对于原始数据/文件夹备份,我使用 Microsoft 数据移动服务将 blob 目录从 ADLS Gen2 复制到存储帐户。

为此,创建一个每日时间触发函数来执行 blob 目录的增量副本。你可以配置这样的东西。

使用每个星期一(日期)的完整备份创建一个新文件夹,并将增量更改保留到星期日。一个月后删除旧的备份文件夹。

这是我的实现。

  public async Task<string> CopyBlobDirectoryAsync(BlobConfiguration sourceBlobConfiguration, BlobConfiguration destBlobConfiguration, string blobDirectoryName)
    {
        CloudBlobDirectory sourceBlobDir = await GetCloudBlobDirectoryAsync(sourceBlobConfiguration.ConnectionString, sourceBlobConfiguration.ContainerName, blobDirectoryName);

        CloudBlobDirectory destBlobDir = await GetCloudBlobDirectoryAsync(destBlobConfiguration.ConnectionString, destBlobConfiguration.ContainerName, destBlobConfiguration.BlobDirectoryPath + "/" + blobDirectoryName);

        // You can also replace the source directory with a CloudFileDirectory instance to copy data from Azure File Storage. If so:
        //   1. If recursive is set to true, SearchPattern is not supported. Data movement library simply transfer all azure files
        //      under the source CloudFileDirectory and its sub-directories.
        CopyDirectoryOptions options = new CopyDirectoryOptions()
        {
            Recursive = true
        };

        DirectoryTransferContext context = new DirectoryTransferContext();
        context.FileTransferred += FileTransferredCallback;
        context.FileFailed += FileFailedCallback;
        context.FileSkipped += FileSkippedCallback;

        // Create CancellationTokenSource used to cancel the transfer
        CancellationTokenSource cancellationSource = new CancellationTokenSource();

        TransferStatus trasferStatus = await TransferManager.CopyDirectoryAsync(sourceBlobDir, destBlobDir, CopyMethod.ServiceSideAsyncCopy, options, context, cancellationSource.Token);

        return TransferStatusToString(blobDirectoryName, trasferStatus);
    }

推荐阅读