首页 > 解决方案 > Code/Script to recursively search S3 bucket to unzip (GZIP) and move specific file from ZIP to new location

问题描述

Hi I have an S3 bucket containing gzip files. Within each zip there is a single TSV file I want to move to a new folder or bucket (dont really mind which). The S3 bucket will be added to with new zip file each hour so this script needs to be something I can schedule or trigger. Happy to use CLI, Lambda or any other method! Pointers, links, help very much appreciated.

标签: fileamazon-s3aws-lambdazipgzip

解决方案


好的,所以这样做的软糖方法是使用本地处理:

  1. 连接到 S3

    AmazonS3Config 配置 = 新 AmazonS3Config(); config.ServiceURL = " https://s3-eu-west-1.amazonaws.com "; AmazonS3Client s3Client = new AmazonS3Client(S3AccessKey, S3SecretKey, config);

  2. 复制您要处理的文件

    S3DirectoryInfo dir = new S3DirectoryInfo(s3Client, bucketname, "jpbodenukproduction"); dir.CopyToLocal(@"C:\S3Loc​​al");

  3. 解压 Gzip(包含 tar,包含多个文件):

    字符串目录路径 = @"C:\S3Loc​​al"; DirectoryInfo directoryselected = new DirectoryInfo(directorypath); foreach (FileInfo FileToDecompress in directoryselected.GetFiles("*.gz")) { Decompress(FileToDecompress);
    }

    公共静态无效解压缩(FileInfo fileToDecompress){使用(FileStream originalFileStream = fileToDecompress.OpenRead()){ string currentFileName = fileToDecompress.FullName; 字符串 newFileName = currentFileName.Remove(currentFileName.Length - fileToDecompress.Extension.Length);

            using (FileStream decompressedFileStream = File.Create(newFileName))
            {
                using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
                {
                    decompressionStream.CopyTo(decompressedFileStream);
                    Console.WriteLine("Decompressed: {0}", fileToDecompress.Name);
                }
            }
        }
    }
    
  4. 现在处理 tar 文件(使用 ICSharpCode.SharpZipLib):

    foreach (FileInfo TarFile in directoryselected.GetFiles("*.tar")) { var stream = File.OpenRead(TarFile.FullName); var tarArchive = ICSharpCode.SharpZipLib.Tar.TarArchive.CreateInputTarArchive(stream); tb1.Text = "正在处理:" + TarFile.Name; 尝试 { tarArchive.ExtractContents(@"C:\S3Loc​​al\Trash\"); } catch (Exception ziperror) { tb1.Text = "TarUnzip 中的延迟错误:" + ziperror; 线程.睡眠(10000);} 最后 { tarArchive.Close(); 流。关闭();

            }
    

最后对解压缩的文件做你想做的事,我只是提取了我需要的单个文件,重新压缩并移回 S3。

我的计划是接下来转换为 Lambda 并按计划运行。


推荐阅读