首页 > 解决方案 > 如何使用云形成在 Glue 爬虫中指定 Glue 表和模式

问题描述

我正在使用 CFT 创建 Glue Database、Glue Table with Schema 和 Glue Crawler,请在下面找到我的代码。在我的 Glue Crawler 中,我想在 Glue Crawler 中指定粘合表“ myTestTable”,schema以便在发生任何模式更新(添加或删除任何字段)时,我的爬虫会自动更新这个新的模式更改。

我如何使用 CFT 来实现这一点,如果有人可以帮助我解决这个问题,我将不胜感激。

  GlueDatabase:
    Type: AWS::Glue::Database
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseInput:
        Name: myTestGlueDB

  GlueTable:
    Type: AWS::Glue::Table
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseName: !Ref GlueDatabase
      TableInput:
        Parameters:
          classification: parquet
        Name: myTestTable
        Owner: owner
        Retention: 0
        StorageDescriptor:
          Columns:
          - Name: productName
            Type: string
          - Name: productId
            Type: string
          InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
          OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
          Compressed: false
          NumberOfBuckets: -1
          SerdeInfo:
            SerializationLibrary: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
            Parameters:
              serialization.format: '1'
          Location: s3://test-bucket
          BucketColumns: []
          SortColumns: []
          StoredAsSubDirectories: false
        PartitionKeys:
        - Name: id
          Type: string
        - Name: year
          Type: string
        - Name: month
          Type: string
        - Name: day
          Type: string
        TableType: EXTERNAL_TABLE

  GlueCrawler:
    Type: AWS::Glue::Crawler
    Properties:
      Name: myTestCrawler
      Role: GlueCrawlerRole
      DatabaseName: myTestGlueDB
      TablePrefix: myTable-
      Targets:
        S3Targets:
          - Path: s3://test-bucket
      SchemaChangePolicy:
        UpdateBehavior: UPDATE_IN_DATABASE
        DeleteBehavior: LOG 
      Configuration: "{\"Version\":1.0,\"Grouping\":{\"TableGroupingPolicy\":\"CombineCompatibleSchemas\"}}"

标签: amazon-cloudformationaws-glueaws-glue-data-catalog

解决方案


推荐阅读