amazon-web-services - 通过 cloudformation 更新时 ECS 任务卡在 PENDING 中
问题描述
我在构建良好但在 cloudformation 中更新任务时部署 ECS 集群时遇到问题。ECSSerivce 启动了 6 个PENDING
新任务。但是 6 个旧任务仍然存在RUNNING
,有时它会开始耗尽旧任务并且部署会工作,但其他时候所有旧任务都不会耗尽,ECSService 只是卡在了UPDATE_IN_PROGRESS
。我该如何麻烦这样的事情?
下面是我的堆栈模板。
AWSTemplateFormatVersion: '2010-09-09'
Resources:
ElasticLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
SecurityGroups:
- !Ref 'ELBSecurityGroup'
Subnets:
- !Ref 'InstanceSubnet'
- !Ref 'SecondarySubnet'
Scheme: internet-facing
RedirectLoadBalancerListener:
Type: AWS::ElasticLoadBalancingV2::Listener
DependsOn: ECSServiceRole
Properties:
DefaultActions:
- Type: forward
TargetGroupArn: !Ref 'ECSTG'
LoadBalancerArn: !Ref 'ElasticLoadBalancer'
Port: '80'
Protocol: HTTP
RedirectLoadBalancerListenerRule:
Type: AWS::ElasticLoadBalancingV2::ListenerRule
DependsOn: RedirectLoadBalancerListener
Properties:
Actions:
- Type: forward
TargetGroupArn: !Ref 'ECSTG'
Conditions:
- Field: path-pattern
Values:
- /
ListenerArn: !Ref 'RedirectLoadBalancerListener'
Priority: '1'
LoadBalancerListener:
Type: AWS::ElasticLoadBalancingV2::Listener
DependsOn: ECSServiceRole
Properties:
Certificates:
- CertificateArn: !Ref 'SSLCertificateId'
DefaultActions:
- Type: forward
TargetGroupArn: !Ref 'ECSTG'
LoadBalancerArn: !Ref 'ElasticLoadBalancer'
Port: '443'
Protocol: HTTPS
LoadBalancerListenerRule:
Type: AWS::ElasticLoadBalancingV2::ListenerRule
DependsOn: LoadBalancerListener
Properties:
Actions:
- Type: forward
TargetGroupArn: !Ref 'ECSTG'
Conditions:
- Field: path-pattern
Values:
- /
ListenerArn: !Ref 'LoadBalancerListener'
Priority: '1'
ECSTG:
DependsOn: ElasticLoadBalancer
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
HealthCheckIntervalSeconds: 6
HealthCheckPath: /api/ping
HealthCheckProtocol: HTTP
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 2
Port: 80
Protocol: HTTP
UnhealthyThresholdCount: 5
VpcId: !Ref 'VPCId'
TargetGroupAttributes:
- Key: deregistration_delay.timeout_seconds
Value: '20'
AppSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: AppSecurityGroup
SecurityGroupIngress:
- IpProtocol: '-1'
FromPort: '-1'
ToPort: '-1'
SourceSecurityGroupId: !Ref 'ELBSecurityGroup'
VpcId: !Ref 'VPCId'
Route53Entry:
Type: AWS::Route53::RecordSetGroup
Properties:
HostedZoneName: !Join ['', [!Ref 'Route53HostedZone', .]]
Comment: Zone apex alias targeted to myELB LoadBalancer.
RecordSets:
- Name: !Join [., [!Ref 'ApplicationHost', !Ref 'Route53HostedZone']]
Type: A
AliasTarget:
HostedZoneId: !GetAtt [ElasticLoadBalancer, CanonicalHostedZoneID]
DNSName: !GetAtt [ElasticLoadBalancer, DNSName]
ELBSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: ELBSecurityGroup
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: '443'
ToPort: '443'
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: '80'
ToPort: '80'
CidrIp: 0.0.0.0/0
VpcId: !Ref 'VPCId'
CloudWatchAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
ActionsEnabled: true
AlarmActions:
- arn:aws:sns:us-east-1:6xxxxxxx:instance-alarm
ComparisonOperator: LessThanOrEqualToThreshold
Dimensions:
- Name: LoadBalancer
Value: !GetAtt [ElasticLoadBalancer, LoadBalancerFullName]
- Name: TargetGroup
Value: !GetAtt [ECSTG, TargetGroupFullName]
EvaluationPeriods: 5
MetricName: HealthyHostCount
Namespace: AWS/ApplicationELB
Period: 60
Statistic: Maximum
Threshold: 0
LowOnCreditAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
ActionsEnabled: true
AlarmActions:
- arn:aws:sns:us-east-1:6xxxxxx:instance-alarm
ComparisonOperator: LessThanThreshold
Dimensions:
- Name: AutoScalingGroupName
Value: !Ref 'AutoScalingGroup'
EvaluationPeriods: 1
MetricName: CPUCreditBalance
Namespace: AWS/EC2
Period: 300
Statistic: Average
Threshold: 15
Database:
Type: AWS::RDS::DBInstance
Properties:
AllocatedStorage: '5'
DBInstanceClass: db.t2.micro
Engine: postgres
BackupRetentionPeriod: 35
EngineVersion: 9.5.2
DBName: !If [RestoreDB, '', ekdb]
MasterUsername: !Ref 'DBUser'
MasterUserPassword: !Ref 'DBPassword'
DBSecurityGroups:
- !Ref 'DatabaseSecurityGroup'
DBSubnetGroupName: !Ref 'DatabaseSubnetGroup'
DBSnapshotIdentifier: !Ref 'DBSnapshot'
DeletionPolicy: Snapshot
DatabaseSecurityGroup:
Type: AWS::RDS::DBSecurityGroup
Properties:
GroupDescription: DatabaseSecurityGroup
DBSecurityGroupIngress:
- EC2SecurityGroupId: !Ref 'AppSecurityGroup'
EC2VpcId: !Ref 'VPCId'
Redis:
Type: AWS::ElastiCache::CacheCluster
Properties:
CacheNodeType: cache.t2.micro
Engine: redis
EngineVersion: 2.8.24
NumCacheNodes: 1
VpcSecurityGroupIds:
- !Ref 'RedisSecurityGroup'
CacheSubnetGroupName: !Ref 'RedisSubnetGroup'
RedisSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: RedisSecurityGroup
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: '6379'
ToPort: '6379'
SourceSecurityGroupId: !Ref 'AppSecurityGroup'
VpcId: !Ref 'VPCId'
FrontendUser:
Type: AWS::IAM::User
Properties:
Groups:
- SynapseAppUsers
BackendUser:
Type: AWS::IAM::User
Properties:
Groups:
- SynapseAppUsers
FrontendUserAccessKey:
Type: AWS::IAM::AccessKey
Properties:
UserName: !Ref 'FrontendUser'
BackendUserAccessKey:
Type: AWS::IAM::AccessKey
Properties:
UserName: !Ref 'BackendUser'
S3BucketPolicy:
Type: AWS::S3::BucketPolicy
Properties:
Bucket: !Ref 'S3Bucket'
PolicyDocument:
Statement:
- Action: s3:GetObject
Effect: Allow
Resource: !Sub 'arn:aws:s3:::${S3Bucket}/*'
Principal:
AWS:
- !GetAtt 'FrontendUser.Arn'
- !GetAtt 'BackendUser.Arn'
- Action: s3:PutObject
Effect: Allow
Resource: !Sub 'arn:aws:s3:::${S3Bucket}/*'
Principal:
AWS:
- !GetAtt 'BackendUser.Arn'
- Action: s3:PutObjectAcl
Effect: Allow
Resource: !Sub 'arn:aws:s3:::${S3Bucket}/*'
Principal:
AWS:
- !GetAtt 'BackendUser.Arn'
- Action:
- s3:PutObjectAcl
- s3:PutObject
- s3:GetObject
- s3:DeleteObject
Effect: Allow
Resource: !Sub 'arn:aws:s3:::${S3Bucket}/*'
Principal:
AWS:
- arn:aws:iam::6xxxxxxx:user/filestack-v3-policy
S3Bucket:
Type: AWS::S3::Bucket
Properties:
AccessControl: AuthenticatedRead
CorsConfiguration:
CorsRules:
- AllowedHeaders:
- '*'
AllowedMethods:
- GET
- PUT
- POST
AllowedOrigins:
- '*'
ExposedHeaders:
- ETag
MaxAge: 3000
DeletionPolicy: Retain
AppIamRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Action:
- sts:AssumeRole
Path: /
Policies:
- PolicyName: app-iam-role
PolicyDocument:
Statement:
- Effect: Allow
Action:
- ecs:*
- ecr:*
- sns:*
- logs:*
Resource: '*'
- Effect: Allow
Action:
- s3:PutObject
- s3:GetObject
- s3:PutObjectAcl
- s3:DeleteObject
Resource: !GetAtt [S3Bucket, Arn]
AppInstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: /
Roles:
- !Ref 'AppIamRole'
LaunchConfig:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
AssociatePublicIpAddress: true
ImageId: !FindInMap [AWSRegionToAMI, !Ref 'AWS::Region', AMIID]
InstanceType: !If [IsExclusive, t2.medium, m4.large]
IamInstanceProfile: !Ref 'AppInstanceProfile'
SecurityGroups:
- !Ref 'AppSecurityGroup'
UserData: !Base64
Fn::Join:
- ''
- - '#!/bin/bash -xe
'
- echo ECS_CLUSTER=
- !Ref 'ECSCluster'
- ' >> /etc/ecs/ecs.config
'
- 'yum install -y aws-cfn-bootstrap
'
- '/opt/aws/bin/cfn-signal -e $? '
- ' --stack '
- !Ref 'AWS::StackName'
- ' --resource AutoScalingGroup '
- ' --region '
- !Ref 'AWS::Region'
AutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
LaunchConfigurationName: !Ref 'LaunchConfig'
MinSize: 1
MaxSize: 2
DesiredCapacity: !If [IsExclusive, 1, 2]
VPCZoneIdentifier:
- !Ref 'InstanceSubnet'
HealthCheckGracePeriod: 600
HealthCheckType: ELB
CreationPolicy:
ResourceSignal:
Timeout: PT15M
UpdatePolicy:
AutoScalingReplacingUpdate:
WillReplace: 'true'
DatabaseSubnetGroup:
Type: AWS::RDS::DBSubnetGroup
Properties:
DBSubnetGroupDescription: Subnet Group for database
SubnetIds:
- !Ref 'SecondarySubnet'
- !Ref 'InstanceSubnet'
RedisSubnetGroup:
Type: AWS::ElastiCache::SubnetGroup
Properties:
Description: Subnet Group for Redis
SubnetIds:
- !Ref 'SecondarySubnet'
- !Ref 'InstanceSubnet'
ECSCluster:
Type: AWS::ECS::Cluster
ECSService:
DependsOn:
- RedirectLoadBalancerListener
- LoadBalancerListener
- AutoScalingGroup
Type: AWS::ECS::Service
Properties:
Cluster: !Ref 'ECSCluster'
DesiredCount: !If [IsExclusive, 2, 6]
Role: !Ref 'ECSServiceRole'
TaskDefinition: !Ref 'TaskDefinition'
LoadBalancers:
- ContainerName: nginx
ContainerPort: '80'
TargetGroupArn: !Ref 'ECSTG'
ECSServiceRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service:
- ecs.amazonaws.com
Action:
- sts:AssumeRole
Path: /
Policies:
- PolicyName: ecs-service
PolicyDocument:
Statement:
- Effect: Allow
Action:
- elasticloadbalancing:DeregisterInstancesFromLoadBalancer
- elasticloadbalancing:DeregisterTargets
- elasticloadbalancing:Describe*
- elasticloadbalancing:RegisterInstancesWithLoadBalancer
- elasticloadbalancing:RegisterTargets
- ec2:Describe*
- ec2:AuthorizeSecurityGroupIngress
Resource: '*'
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
ContainerDefinitions:
- Name: frontend
Memory: '256'
MemoryReservation: '32'
Image: !Sub '6xxxxxxx0.dkr.ecr.us-east-1.amazonaws.com/frontend:${ImageTag}'
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref 'ECSLogGroup'
awslogs-region: !Ref 'AWS::Region'
awslogs-stream-prefix: '[frontend]'
- Name: backend
Memory: '1024'
MemoryReservation: '256'
Links:
- xray-daemon
Environment:
- Name: NODE_ENV
Value: prod
- Name: AWS_XRAY_DAEMON_ADDRESS
Value: "xray-daemon:2000"
- Name: APPLICATION_URL
Value: !Sub 'https://${ApplicationHost}.${Route53HostedZone}'
- Name: ACCOUNTS_TOKEN
Value: !Ref AccountsToken
- Name: ACCOUNTS_URL
Value: !Ref 'AccountsUrl'
- Name: HEAP_APPLICATION_ID
Value: '3901275559'
- Name: HUBSPOT_API_KEY
Value: !Ref 'HubspotApiKey'
- Name: USER_POOL
Value: !Ref 'UserPool'
- Name: POOL_CLIENTS
Value: !Ref 'PoolClients'
- Name: JWKS
Value: !Ref 'JWKS'
- Name: DATABASE_URL
Value: !Sub ['postgresql://${DBUser}:${DBPassword}@${Address}:${Port}/ekdb',
{Address: !GetAtt [Database, Endpoint.Address], Port: !GetAtt [Database,
Endpoint.Port]}]
- Name: REDIS_URL
Value: !Sub ['redis://${Address}:${Port}/', {Address: !GetAtt [Redis, RedisEndpoint.Address],
Port: !GetAtt [Redis, RedisEndpoint.Port]}]
- Name: S3_FRONTEND_USER_ACCESS_KEY_ID
Value: !Ref 'FrontendUserAccessKey'
- Name: S3_FRONTEND_USER_SECRET
Value: !GetAtt [FrontendUserAccessKey, SecretAccessKey]
- Name: S3_BACKEND_USER_ACCESS_KEY_ID
Value: !Ref 'BackendUserAccessKey'
- Name: S3_BACKEND_USER_SECRET
Value: !GetAtt [BackendUserAccessKey, SecretAccessKey]
- Name: S3_BUCKET_NAME
Value: !Ref 'S3Bucket'
- Name: UPLOAD_STRATEGY
Value: S3
- Name: ACCOUNT_ID
Value: !Ref 'AccountId'
- Name: CHECK_ACCOUNT_ID
Value: !Ref 'CheckAccountId'
- Name: SNS_TOPIC_ARN
Value: !Ref 'SNSTopicArn'
Image: !Sub '6xxxxxx.dkr.ecr.us-east-1.amazonaws.com/backend:${ImageTag}'
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref 'ECSLogGroup'
awslogs-region: !Ref 'AWS::Region'
awslogs-stream-prefix: '[backend]'
- Name: nginx
Memory: '256'
MemoryReservation: '32'
Links:
- frontend
- backend
- pdf_viewer
- preview
Image: !Sub '67xxxxxx.dkr.ecr.us-east-1.amazonaws.com/nginx:${ImageTag}'
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref 'ECSLogGroup'
awslogs-region: !Ref 'AWS::Region'
awslogs-stream-prefix: '[nginx]'
PortMappings:
- ContainerPort: 80
- Name: pdf_viewer
Memory: '256'
MemoryReservation: '32'
Image: !Sub '6xxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/pdf_viewer:${ImageTag}'
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref 'ECSLogGroup'
awslogs-region: !Ref 'AWS::Region'
awslogs-stream-prefix: '[pdf_viewer]'
- Name: preview
Memory: '256'
MemoryReservation: '32'
Image: !Sub '6xxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/preview:${ImageTag}'
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref 'ECSLogGroup'
awslogs-region: !Ref 'AWS::Region'
awslogs-stream-prefix: '[preview]'
- Name: xray-daemon
Memory: '256'
MemoryReservation: '32'
Image: 'amazon/aws-xray-daemon'
PortMappings:
- ContainerPort: 2000
HostPort: 0
Protocol: "udp"
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref 'ECSLogGroup'
awslogs-region: !Ref 'AWS::Region'
awslogs-stream-prefix: '[xray-daemon]'
ECSLogGroup:
Type: AWS::Logs::LogGroup
Parameters:
CheckAccountId:
Type: String
Description: Should user's account id be checked while logging in to the instance?
Default: 'yes'
Route53HostedZone:
Type: String
SSLCertificateId:
Type: String
Description: Pass SSL id from AWS Certificate Manager to pass to ELB
ApplicationHost:
Type: String
Description: 'Host to be applied as follows: {host}.{Route53HostedZone}'
DBUser:
Type: String
Description: Username that the database should be accessible with
DBPassword:
Type: String
Description: Password that the database user should have
HtpasswdEntry:
Type: String
Description: This is the file that should be htpasswd entry file
DBSnapshot:
Type: String
Description: Database Snapshot ID if you want to restore DB from snapshot
Default: ''
VPCId:
Type: String
Description: VPC Id to assosiate instance to. Pass this if you want to hide the
instances behind pre-existing VPC
Default: vpc-355a6b51
InstanceSubnet:
Type: String
Description: Subnet on which the instance should be set up. Required if VPCId
is set
Default: subnet-beb826c8
SecondarySubnet:
Type: String
Description: Subnet on which the RDS and ElastiCache group will be set up as well.
Required if VPCId is set
Default: subnet-04e39239
AccountId:
Type: String
Description: AccountId. used to filter out users from Auth0
AccountsUrl:
Type: String
Description: Accounts url eg. https://app.getsynapse.com/
SNSTopicArn:
Type: String
Description: ARN of SNS Topic that will be use to communicate between different
parts of the infrastructure
HubspotApiKey:
Type: String
Description: Hubspot api key
UserPool:
Type: String
Description: Cognito UserPool
PoolClients:
Type: String
Description: Cognito PoolClients
JWKS:
Type: String
Description: Cognito JWKS
ImageTag:
Type: String
Description: Tag of docker images
AccountsToken:
Type: String
Description: Token used for authenticating with Accounts
Conditions:
RestoreDB: !Not [!Equals [!Ref 'DBSnapshot', '']]
IsExclusive: !Not [!Equals [!Ref 'AccountId', N/a]]
Outputs:
InstanceURL:
Value: !Join ['', ["https://", !Ref 'ApplicationHost', ., !Ref 'Route53HostedZone']]
Mappings:
AWSRegionToAMI:
us-east-1:
AMIID: ami-a7a242da
us-east-2:
AMIID: ami-b86a5ddd
us-west-1:
AMIID: none
us-west-2:
AMIID: none
eu-west-1:
AMIID: none
eu-central-1:
AMIID: none
ap-northeast-1:
AMIID: none
ap-southeast-1:
AMIID: none
ap-southeast-2:
AMIID: none
解决方案
根据评论,该问题似乎与MaximumPercent和MinimumHealthyPercent参数及其默认值分别为 200 和 100 有关:
MaximumPercent:如果服务使用滚动更新 (ECS) 部署类型,则最大百分比参数表示在部署期间允许处于 RUNNING 或 PENDING 状态的服务中的任务数的上限。
MinimumHealthyPercent:如果服务使用滚动更新 (ECS) 部署类型,则最小健康百分比表示服务中必须在部署期间保持 RUNNING 状态的任务数的下限。
默认值 200 和 100 意味着对于 6 个任务大小的服务,在部署期间将有12 个任务同时运行。这对于容器实例来说似乎太多了。
建议的解决方案是将值更改为150 和 50,从而在部署期间总共运行6 个任务(3 个新任务和 3 个旧任务),直到部署完成。
推荐阅读
- c# - WPF 动画不会为一位用户显示并且可能导致崩溃
- unit-testing - Vue:单元测试动态组件
- python - Django:带有动态 URL 的侧边栏:如何动态创建路径中包含动态文件夹的 URL
- python - 无效的语法 sum(list) >= 2
- reporting-services - 子报表与主报表不匹配 [设计]
- database - 为财务数据构建数据库
- python - 梯度下降返回斜率和误差的 NaN 值
- html - IOS html5文件输入视频压缩480x360
- php - PHP 联系表问题 - 找到 302
- android - 如何访问不是活动的 java 类中的 xml 文件的元素?