首页 > 解决方案 > 获取 java.net.SocketTimeoutException:在 AWS CodeBuild 期间连接超时

问题描述

在 AWS CodeBuild 的验收测试期间,我们能够将 .jar 请求到管道中,但调用 .jar 的命令无法执行(在此示例中修改了 URL 和 IP,用于混淆目的):

[Container] 2020/07/08 14:53:37 Running command java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com"
Skipping HTTPS certificate checks altogether. Note that this is not secure at all.
java.net.SocketTimeoutException: connect timed out
    at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
    at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
    at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
    at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
    at java.base/java.net.Socket.connect(Socket.java:609)
    at hudson.cli.CLI.connectViaCliPort(CLI.java:210)
    at hudson.cli.CLI.<init>(CLI.java:128)
    at hudson.cli.CLIConnectionFactory.connect(CLIConnectionFactory.java:72)
    at hudson.cli.CLI._main(CLI.java:479)
    at hudson.cli.CLI.main(CLI.java:390)
    Suppressed: java.io.EOFException: unexpected stream termination
        at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:331)
        at hudson.remoting.Channel.<init>(Channel.java:422)
        at hudson.remoting.Channel.<init>(Channel.java:401)
        at hudson.remoting.Channel.<init>(Channel.java:397)
        at hudson.remoting.Channel.<init>(Channel.java:386)
        at hudson.remoting.Channel.<init>(Channel.java:378)
        at hudson.remoting.Channel.<init>(Channel.java:354)
        at hudson.cli.CLI.connectViaHttp(CLI.java:159)
        at hudson.cli.CLI.<init>(CLI.java:132)
        ... 3 more

[Container] 2020/07/08 14:54:01 Command did not exit successfully java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com" exit status 255
[Container] 2020/07/08 14:54:01 Phase complete: PRE_BUILD State: FAILED
[Container] 2020/07/08 14:54:01 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com". Reason: exit status 255

这是app-test-buildspec.ymlwget作品):

# builld spec version.  keep at 0.2
# https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html#build-spec-ref-versions
version: 0.2

phases:
  pre_build:
    commands:
      #- echo "Installing jq (JSON parser)..."
      #- yum install -y jq gettext
      - echo "deploy_phase=${deploy_phase} developer_prefix=${developer_prefix} environment=${environment} account_id=${account_id} account_alias=${account_alias}"
      - $(cat version.json | jq -j '"export app_name=\(.app_name) app_version=\(.app_version) s3_version=\(.s3_version)"')
      - echo "app_name=${app_name} app_version=${app_version} s3_version=${s3_version} developer_prefix=${developer_prefix} environment=${environment}"
      - $(cat app-deploy.json | jq -j '"export UseFargate=\(.Parameters.UseFargate)"')
      - echo "UseFargate=${UseFargate}"
      - wget https://example.com/jenkins/jenkins-cli.jar -O qa-jenkins-cli.jar
      - java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com"
  build:
    commands:
      - pip install boto3 pytest
      - pytest -o log_cli=true -o log_cli_level=INFO -v tests/test_ecs_cluster.py

artifacts:
  files:
    - '**/*'

我们设置了 DNS 镜像,以便某些 AWS 进程可以访问本地服务,例如我们尝试在此处运行的测试套件。由于镜像,测试正在 VPC 中运行。我们知道镜像正在工作,因为我们可以执行wget检索 .jar 文件的操作。我们无法在任何地方的流日志中看到此调用。

有人对这里可能发生的事情有任何见解吗?

标签: javajaraws-codepipelineaws-codebuild

解决方案


我们发现测试 .jar 文件试图在另一个本地设备上执行测试,该设备具有防火墙设置,导致命令请求在该防火墙处丢弃,除了超时响应之外什么都没有。

经验教训 - 如果您要使用 AWS 和本地资源的组合运行混合系统,您必须确切知道需要哪些资源以及它们的位置。在大型系统中,过程的文档可能不准确或不存在。您必须拥有出色的工具来追踪问题发生的点(WireShark 是这里的救命稻草),以便您了解如何进行补救。


推荐阅读