首页 > 解决方案 > 从自签名转换为商业证书 TLS 错误

问题描述

当我安装我们的集群时,我使用了来自我们内部 CA 机构的自签名证书。一切都很好,直到我开始从部署到 OKD 集群的应用程序中收到证书错误。我们决定不再尝试一次修复一个错误,而是简单地购买一个商业证书并安装它。因此,我们从 GlobalSign 购买了带有通配符(与我们最初从内部 CA 获得的相同)的 SAN 证书,我正在尝试安装它,但遇到了很大的问题。

请记住,我在这里尝试了数十次迭代。我只是在记录我尝试过的最后一个,试图找出到底是什么问题。这是在我的测试集群上,它是一个虚拟机服务器,我在每一个之后都恢复到一个快照。快照是使用内部 CA 证书的操作集群。

因此,我的第一步是构建要传入的 CAfile。我下载了 GlobalSign 的根证书和中间证书并将它们放入ca-globalsign.crt文件中。(PEM 格式)

当我跑步时

openssl verify -CAfile ../ca-globalsign.crt labtest.mycompany.com.pem

我得到:

labtest.mycompany.com.pem: OK

openssl x509 -in labtest.mycompany.com.pem -text -noout给我(已编辑)

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            (redacted)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=BE, O=GlobalSign nv-sa, CN=GlobalSign Organization Validation CA - SHA256 - G2
        Validity
            Not Before: Apr 29 16:11:07 2019 GMT
            Not After : Apr 29 16:11:07 2020 GMT
        Subject: C=US, ST=(redacted), L=(redacted), OU=Information Technology, O=(redacted), CN=labtest.mycompany.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    (redacted)
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            Authority Information Access:
                CA Issuers - URI:http://secure.globalsign.com/cacert/gsorganizationvalsha2g2r1.crt
                OCSP - URI:http://ocsp2.globalsign.com/gsorganizationvalsha2g2

            X509v3 Certificate Policies:
                Policy: 1.3.6.1.4.1.4146.1.20
                  CPS: https://www.globalsign.com/repository/
                Policy: 2.23.140.1.2.2

            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Subject Alternative Name:
                DNS:labtest.mycompany.com, DNS:*.labtest.mycompany.com, DNS:*.apps.labtest.mycompany.com
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Key Identifier:
                (redacted)
            X509v3 Authority Key Identifier:
                (redacted)

            (redacted)

在我的本地机器上。我所知道的关于 SSL 的一切都表明证书很好。这些新文件被放入我用来保存配置的项目中,例如我的 OKD 安装。

然后我更新了我的 ansible 库存项目中的证书文件并运行命令

ansible-playbook -i ../okd_install/inventory/okd_labtest_inventory.yml playbooks/redeploy-certificates.yml

当我阅读文档时,一切都告诉我它应该简单地通过其过程并提出新的证书。这不会发生。当我openshift_master_overwrite_named_certificates: false在我的清单文件中使用时,安装完成,但它只替换*.apps.labtest域上的证书,但console.labtest保持原始状态但它确实在线,除了监控bad gateway在集群控制台中显示的事实。

现在,如果我尝试再次运行该命令,使用openshift_master_overwrite_named_certificates: truemy/var/log/containers/master-api*.log会出现这样的错误

{"log":"I0507 15:53:28.451851       1 logs.go:49] http: TLS handshake error from 10.128.0.56:46796: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.451894391Z"}
{"log":"I0507 15:53:28.455218       1 logs.go:49] http: TLS handshake error from 10.128.0.56:46798: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.455272658Z"}
{"log":"I0507 15:53:28.458742       1 logs.go:49] http: TLS handshake error from 10.128.0.56:46800: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.461070768Z"}
{"log":"I0507 15:53:28.462093       1 logs.go:49] http: TLS handshake error from 10.128.0.56:46802: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.463719816Z"}

还有这些

{"log":"I0507 15:53:29.355463       1 logs.go:49] http: TLS handshake error from 10.70.25.131:44424: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.357218793Z"}
{"log":"I0507 15:53:29.357961       1 logs.go:49] http: TLS handshake error from 10.70.25.132:43128: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.358779155Z"}
{"log":"I0507 15:53:29.357993       1 logs.go:49] http: TLS handshake error from 10.70.25.132:43126: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.358790397Z"}
{"log":"I0507 15:53:29.405532       1 logs.go:49] http: TLS handshake error from 10.70.25.131:44428: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.406873158Z"}
{"log":"I0507 15:53:29.527221       1 logs.go:49] http: TLS handshake error from 10.70.25.132:43130: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53

并且安装挂在 ansible 任务上TASK [Remove web console pods]。它会在那里坐几个小时。当进入主控制台并oc get podsopenshift-web-consoleterminating状态下运行时。当我描述试图开始的吊舱时pending,它回来说硬盘已满。我假设那是因为上面的所有这些 TLS 错误,它无法与存储系统通信。它只是停留在那里。如果我强制删除终止的 pod,然后重新启动主节点,然后删除正在尝试启动的新 pod,然后再次重新启动,我可以恢复集群。然后 Web 控制台上线,但我的所有日​​志文件都充斥着这些 TLS 错误。但是,更令人担忧的是安装在那个位置挂起,所以我假设在将 Web 控制台联机后还有其他步骤也会导致我出现问题。

因此,我也尝试重新部署服务器 CA。这产生了问题,因为我的新证书不是 CA 证书。然后,当我刚刚运行重新部署 CA 剧本时,让集群重新创建服务器 CA,它完成得很好,但是当我尝试运行时redeploy-certificates.yml,我得到了相同的结果。

这是我的库存文件

all:
  children:
    etcd:
      hosts:
        okdmastertest.labtest.mycompany.com:
    masters:
      hosts:
        okdmastertest.labtest.mycompany.com:
    nodes:
      hosts:
        okdmastertest.labtest.mycompany.com:
          openshift_node_group_name: node-config-master-infra
        okdnodetest1.labtest.mycompany.com:
          openshift_node_group_name: node-config-compute
          openshift_schedulable: True
    OSEv3:
      children:
        etcd:
        masters:
        nodes:
        # https://docs.okd.io/latest/install_config/persistent_storage/persistent_storage_glusterfs.html#overview-containerized-glusterfs
        # https://github.com/openshift/openshift-ansible/tree/master/playbooks/openshift-glusterfs
        # glusterfs:
      vars:
        openshift_deployment_type: origin
        ansible_user: root

        openshift_master_cluster_method: native
        openshift_master_default_subdomain: apps.labtest.mycompany.com
        openshift_install_examples: true

        openshift_master_cluster_hostname: console.labtest.mycompany.com
        openshift_master_cluster_public_hostname: console.labtest.mycompany.com
        openshift_hosted_registry_routehost: registry.apps.labtest.mycompany.com

        openshift_certificate_expiry_warning_days: 30
        openshift_certificate_expiry_fail_on_warn: false
        openshift_master_overwrite_named_certificates: true
        openshift_hosted_registry_routetermination: reencrypt

        openshift_master_named_certificates:
          - certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
            keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
            cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
            names:
              - "console.labtest.mycompany.com"
              # - "labtest.mycompany.com"
              # - "*.labtest.mycompany.com"
              # - "*.apps.labtest.mycompany.com"
        openshift_hosted_router_certificate:
          certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
          keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
          cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
        openshift_hosted_registry_routecertificates:
          certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
          keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
          cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"

        # LDAP auth
        openshift_master_identity_providers:
        - name: 'mycompany_ldap_provider'
          challenge: true
          login: true
          kind: LDAPPasswordIdentityProvider
          attributes:
            id:
            - dn
            email:
            - mail
            name:
            - cn
            preferredUsername:
            - sAMAccountName
          bindDN: 'ldapbind@int.mycompany.com'
          bindPassword: (redacted) 
          insecure: true
          url: 'ldap://dc-pa1.int.mycompany.com/ou=mycompany,dc=int,dc=mycompany,dc=com'

我在这里想念什么?我认为这本redeploy-certificates.yml剧本旨在更新证书。为什么我不能将其转换为我的新商业证书?它几乎就像替换路由器上的证书(有点),但在这个过程中搞砸了内部服务器证书。我真的在我这里结束了,我不知道还有什么可以尝试的。

标签: kubernetesopenshiftopenshift-origin

解决方案


您应该将openshift_master_cluster_hostname和配置openshift_master_cluster_public_hostname为彼此不同的主机名。这两个主机名也应该由 DNS 解析。您的商业证书用作外部访问点。

The openshift_master_cluster_public_hostname and openshift_master_cluster_hostname parameters in the Ansible inventory file, by default /etc/ansible/hosts, must be different. 
If they are the same, the named certificates will fail and you will need to re-install them.

# Native HA with External LB VIPs
openshift_master_cluster_hostname=internal.paas.example.com
openshift_master_cluster_public_hostname=external.paas.example.com

并且您最好逐步配置每个组件的证书以进行测试。例如,首先,配置自定义主主机证书,然后验证。然后,为默认路由器配置自定义通配符证书,并验证。等等。如果您可以成功完成所有重新部署证书的任务,最终您可以使用完整的参数运行您的商业证书维护。

有关更多详细信息,请参阅配置自定义证书。我希望它对你有帮助。


推荐阅读