首页 > 解决方案 > 导致 Jupyterhub 内部错误的初始化操作

问题描述

我有一个使用 Jupyter 和 Anaconda 作为附加组件创建 Dataproc 集群的 dag,我使用自定义脚本启动集群以安装 Python 包。

class CustomDataprocClusterCreateOperator(DataprocClusterCreateOperator):

    def __init__(self, *args, **kwargs):
        super(CustomDataprocClusterCreateOperator, self).__init__(*args, **kwargs)

    def _build_cluster_data(self):
        cluster_data = super(CustomDataprocClusterCreateOperator, self)._build_cluster_data()
        cluster_data['config']['endpointConfig'] = {
            'enableHttpPortAccess': True
        }
        cluster_data['config']['softwareConfig']['optionalComponents'] = [ 'JUPYTER', 'ANACONDA' ]
        return cluster_data

create_cluster=CustomDataprocClusterCreateOperator(
            task_id='create-%s-%s' % (cluster_name, user), 
            cluster_name='%s-%s' % (cluster_name, user),
            project_id=project_id,
            num_workers=2,
            num_masters=2,
            master_machine_type='n1-standard-2',
            worker_machine_type='n1-standard-2',
            subnetwork_uri='projects/XXX-network/regions/europe-west1/subnetworks/prod',
            init_actions_uris=['gs://XXX/python-packages.sh'],
            master_disk_size=1000,
            worker_disk_size=1000,
            master_disk_type='pd-ssd',
            worker_disk_type='pd-ssd',
            storage_bucket=XXX, 
            region='europe-west1', 
            zone='europe-west1-b',
            auto_delete_ttl=36000, 
            dag=dag
        )

这是python-package.sh脚本

#!/bin/sh
apt update
apt install python-pip -y
apt install libpq-dev python-dev
pip install -U google-cloud-storage
pip install -U xlrd 
pip install -U gcsfs
pip install -U tensorflow
pip install -U pymongo
pip install -U openpyxl
pip install -U psycopg2

它过去工作得很好,但自周五以来,集群正在启动,但访问 Jupyterhub 时出现 500 内部错误。我已连接到主服务器,并且确实可以访问存储桶(正在执行gsutil cat gs://XXX/python-packages.sh),并且初始化脚本已正确保存在/etc/google-dataproc/startup-scripts/dataproc-initialization-script-0. 我没有发现任何错误,/var/log/jupyter_notebook.log所以我现在有点没有想法。

删除init_actions确实使 Jupyterhub 再次工作,但我确实需要这些包

标签: google-cloud-platformgoogle-cloud-dataprocjupyterhub

解决方案


推荐阅读