首页 > 解决方案 > 谷歌云存储 - 计算非常大的桶中的对象

问题描述

我正在尝试加快计算存储桶中对象(文件)数量的方法。我当前的方法使用大桶(通常包含 50k 或更多对象)变得非常慢。我想获取存储桶中对象的总数,以及超过某个阈值的对象的计数。

    $storage_client = GoogleCloudStorageNew::client();
    $bucket = $storage_client->bucket( $myBucketName );

    $this->threshold_hours = 3;
    $threshold = time() - ( 60 * 60 * $this->threshold_hours );
    $threshold_timestamp = date('Y-m-d H:i:s', $threshold);

    $params = [
                 'prefix'     => "$mls_id/",
                 'pageToken'  => null
              ];

    $this->total_images = 0;
    $this->total_old_images = 0;
    foreach ( $bucket->objects( $params ) as $object )
    {
        // always add to total
        $this->total_images++;

        $info = $object->info();
        $image_created = date( 'Y-m-d H:i:s', strtotime( $info['timeCreated']) );
        if ( $image_created < $threshold_timestamp )
        {
            $this->total_old_images++;
        }
    }

我想知道是否尝试分页结果可能会更快,但我无法让分页工作。使用相同的设置,我尝试了这个:

    $page_token = null;
    $params = [
                 'prefix'      => "$mls_id/",
                 'maxResults'  => 5000,
                 'pageToken'   => null
              ];

    $this->total_images = 0;
    $this->total_old_images = 0;

    $this->threshold_hours = 3;
    $threshold = time() - ( 60 * 60 * $this->threshold_hours );
    $threshold_timestamp = date('Y-m-d H:i:s', $threshold);
    
    while ( $objectList = $bucket->objects($params) )
    {
        $params['pageToken'] = $objectList->nextResultToken();
        foreach ( $objectList as $object )
        {
            $this->total_images++;

            $info = $object->info();
            $image_created = date( 'Y-m-d H:i:s', strtotime( $info['timeCreated']) );
            if ( $image_created < $threshold_timestamp )
            {
                $this->total_old_images++;
            }
        }
    }

但是分页不起作用 - maxResults 不会将返回的值限制为 5000,它只会获取所有内容。我是否误读了 maxResults/pageToken 和 nextResultToken() 如何协同工作?显然,我是,但我错过了什么?

标签: phpiterationgoogle-cloud-storage

解决方案


推荐阅读