首页 > 解决方案 > Azure 存储帐户备份循环

问题描述

我希望有人可以通过遵循最佳方法来帮助理解和解决small我面临的这个问题。

我有一个庞大的存储帐户列表,其中超过 54 个。每个存储帐户下面都有多个 blob 和表。我想将每个存储帐户备份到另一个存储帐户。

所以我开始编写备份代码,它的作用是对特定存储 blob 和表进行循环,复制并将它们粘贴到另一个存储帐户中。

这是 blob 和表循环的代码

from typing import Container
from azure.cosmosdb.table.tableservice import TableService,ListGenerator
from azure.storage.blob import BlobClient, BlobServiceClient, ContainerClient
from azure.storage.blob import ResourceTypes, AccountSasPermissions
from azure.storage.blob import generate_account_sas    
from datetime import date
from datetime import *

today = str(date.today().strftime("%Y%m%d"))
print(today)




#================================ SOURCE ===============================
# Source Client
connection_string = '' # The connection string for the source container
account_key = '' # The account key for the source container
# source_container_name = 'newblob' # Name of container which has blob to be copied
table_service_out = TableService(account_name='', account_key='')
table_service_in = TableService(account_name='', account_key='')

# Create client
client = BlobServiceClient.from_connection_string(connection_string) 




client = BlobServiceClient.from_connection_string(connection_string)
all_containers = client.list_containers(include_metadata=True)
for container in all_containers:
    # Create sas token for blob
    sas_token = generate_account_sas(
        account_name = client.account_name,
        account_key = account_key, 
        resource_types = ResourceTypes(object=True, container=True),
        permission= AccountSasPermissions(read=True,list=True),
        # start = datetime.now(),
        expiry = datetime.utcnow() + timedelta(hours=4) # Token valid for 4 hours
    )

    
    print("==========================")
    print(container['name'], container['metadata'])
    
    # print("==========================")
    container_client = client.get_container_client(container.name)
    # print(container_client)
    blobs_list = container_client.list_blobs()
    for blob in blobs_list:
        # Create blob client for source blob
        source_blob = BlobClient(
        client.url,
        container_name = container['name'],
        blob_name = blob.name,
        credential = sas_token
    )
        target_connection_string = ''
        target_account_key = ''
        source_container_name = container['name']
        target_blob_name = blob.name
        target_destination_blob = container['name'] + today
        print(target_blob_name)
        # print(blob.name)
        target_client = BlobServiceClient.from_connection_string(target_connection_string)
        try:
            container_client = target_client.create_container(target_destination_blob)
            new_blob = target_client.get_blob_client(target_destination_blob, target_blob_name)
            new_blob.start_copy_from_url(source_blob.url)
            print("COPY TO")
            print(f"TRY: saving blob {target_blob_name} into {target_destination_blob} ")
        except:
            # Create new blob and start copy operation.
            new_blob = target_client.get_blob_client(target_destination_blob, target_blob_name)
            new_blob.start_copy_from_url(source_blob.url)
            print("COPY TO")
            print(f"EXCEPT: saving blob {target_blob_name} into {target_destination_blob} ")
        

#query 100 items per request, in case of consuming too much menory load all data in one time
query_size = 100

#save data to storage2 and check if there is lefted data in current table,if yes recurrence
def queryAndSaveAllDataBySize(tb_name,resp_data:ListGenerator ,table_out:TableService,table_in:TableService,query_size:int):
    for item in resp_data:
        #remove etag and Timestamp appended by table service
        del item.etag
        del item.Timestamp
        print("instet data:" + str(item) + "into table:"+ tb_name)
        table_in.insert_or_replace_entity(tb_name,item)
    if resp_data.next_marker:
        data = table_out.query_entities(table_name=tb_name,num_results=query_size,marker=resp_data.next_marker)
        queryAndSaveAllDataBySize(tb_name,data,table_out,table_in,query_size)


tbs_out = table_service_out.list_tables()
print(tbs_out)

for tb in tbs_out:
    table = tb.name + today
    print(target_connection_string)
    #create table with same name in storage2
    table_service_in.create_table(table_name=table, fail_on_exist=False)
    #first query
    data = table_service_out.query_entities(tb.name,num_results=query_size)
    queryAndSaveAllDataBySize(table,data,table_service_out,table_service_in,query_size)

但正如你在我的代码中看到的,我有

# Source Client
connection_string = '' # The connection string for the source container
account_key = '' # The account key for the source container
# source_container_name = 'newblob' # Name of container which has blob to be copied
table_service_out = TableService(account_name='', account_key='')
table_service_in = TableService(account_name='', account_key='')

它们是硬编码的连接字符串和键。

另一方面,我正在使用azure bicep创建所有备份存储,我可以在其中备份以前代码中的所有内容。

但这是我的阻碍,我想不出一个正确的方法来使这个动态。

在我的第一个代码(副本)中,我有硬编码变量,这意味着我必须从每个存储帐户复制并粘贴连接字符串>>运行脚本>>等待它完成>>从下一个存储开始。

我想做的可能是在存储 A 和 StorageBackupA 之间建立一种映射,我可以在其中列出所有连接字符串并只运行一次脚本,以便它可以遍历所有这些连接字符串并复制 blob 等到分配的备份存储。

我正在考虑不同的方法,例如 2 个列表,一个用于 StorageA,一个用于备份或字典,但我很难找出开发此步骤的最佳方法。

我希望我的描述足够清楚,如果您需要更多信息,请随时询问。

非常感谢您提供的任何帮助

标签: azureazure-devopsazure-blob-storageazure-table-storageazure-python-sdk

解决方案


推荐阅读