首页 > 解决方案 > 在 API 中循环搜索控制台天以获取全天数据

问题描述

我正在尝试使用https://developers.google.com/webmaster-tools/search-console-api-original/v3/how-tos/all-your-data上的示例伪代码(概述标题)

并将其应用于我的脚本,因为每次调用限制为 25,000 行。解决方案是每天,从第一次循环迭代开始,我将获得最大行数(25,000),然后下一次迭代将是最后一个插入行的起始行(即 25,000)。例如:

到目前为止,我正在努力实现这一目标,并且不确定将这个 while 循环放在哪里。

我试过把它放在下面 - 但过滤器在循环中丢失了(它只会获取移动数据,即使这样看起来它也错过了一些其他过滤器):

        startRowindexIterator = 0
        while True:

            for filter_set in generate_filters(page=pages, device=args.devices, country=args.countries):
                    print day
                    request = {
                        'startDate' : day,
                        'endDate' : day,
                        'dimensions' : ["page","query"],
                        'startRow' : startRowindexIterator * args.max_rows_per_day,
                        'rowLimit' : args.max_rows_per_day,
                        'dimensionFilterGroups' : [
                            {
                                "groupType" : "and",
                                "filters" : filter_set
                            }
                        ]
                    }

                    response = execute_request(service, args.property_uri, request)
                    print 'startRowIterator: '+ str(startRowindexIterator)
                    print 'maxRows: '+ str(args.max_rows_per_day)

                    if response is None:
                        logging.error("Request failed %s", json.dumps(request, indent=2))
                        continue

                    if 'rows' in response:
                        startRowindexIterator = startRowindexIterator + 1

                        print 'rows: '+ str(len(response['rows']))

                        if pages:
                            filters = [pages[0], 'worldwide', 'all_devices', args.url_type]
                        else:
                            filters = ['gsc_property', 'worldwide', 'all_devices', args.url_type]

                        filter_mapping = {'page': 0, 'country': 1, 'device': 2}
                        for _filter in filter_set:
                            filters[filter_mapping[_filter['dimension']]] = _filter['expression']

该代码基于https://github.com/stephan765/Google-Search-Console-bulk-query/blob/master/search_console_query.py

我已将其修改为使用服务帐户,并进行了一些调整以适用于我的 python 版本和环境,以下是其余相关代码(未添加 while 循环)——这显然意味着我们限制为 25,000 行对于每个设备:

     for day in date_range(start_date, end_date):
        output_file = os.path.join(
            args.output_location,
            "{}_{}.csv".format(args.url_type, day.strftime("%Y%m%d"))
        )
        day = day.strftime("%Y-%m-%d")
        output_rows = []

        for filter_set in generate_filters(page=pages, device=args.devices, country=args.countries):
                request = {
                    'startDate' : day,
                    'endDate' : day,
                    'dimensions' : ["page","query"],
                    'rowLimit' : args.max_rows_per_day,
                    'dimensionFilterGroups' : [
                        {
                            "groupType" : "and",
                            "filters" : filter_set
                        }
                    ]
                }

                response = execute_request(service, args.property_uri, request)

                if 'rows' in response:

                    if pages:
                        filters = [pages[0], 'worldwide', 'all_devices', args.url_type]
                    else:
                        filters = ['gsc_property', 'worldwide', 'all_devices', args.url_type]

                    filter_mapping = {'page': 0, 'country': 1, 'device': 2}
                    for _filter in filter_set:
                        filters[filter_mapping[_filter['dimension']]] = _filter['expression']

                print len(response['rows'])


我希望能够循环直到每天和设备检索到所有记录。当我运行如下命令时的最终结果:

./script.py https://example.com/ 2019-01-01 2019-01-01 --max-rows-per-day=25000

我将能够获取给定日期所有设备的所有数据。

标签: pythonrestgoogle-search-console

解决方案


推荐阅读