python - 从谷歌分析 API v4 下载批处理报告
问题描述
我正在尝试获取 3 个月的报告,为此,我需要发出多个请求并将结果附加到列表中,因为 API 仅返回100,000
每个请求的行。有一个从名为 API 的变量返回nextPageToken
,我需要将其传递到下一个查询中以获取100,000
报告的下一行。我很难做到这一点。
这是我的代码:
def initialize_analyticsreporting():
'''Initializes an Analytics Reporting API V4 service object.
Returns:
An authorized Analytics Reporting API V4 service object.
'''
credentials = ServiceAccountCredentials.from_json_keyfile_name(
KEY_FILE_LOCATION, SCOPES)
# Build the service object.
analytics = build('analyticsreporting', 'v4', credentials=credentials)
return analytics
list = []
def get_report(analytics, pageTokenVariable):
return analytics.reports().batchGet(
body={
'reportRequests': [
{
'viewId': VIEW_ID,
'pageSize': 100000,
'dateRanges': [{'startDate': '90daysAgo', 'endDate': 'yesterday'}],
'metrics': [{'expression': 'ga:adClicks'}, {'expression': 'ga:impressions'}, {'expression': 'ga:adCost'}, {'expression': 'ga:CTR'}, {'expression': 'ga:CPC'}, {'expression': 'ga:costPerTransaction'}, {'expression': 'ga:transactions'}, {'expression': 'ga:transactionsPerSession'}, {'expression': 'ga:pageviews'}, {'expression': 'ga:timeOnPage'}],
"pageToken": pageTokenVariable,
'dimensions': [{'name': 'ga:adMatchedQuery'}, {'name': 'ga:campaign'}, {'name': 'ga:adGroup'}, {'name': 'ga:adwordsCustomerID'}, {'name': 'ga:date'}],
'orderBys': [{'fieldName': 'ga:impressions', 'sortOrder': 'DESCENDING'}],
'dimensionFilterClauses': [{
'filters': [{
'dimension_name': 'ga:adwordsCustomerID',
'operator': 'EXACT',
'expressions': 'abc',
'not': 'True'
}]
}],
'dimensionFilterClauses': [{
'filters': [{
'dimension_name': 'ga:adMatchedQuery',
'operator': 'EXACT',
'expressions': '(not set)',
'not': 'True'
}]
}]
}]
}
).execute()
analytics = initialize_analyticsreporting()
response = get_report(analytics, "0")
for report in response.get('reports', []):
pagetoken = report.get('nextPageToken', None)
print(pagetoken)
#------printing the pagetoken here returns `100,000` which is expected
columnHeader = report.get('columnHeader', {})
dimensionHeaders = columnHeader.get('dimensions', [])
metricHeaders = columnHeader.get(
'metricHeader', {}).get('metricHeaderEntries', [])
rows = report.get('data', {}).get('rows', [])
for row in rows:
# create dict for each row
dict = {}
dimensions = row.get('dimensions', [])
dateRangeValues = row.get('metrics', [])
# fill dict with dimension header (key) and dimension value (value)
for header, dimension in zip(dimensionHeaders, dimensions):
dict[header] = dimension
# fill dict with metric header (key) and metric value (value)
for i, values in enumerate(dateRangeValues):
for metric, value in zip(metricHeaders, values.get('values')):
# set int as int, float a float
if ',' in value or ',' in value:
dict[metric.get('name')] = float(value)
else:
dict[metric.get('name')] = float(value)
list.append(dict)
# Append that data to a list as a dictionary
# pagination function
while pagetoken: # This says while there is info in the nextPageToken get the data, process it and add to the list
response = get_report(analytics, pagetoken)
pagetoken = response['reports'][0]['nextPageToken']
print(pagetoken)
#------printing the pagetoken here returns `200,000` as is expected but the data being pulled is the same as for the first batch and so on. While in the loop the pagetoken is being incremented but it does not retrieve new data
for row in rows:
# create dict for each row
dict = {}
dimensions = row.get('dimensions', [])
dateRangeValues = row.get('metrics', [])
# fill dict with dimension header (key) and dimension value (value)
for header, dimension in zip(dimensionHeaders, dimensions):
dict[header] = dimension
# fill dict with metric header (key) and metric value (value)
for i, values in enumerate(dateRangeValues):
for metric, value in zip(metricHeaders, values.get('values')):
# set int as int, float a float
if ',' in value or ',' in value:
dict[metric.get('name')] = float(value)
else:
dict[metric.get('name')] = float(value)
list.append(dict)
df = pd.DataFrame(list)
print(df) # Append that data to a list as a dictionary
df.to_csv('full_dataset.csv', encoding="utf-8", index=False)
我试图传递pagetoken的错误在哪里?
解决方案
所以你正在更新pagetoken,pagetoken = response['reports'][0]['nextPageToken']
但你不应该rows
在while循环中用新数据更新吗?
像这样的东西。
while pagetoken:
response = get_report(analytics, pagetoken)
pagetoken = response['reports'][0].get('nextPageToken')
for report in reponse.get('reports', []):
rows = report.get('data', {}).get('rows', [])
for row in rows:
推荐阅读
- python - 使用 win32com.client.dispatch 函数在 python 中打开 Excel 和 Word 时的不同行为
- java - 什么与 Java 列表相关的微妙之处导致我的堆算法实现失败?
- unit-testing - 开玩笑如何测试是否调用了模拟函数
- c# - 在特定时间只运行一次 azure 函数?
- sql - 为什么 SQL Server 连接执行时间过长?
- python - 无法将“时间戳”类型与“int”类型进行比较 - 减去数据集
- sql - SQL Server:对象的权限
- node.js - 我是否需要明确“结束”由 Firestore 事件触发的 Firebase Cloud Functions?
- javascript - 如何自动创建可用性测试?
- flutter - 带“:”的 Dart 类与不带的 Dart 类