python - Python 中的 Youtube 数据 API nextPageToken 循环
问题描述
我从网上找到的许多不同的例子拼凑起来。
目标是:
- 在 youtube api 中搜索
- 将多个页面的搜索结果转换为 csv 文件
编辑:由于提供的答案之一,这是搜索循环的一个工作示例。现在,这会按预期循环最大次数(10),但是当执行时,现在的问题是CSV 文件
似乎在调用响应之后,即使有调用results
和writeCSV
之后,程序也会完成。
任何进一步的帮助将不胜感激!
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
import argparse
DEVELOPER_KEY = "dev-key"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, developerKey=DEVELOPER_KEY)
# -------------Build YouTube Search------------#
def youtubeSearch(query, order="relevance"):
# search 50 results per page
request = youtube.search().list(
q=query,
type="video",
order=order,
part="id,snippet",
maxResults="50",
relevanceLanguage='en',
videoDuration='long',
fields='nextPageToken, items(id,snippet)'
)
title = []
channelId = []
channelTitle = []
categoryId = []
videoId = []
viewCount = []
likeCount = []
dislikeCount = []
commentCount = []
favoriteCount = []
tags = []
category = []
videos = []
while request:
response = request.execute()
for search_result in response.get("items", []):
if search_result["id"]["kind"] == "youtube#video":
# append title and video for each item
title.append(search_result['snippet']['title'])
videoId.append(search_result['id']['videoId'])
# then collect stats on each video using videoId
stats = youtube.videos().list(
part='statistics, snippet',
id=search_result['id']['videoId']).execute()
channelId.append(stats['items'][0]['snippet']['channelId'])
channelTitle.append(stats['items'][0]['snippet']['channelTitle'])
categoryId.append(stats['items'][0]['snippet']['categoryId'])
favoriteCount.append(stats['items'][0]['statistics']['favoriteCount'])
viewCount.append(stats['items'][0]['statistics']['viewCount'])
# Not every video has likes/dislikes enabled so they won't appear in JSON response
try:
likeCount.append(stats['items'][0]['statistics']['likeCount'])
except:
# Good to be aware of Channels that turn off their Likes
print("Video titled {0}, on Channel {1} Likes Count is not available".format(
stats['items'][0]['snippet']['title'],
stats['items'][0]['snippet']['channelTitle']))
print(stats['items'][0]['statistics'].keys())
# Appends "Not Available" to keep dictionary values aligned
likeCount.append("Not available")
try:
dislikeCount.append(stats['items'][0]['statistics']['dislikeCount'])
except:
# Good to be aware of Channels that turn off their Likes
print("Video titled {0}, on Channel {1} Dislikes Count is not available".format(
stats['items'][0]['snippet']['title'],
stats['items'][0]['snippet']['channelTitle']))
print(stats['items'][0]['statistics'].keys())
dislikeCount.append("Not available")
# Sometimes comments are disabled so if they exist append, if not append nothing...
# It's not uncommon to disable comments, so no need to wrap in try and except
if 'commentCount' in stats['items'][0]['statistics'].keys():
commentCount.append(stats['items'][0]['statistics']['commentCount'])
else:
commentCount.append(0)
if 'tags' in stats['items'][0]['snippet'].keys():
tags.append(stats['items'][0]['snippet']['tags'])
else:
# I'm not a fan of empty fields
tags.append("No Tags")
request = youtube.search().list_next(
request, response)
# Break out of for-loop and if statement and store lists of values in dictionary
youtube_dict = {'tags': tags, 'channelId': channelId, 'channelTitle': channelTitle,
'categoryId': categoryId, 'title': title, 'videoId': videoId,
'viewCount': viewCount, 'likeCount': likeCount, 'dislikeCount': dislikeCount,
'commentCount': commentCount, 'favoriteCount': favoriteCount}
print("Search Completed...")
print("Total results: {0} \nResults per page: {1}".format(request['pageInfo']['totalResults'],
request['pageInfo']['resultsPerPage']))
print("Example output per item, snippet")
print(request['items'][0]['snippet'].keys())
# Assign first page of results (items) to item variable
items = request['items'] # 50 "items"
# Assign 1st results to title, channelId, datePublished then print
title = items[0]['snippet']['title']
channelId = items[0]['snippet']['channelId']
datePublished = items[0]['snippet']['publishedAt']
print("First result is: \n Title: {0} \n Channel ID: {1} \n Published on: {2}".format(title, channelId,
datePublished))
return youtube_dict
# Input query
print("Please input your search query")
q = input()
# Run YouTube Search
results = youtubeSearch(q)
# Display result titles
print("Top 3 results are: \n {0}, ({1}), \n {2}, ({3}),\n {4}, ({5})".format(results['title'][0],
results['channelTitle'][0],
results['title'][1],
results['channelTitle'][1],
results['title'][2],
results['channelTitle'][2]))
# -------------------------Save results------------------------------#
print("Input filename to store csv file")
file = "\\YouTube\\" + input() + ".csv"
def writeCSV(results, filename):
import csv
keys = sorted(results.keys())
with open(filename, "w", newline="", encoding="utf-8") as output:
writer = csv.writer(output, delimiter=",")
writer.writerow(keys)
writer.writerows(zip(*[results[key] for key in keys]))
writeCSV(results, file)
print("CSV file has been uploaded at: " + str(file))
解决方案
由于您使用的是 Google 的Python API 客户端库,因此在API 端点上实现结果集分页的Python 方式如下所示:Search.list
request = youtube.search().list(
q = 'A query',
part = 'id,snippet',
type = 'video',
maxResults = 50,
relevanceLanguage = 'en',
videoDuration = 'long'
)
while request:
response = request.execute()
for item in response['items']:
...
request = youtube.search().list_next(
request, response)
由于 Python 客户端库的实现方式,这很简单:根本不需要显式处理 API 响应对象的属性nextPageToken
和 API 请求参数pageToken
。
推荐阅读
- ruby-on-rails - 使用 ActiveStorage 构建 Rails 6 应用程序,通过 dokku 成功部署,现在部署后图像消失
- python - 在 Pandas 中,如何将函数应用于数据框的一行,其中行中的每个项目都应作为参数传递给函数?
- ios - iOS/SpriteKit:需要说明 SKEmitterNode 位置
- powershell - Windows Server 2019 任务计划程序中的 Powershell 命令?
- angular - 将 FormControl 绑定到指令中的输入
- android - 整个应用程序中的 SharedPreferences 不存储值
- php - PHP算法函数创建一个可能排列的数组(来自其他数组)
- c# - 从 Azure blob 存储和 ASP.NET Core 3 流式传输视频
- javascript - 如何使用音频标签在 HTML5 中播放音频数据?
- java - 无法在项目 springboot 上执行目标 org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile)