首页 > 解决方案 > 如何提取 github 存储库的提交页面总数

问题描述

我正在设置一个脚本来导出所有提交和拉取请求,以获取更大的 github 存储库列表(大约 4000 个)。

在脚本的基本思想起作用之后,我需要一种方法来遍历存储库的所有提交页面。

我发现我可以每页导出 100 个提交。对于某些存储库,还有更多提交(例如 8000 次),因此我需要循环浏览 80 页。

我找不到从 github api 中提取页数的方法。

到目前为止,我所做的是设置它循环所有提交并将它们导出到 txt / csv 文件的脚本。

我需要做的是在开始循环提交 repo 之前知道总页数。

这在这里以我无法使用的方式为我提供了页数。

curl -u "user:password" -I https://api.github.com/repos/0chain/rocksdb/commits?per_page=100

结果:

链接:https ://api.github.com/repositories/152923130/commits?per_page=100&page=2 ;rel="next", https://api.github.com/repositories/152923130/commits?per_page=100&page=75;rel="最后"

我需要将值 75(或来自其他存储库的任何其他值)用作循环中的变量。

像这样:

repolist=`cat repolist.txt`
repolistarray=($(echo $repolist))
repolength=$(echo "${#repolistarray[@]}")

for (( i = 0; i <= $repolength; i++ )); do
    #here i need to extract the pagenumber
    pagenumber=$(curl -u "user:password" -I https://api.github.com/repos/$(echo "${repolistarray[i]}")/commits?per_page=100)

    for (( n = 1; n <= $pagenumber; n++ )); do
        curl -u "user:password" -s https://api.github.com/repos/$(echo "${repolistarray[i]}")/commits?per_page=100&page$(echo "$n") >committest.txt
    done
done

done

我怎样才能得到“75”或任何其他结果

链接:https ://api.github.com/repositories/152923130/commits?per_page=100&page=2 ;rel="next", https://api.github.com/repositories/152923130/commits?per_page=100&page=75;rel="最后"

用作“n”?

标签: bashshellgithubgithub-apigit-bash

解决方案


这是@Poshi 评论的内容:无限期地循环请求下一页,直到你碰到一个空页,然后跳出内部循环,继续下一个 repo。

# this is the contents of a page past the last real page:
emptypage='[

]'

# here's a simpler way to iterate over each repo than using a bash array
cat repolist.txt | while read -d' ' repo; do

  # loop indefinitely
  page=0
  while true; do
    page=$((page + 1))

    # minor improvement: use a variable, not a file.
    # also, you don't need to echo variables, just use them
    result=$(curl -u "user:password" -s \ 
      "https://api.github.com/repos/$repo/commits?per_page=100&page=$n")

    # if the result is empty, break out of the inner loop
    [ "$result" = "$emptypage" ] && break

    echo "$result" > committest.txt
    # note that > overwrites (whereas >> appends),
    # so committest.txt will be overwritten with each new page.
    #
    # in the final version, you probably want to process the results here,
    # and then
    #
    #       echo "$processed_results"
    #     done > repo1.txt
    #   done
    #
    # to ouput once per repo, or
    #
    #       echo "$processed_results"
    #     done
    #   done > all_results.txt
    #
    # to output all results to a single file

  done
done

推荐阅读