首页 > 解决方案 > 如何以可编程方式获取 ArchLinux 中 AUR 提供的所有包的元数据?

问题描述

如何以可编程的方式获取 ArchLinux 中 AUR 提供的所有包的元数据,包括那些未安装在本地的包?最好在 Python 中。

我尝试过AurJson,这是一组用于访问包元数据的 API,但必须提供最小长度的搜索关键字才能查询包元数据。

标签: pythonarchlinux

解决方案


这是一个有趣的问题!

AUR 包

您可以从https://aur.archlinux.org/packages.gz获取所有 AUR 包的列表。

然后,您可以使用 AurJson 接口的info请求并批量处理多个包(不确定每个请求的最大值是多少):

https://aur.archlinux.org/rpc.php/rpc/?v=5&type=info&arg[]=criu&arg[]=criu&arg[]=criu&arg[]=criu&arg[]=criu&arg[]=criu&arg[]=criu&arg []=克鲁

一定要表现得很好并限制您的请求!这样的事情会让你开始......

import requests

packages = requests.get('https://aur.archlinux.org/packages.gz').text.splitlines()
batch_size = 50
package_infos = {}

while packages:
    batch, packages = packages[:batch_size], packages[batch_size:]
    for result in requests.get(
        'https://aur.archlinux.org/rpc.php/rpc/',
        params={'v': 5, 'type': 'info', 'arg[]': batch},
    ).json()['results']:
        package_infos[result['Name']] = result
    break  # Replace this with throttling code :)

print(package_infos)

结果是

{'adwaita-dark-darose': {'Depends': ['gnome-themes-standard'],
                         'Description': 'Adwaita theme hacked to use my custom '
                                        'color scheme. (Dark blues instead of '
                                        'greys.)',
                         'FirstSubmitted': 1493136022,
                         'ID': 464990,
                         'Keywords': [],
                         'LastModified': 1511841278,
                         'License': ['GPL'],
                         'Maintainer': 'darose',
                         'MakeDepends': ['glib2', 'gtk3'],
                         'Name': 'adwaita-dark-darose',
                         'NumVotes': 3,
                         'OutOfDate': None,
                         'PackageBase': 'adwaita-dark-darose',
                         'PackageBaseID': 121780,
                         'Popularity': 0.024409,
                         'URL': 'none',
                         'URLPath': '/cgit/aur.git/snapshot/adwaita-dark-darose.tar.gz',
                         'Version': '3.22.3-10'},
 'atari-adventure': {'Depends': ['stella'],
                     'Description': 'The original Adventure game for the old '
                                    'Atari 2600 game console',
                     'FirstSubmitted': 1247592088,
                     'ID': 214107,
                     'Keywords': [],
                     'LastModified': 1437534447,
                     'License': ['unknown'],
                     'Maintainer': 'darose',
                     'Name': 'atari-adventure',
                     'NumVotes': 2,
                     'OutOfDate': None,
                     'PackageBase': 'atari-adventure',
                     'PackageBaseID': 28288,
                     'Popularity': 0,
                     'URL': 'http://www.atariage.com/software_page.html?SoftwareID=802',
                     'URLPath': '/cgit/aur.git/snapshot/atari-adventure.tar.gz',
                     'Version': '1.0-3'},
....

拱包

(我误解了原来的问题,但这是原来的答案。)

您可以使用Python 中的库查看 Arch 数据库文件,根据 Arch Linux wiki,这些文件是 tar.gz 文件。tarfile

所以假设你已经从镜像下载了 core.db/community.db/extra.db(例如https://mirrors.edge.kernel.org/archlinux/core/os/x86_64/core.db / https:// mirrors.edge.kernel.org/archlinux/community/os/x86_64/community.db / https://mirrors.edge.kernel.org/archlinux/extra/os/x86_64/extra.db但请使用另一个更接近的镜像你),你可以阅读它们,例如(Python 3)

import tarfile
tf = tarfile.open('core.db', 'r:gz')
for member in tf.getmembers():
    if member.name.endswith('/desc'):
        with tf.extractfile(member) as fp:
            print(fp.read().decode())
            print('-' * 40)

它以原始格式打印出描述文件,例如

%FILENAME%
archlinux-keyring-20180404-1-any.pkg.tar.xz

%NAME%
archlinux-keyring

%VERSION%
20180404-1

%DESC%
Arch Linux PGP keyring

%CSIZE%
684236

%ISIZE%
948224

%MD5SUM%
9ba27bf598d60f2ea6320339289a2401

%SHA256SUM%
6f0f2f8d72742da18b61b7e4a1900d419c718b6d9dcad804763b80a12cc9abaf

%PGPSIG%
iQEzBAABCAAdFiEE82kWh9hnuBtRzgfZu+Q3cUhzKKkFAlrEfLMACgkQu+Q3cUhzKKmE7ggAgNjBAz6FkFqy2+Q0Rfzt0ZibYT/KW6ibQoKgpxDQNkzcl/1ZVzS4rkZRjHkBJd8fKI2n6NtiijwiQBPBsTI8t4+nVD19C4zZbDHzTdABm4EaDdJg+ya635Df8xMqt6GNzxV5DmABioSww2ebY9EuSwl3yvMNTQUI8hAjWPfOirDRZDic9DEYvhPabUn9NlLzShQeDIZP/R0ejDCfBIcu2NMX+NSUg41w0+LGrLNpqdnI+ej0n3X6NDkvCZwvvC3DPCWs1PAhFS5yC5dve4pDBjf8fLuJBPbRQJx6Se0K0CCoeUVA2V4ld2HLXor1aLG0bijF2QhMLzHmW4XxWbpWLA==

%URL%
https://projects.archlinux.org/archlinux-keyring.git/

%LICENSE%
GPL

%ARCH%
any

%BUILDDATE%
1522826386

%PACKAGER%
Bartłomiej Piotrowski <bpiotrowski@archlinux.org>

编辑:您还可以使用类似的东西将数据库文件解析为字典

def read_aur_db_entry(fp):
    db_entry = collections.defaultdict(str)
    key = None
    for line in fp.readlines():
        if line.startswith(b'%') and line.endswith(b'%\n'):
            key = line[1:-2].decode()
            continue
        db_entry[key] += line.decode()
    return {key: value.strip() for (key, value) in db_entry.items()}

所以你得到

{'ARCH': 'any',
 'BUILDDATE': '1522826386',
 'CSIZE': '684236',
 'DESC': 'Arch Linux PGP keyring',
 'FILENAME': 'archlinux-keyring-20180404-1-any.pkg.tar.xz',
 'ISIZE': '948224',
 'LICENSE': 'GPL',
 'MD5SUM': '9ba27bf598d60f2ea6320339289a2401',
 'NAME': 'archlinux-keyring',
 'PACKAGER': 'Bartłomiej Piotrowski <bpiotrowski@archlinux.org>',
 'PGPSIG': 'iQEzBAABCAAdFiEE82kWh9hnuBtRzgfZu+Q3cUhzKKkFAlrEfLMACgkQu+Q3cUhzKKmE7ggAgNjBAz6FkFqy2+Q0Rfzt0ZibYT/KW6ibQoKgpxDQNkzcl/1ZVzS4rkZRjHkBJd8fKI2n6NtiijwiQBPBsTI8t4+nVD19C4zZbDHzTdABm4EaDdJg+ya635Df8xMqt6GNzxV5DmABioSww2ebY9EuSwl3yvMNTQUI8hAjWPfOirDRZDic9DEYvhPabUn9NlLzShQeDIZP/R0ejDCfBIcu2NMX+NSUg41w0+LGrLNpqdnI+ej0n3X6NDkvCZwvvC3DPCWs1PAhFS5yC5dve4pDBjf8fLuJBPbRQJx6Se0K0CCoeUVA2V4ld2HLXor1aLG0bijF2QhMLzHmW4XxWbpWLA==',
 'SHA256SUM': '6f0f2f8d72742da18b61b7e4a1900d419c718b6d9dcad804763b80a12cc9abaf',
 'URL': 'https://projects.archlinux.org/archlinux-keyring.git/',
 'VERSION': '20180404-1'}

推荐阅读