首页 > 解决方案 > 将 API JSON 数据发送到 Exasol DB 表

问题描述

我正在使用来自虚拟 JSON 站点的假 JSON 数据,如下所示:

[
  {
    "postId": 1,
    "id": 1,
    "name": "id labore ex et quam laborum",
    "email": "Eliseo@gardner.biz",
    "body": "laudantium enim quasi est quidem magnam voluptate ipsam eos\ntempora quo necessitatibus\ndolor quam autem quasi\nreiciendis et nam sapiente accusantium"
  },
  {
    "postId": 1,
    "id": 2,
    "name": "quo vero reiciendis velit similique earum",
    "email": "Jayne_Kuhic@sydney.com",
    "body": "est natus enim nihil est dolore omnis voluptatem numquam\net omnis occaecati quod ullam at\nvoluptatem error expedita pariatur\nnihil sint nostrum voluptatem reiciendis et"
  }
]

我通过库读取 API 数据,requests然后转身将其发送到 Exasol DB 表中。请参阅下面的代码

import requests
import pyexasol

def get_api_data():
    r = requests.get("http://jsonplaceholder.typicode.com/comments")
    data = r.json()
    return data
    
def connection():
    session = pyexasol.connect_local_config('my_exasol')
    return session

def send_api_data():
    s = connection()
    data = get_api_data()
    for row in data:
        s.execute("""INSERT INTO TESTBED.TEST_API(postId, id, name, email, body) VALUES ({postId}, {id},{name},
        {email}, {body})""", {'postId': row['postId'], 'id': row['id'], 'name': row['name'], 'email': row['email'],
        'body': row['body']})

send_api_data()

这工作正常,问题是它非常慢。插入 500 条记录大约需要 2 分钟。我知道必须有一种更有效的方法来做到这一点。实际上,我将从一个包含数千条记录的 API 中提取数据,我希望将这些记录转过来并发送到数据库表中。

关于采取更好方法的任何想法?

标签: pythonjsonexasol

解决方案


在 Exasol 中执行单个插入语句很慢,因为它是基于列的数据库。您应该改用IMPORT。确保也阅读 pyexasol 的最佳实践。还要考虑启用压缩

对于您的示例,请尝试以下操作。以我为例,导入数据需要 0.7 秒。

import requests
import pyexasol
import pandas
import time

def get_api_data():
    r = requests.get("http://jsonplaceholder.typicode.com/comments")
    data = r.json()
    return data
    
def connection():
    session = pyexasol.connect_local_config('my_exasol')
    return session

def send_api_data():
    s = connection()
    data = get_api_data()

    data_for_import = [(row['postId'], row['id'], row['name'], row['email'], row['body']) for row in data]
    start = time.time()
    s.import_from_iterable(data_for_import, ("TESTBED","TEST_API"))
    print("Finished import after ", time.time() - start, " seconds")

send_api_data()

推荐阅读