cassandra - 使用 dsbulk 将 json 数据加载到 Cassandra
问题描述
我觉得 dsbulk 文档中确实缺少将 json 文件加载到 cassandra 中的文档。
这是我试图加载的 json 文件的一部分:
[
{
"tags": [
"r"
],
"owner": {
"reputation": 23,
"user_id": 12235281,
"user_type": "registered",
"profile_image": "https://www.gravatar.com/avatar/60e28f52215bff12adb9758fc2cf86dd?s=128&d=identicon&r=PG&f=1",
"display_name": "Me28",
"link": "https://stackoverflow.com/users/12235281/me28"
},
"is_answered": false,
"view_count": 3,
"answer_count": 0,
"score": 0,
"last_activity_date": 1589053659,
"creation_date": 1589053659,
"question_id": 61702762,
"link": "https://stackoverflow.com/questions/61702762/merge-dataframes-in-r-with-different-size-and-condition",
"title": "Merge dataframes in R with different size and condition"
},
{
"tags": [
"python",
"location",
"pyautogui"
],
"owner": {
"reputation": 1,
"user_id": 13507535,
"user_type": "registered",
"profile_image": "https://lh3.googleusercontent.com/a-/AOh14GgtdM9KrbH3X5Z33RCtz6xm_TJUSQS_S31deNYUcA=k-s128",
"display_name": "lowhatex",
"link": "https://stackoverflow.com/users/13507535/lowhatex"
},
"is_answered": false,
"view_count": 2,
"answer_count": 0,
"score": 0,
"last_activity_date": 1589053657,
"creation_date": 1589053657,
"question_id": 61702761,
"link": "https://stackoverflow.com/questions/61702761/want-to-get-a-grip-of-this-pyautogui-command",
"title": "Want to get a grip of this pyautogui command"
}
]
我一直在尝试加载的方式如下:
dsbulk load -url ./data_so1.json -k stackoverflow_t -t staging_t -h '182.14.0.1' -header false -u username -p password
这是我得到的最接近的值,它将值逐行推送到 Cassandra,如下所示:
data
-------------------------------------------------------------------------------------------------------------------------------
"title": "'Microsoft.ACE.OLEDB.12.0' provider is not registered on the local machine giving exception on client"
"profile_image": "https://www.gravatar.com/avatar/05085ede54486bdaebefcf8363e081e2?s=128&d=identicon&r=PG&f=1",
"view_count": 422,
"question_id": 61702768,
"user_id": 12235281,
这只是按原样获取行(包括逗号)。我已经尝试使用 -m 键进行映射,但并没有真正使用它。
将这些值放到各自的列中的正确方法是什么?
解决方案
推荐阅读
- python - 如何在不使用函数的情况下在 python 中打印名称的反面?
- android - 圆角 ExpandableListView
- javascript - 如何在 JavaScript 中使用模型数据:ASP.NET Core
- mariadb - MariaDB AUTO_INCREMENT 行为,同时设置键为负
- reactjs - 当组件没有完全渲染而没有重复代码时,在每个组件上显示加载屏幕的最佳方式是什么
- c++ - 在按键上增加模型的加速度?
- android - 服务调用 onServiceConnected 甚至在完全绑定其他服务之前
- javascript - 从方法对象数组返回值
- git - git core.ignorecase = false 在 Mac OS X 中
- tensorflow-lite - 在 tensorflow lite 中获得量化的激活