python - 读取 JSON 文件并转换为 Panda DataFrame
问题描述
我有一个包含这样的 JSON 格式数据的文本文件。
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32", "transactionAmount": 98.55, "merchantName": "Uber", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "02", "posConditionCode": "01", "merchantCategoryCode": "rideshare", "currentExpDate": "06/2023", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "414", "enteredCVV": "414", "cardLast4Digits": "1803", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-10-11T05:05:54", "transactionAmount": 74.51, "merchantName": "AMC #191138", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "entertainment", "cardPresent": true, "currentExpDate": "02/2024", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-11-08T09:18:39", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
我想要的是得到一个像这样的熊猫数据框
accountNumber customerId creditLimit availableMoney transactionDateTime transactionAmount merchantName acqCountry merchantCountryCode posEntryMode ... echoBuffer currentBalance merchantCity merchantState merchantZip cardPresent posOnPremises recurringAuthInd expirationDateKeyInMatch isFraud
0 737265056 737265056 5000 5000.0 2016-08-13T14:27:32 98.55 Uber US US 02 ... NaN 0.0 NaN NaN NaN False NaN NaN False False
当我执行下面的代码时,只是读取 JSON 并转换为 Pandas,我没有得到值;但每个单元格也包括字典。我也试过 json_normalize,但问题不是所有列都对齐。任何帮助,将不胜感激。
import json
import pandas as pd
with open('test.txt', 'r') as f:
data = json.load(f)
df = pd.DataFrame(data)
0 1 2 3 4 5 6 7 8 9 ... 19 20 21 22 23 24 25 26 27 28
0 {"accountNumber": "737265056" "customerId": "737265056" "creditLimit": 5000.0 "availableMoney": 5000.0 "transactionDateTime": "2016-08-13T14:27:32" "transactionAmount": 98.55 "merchantName": "Uber" "acqCountry": "US" "merchantCountryCode": "US" "posEntryMode": "02" ... "echoBuffer": "" "currentBalance": 0.0 "merchantCity": "" "merchantState": "" "merchantZip": "" "cardPresent": false "posOnPremises": "" "recurringAuthInd": "" "expirationDateKeyInMatch": false "isFraud": false}
1 {"accountNumber": "737265056" "customerId": "737265056" "creditLimit": 5000.0 "availableMoney": 5000.0 "transactionDateTime": "2016-10-11T05:05:54" "transactionAmount": 74.51 "merchantName": "AMC #191138" "acqCountry": "US" "merchantCountryCode": "US" "posEntryMode": "09" ... "transactionType": "PURCHASE" "echoBuffer": "" "currentBalance": 0.0 "merchantCity": "" "merchantState": "" "merchantZip": "" "posOnPremises": "" "recurringAuthInd": "" "expirationDateKeyInMatch": false "isFraud": false}
解决方案
问题是您的“JSON”文件有点无效。基本上,每条记录都应该用逗号分隔,并且应该包含在这样的列表中 -
[{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32", "transactionAmount": 98.55, "merchantName": "Uber", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "02", "posConditionCode": "01", "merchantCategoryCode": "rideshare", "currentExpDate": "06/2023", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "414", "enteredCVV": "414", "cardLast4Digits": "1803", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false},
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-10-11T05:05:54", "transactionAmount": 74.51, "merchantName": "AMC #191138", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "entertainment", "cardPresent": true, "currentExpDate": "02/2024", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false},
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-11-08T09:18:39", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}]
假设该片段保存在一个名为 的文件中test.json
,则以下内容应该有效 -
>>> import pandas as pd
>>> df = pd.read_json('test.json')
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 29 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 accountNumber 3 non-null int64
1 customerId 3 non-null int64
2 creditLimit 3 non-null int64
3 availableMoney 3 non-null int64
4 transactionDateTime 3 non-null object
5 transactionAmount 3 non-null float64
6 merchantName 3 non-null object
7 acqCountry 3 non-null object
8 merchantCountryCode 3 non-null object
9 posEntryMode 3 non-null int64
10 posConditionCode 3 non-null int64
11 merchantCategoryCode 3 non-null object
12 currentExpDate 3 non-null object
13 accountOpenDate 3 non-null object
14 dateOfLastAddressChange 3 non-null object
15 cardCVV 3 non-null int64
16 enteredCVV 3 non-null int64
17 cardLast4Digits 3 non-null int64
18 transactionType 3 non-null object
19 echoBuffer 3 non-null object
20 currentBalance 3 non-null int64
21 merchantCity 3 non-null object
22 merchantState 3 non-null object
23 merchantZip 3 non-null object
24 cardPresent 3 non-null bool
25 posOnPremises 3 non-null object
26 recurringAuthInd 3 non-null object
27 expirationDateKeyInMatch 3 non-null bool
28 isFraud 3 non-null bool
dtypes: bool(3), float64(1), int64(10), object(15)
memory usage: 761.0+ bytes
>>>
编辑:跟进回应 -
存在如何编辑字典文件以使其成为有效 JSON 的问题。
假设文件被调用test.txt
并且看起来像原始格式的文件(每行都有字典,没有逗号分隔,开头和结尾没有括号)。
import json
with open('test.txt', 'r') as file:
data = file.readlines()
data = [json.loads(d) for d in data]
df = pd.DataFrame(data)
现在你有了你的DataFrame,之后一切都一样了......
推荐阅读
- prometheus - 如何在普罗米修斯中获得速率的分位数
- javascript - 带有循环的jQuery字体颜色动画
- php - 拖放 jQuery 和 Ajax 上传文件并一起发送 html 表单数据
- sql-server - MasterDataService 2017 插件未在 Office 365 Excel 应用程序(SQL Server 2017)中连接
- ios - 将两个 ViewController 容器约束在一起以填满屏幕
- android - ConstraintLayout 没有为在 RecyclerView 上单击的项目设置 NavController
- ios - 是否可以从非 iOS 系统(如 Ubuntu)创建证书以签署 Apple Wallet Passes?
- c++ - c++ 模板模式可以取代不断增长的联合吗?
- javascript - 编写正确的 If 语句的问题
- python - 如何从另一个继承自 sales.order 的类中获取字段值?