python - 如何通过数据类型转换(十进制)将 pandas 数据框转换为自定义嵌套字典列表
问题描述
我有以下df
df = pd.DataFrame({'sku_id': {0: 4901301230676,
1: 4901301230638,
2: 8886451000583,
3: 8710428021304,
4: 37338022},
'stock': {0: 5, 1: 12, 2: 9, 3: 14, 4: 332},
'price': {0: 639.0, 1: 639.0, 2: 849.0, 3: 849.0, 4: 128.0},
'special_price': {0: '599', 1: '599', 2: '849', 3: '849', 4: 128.0}})
和自定义值
timestamp = 1580515200
我怎样才能将其转换为嵌套字典,如下所示
{ 'items' : [{'sku': '4901301230676',
'records': [{'timestamp': 1580515200,
'inv': 5,
'price': Decimal('639.0'),
'special_price': Decimal('599'),
'sales_added': False}]},
{'sku': '4901301230638',
'records': [{'timestamp': 1580515200,
'inv': 12,
'price': Decimal('639.0'),
'special_price': Decimal('599'),
'sales_added': False}]},
{'sku': '8886451000583',
'records': [{'timestamp': 1580515200,
'inv': 9,
'price': Decimal('849.0'),
'special_price': Decimal('849'),
'sales_added': False}]},
{'sku': '8710428021304',
'records': [{'timestamp': 1580515200,
'inv': 14,
'price': Decimal('849.0'),
'special_price': Decimal('849'),
'sales_added': False}]},
{'sku': '37338022',
'records': [{'timestamp': 1580515200,
'inv': 332,
'price': Decimal('128.0'),
'special_price': Decimal('128.0'),
'sales_added': False}]}]}
我努力了
from decimal import Decimal
nestedDict = {'items':[]}
for item in df.itertuples():
output = {"sku":str(item.sku_id),
'records': [
{
'timestamp':timestamp,
'inv':int(item.stock),
'price':Decimal(str(item.price)),
'special_price':Decimal(str(item.special_price)),
'sales_added': False
}
]
}
nestedDict['items'].append(output)
但这真的很慢,因为它使用了 itertuples 和 python for 循环。通过矢量化或其他方式执行此操作是否更好。
我希望处理大约 2000 万行
解决方案
这是我的尝试,让我知道它是否足够快:
In [3]: df = pd.DataFrame({'sku_id': {0: 4901301230676,
...: 1: 4901301230638,
...: 2: 8886451000583,
...: 3: 8710428021304,
...: 4: 37338022},
...: 'stock': {0: 5, 1: 12, 2: 9, 3: 14, 4: 332},
...: 'price': {0: 639.0, 1: 639.0, 2: 849.0, 3: 849.0, 4: 128.0},
...: 'special_price': {0: '599', 1: '599', 2: '849', 3: '849', 4: 128.0}})
...:
...: timestamp = 1580515200
...:
...: df["timestamp"] = timestamp
...: df["sales_added"] = False
...:
...: df.rename(columns={"stock": "inv"}, inplace=True)
...:
...: sku_id = df["sku_id"]
...:
...: df = df[["timestamp", "inv", "price", "special_price", "sales_added"]]
...:
...: {"items": pd.DataFrame({"sku": sku_id, "records": df.to_dict(orient="rec
...: ords")}).to_dict(orient="records")}
Out[3]:
{'items': [{'sku': 4901301230676,
'records': {'timestamp': 1580515200,
'inv': 5,
'price': 639.0,
'special_price': '599',
'sales_added': False}},
{'sku': 4901301230638,
'records': {'timestamp': 1580515200,
'inv': 12,
'price': 639.0,
'special_price': '599',
'sales_added': False}},
{'sku': 8886451000583,
'records': {'timestamp': 1580515200,
'inv': 9,
'price': 849.0,
'special_price': '849',
'sales_added': False}},
{'sku': 8710428021304,
'records': {'timestamp': 1580515200,
'inv': 14,
'price': 849.0,
'special_price': '849',
'sales_added': False}},
{'sku': 37338022,
'records': {'timestamp': 1580515200,
'inv': 332,
'price': 128.0,
'special_price': 128.0,
'sales_added': False}}]}
推荐阅读
- c# - AspNetCore 中缺少 Range 标头
- swiftui - DragGesture 没有更新
- php - 错误:在此服务器上找不到请求的 URL /login。(拉拉维尔)
- django - Django 如何在会话中将 JSON 数据保存在视图之外并在视图内检索保存的数据?
- kubernetes - 获取特定的 Kubernetes pod
- monaco-editor - 我应该如何在摩纳哥编辑器中为令牌添加悬停
- python - 如何进行条件 JSON 模式验证
- vue.js - 在动态加载的组件中获取自我
- android - DAO 注入问题 - 在 Hilt Android 中找不到 RoomDatabase
- azure - Azure Budget ARM 模板部署未经授权