首页 > 解决方案 > 如何通过数据类型转换(十进制)将 pandas 数据框转换为自定义嵌套字典列表

问题描述

我有以下df

df = pd.DataFrame({'sku_id': {0: 4901301230676,
  1: 4901301230638,
  2: 8886451000583,
  3: 8710428021304,
  4: 37338022},
 'stock': {0: 5, 1: 12, 2: 9, 3: 14, 4: 332},
 'price': {0: 639.0, 1: 639.0, 2: 849.0, 3: 849.0, 4: 128.0},
 'special_price': {0: '599', 1: '599', 2: '849', 3: '849', 4: 128.0}})

和自定义值

timestamp = 1580515200

我怎样才能将其转换为嵌套字典,如下所示

{ 'items' : [{'sku': '4901301230676',
  'records': [{'timestamp': 1580515200,
    'inv': 5,
    'price': Decimal('639.0'),
    'special_price': Decimal('599'),
    'sales_added': False}]},
 {'sku': '4901301230638',
  'records': [{'timestamp': 1580515200,
    'inv': 12,
    'price': Decimal('639.0'),
    'special_price': Decimal('599'),
    'sales_added': False}]},
 {'sku': '8886451000583',
  'records': [{'timestamp': 1580515200,
    'inv': 9,
    'price': Decimal('849.0'),
    'special_price': Decimal('849'),
    'sales_added': False}]},
 {'sku': '8710428021304',
  'records': [{'timestamp': 1580515200,
    'inv': 14,
    'price': Decimal('849.0'),
    'special_price': Decimal('849'),
    'sales_added': False}]},
 {'sku': '37338022',
  'records': [{'timestamp': 1580515200,
    'inv': 332,
    'price': Decimal('128.0'),
    'special_price': Decimal('128.0'),
    'sales_added': False}]}]}

我努力了

from decimal import Decimal
nestedDict = {'items':[]}
for item in df.itertuples():
        output = {"sku":str(item.sku_id),
                   'records': [
                       {
                           'timestamp':timestamp,
                           'inv':int(item.stock),
                           'price':Decimal(str(item.price)),
                           'special_price':Decimal(str(item.special_price)),
                           'sales_added': False
                       }
                   ]
                  }
        nestedDict['items'].append(output)

但这真的很慢,因为它使用了 itertuples 和 python for 循环。通过矢量化或其他方式执行此操作是否更好。

我希望处理大约 2000 万行

标签: pythonpandas

解决方案


这是我的尝试,让我知道它是否足够快:

In [3]: df = pd.DataFrame({'sku_id': {0: 4901301230676, 
   ...:   1: 4901301230638, 
   ...:   2: 8886451000583, 
   ...:   3: 8710428021304, 
   ...:   4: 37338022}, 
   ...:  'stock': {0: 5, 1: 12, 2: 9, 3: 14, 4: 332}, 
   ...:  'price': {0: 639.0, 1: 639.0, 2: 849.0, 3: 849.0, 4: 128.0}, 
   ...:  'special_price': {0: '599', 1: '599', 2: '849', 3: '849', 4: 128.0}}) 
   ...:  
   ...: timestamp = 1580515200 
   ...:  
   ...: df["timestamp"] = timestamp 
   ...: df["sales_added"] = False 
   ...:  
   ...: df.rename(columns={"stock": "inv"}, inplace=True) 
   ...:  
   ...: sku_id = df["sku_id"] 
   ...:  
   ...: df = df[["timestamp", "inv", "price", "special_price", "sales_added"]] 
   ...:  
   ...: {"items": pd.DataFrame({"sku": sku_id, "records": df.to_dict(orient="rec
   ...: ords")}).to_dict(orient="records")}     


Out[3]: 
{'items': [{'sku': 4901301230676,
   'records': {'timestamp': 1580515200,
    'inv': 5,
    'price': 639.0,
    'special_price': '599',
    'sales_added': False}},
  {'sku': 4901301230638,
   'records': {'timestamp': 1580515200,
    'inv': 12,
    'price': 639.0,
    'special_price': '599',
    'sales_added': False}},
  {'sku': 8886451000583,
   'records': {'timestamp': 1580515200,
    'inv': 9,
    'price': 849.0,
    'special_price': '849',
    'sales_added': False}},
  {'sku': 8710428021304,
   'records': {'timestamp': 1580515200,
    'inv': 14,
    'price': 849.0,
    'special_price': '849',
    'sales_added': False}},
  {'sku': 37338022,
   'records': {'timestamp': 1580515200,
    'inv': 332,
    'price': 128.0,
    'special_price': 128.0,
    'sales_added': False}}]}

推荐阅读