首页 > 解决方案 > 在 Pandas 中读取 CSV 并以所需格式获取输出

问题描述

我是新手pandas。我正在读取csv文件并尝试将输出作为dictionary.

import pandas as pd

df = pd.read_csv('source.csv')
my_projects = ['WORLD', 'P&G', 'AVR', 'ABCD', 'Channel', 'Migration']
filtered_projects = df[(df['area'] == 'MY PROJECTS') & (df['name'].isin(my_projects))]
filtered_projects['count'] = 1
total_of_each_error = filtered_projects.groupby(['month','name','errors']).sum().reset_index()
total_of_each_error['month'] = pd.to_datetime(total_of_each_error['month']).dt.strftime('%B')

我要计算的事情清单:['Big', 'Small', 'Monitoring', 'Improvement']

total_of_each_error数据框有:

    month        name       errors     count
0   February     ABCD        Big         1
1   February     ABCD      Monitoring    3
2   February     WORLD     Small         1
3   February     Channel    Big          2
4   February     Channel   Small         1
5   February     Channel  Monitoring     1
6   February     AVR      Monitoring     1
7   April       WORLD     Monitoring     2
8   May         Migration    Big         1
9   May         Migration Monitoring     2
10  June        P&G       Small          1
11  June        P&G       Monitoring     1
12  June        ABCD      Monitoring     1
13  June        WORLD    Improvement     1
14  July        P&G      Monitoring      1
15  July        ABCD         Small       1
16  July        ABCD     Monitoring      1

如果一个月没有特定错误,则应填写零。我想要得到的输出是这样的dictionary

data = {'WORLD': {'categories': ['February', 'April', 'May', 'June', 'July'],
                'series': [{
                    'name': 'Big Issue',
                    'data': [0, 0, 0, 0, 0]  # Number of Bigs in those months
                    }, {
                    'name': 'Small Issue',
                    'data': [1, 0, 0, 0, 0]  # Number of Smalls in those months
                    }, {
                    'name': 'Monitoring',
                    'data': [0, 2, 0, 0, 0]  # Number of Monitorings in those months
                    }, {
                    'name': 'Improvement',
                    'data': [0, 0, 0, 1, 0]  # Number of Improvements in those months
                    }]
                },
        'P&G': {'categories': ['February', 'April', 'May', 'June', 'July'],
                'series': [{
                    'name': 'Big Issue',
                    'data': [0, 0, 0, 0, 0]
                    }, {
                    'name': 'Small Issue',
                    'data': [0, 0, 0, 1, 0]
                    }, {
                    'name': 'Monitoring',
                    'data': [0, 2, 0, 0, 0]
                    }, {
                    'name': 'Improvement',
                    'data': [0, 0, 0, 1, 0]
                    }]
                }      

    }

上面显示的预期输出仅用于WORLDP&G仅用于。字典将与my_projects. 应保留月份和数据的顺序。

编辑:更改了错误的值name

标签: python-3.xpandas

解决方案


将您的数据框排序和修改为正确的格式(使用df.groupbyand .unstack())然后使用to_dict()您的数据框上的方法获得您想要的结果。下面的例子。

import numpy as np
import pandas as pd
df = pd.DataFrame(
    data ={'Month': ['Jan','Feb']*5,
           'Issue': ['Big Issue','Monitoring']*5,
           'value': np.arange(30,40)})

df.groupby(['Month','Issue']).count().unstack()
df.to_dict()

    df.to_dict()

推荐阅读