首页 > 解决方案 > 不规则地将数据导出到具有 NaN 值的 csv 文件

问题描述

我在csv文件中有这些数据:

  Date/Time      kWh  kVArh        kVA      PF
0    2021-01-01 00:30:00   471.84   0.00   943.6800  1.0000
1    2021-01-01 01:00:00   491.04   1.44   982.0842  1.0000
2    2021-01-01 01:30:00   475.20   0.00   950.4000  1.0000
3    2021-01-01 02:00:00   470.88   0.00   941.7600  1.0000
4    2021-01-01 02:30:00   466.56   0.00   933.1200  1.0000
...                  ...      ...    ...        ...     ...
9223 2021-07-14 04:00:00  1104.00  53.28  2210.5698  0.9988
9224 2021-07-14 04:30:00  1156.30  49.92  2314.7542  0.9991
9225 2021-07-14 05:00:00  1176.00  37.92  2353.2224  0.9995
9226 2021-07-14 05:30:00  1177.00  27.36  2354.6359  0.9997
9227 2021-07-14 06:00:00  1196.60  22.56  2393.6253  0.9998

在计算每小时的平均值之后,我使用此代码读取它,然后将其导出到 csv 文件。

import pandas as pd
file = pd.read_csv('Electricity_data.csv', 
                    sep = ',', 
                    skiprows = 0,
                    dayfirst = True,
                    parse_dates = ['Date/Time'])

pd_mean = file.groupby(pd.Grouper(key = 'Date/Time', freq = 'H')).mean().reset_index()

pd_mean.to_csv("data_1h_year_.csv")

但是,当我运行它时,我的最终文件有一个间隙。

代码启动前的数据(日期:03/01/2021):

Date/Time     kWh  kVArh        kVA   PF
90  2021-02-01 21:30:00  496.83   0.00   993.6600  1.0
91  2021-02-01 22:00:00  486.72   0.00   973.4400  1.0
92  2021-02-01 22:30:00  490.08   0.00   980.1600  1.0
93  2021-02-01 23:00:00  503.00   1.92  1006.0073  1.0
94  2021-02-01 23:30:00  484.84   0.00   969.6800  1.0
95  2021-03-01 00:00:00  484.80   0.00   969.6000  1.0
96  2021-03-01 00:30:00  487.68   0.00   975.3600  1.0
97  2021-03-01 01:00:00  508.30   1.44  1016.6041  1.0
98  2021-03-01 01:30:00  488.66   0.00   977.3200  1.0
99  2021-03-01 02:00:00  486.24   0.00   972.4800  1.0
100 2021-03-01 02:30:00  495.36   1.44   990.7242  1.0
101 2021-03-01 03:00:00  484.32   0.00   968.6400  1.0
102 2021-03-01 03:30:00  485.76   0.00   971.5200  1.0
103 2021-03-01 04:00:00  492.48   1.44   984.9642  1.0
104 2021-03-01 04:30:00  476.16   0.00   952.3200  1.0
105 2021-03-01 05:00:00  477.12   0.00   954.2400  1.0

代码启动后的数据(日期:03/01/2021):

Date/Time       kWh   kVArh         kVA       PF
45 2021-01-02 21:00:00  1658.650  292.32  3368.45000  0.98485
46 2021-01-02 22:00:00  1622.150  291.60  3296.34415  0.98420
47 2021-01-02 23:00:00  1619.300  261.36  3280.52380  0.98720
48 2021-01-03 00:00:00       NaN     NaN         NaN      NaN
49 2021-01-03 01:00:00       NaN     NaN         NaN      NaN
50 2021-01-03 02:00:00       NaN     NaN         NaN      NaN
51 2021-01-03 03:00:00       NaN     NaN         NaN      NaN
52 2021-01-03 04:00:00       NaN     NaN         NaN      NaN
53 2021-01-03 05:00:00       NaN     NaN         NaN      NaN
54 2021-01-03 06:00:00  1202.400  158.40  2425.57730  0.99140
55 2021-01-03 07:00:00  1209.375  168.00  2441.98105  0.99050
56 2021-01-03 08:00:00  1260.950  162.72  2542.89820  0.99175
57 2021-01-03 09:00:00  1308.975  195.60  2647.07935  0.98900
58 2021-01-03 10:00:00  1334.150  193.20  2696.17005  0.98965

我不知道为什么会这样,但它没有计算平均值,我得到了 NaN 在最终的 csv 文件周围形成间隙。

标签: pythonpandasnanmean

解决方案


Pandas 无法正确解释您的日期。自己指定格式。

使用以下代码解决您的问题:

parser = lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M')

df = pd.read_csv('data.csv', sep=',', skiprows=0,
                 parse_dates=['Date/Time'], date_parser=parser)
pd_mean = df.groupby(pd.Grouper(key='Date/Time', freq='H')).mean()

在操作前检查您的日期:

93  2021-02-01 23:00:00  # February, 1st
94  2021-02-01 23:30:00  # February, 1st
95  2021-03-01 00:00:00  # March, 1st
96  2021-03-01 00:30:00  # March, 1st

推荐阅读