首页 > 解决方案 > 如何提取与特定列中日期相同的值?(在蟒蛇中)

问题描述

考虑以下字典中的数据框。2 列是“日期时间”、“日期_at_which_value_is_needed”。我想创建一个新列,其中包含 datetime 列的值作为列表/系列,其日期与“date_at_which_value_is_needed”列中的值相同。有没有办法在没有循环的情况下做到这一点?

  {'datetime': {667: Timestamp('2019-11-08 10:00:00+0000', tz='UTC'),
      673: Timestamp('2019-11-08 16:00:00+0000', tz='UTC'),
      679: Timestamp('2019-11-08 22:00:00+0000', tz='UTC'),
      685: Timestamp('2019-11-09 04:00:00+0000', tz='UTC'),
      691: Timestamp('2019-11-11 10:00:00+0000', tz='UTC'),
      697: Timestamp('2019-11-11 16:00:00+0000', tz='UTC'),
      703: Timestamp('2019-11-11 22:00:00+0000', tz='UTC'),
      709: Timestamp('2019-11-12 04:00:00+0000', tz='UTC'),
      715: Timestamp('2019-11-12 10:00:00+0000', tz='UTC'),
      721: Timestamp('2019-11-12 16:00:00+0000', tz='UTC'),
      727: Timestamp('2019-11-12 22:00:00+0000', tz='UTC'),
      733: Timestamp('2019-11-13 04:00:00+0000', tz='UTC'),
      739: Timestamp('2019-11-13 10:00:00+0000', tz='UTC'),
      745: Timestamp('2019-11-13 16:00:00+0000', tz='UTC'),
      751: Timestamp('2019-11-13 22:00:00+0000', tz='UTC'),
      757: Timestamp('2019-11-14 04:00:00+0000', tz='UTC'),
      763: Timestamp('2019-11-14 10:00:00+0000', tz='UTC'),
      769: Timestamp('2019-11-14 16:00:00+0000', tz='UTC'),
      775: Timestamp('2019-11-14 22:00:00+0000', tz='UTC'),
      780: Timestamp('2019-11-15 04:00:00+0000', tz='UTC')},
     'date_at_which_value_is_needed': {667: Timestamp('2019-11-05 00:00:00+0000', tz='UTC'),
      673: Timestamp('2019-11-05 00:00:00+0000', tz='UTC'),
      679: Timestamp('2019-11-05 00:00:00+0000', tz='UTC'),
      685: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      691: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      697: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      703: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      709: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      715: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      721: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      727: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      733: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      739: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      745: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      751: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      757: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      763: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      769: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      775: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      780: Timestamp('2019-11-12 00:00:00+0000', tz='UTC')},
     'c': {667: 64.6475,
      673: 65.005,
      679: 65.0075,
      685: 65.0075,
      691: 65.0225,
      697: 65.5875,
      703: 65.6,
      709: 65.5625,
      715: 65.355,
      721: 65.475,
      727: 65.425,
      733: 65.0375,
      739: 65.9017,
      745: 66.1875,
      751: 66.15,
      757: 66.075,
      763: 65.695,
      769: 65.625,
      775: 65.66,
      780: 65.9525}}

例如,对于最后一行(索引 780),新列将包含列表:

[Timestamp('2019-11-12 04:00:00+0000', tz='UTC'), Timestamp('2019-11-12 10:00:00+0000', tz='UTC'), Timestamp('2019-11-12 16:00:00+0000', tz='UTC'), Timestamp('2019-11-12 22:00:00+0000', tz='UTC')]

标签: pythonpandas

解决方案


尝试这个:

import pandas as pd
from pandas import Timestamp

data = {'datetime': {667: Timestamp('2019-11-08 10:00:00+0000', tz='UTC'),
      673: Timestamp('2019-11-08 16:00:00+0000', tz='UTC'),
      679: Timestamp('2019-11-08 22:00:00+0000', tz='UTC'),
      685: Timestamp('2019-11-09 04:00:00+0000', tz='UTC'),
      691: Timestamp('2019-11-11 10:00:00+0000', tz='UTC'),
      697: Timestamp('2019-11-11 16:00:00+0000', tz='UTC'),
      703: Timestamp('2019-11-11 22:00:00+0000', tz='UTC'),
      709: Timestamp('2019-11-12 04:00:00+0000', tz='UTC'),
      715: Timestamp('2019-11-12 10:00:00+0000', tz='UTC'),
      721: Timestamp('2019-11-12 16:00:00+0000', tz='UTC'),
      727: Timestamp('2019-11-12 22:00:00+0000', tz='UTC'),
      733: Timestamp('2019-11-13 04:00:00+0000', tz='UTC'),
      739: Timestamp('2019-11-13 10:00:00+0000', tz='UTC'),
      745: Timestamp('2019-11-13 16:00:00+0000', tz='UTC'),
      751: Timestamp('2019-11-13 22:00:00+0000', tz='UTC'),
      757: Timestamp('2019-11-14 04:00:00+0000', tz='UTC'),
      763: Timestamp('2019-11-14 10:00:00+0000', tz='UTC'),
      769: Timestamp('2019-11-14 16:00:00+0000', tz='UTC'),
      775: Timestamp('2019-11-14 22:00:00+0000', tz='UTC'),
      780: Timestamp('2019-11-15 04:00:00+0000', tz='UTC')},
     'date_at_which_value_is_needed': {667: Timestamp('2019-11-05 00:00:00+0000', tz='UTC'),
      673: Timestamp('2019-11-05 00:00:00+0000', tz='UTC'),
      679: Timestamp('2019-11-05 00:00:00+0000', tz='UTC'),
      685: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      691: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      697: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      703: Timestamp('2019-11-06 00:00:00+0000', tz='UTC'),
      709: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      715: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      721: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      727: Timestamp('2019-11-07 00:00:00+0000', tz='UTC'),
      733: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      739: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      745: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      751: Timestamp('2019-11-08 00:00:00+0000', tz='UTC'),
      757: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      763: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      769: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      775: Timestamp('2019-11-11 00:00:00+0000', tz='UTC'),
      780: Timestamp('2019-11-12 00:00:00+0000', tz='UTC')},
     'c': {667: 64.6475,
      673: 65.005,
      679: 65.0075,
      685: 65.0075,
      691: 65.0225,
      697: 65.5875,
      703: 65.6,
      709: 65.5625,
      715: 65.355,
      721: 65.475,
      727: 65.425,
      733: 65.0375,
      739: 65.9017,
      745: 66.1875,
      751: 66.15,
      757: 66.075,
      763: 65.695,
      769: 65.625,
      775: 65.66,
      780: 65.9525}}

# Converting the dictionaries into a dataframe    
datesDf = pd.DataFrame.from_dict(data)
# Selecting the date part of the datetime column
datesDf['date'] = datesDf['datetime'].apply(lambda x: x.date())
datesDf['date_needed'] = datesDf['date_at_which_value_is_needed'].apply(lambda x: x.date())

# Creating a new dataframe grouping dates by datetime
datesGrouped = datesDf.groupby('date')['datetime'].apply(list).to_frame()

# Joining original dataframe with new one after the grouping
result = datesDf.merge(datesGrouped, how='left', left_on='date_needed', right_on='date')

# Formating the result
result = result.drop(['date', 'date_needed'], axis = 1).rename(columns={"datetime_x": "datetime", "datetime_y": "datetime_col"})

推荐阅读