首页 > 解决方案 > Machine learning model with varying input shape as time changes

问题描述

I am trying to predict the bookings of a stand-up comedian cafe. There are a lot of features I can use which have an affect on the number of sales. (e.g. day of the year, weather, average sales last month, day of the week, average sales on the specific day of the week etc.)

However, one of the features that most correlates with the actual number of sales is the number of tickets already sold before the deadline. The customers are able to start making reservations 120hours (5 days) before the actual deadline of ordering (11:00 AM on the same day of the show).

I would prefer to use this data as input for my machine learning algorithm. Currently I created 120 columns in the dataframe. The columns define 120 hours before deadline untill the deadline itself. Column "hour_98" therefore shows the accumulated sales 4 days before the deadline. Column "hour_24" shows the accumulated sales 24 hours before deadline etc.

If I now would like to predict the sales 24 hours before deadline the columns "hour_24" until "hour_0" are all given "NaN" values. Since algorithms can't deal with NaN values I currently give these columns a value of 0. However, I tihnk this is too simplistic and will result in bad prediction model.

How do we deal with a changing input shape since we obtain more data if we get closer to the deadline of ordering?

标签: machine-learningdeep-learning

解决方案


现在据我了解,您有固定数量的列,每列代表截止日期前一个预定义小时的数据。所以从某种意义上说,输入数据的形状永远不会改变,只有一些输入特征的有效性会改变。

如果您有一个固定的输入形状,并且特征的有效性(NaN)不断变化,您可以通过为每个输入特征使用掩码来解决该问题。

例如,有效的hour_24可以表示为hour_24 = 20and mask_24 = 1,无效的hour_24可以表示为hour_24 = 0(或其他) and mask_24 = 0

算法本身需要学习相对于相关特征的掩码忽略给定特征的位置。

这个答案更详细地解释了如何屏蔽输入。


推荐阅读