machine-learning - Machine learning model with varying input shape as time changes
问题描述
I am trying to predict the bookings of a stand-up comedian cafe. There are a lot of features I can use which have an affect on the number of sales. (e.g. day of the year, weather, average sales last month, day of the week, average sales on the specific day of the week etc.)
However, one of the features that most correlates with the actual number of sales is the number of tickets already sold before the deadline. The customers are able to start making reservations 120hours (5 days) before the actual deadline of ordering (11:00 AM on the same day of the show).
I would prefer to use this data as input for my machine learning algorithm. Currently I created 120 columns in the dataframe. The columns define 120 hours before deadline untill the deadline itself. Column "hour_98" therefore shows the accumulated sales 4 days before the deadline. Column "hour_24" shows the accumulated sales 24 hours before deadline etc.
If I now would like to predict the sales 24 hours before deadline the columns "hour_24" until "hour_0" are all given "NaN" values. Since algorithms can't deal with NaN values I currently give these columns a value of 0. However, I tihnk this is too simplistic and will result in bad prediction model.
How do we deal with a changing input shape since we obtain more data if we get closer to the deadline of ordering?
解决方案
现在据我了解,您有固定数量的列,每列代表截止日期前一个预定义小时的数据。所以从某种意义上说,输入数据的形状永远不会改变,只有一些输入特征的有效性会改变。
如果您有一个固定的输入形状,并且特征的有效性(NaN)不断变化,您可以通过为每个输入特征使用掩码来解决该问题。
例如,有效的hour_24
可以表示为hour_24 = 20
and mask_24 = 1
,无效的hour_24
可以表示为hour_24 = 0
(或其他) and
mask_24 = 0
。
算法本身需要学习相对于相关特征的掩码忽略给定特征的位置。
这个答案更详细地解释了如何屏蔽输入。
推荐阅读
- c# - 删除 ASP.NET Core 中的 http 客户端日志记录处理程序
- javascript - 将 url 分配给 Promise 内的 img 源后,不会触发加载事件侦听器
- javascript - 如何处理从 javascript(react-native)到 PHP 的 fetch 调用中的对象数组?
- php - 为什么我的 $_SESSION 会在文件之间丢失多维数组数据?
- javascript - 使用不同的参数调用函数两次
- fiware - 我怎么能用 Fiware-Perseo-fe 做下面的例子
- java - DJI Lightbridge 2 的视频源损坏
- android - 用于移动设备调试的 http 标头捕获
- c - 数字的总和 (c)
- magento - 启用rabbitmq-management后无法启动rabbitmq-server