machine-learning - 以多个变量作为输入的 Forecasting.ForecastBySsa
问题描述
我有这个代码来预测时间序列。我想根据价格的时间序列和相关指标进行预测。
因此,与要预测的值一起,我想传递一个边值,但我不明白是否考虑到这一点,因为无论有没有它,预测都不会改变。我需要以哪种方式告诉算法如何考虑这些参数?
public static TimeSeriesForecast PerformTimeSeriesProductForecasting(List<TimeSeriesData> listToForecast)
{
var mlContext = new MLContext(seed: 1); //Seed set to any number so you have a deterministic environment
var productModelPath = $"product_month_timeSeriesSSA.zip";
if (File.Exists(productModelPath))
{
File.Delete(productModelPath);
}
IDataView productDataView = mlContext.Data.LoadFromEnumerable<TimeSeriesData>(listToForecast);
var singleProductDataSeries = mlContext.Data.CreateEnumerable<TimeSeriesData>(productDataView, false).OrderBy(p => p.Date);
TimeSeriesData lastMonthProductData = singleProductDataSeries.Last();
const int numSeriesDataPoints = 2500; //The underlying data has a total of 34 months worth of data for each product
// Create and add the forecast estimator to the pipeline.
IEstimator<ITransformer> forecastEstimator = mlContext.Forecasting.ForecastBySsa(
outputColumnName: nameof(TimeSeriesForecast.NextClose),
inputColumnName: nameof(TimeSeriesData.Close), // This is the column being forecasted.
windowSize: 22, // Window size is set to the time period represented in the product data cycle; our product cycle is based on 12 months, so this is set to a factor of 12, e.g. 3.
seriesLength: numSeriesDataPoints, // This parameter specifies the number of data points that are used when performing a forecast.
trainSize: numSeriesDataPoints, // This parameter specifies the total number of data points in the input time series, starting from the beginning.
horizon: 5, // Indicates the number of values to forecast; 2 indicates that the next 2 months of product units will be forecasted.
confidenceLevel: 0.98f, // Indicates the likelihood the real observed value will fall within the specified interval bounds.
confidenceLowerBoundColumn: nameof(TimeSeriesForecast.ConfidenceLowerBound), //This is the name of the column that will be used to store the lower interval bound for each forecasted value.
confidenceUpperBoundColumn: nameof(TimeSeriesForecast.ConfidenceUpperBound)); //This is the name of the column that will be used to store the upper interval bound for each forecasted value.
// Fit the forecasting model to the specified product's data series.
ITransformer forecastTransformer = forecastEstimator.Fit(productDataView);
// Create the forecast engine used for creating predictions.
TimeSeriesPredictionEngine<TimeSeriesData, TimeSeriesForecast> forecastEngine = forecastTransformer.CreateTimeSeriesEngine<TimeSeriesData, TimeSeriesForecast>(mlContext);
// Save the forecasting model so that it can be loaded within an end-user app.
forecastEngine.CheckPoint(mlContext, productModelPath);
ITransformer forecaster;
using (var file = File.OpenRead(productModelPath))
{
forecaster = mlContext.Model.Load(file, out DataViewSchema schema);
}
// We must create a new prediction engine from the persisted model.
TimeSeriesPredictionEngine<TimeSeriesData, TimeSeriesForecast> forecastEngine2 = forecaster.CreateTimeSeriesEngine<TimeSeriesData, TimeSeriesForecast>(mlContext);
// Get the prediction; this will include the forecasted product units sold for the next 2 months since this the time period specified in the `horizon` parameter when the forecast estimator was originally created.
prediction = forecastEngine.Predict();
return prediction;
}
TimeSeriesData
具有多个属性,不仅是我要预测的系列的值。只是想知道在预测时是否考虑到它们。有没有更好的方法来预测这种类型的序列,比如 LMST?这种方法在 ML.NET 中可用吗?
解决方案
有一张新的增强票:Multivariate Time based series forecasting to ML.Net
见票:github.com/dotnet/machinelearning/issues/5638