A Backcasting Approach for Anomaly Detection in Time Series Data

44th International Symposium on Forecasting, Dijon, France

Priyanga Dilini Talagala

July 1, 2024

Anomalies in Temporal Data

Dengue Outbreak

Major Health Problem in Sri Lanka.

Weekly Dengue Cases in Gampaha District, Sri Lanka

Data Source: https://denguedatahub.netlify.app/

Weekly Dengue Cases in Sri Lanka

Weekly Dengue Cases in Sri Lanka

Daily COVID-19 Confirmed Cases

Outbreak

  • An occurrence of a disease in a specific geographic area that is significantly higher than the established baselines.

  • This increase can be either sudden or gradual.

Outbreak

  • An occurrence of a disease in a specific geographic area that is significantly higher than the established baselines.

  • This increase can be either sudden or gradual.

What is an Anomaly ?

Outbreak

  • An occurrence of a disease in a specific geographic area that is significantly higher than the established baselines.

  • This increase can be either sudden or gradual.

What is an Anomaly ?

  • We define an anomaly as an observation that is very unlikely given the backcasted distribution.

Outbreak

  • An occurrence of a disease in a specific geographic area that is significantly higher than the established baselines.

  • This increase can be either sudden or gradual.

What is an Anomaly ?

  • We define an anomaly as an observation that is very unlikely given the backcasted distribution.

  • An anomaly is an observation that exhibits a significant deviation from the established typical behaviour.

Methodology

  • Backcasting is a planning method that starts with defining a desirable future and then works backwards to identify policies and programs that will connect that specified future to the present.

Methodology

  • Backcasting is a planning method that starts with defining a desirable future and then works backwards to identify policies and programs that will connect that specified future to the present.

  • This approach allows us to strategically assess how current or future observations fit into historical trends and influences.

Off-line Phase

  • Build a model of a system’s typical behaviour.

Off-line Phase

  • Build a model of a system’s typical behaviour.

  • The trend component is calculated using locally estimated scatterplot smoothing method

Off-line Phase

  • Build a model of a system’s typical behaviour.

  • The trend component is calculated using locally estimated scatterplot smoothing method

  • Outbreaks of new or re-emerging diseases, such as SARS, MERS, or COVID-19, may not initially show clear seasonal patterns.

Off-line Phase

  • Build a model of a system’s typical behaviour.

  • The trend component is calculated using locally estimated scatterplot smoothing method

  • Outbreaks of new or re-emerging diseases, such as SARS, MERS, or COVID-19, may not initially show clear seasonal patterns.

  • Their spread is often influenced by factors such as human behavior, travel, and public health interventions rather than environmental seasonality.

Off-line Phase

  • Use the Exponential Smoothing State Space model with low smoothing parameters for the level and slope, and a high dampening parameter for the slope, emphasizing recent observation influence in backcasting.

Build a model of a system’s typical behaviour.

Move the window one step ahead with each new data point

For each new data subset reinitialize the model state with new data without changing the estimated parameters.

Generate one-step backward projections using a refitted backcasting model.

Compare the backcasted values with the actual trend values.

Compare the backcasted values with the actual trend values.

Block Maxima Method for Anomalous Threshold Calculation

  • Select error data from the typical behaviour

Block Maxima Method for Anomalous Threshold Calculation

  • Select error data from the typical behaviour

  • Divide error data into blocks and extract block maxima and minima

Block Maxima Method for Anomalous Threshold Calculation

  • Select error data from the typical behaviour

  • Divide error data into blocks and extract block maxima and minima

  • Apply Generalized Extreme Value distribution to the block maxima and minima to model extreme error values

Block Maxima Method for Anomalous Threshold Calculation

  • Select error data from the typical behaviour

  • Divide error data into blocks and extract block maxima and minima

  • Apply Generalized Extreme Value distribution to the block maxima and minima to model extreme error values

  • Determine the 95th percentile (upper threshold) and 5th percentile (lower threshold) of the GEV distribution

What Next?

  • Determine the optimal rolling window size for capturing typical behavior patterns.

What Next?

  • Determine the optimal rolling window size for capturing typical behavior patterns.

  • Conduct further experiments with various weighted backcasting approaches beyond exponential smoothing.

What Next?

  • Determine the optimal rolling window size for capturing typical behavior patterns.

  • Conduct further experiments with various weighted backcasting approaches beyond exponential smoothing.

  • Extend the algorithm to handle multivariate data streams.

Thank you

This work was supported in part by the RETINA research lab, funded by the OWSD, a program unit of the United Nations Educational, Scientific, and Cultural Organization (UNESCO).

Slides available at: prital.netlify.app

Parameters in the Exponential Smoothing State Space Model

Low Smoothing Parameter for the Level

  • Controls how much weight is given to the most recent observations when updating the level component.

  • Effect: A low places more emphasis on recent observations, making the model more responsive to recent changes in the data. This is particularly useful for capturing short-term fluctuations and trends.

Parameters in the Exponential Smoothing State Space Model

Low Smoothing Parameter for the Slope

  • Determines how much weight is assigned to changes in the level over time.

  • Effect: A low vale means that changes in the trend (slope) component are primarily influenced by recent changes in the level.

  • This parameter helps adjust the slope to reflect recent trends while smoothing out noise.

Parameters in the Exponential Smoothing State Space Model

High Damping Parameter for the Slope

  • Controls the rate at which the trend (slope) component reverts to a long-term mean.

  • Effect: A high value indicates strong damping, causing the slope to revert quickly to its long-term average. This helps stabilize the trend component against short-term fluctuations, providing a smoother forecast.