A Quick Note on Continuous Analysis
Analyzing data on the fly is tricky: Data sets are unbounded and real-time responses demand fast analysis. Incremental algorithms can be used for statistical analysis, set membership, regression-based learning, and training and prediction of learning algorithms, amongst others. In specific use cases, domain-specific algorithms also apply.
Measures like the mean, variance, moments, and correlation coefficients are easy to compute incrementally, but recursive algorithms – for example for quantiles - are not so simple. Statistical measures describe both properties of the data and are also estimates for model parameters in the context of hypothesis testing. For example, the mean is an estimator for the expected value of the distribution. Statistical moments of higher order provide information about the variance, the skewness, and the kurtosis of a distribution.
Non-stationarity is important for many practical use cases such as predictive maintenance. Effectively the parameters of the sampled distribution may change over time, requiring change detection using a χ2- or t-test. Change detection is also important to derive optimal sampling strategies, such as the size of a time window for parameter estimations.
Swim can deal with many of these incremental functions, and we continue to add to it. But there is a treasure trove of open source tools for example in Apache Flink that you can plug into Swim. We’ll take care of the hard parts, like distributed coherence…