This article describes a few approaches and algorithms for temporal neighborhood discretization from a couple of papers accepted in the conference ICDM (2009) and in the journal IDA (2014), authored by me and my fellow professors Dr. Aryya Gangopadhyay and Dr. Vandana Janeja at UMBC. Almost all of the texts / figures in this blog post are taken from the papers.
Neighborhood discovery is a precursor to knowledge discovery in complex and large datasets such as Temporal data, which is a sequence of data tuples measured at successive time instances. Hence instead of mining the entire data, we are
interested in dividing the huge data into several smaller intervals of interest which we call as temporal neighborhoods. In this article we propose 4 novel of algorithms to generate temporal neighborhoods through unequal depth discretization:
- Similarity-based simple Merging (SMerg)
- Markov Stationary distribution based Merging (StMerg)
- Greedy Merging (GMerg)
- Optimal Merging with Dynamic Programming (OptMerg).
The next figure shows the schematic diagram of the four algorithms:
All the algorithms start by dividing a given time series data into a few (user-chosen) initial number of bins (with equal-frequency binning) and then they proceed merging similar bins (resulting a few possibly unequal-width bins) using different approaches, finally the algorithms terminate if we obtain the user-specified number of output (merged) bins .
Both the algorithms SMerg and STMerg start by setting a Markov Model with the initial bins as the states (each state being summarized using the corresponding bin’s mean and variance) and Transition Matrix computed with the similarity (computed using a few different distribution-similarity computation metrics) in between the bins, as shown in the following figure:
The following distribution-distance measures are used to compute the similarity in between the bins (we assume the temporal data in each of the bins as normally distributed with the bin mean and the variance).
We discuss the results for a one dimensional special case here, however, our approach is generalizable to multidimensional temporal data.
The Markov Transition Matrix is defined as shown in the following figure as a block-diagonal matrix with a given Temporal window size w.
The following figure shows the pre-processing step Init-bin for setting up the Markov Model
- Similarity based merging (SMerg)In this approach we iteratively merge highly similar bins (based on a threshold). At every iteration the transition matrix is recomputed since bin population has changed.
2. Markov Stationary distribution based Merging (StMerg)
Given the initial equal-depth bins and the transition matrix T, we compute the steady-state vector at the stationary distribution, using power-iteration method. Once we have this vector we need to detect the spikes in the vector which indicate the split points such that the data on either side of any split point is very dissimilar and the data within two split points is highly similar. In order to detect these spikes we use Discrete Fourier Transform (DFT) and Inverse Discrete Fourier Transform (IDFT).
The following figure shows how the steady state vector approximate the pattern in the original data for a few synthetically generated and public transportation datasets, thereby achieving data compression (since number of states in the steady state vector =
# output bins obtained after bin merging << # initial bins).
3. Greedy Approach (GMerg)
This algorithm starts with the initial bins and starts merging the most similar adjacent bins right away (does not require a Markov Model). It is greedy in the sense that it chooses the current most similar intervals for merging and hence may find a local optimum instead of obtaining a globally optimal merging.
4. Optimal merging (OPTMerg)
Starting with the initial equal depth intervals (bins), we introduce the notion of optimal merging by defining optimal partitioning, as described in the following figures:
Next we use dynamic programming to find optimal partition, as shown in the next figure:
The following figures show the optimal partition trees obtained on a few datasets:
Experimental Results with Synthetic and CAT Lab data
The following figures show the output bins obtained for few of the datasets using few of the algorithms and distance measures.
From the regions in the following figures marked (a, b) we can see that the weekend days are clearly different from the weekdays marked as (d, e). Further if we consider a very high level granularity such as the region (c) using StMerg with the KL distance we can entirely demarcate the weekends as compared to the weekdays (the global outliers can be detected, marked by a circle).
In the next figure, temporal interval (a) and (b) obtained after merging the intervals shows that there is an anomaly in the data from 05/12/2009 23 : 03 : 39 to 05/13/2009 00 03 39. We indeed find that speed value recorded by the censor is 20, constant all over the time period, and we verified from CATT lab representative that this is due to a possible ”free flow speed condition” where the sensor may go into a powersave mode if the traffic intensity is very minimal or close to zero.
We also have extended the existing well-known and popular technique SAX (Symbolic Aggregate Approximation) to SAXMerg (where we merge the adjacent bins represented by same symbol) as shown below.
We then compare our results with those obtained using SAXMerg, as shown in the following figures.
It can be seen from the merged output bins from the next figure, the OPTMerg with Mahalanobis distance measure is more robust to local outliers than the SAXMerg is.
Some results with precision and recall: