A burst is a period of increased activity, determined by minimizing a cost function that assumes a set of possible states (not bursting and various degrees of burstiness) with increasing event frequencies, where it is expensive to go up a level and cheap (zero-cost) to decrease a level. Given a table with at least three columns, a Text Column (event or topics to be targeted), a dates/timestamps (time the event happens) and a delimited value (to separate multiple events / topics), this algorithm detects bursts of each event / topics.
The algorithm takes 8 parameters.
Usually you will only need to change Date Column, Date Format, Text Column, and Text Separator when using Burst Detection. Please see 'Usage Hints'
Because of the by-value state machine approach, values are bursted on independently of each other. This makes this algorithm suitable primarily when the changes in patterns of individual value usage are the area of interest. Cross-value comparisons of bursts are possible, because burst 'strength' is calculated.
Burst detection is particularly useful for examining the trends in collections of texts or communities of conversation. Even words that are used comparatively little, but that change in frequency of usage over time, stand out, unlike in burst detection algorithms based on thresholds.
This algorithm provides all options for the original C program that had any effect.
Note that the values in the Text Column must be a list of textual tokens. You can use Lowercase, Tokenize, Stem, and Stopword Text to normalize free-form text into this shape.
The defaults are typically good choices, but more sophisticated models can be fitted by tweaking them in various ways.
J. Kleinberg. Bursty and Hierarchical Structure in Streams. Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002.