This burst detection algorithm is implemented based on the Jon Kleinberg's, Bursty and Heirarchical Hierarchical Structure in StreamStreams. A burst is a period of increased activity, determined by minimizing a cost function that assumes a set of possible states (not bursting and various degrees of burstiness) with increasing event frequencies, where it is expensive (costly) to go up a level and cheap (zero-cost) to decrease a level. It is useful for text stream analysis (such as emails, corpus, publication) where you want to know the activity of the stream in a period of time.
- Gamma is the value that state transition costs are proportional to. The higher Gamma value results the higher transition costs. Use this parameter to control how ease the automaton can change states.
- Density scaling determines how much 'more bursty' each level is beyond the previous one. The higher the scaling value, the more active (bursty) the event happens in each level.
- Bursting States determines how many bursting states there will be, beyond the non-bursting state. An i value of bursting states is equals to i+1 automaton states.
- Date Column is the name of the column with date/time when the events / topics happens.
- Date Format specifies how the date column will be interpreted as a date/time. See http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html for details.
- Batch Burst Length Unit specifies how to divide the date range into burstable units.
- Batch Burst Length specifies the number of burstable units per burstable period. For example, 10 years generates bursts by decade.
- Text Column is the name of the column with values (delimiter and tokens) to be computed for bursting results.
- Text Separator delimits the tokens in the text column. When constructing your tables, do not use a separator that is used as a whole or part of any token.
J. Kleinberg. Bursty and Hierarchical Structure in Streams. Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002.