Given a table with at least two columns, one of dates/timestamps and one of delimited values, this algorithm detects bursts in each value. A burst is a period of increased activity, determined by minimizing a cost function that assumes a set of possible states (not bursting and various degrees of burstiness) with increasing event frequencies, where it is expensive to go up a level and cheap (freezero-cost) to decrease a level. Given a table with at least three columns, a Text Column (event or topics to be targeted), a dates/timestamps (time the event happens) and a delimited value (to separate multiple events / topics), this algorithm detects bursts of each event / topics.
The algorithm takes 8 parameters.
- Gamma is the value that state transition costs are proportional to. General Ratio The higher Gamma value results the higher transition costs. Use this parameter to control how ease the automaton can change states.
- *Density scaling *determines how much 'more bursty' each level is beyond the previous one. First Ratio determines how much 'more bursty' the first bursting state is beyond the non-bursting stateThe higher the scaling value, the more active (bursty) the event happens in each level.
- Bursting States determines how many bursting states there will be, beyond the non-bursting state. An i value of bursting states is equals to i+1 automaton states.
- Date Column is the name of the column with date/time values in it for all the value-events in that columnwhen the events / topics happens.
- Date Format specifies how the date column will be interpreted as a date/time. See http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html for details.
- Text Column is the name of the column with delimited values (delimiter and tokens) to be bursted oncomputed for bursting results.
- Text Separator delimits the values tokens in the text column. When constructing your tables, do not use a separator that is used as a whole or part of any valuetoken.
Usually you will only need to change Date Column, Date Format, Text Column, and Text Separator when using Burst Detection. Please see 'Usage Hints'
Pros & Cons
Because of the by-value state machine approach, values are bursted on independently of each other. This makes this algorithm suitable primarily when the changes in patterns of individual value usage are the area of interest. Cross-value comparisons of bursts are possible, because burst 'strength' is calculated.