Child pages
  • Burst Detection

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Gamma is the value that state transition costs are proportional to. The higher Gamma value results the higher transition costs. Use this parameter to control how ease the automaton can change states.
  • Density scaling determines how much 'more bursty' each level is beyond the previous one. The higher the scaling value, the more active (bursty) the event happens in each level.
  • Bursting States determines how many bursting states there will be, beyond the non-bursting state. An i value of bursting states is equals to i+1 automaton states.
  • Date Column is the name of the column with date/time when the events / topics happens.
  • Date Format specifies how the date column will be interpreted as a date/time. See http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html for details.
  • Text Column is the name of the column with values (delimiter and tokens) to be computed for bursting results.
  • Text Separator delimits the tokens in the text column. When constructing your tables, do not use a separator that is used as a whole or part of any token.

Since we are focus on scholarly data, the data will be distributed into yearly batches before the burst computation started. Please see Kleinbergpg. 14. We also replace the missing years with empty batches to make the batches continuously by year. There will no burst for these empty batches. It is possible to add a scaling factor for the batches to month, day, hour; even number of years per batch. However, we need more evaluation on the implementation before it is available.

The result will be generated into a CSV file with the following fields:

...

Pros & Cons

Because of the by-value event state machine approach, values events are bursted on independently of each other. This makes this algorithm suitable primarily when the changes in patterns of individual value event usage are the area of interest. Cross-value burst-levels comparisons of bursts are possible, because an event are possibleby using the burst 'strengthweight' is calculated.. However, this algorithm only support batches records by year. In future, it will expended to month, date, hour; even number of year as desired based on user needs.

Applications

Burst detection is particularly useful for examining the trends in collections of texts or communities of conversation. Even words that are used comparatively little, but that change in frequency of usage over time, stand out, unlike in burst detection algorithms based on thresholds.

Implementation Details

Wiki Markup
Since we are focus on scholarly data, the data will be distributed into yearly batches before the burst computation started. Please see Kleinberg\[pg. 14\]. We also replace the missing years with empty batches to make the batches continuously by year. There will no burst for these empty batches. It is possible to add a scaling factor for the batches to month, day, hour; even number of years per batch. However, we need more evaluation on the implementation before it is available. This
algorithm provides all options for the original C program that had any effect
 algorithm was modified into Java based on the original C implementation.

Usage Hints

Please read the Description section before continue. This burst algorithm is a text based burst detection that provide burst results in heirarchical structure. However, it is also capable to detect if the bursts exist by setting the bursting states to 1.

...