Child pages
  • Burst Detection

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0


Burst detection is particularly useful for examining the trends in collections of texts or communities of conversation. Even words that are used comparatively little, but that change in frequency of usage over time, stand out, unlike in burst detection algorithms based on thresholds.

Implementation Details

Wiki MarkupSince we are focus on scholarly data, the data will be distributed into batches (usually yearly batches) before the burst computation started. The burst detection algorithm was re-implemented in Java based on the origin C implementation. Please see Kleinberg \ [pg. 14\]. We replace the missing years with empty batches to make the batches continuously by year. There will no burst for these empty batches. Users can change the scaling factor for the batches to month, day, hour; even number of years per batch. The batching implementation will not consider the date fields with the scaling factors that are smaller than the user selected scaling factor. For example, if the days scaling factor is selected, the batching algorithm will remove the hour and minute fields in the date value.

Usage Hints

Please read the Description section before continuing. This burst algorithm is a text based burst detection that provide burst results in hierarchical structure. However, it is also capable to detect if the bursts exist by setting the bursting states to 1.


J. Kleinberg. Bursty and Hierarchical Structure in Streams. Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002.

See Also

Wiki Markup