  Burst Detection

Since we are focus on scholarly data, the data will be distributed into yearly batches before the burst computation started. Please see Kleinberg\[pg. 14\]. We also replace the missing years with empty batches to make the batches continuously by year. There will no burst for these empty batches. It is possible to add a scaling factor for the batches to month, day, hour; even number of years per batch. However, we need more evaluation on the implementation before it is available. This algorithm was modifiedre-implemented intoin Java based on the originalorigin C implementation.

Usage Hints


Preprocessing steps:
You might need to consider to normalize free-form text of the Text Column by using Lowercase, Tokenize, Stem, and Stopword Text. The burst detection algorithm will not edit the words in the Text Column. The different forms of a word such as author, Author, authors will be treat treated as different tokens. To avoid this, you can use the Lowercase, Tokenize, Stem, and Stopword Text algorithm to normalized the Text Column. Basically, the normalized result is a list of tokens (words) that delimited with '|'.