Given a table with at least three columns, a Text Column (event or topics to be targeted), a dates/timestamps (time the event happens) and a delimited value (to separate multiple events / topics), this algorithm detects bursts of each event / topics. Please see 'Usage Hints' for more details about guidance.
The algorithm takes 7 9 parameters.
- Gamma is the value that state transition costs are proportional to. The higher Gamma value results the higher transition costs. Use this parameter to control how ease the automaton can change states.
- Density scaling determines how much 'more bursty' each level is beyond the previous one. The higher the scaling value, the more active (bursty) the event happens in each level.
- Bursting States determines how many bursting states there will be, beyond the non-bursting state. An i value of bursting states is equals to i+1 automaton states.
- Date Column is the name of the column with date/time when the events / topics happens.
- Date Format specifies how the date column will be interpreted as a date/time. See http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html for details.
- Batch Length Unit specifies how to divide the date range into burstable units.
- Batch Length specifies the number of burstable units per burstable period. For example, 10 years generates bursts by decade.
- Text Column is the name of the column with values (delimiter and tokens) to be computed for bursting results.
- Text Separator delimits the tokens in the text column. When constructing your tables, do not use a separator that is used as a whole or part of any token.
Because of the by-event state machine approach, events are bursted on independently of each other. This makes this algorithm suitable primarily when the changes in patterns of individual event usage are the area of interest. Cross-burst-levels comparisons of an event are possibly using the burst 'weight'. However, this algorithm only support batches records by year. In future, it will expended to month, date, hour; even number of year as desired based on user needsyears, months, days, hours, minutes.
Burst detection is particularly useful for examining the trends in collections of texts or communities of conversation. Even words that are used comparatively little, but that change in frequency of usage over time, stand out, unlike in burst detection algorithms based on thresholds.
Since we are focus on scholarly data, the data will be distributed into batches (usually yearly batches) before the burst computation started. Please see Kleinberg \[pg. 14\]. We also replace the missing years with empty batches to make the batches continuously by year. There will no burst for these empty batches. ItUsers iscan possiblechange tothe add a scaling factor for the batches to month, day, hour; even number of years per batch. However,The webatching needimplementation morewill evaluationnot onconsider the implementationdate beforefields it is available. with the scaling factors that are smaller than the user selected scaling factor. For example, if the days scaling factor is selected, the batching algorithm will remove the hour and minute fields in the date value. This algorithm was re-implemented in Java based on the origin C implementation.
- Load the CSV file by choosing load from menu bar.
- Choose Analysis > Topical > Burst Detection from the Sci2's menu bar.
- A window will popup and a 7 input parameters are listed.
- Usually you will only need to change Date Column, Date Format, Burst Length Unit, Burst Length, Text Column, and Text Separator when using Burst Detection. If you want to have a abstract view of the entire data, set the bursting states to 1. If you interested to the hierarchical burst structure of each word, set the bursting states bigger than 1. Change the parameters based on your need. Please see 'Desciption' section for detail of each parameter.
- Press ok once you done adjusting parameters. A new result in csv file will be generated at the right panel. Save it and you can view the result with Excel.
- Please refer to description for result fields information.