Data Preparation > Database > ISI > Extract Document Citation Network (Core and References)
Extracts the Document citation network from an ISI database.
Each Document and each referenced Document in the input database is represented by a node. An edge is drawn between the nodes for two Documents if and only if one of the Documents cited the other Document.
Core Document vs. Non-Core Document
There is a distinction drawn between Documents contained in your dataset and Documents in general. A Document in your dataset is called a "Core Document". Your Documents may (and probably do) reference Non-Core Documents.
The output network of this algorithm will contain nodes representing even Non-Core Documents. For an algorithm that will represent only Core Documents, see Extract Document Citation Network (Core Only).
The output network will include the following data and metadata:
- Node (Document)
- All actual data from the Documents Table.
- A generated prettified label for identifying this Document.
- A generated prettified string giving the Source of this Document (called 'SOURCE').
- Edge (Citation)
- Currently, no metadata is provided on edges.
Load an ISI file into the tool, then create a database from it using the ISI database loader.
It is strongly recommended that the database be cleaned before extracting any citation networks from it.
For a quick analysis of a small dataset you may wish to merge together Author entities with identical names. For a scientifically sound analysis of a larger dataset, you can find Author entity merging suggestions (or manually set your own merging orders from scratch) and perform the merge.
Then, you will probably want to merge together Source entities according to recognized variants.
Finally, you must match References up to Documents in your dataset (there are no citations to analyze, otherwise).
The specific query run by the tool can be found in the source code.