This algorithm is used to merge identical networks. Emphasis is put on edge attributes merging.
Pros & Cons
This algorithm is very generic, in that you can input a network in any format & it will attempt to merge them based on unique nodes & edges. The merged network can be used to analyse the parent networks side by side. Even if a particular node or edge does not exist in one of the parent networks the merged network has it. Right now we are not accepting networks which are directed. The algorithm assumes that the user knows which node attribute has unique values for the entire network & uses it to disambiguate nodes from both the networks. If the user has made a mistake in providing unique node identifiers then unexpected network fill result. Right now during the user input stage we do not parse the unique node identifying attribute candidates. But in the future we would display only those attributes in the drop down box which actually have unique values.
Once the input networks are validated the algorithm is started. It works as follows,
- Metadata consisting edge & node schema is captured for both the networks.
- Based on this data along with (node & edge) attributes that are to be ignored we initiate merging of the network metadata.
- First consider node schemas of both the input networks.
- For each attribute in the first schema do,
- Check if the attribute name is present in other schemas. If yes then use the collision resolving prefix provided by the user for the first network to create a unique attribute name in the format - "<PREFIX>_<Current Attribute Name>". Add this new attribute name, old attribute name & the data type to the resolved schema definitions.
- If no, then just add the current attribute name & data type to the resolved schema definitions.
- Do the above for the second network as well.
- For each attribute in the first schema do,
- Do the above (3.1) for the edge schemas of both the input networks.
- We will use the unified schema of nodes & edges later on when generating a new network conatining data from both the input networks.
- We create new maps of unique node identifier to node objects & unique edge identifiers to edge objects so that if duplicate nodes are encountered in other networks they can be promptly merged.
- Start processing each network file & create new node & edge objects, if needed.
- Processing of each file assumes the following flow,
- When node elements are encountered we first check which is the unique identifier for each node element.
- Map of unique identifiers to node objects is checked for the current unique identifier.
- If it is present then we use the node object attached to that identifier for further processing.
- If not then we create a new node object & create a map entry from unique identifier to newly created node object.
- For each incoming node attribute we first check our mapping from old attribute name to new (resolved/unified) attribute & then use that to add attributes to the node object.
- When edge elements are encountered we first get the unique edge identifier. Unique edge identifier is constructed using the format - "<SOURCE NODE ID>$<TARGET NODE ID>".
- Map of unique identifiers to edge objects is checked for the current unique identifier.
- If it is present then we use the edge object attached to that identifier for further processing.
- If not then we create a new edge object & create a map entry from unique identifier to newly created edge object.
- For each incoming edge attribute we first check our mapping from old attribute name to new (resolved/unified) attribute & then use that to add attributes to the edge object.
- After the network assets (nodes & edges) objects are created we use this & final schema to create an output network file.
- First the node schema is set.
- Then each node object is traversed and node row is creted in the output file.
- Then edge schema is set.
- Then each edge object is traversed and edge row is creted in the output file.
- The file thus created is provided to the user through the data manager. It is placed below the second network file.
The user has to provide 3 initial inputs; the 2 parent network files storing the information about the network, the attribute name that represents the unique node identifier. The algorithm checks before hand if the 2 networks have overlapping attribute names for nodes or edges, if so then 2 further inputs are requested from the user. Both are text inputs which will be used as prefix when renaming the attributes in the final network. The output will be provided as a .NWB file with a merged network. Following conditions should be met to successfully run the algorithm,
- The input networks should be undirected.
- There should be at least 1 common node attribute between the 2 networks. This will be used to disambiguate nodes.
- Prefix values (if required) should be valid as described below,
- Prefix cannot be empty.
- Prefix cannot start with a number.
- Prefix for different networks cannot be the same.
- Prefix cannot be more than 10 in length.
The aggregate data plugin was authored, implemented, integrated and documented by Chintan Tank.