Network scaling algorithms such as the Pathfi=
nder algorithm are used to prune many different kinds of networks, incl=
uding citation networks, random networks, and social networks. However, thi=
s algorithm suffers from run time problems for large networks and online pr=
ocessing due to its O(n^{4}) time complexity. Hence another alterna=
tive, MST-Pathfinder algorithm, is used which prunes the original network t=
o get its PFNET(infinity,n-1) in just O(n^{2} * log n) time.

The underlying idea comes from the fact that the union (superposition) o= f all the Minimum Spanning Trees extracted= from a given network is equivalent to the PFNET resulting from the Pathfin= der algorithm parameterized by a specific set of values (r =3D infinity and= q =3D n-1), those usually considered in many different applications.

= =20The algorithm has a space complexity of 3 * n^{2} + n, which is =
very less compared to the Original Pathfinder approach's complexity of 2 * =
(n^{3} - n^{2}). Also its time complexity is O(n^{2 * log(n)) compared to O(n4) of the Original Pathfinder approac=
h. According to an experimental setup, this efficiency enables MST based ap=
proach to scale the network with 10,000 nodes and 100 million edges in appr=
oximately 129 seconds compared to more than 222 hours using Original approa=
ch.}

However, since it only provides an optimal solution the user cannot cont= rol the parameters,

=20- =20
which constrains the number of indirect pro= ximities examined in generating the network. The q parameter is an integer = value between 2 and n-1, inclusive where n is the number of nodes or items.= Value of q is fixed to n-1.*q*=20
defines the metric used for computing the d= istance of paths. The r parameter is a real number between 1 and infinity, = inclusive. Value of r is fixed to infinity.*r*=20

This, however, is mitigated by the fact that the networks resulting from= citation, cocitation, or term co-occurrence analysis are usually dense whe= n the categories are used as the unit for each node. Due to this fact, and = especially in the case of vast scientific domains with a high number of ent= ities (categories in our case) in the network, Pathfinder is usually parame= terized to r =3D infinity and q =3D n-1, obtain an schematic representation= of the most outstanding existing information by means of a network showing= just the most salient links.

=20One of the drawbacks of this algorithm is that it cannot be applied on d= irected networks, because the sorting process deals with undirected links. = Scaling of Directed networks using Pathfinder algorithm can be implemented = in the future.

=20Once the input file is validated the algorithm is started. It works as f= ollows,

=20- =20
- When reading through the nodes, initiate clusters (one for each node).<= /li>=20
- Next, when reading through the edges create tuples containing unique ed=
ge id, node 1, node 2 & edge weight. Depending upon users choice of how=
edge weight should be represented, normalize the edge weight i.e. If they =
chose similarity, normalize edge weights to be
*Dissimilarity*based= by inverting the edge weight. =20
- Sort edge tuples based on their weight. =20
- Iterate over the edge tuple i.e. for each edge e(u, v) do,=20
- =20
- For all the edges e(u', v') remaining in the edge list having edge weig=
ht equal to the current edge e(u, v) do,=20
- =20
- Remove e(u', v') from the edge list =20
- If nodes u' & v' are not in the same cluster then,=20
- =20
- Add e(u', v') to the final edges list, which holds the edges which will= belong to the final scaled network. =20
- Merge the clusters belonging to u' & v' such that smaller cluster i= s merged with the bigger cluster & smaller cluster is deleted. =20

=20

=20

=20
- For all the edges e(u', v') remaining in the edge list having edge weig=
ht equal to the current edge e(u, v) do,=20
- After all the edges are considered output the final edges list to a new= .NWB file. This will contain the nodes from the original network and only = those edges which are present in the final edges list. =20

The user has to provide 3 inputs; the file storing the information about=
the network, the column name that represents the edge weight and how the e=
dge weight should be considered (Dissimilarity or Similarity). The provided=
network should not contain edge weights of less than equal to 0 else a war=
ning is generated and default edge weight is assumed for that edge. The alg=
orithm assumes the edge weights to be a measure of dissimilarity i.e. More =
the weight, more is the dissimilarity. If the input edge weights are a meas=
ure of Similarity i.e. More the weight, less is the similarity then the use=
r can select *Similarity* option from the drop-down box. Only in the=
case of user selecting *Similarity* the edge weights will be invert=
ed. For example, in a co-authorship network, more number of articles author=
ed by 2 subjects equates to a stronger relationship. In this case the user =
should select *Similarity* option. The output will be provided as a =
.NWB file with a pruned network.

- =20
- Source Code =20

The original Pathfinder Network Scaling algorithm authored by Roger Schvaneveldt et al. was=
adapted with a MST based approach by *Arnaud Quirin et al.* is impl=
emented, integrated and documented by Chintan Tank. Also thanks to Micah Li=
nnemeier and Russell Duhon for providing guidance during design and impleme=
ntation. For the description I acknowledge Wikipedia.

- =20
- Schvaneveldt, Roger., Durso, Francis., Dearholt, Donald. Graph theoreti=
c foundations of pathfinder networks. In
*Computers & mathematics wi= th applications*, volume 15, pages 337-345, 1988. Link =20
- Schvaneveldt, Roger., Durso, Francis., Dearholt, Donald. Network struct= ures in Proximity Data. 1989. Link =20
- Schvaneveldt, Roger. Pathfinder associative networks : Studies in Knowl= edge organization. 1990. Link =20
- Guerrero-Bote, Vicente., Zapico-Alonso, Felipe., Espinosa-Calvo, Maria =
Eugenia., Crisostomo, Rocio Gomez., Moya-Anegnon, Felix. Binary Pathfinder:=
An improvement to the Pathfinder algorithm. In
*Information processing = & management*, volume 42, pages 1484-1490, 2006. Link =20
- Quirin, Arnaud., Cordon, Oscar., Guerrero-Bote, Vicente., Vargas-Quesad=
a, Benjamin., Moya-Anegnon, Felix. A quick MST-based algorithm to obtain Pa=
thfinder networks ((infinity), n - 1). In
*Journal of the American Socie= ty for Information Science and Technology*, volume 59, pages 1912-1924,= 2008. Link =20
- Quirin, Arnaud., Cordon, Oscar., Santamaria, J., Vargas-Quesada, Benjam=
in., Moya-Anegnon, Felix. A new variant of the Pathfinder algorithm to gene=
rate large visual science maps in cubic time. In
*Information processing= & management*, volume 44, pages 1611-1623, 2008. Link =20

The license could not be verified: License Certificate has expired!=20
Generate a Free license now.