Message-ID: <1039391283.24465.1571702696278.JavaMail.confluence@wiki.cns.iu.edu> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_24464_624837338.1571702696277" ------=_Part_24464_624837338.1571702696277 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html 4.10 Modeling (Why?)

4.10 Modeling (Why?)

Data models are grouped into two major types: descriptive models= and process models. Descriptive models aim to illustrate the majo= r features of a (typically static) data set, such as statistical patterns o= f article citation counts, networks of citations, individual differences in= citation practice, the composition of knowledge domains, or the identifica= tion of research fronts as indicated by new yet highly cited papers. Pr= ocess models or predictive models aim to simulate, statistically descr= ibe, or formally reproduce statistical and dynamic characteristics of inter= est.

4.10.1 Random Graph Model

The random graph model generates a graph that has a fixed number of node= s which are connected randomly by undirected edges, see Figure 4.17 (left).= The number of edges depends on a specified probability. The edge probabili= ty is chosen based on the number of nodes in the graph. The model most comm= only used for this purpose was introduced by Gilbert (Gilbert, 1959). This = is known as the G(n,p) model with n being the number of v= ertices and p the linking probability. The number of edges created= according to this model is not known in advance. Erd?s-R=C3=A9nyi introduc= ed a similar model where all the graphs with m edges are equally p= robable and m varies between 0 and n(n-1)/2 (Erd= ?s & R=C3=A9nyi, 1959). This is known as the G(n,m) model. The= degree distribution for this network is Poissonian, see Figure 4.17 (right= ). Figure 4.17: Random graph and its Poissonian node degree dis= tribution

Very few real world networks are random. However, random networks are a = theoretical construct that is well understood and their properties can be e= xactly solved. They are commonly used as a reference, e.g., in tests of net= work robustness and epidemic spreading (Batagelj & Brandes).
In the= Sci2 Tool, the random graph generator implements the G(n,p) model= by Gilbert. Run 'Modeling > Random Graph' and input the total = number of nodes in the network and their wiring probability. The output is = a network in which each pair of nodes is connected by an undirected edge wi= th the probability specified in the input.
A wiring probability of 0 wo= uld generate a network without any edges and a wiring probability of 1 with= n nodes will generate a network with (n-1) edges. The wi= ring probability should be chosen dependent on the number of vertices. For = a large number of vertices the wiring probability should be smaller.

4.10.2 Watts-Strogatz Sm= all World

A small-world network is one whose majority of nodes are not directly co= nnected to one another, but still can reach any other node via very few edg= es. It can be used to generate networks of any size. The degree distributio= n is almost Poissonian for any value of the rewiring probability (except in= the extreme case of rewiring probability zero, for which all nodes have eq= ual degree). The clustering coefficient is high until beta is close to 1, a= nd as beta approaches one, the distribution becomes Poissonian. This is bec= ause the graph becomes increasingly similar to an Erd?s-R=C3=A9nyi Random G= raph, see Figure 4.18. (Watts & Strogatz, 1998; Inc. Wikimedia Foundati= on, 2009). Figure 4.18: Small world graph (left) and its node degree di= stribution equation (right)

Small world properties are usually studied to explore networks with tuna= ble values for the average shortest path between pairs of nodes and a high = clustering coefficient. Networks with small values for the average shortest= path and large values for the clustering coefficient can be used to simula= te social networks, unlike ER random graphs, which have small average short= est path lengths, but low clustering coefficients.
The algorithm = requires four inputs: the number n of nodes of the network, the nu= mber k of initial neighbors of each node (the initial configuratio= n is a ring of nodes), the probability of rewiring the edges (which is a re= al number between 0 and 1), and the seed of a random number generator. The = network is built following the original prescription of Watts and Strogatz,= i.e., by starting from a ring of nodes each connected to the k nodes and b= y rewiring each edge with the specified probability. The algorithm run time= is O(kn).
Run with 'Modeling > Watts-Strogatz Small Wo= rld' and input 1000 nodes, 10 initial neighbors, and a rewiring probab= ility of 0.01 then compute the average shortest path and the clustering coe= fficient and verify that the former is small and the latter is relatively l= arge.

4.10.3 Barab=C3=A1si-Albert Scale Free Model

The Barab=C3=A1si-Albert (BA) model is an algorithm which generates a sc= ale-free network by incorporating growth and preferential attachment. Start= ing with an initial network of a few nodes, a new node is added at each tim= e step. Older nodes with a higher degree have a higher probability of attra= cting edges from new nodes. The probability of attachment is given by
<= span class=3D"confluence-embedded-file-wrapper"> .

The initial number of nodes in the network must be greater than two and = each of these nodes must have at least one connection. The final structure = of the network does not depend on the initial number of nodes in the networ= k. The degree distribution of the generated network is a power law with a s= caling coefficient of -3 (Barab=C3=A1si & Albert, 1999; Barab= =C3=A1si & Albert, 2002). Figure 4.19 shows the network on the left and= the probability distribution on a log-log scale on the right. Figure 4.19: Scale free graph (left) and its node degree dis= tribution (right)

This is the simplest known algorithm to generate a scale-free network. I= t can be applied to model undirected networks such as the collaboration net= work among scientists, the movie actor network, and other social networks w= here the connections between the nodes are undirected. However, it cannot b= e used to generate a directed network.
The inputs for the algorithm are= the number of initial nodes, the number of initial edges for a new node, a= nd the seed of a random number generator. The algorithm starts with the ini= tial number of nodes that are fully connected. At each time step, a new nod= e is generated with the initial number of edges. The probability of attachi= ng to an existing node is calculated by dividing the degree of an existing = node by the total number of edges. If this probability is greater than zero= and greater than the random number obtained from a random number generator= then an edge is attached between the two nodes. This is repeated in each t= ime step.
Run with 'Modeling > Barab=C3=A1si-Albert Scale Free M= odel' and a time step of around 1000, initial number of nodes= 2, and number of edges 1 in the input. Lay out and deter= mine the number and degree of highly connected nodes via 'Analysis >= Unweighted and Undirected > Degree Distribution' using the default= value. Plot node degree distribution using Gnuplot.