Saturday, 12 October 2013

Networks- Physics of the Web

The Web remains an untamed beast. Ever since its inception, routers and lines are added continuously without bounds in an uncontrolled and decentralised manner; the every embodiment of digital anarchy. But is this network of networks inherently random? Nope. But how them do you get order to emerge from the entropy of millions of links and nodes? Let's examine the Internet and the web in the light of network theory and statistical mechanics.The most fundamental qualitative feature of any network is its degree distribution. The degree of a node is the amount of edges connected to it. Much of the Internet is an aggregation of low degree nodes with some few high degree hubs. An intriguing pattern arises with degree distribution of the Internet at large; it follows roughly a straight line when plotted against a logarithmic scale, essentially implying that the # p(k) of nodes with a degree k obeys a power law p(k)xk^-a. The present value of a for the Internet is around 2.2. But if edges of a network were arbitrarily placed between nodes, the resultant degrees would obey a Poisson distribution (in which a majority of nodes have degrees fairly close to the mean value and a total lack of high degree hubs), much like the Erdős-Renyi random graph. So the fact that the Internet follows a power-law makes it far from random and hence 'scale-free'. Citation networks, where edges represent citations of one paper to another and nodes symbolise the papers themselves, are also scale-free. So why do the Web and Internet both have an affinity and indeed a tendency to form similar scale-free networks? Conventional graph theory makes the assumption that the amount of nodes in a network is static and that links are randomly distributed. Such assumptions fail given that the Internet continually evolves with new routers the the Web with new pages, also the fact that actual networks feature 'preferential attachment' (nodes have a high probability of forming connections with another nodes that have many links).Let's imagine that some nodes in a network are abruptly removed or disappear. 3% of Internet routers are destined to fail at any given time, so what percentage of nodes would need to be removed so as to affect network performance? We can perform one test by removing nodes uniformly and at random and another test by deliberately removing the nodes with the highest degree. It turns out that for a scale-free network, random node removal has little to no effect whereas targeting hubs can be destructive.

The concept of 'six degrees of separation', proposed by Stanley Milgram, suggests that anyone in the world can be connected to anyone else by a chain of five or six acquaintances. Does the Internet follow this trend seen in social networks (small separation of nodes and high degrees of clustering)? Since we don't have a complete copy of the entire web, even search engine cover only around 16%, we can use a small finite sample of it to make an inference about the whole. Using 'finite size scaling', you can quantify the mean shortest distance between two nodes (numbers of clicks to get from a page to another page). Given there are around 1 billion nodes that make up the Web, this bring the 'small world' effect to 19 'clicks of separation'. Not all pairs of nodes can be internconnected given that the Web is not a directed network; a link leading from one page to another does not mean an inverse link exists, hence such a path of 19 clicks is not guaranteed.In most complex networks, nodes undergo competition for links. We can model this by giving each node a 'fitness factor' which quantifies its ability to compete, subsequently energy levels can be assigned to each node to produce a Bose gas (its lowest energy level representing the fittest node). The Bose gas evolves with time, adding new energy levels; such corresponds to the addition of novel nodes in the network. Two different outcomes can arise depending on the distribution of energy level selection: (1) 'fit get rich': as the energy level increases, particle level decreases (2) Bose-Einstein Condensation: the fittest node gains a large percentage of all links and manifests itself as a highly populated lowest energy level. Perhaps the Web is just another Bose-Einstein condensate?

No comments:

Post a Comment