Saturday, 12 October 2013

Networks- Physics of the Web

The Web remains an untamed beast. Ever since its inception, routers and lines are added continuously without bounds in an uncontrolled and decentralised manner; the every embodiment of digital anarchy. But is this network of networks inherently random? Nope. But how them do you get order to emerge from the entropy of millions of links and nodes? Let's examine the Internet and the web in the light of network theory and statistical mechanics.The most fundamental qualitative feature of any network is its degree distribution. The degree of a node is the amount of edges connected to it. Much of the Internet is an aggregation of low degree nodes with some few high degree hubs. An intriguing pattern arises with degree distribution of the Internet at large; it follows roughly a straight line when plotted against a logarithmic scale, essentially implying that the # p(k) of nodes with a degree k obeys a power law p(k)xk^-a. The present value of a for the Internet is around 2.2. But if edges of a network were arbitrarily placed between nodes, the resultant degrees would obey a Poisson distribution (in which a majority of nodes have degrees fairly close to the mean value and a total lack of high degree hubs), much like the Erdős-Renyi random graph. So the fact that the Internet follows a power-law makes it far from random and hence 'scale-free'. Citation networks, where edges represent citations of one paper to another and nodes symbolise the papers themselves, are also scale-free. So why do the Web and Internet both have an affinity and indeed a tendency to form similar scale-free networks? Conventional graph theory makes the assumption that the amount of nodes in a network is static and that links are randomly distributed. Such assumptions fail given that the Internet continually evolves with new routers the the Web with new pages, also the fact that actual networks feature 'preferential attachment' (nodes have a high probability of forming connections with another nodes that have many links).Let's imagine that some nodes in a network are abruptly removed or disappear. 3% of Internet routers are destined to fail at any given time, so what percentage of nodes would need to be removed so as to affect network performance? We can perform one test by removing nodes uniformly and at random and another test by deliberately removing the nodes with the highest degree. It turns out that for a scale-free network, random node removal has little to no effect whereas targeting hubs can be destructive.

The concept of 'six degrees of separation', proposed by Stanley Milgram, suggests that anyone in the world can be connected to anyone else by a chain of five or six acquaintances. Does the Internet follow this trend seen in social networks (small separation of nodes and high degrees of clustering)? Since we don't have a complete copy of the entire web, even search engine cover only around 16%, we can use a small finite sample of it to make an inference about the whole. Using 'finite size scaling', you can quantify the mean shortest distance between two nodes (numbers of clicks to get from a page to another page). Given there are around 1 billion nodes that make up the Web, this bring the 'small world' effect to 19 'clicks of separation'. Not all pairs of nodes can be internconnected given that the Web is not a directed network; a link leading from one page to another does not mean an inverse link exists, hence such a path of 19 clicks is not guaranteed.In most complex networks, nodes undergo competition for links. We can model this by giving each node a 'fitness factor' which quantifies its ability to compete, subsequently energy levels can be assigned to each node to produce a Bose gas (its lowest energy level representing the fittest node). The Bose gas evolves with time, adding new energy levels; such corresponds to the addition of novel nodes in the network. Two different outcomes can arise depending on the distribution of energy level selection: (1) 'fit get rich': as the energy level increases, particle level decreases (2) Bose-Einstein Condensation: the fittest node gains a large percentage of all links and manifests itself as a highly populated lowest energy level. Perhaps the Web is just another Bose-Einstein condensate?

Thursday, 3 October 2013

Homology- A Unified Definition

Homology is 'a word ripe for burning'. But how should it be defined? Superficially, it's often identified as similarity in morphology reflecting a common evolutionary origin; but can we give a more rigorous approach? Like 'species', the many definitions of homology fall into two basic forms: developmental and taxic. The developmental approach is based on ontogeny, and two characters are homologous if they have an identical set of developmental constraints. The taxic definition is based on cladistics and identifies a homologues as a synapomorphy (a trait that characterises a monophyly). Some complications arise with structural homology, for instance, the wings or bats and birds can be considered convergent as they are differently arranged (and lack common ancestry); but they can be considered homologous at the level or the organism (because they evolved from the same pattern or vertebrate forelimb traceable to a common ancestor). Circular reasoning also arises with structural homology in that it is used to build a phylogeny and that phylogeny is subsequently used to infer homology (notice the circularity); a phylogeny must be initially constructed based on evidence before homology is proposed. What about evo-devo? This is also unhelpful for a working definition of homology, different pathways of development can converge on the same adult form, such as the methods of gastrulation and the many routes of developmental regeneration in the hydroid Tubularia. Even the embryonic developmental origin is unuseful as it relies on the subsequent interactions between cells and fails to give a conserved adult morphology. Molecular markers such as genes succumb to hierarchical disconnect (whereby homologous characters produce non-homologous traits). A classic example is the gene PAX6 in eye development which is found and transcribed in species as diverse as insects, humans, squids and even primitive eyed nemertines and platyhelminths.

Experiments involving grafting of Drosophila PAX6 into Drosophila limbs or wings can place eyes in incorrect positions; when mice PAX6 is inserted into Drosophila, it is expressed as mouse-like. These grafting tests indicate that the adjustability for change does not lie in the genes as in the regulatory network of genes that code for expression. The need by to redefine homology at different hierarchical levels is also indicated in other characters. For a long time, arthropod compound eyes had been thought to have evolved rather independently of the vertebrate simple eye; now this seems improbable given the immense similarity between cephalopod and vertebrate eyes (commonly attributed to convergence). In essence, the gene starting eye formation is homologous but it's expression is not necessarily homologous. Hierarchical disconnect  in the form of nonhomologous traits causing homologous characters is also noted. With the exception of urodele amphibians, all tetrapods develop tissue between their primordial digits and later undergo apoptosis. But in newts and salamanders, there is no need for apoptosis and digits take a separate developmental pathway. The evolutionary hypothesis is that salamanders and newts (or one of their ancestral species) lost the ability of apoptosis between digits and differential growth is a derived process. Novel genes exchanged for older ones can also cause the same homologous morphology (co-option of genes during evolution for very distinct functions).