close
Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2002 Oct 25;3(11):research0064.
doi: 10.1186/gb-2002-3-11-research0064. Epub 2002 Oct 25.

The society of genes: networks of functional links between genes from comparative genomics

Affiliations
Comparative Study

The society of genes: networks of functional links between genes from comparative genomics

Itai Yanai et al. Genome Biol. .

Abstract

Background: Comparative genomics provides at least three methods beyond traditional sequence similarity for identifying functional links between genes: the examination of common phylogenetic distributions, the analysis of conserved proximity along the chromosomes of multiple genomes, and observations of fusions of genes into a multidomain gene in another organism. We have previously generated the links according to each of these methods individually for 43 known microbial genomes. Here we combine these results to construct networks of functional associations.

Results: We show that the functional networks obtained by applying these methods have different topologies and that the information they provide is largely additive. In particular, the combined networks of functional links contain an average of 57% of an organism's complete genetic complement, uncover substantial portions of known pathways, and suggest the function of previously unannotated genes. In addition, the combined networks are qualitatively different from the networks obtained using individual methods. They have a dominant cluster that contains approximately 80%-90% of the genes, independent of genome size, and the dominant clusters show the small world behavior expected of a biological system, with global connectivity that is nearly random, and local properties that are highly ordered.

Conclusions: When the information on functional linkage provided by three emerging computational methods is combined, the integrated network uncovers large numbers of conserved pathways and identifies clusters of functionally related genes. It therefore shows considerable utility and promise as a tool for understanding genomic structure, and for guiding high throughput experimental investigations.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Three comparative genomics methods for identifying functional links between genes and the networks they produce. The schematics on the left show links between gene A (pink) and gene B (blue) based on (a) the same phyletic distribution across the known genomes, here arbitrarily labeled W, X, Y, Z, etc.; (b) their proximity on chromosomes from different genomes; and (c) fusion of A and B into one multidomain gene in another organism. On the right are networks found in H. pylori using each of the three methods. All network figures were made using the Pajek program [42].
Figure 2
Figure 2
The combined network in H. pylori. The networks of the three individual methods (shown in Figure 1) are superimposed. Links colored yellow were obtained by phylogenetic profiling; links colored blue by conserved chromosomal proximity, and links colored red by domain fusion. The remaining links are coded as composites of the three primary colors: purple, links found by both fusion and chromosomal proximity; green, links found by chromosomal proximity and phylogenetic profiling, orange, links found by phylogenetic profiling and fusion; and brown, links found by all three methods. The nodes highlighted in red and blue identify genes that participate in oxidative phosphorylation and aromatic amino-acid biosynthesis, respectively (see Figure 4).
Figure 3
Figure 3
Functional correlation of networks in terms of COG functional correlations and KEGG pathways. Each circle corresponds to a network of one of the four types of the observed networks (differently colored) for one of the 43 genomes. Each triangle corresponds to the mean of 100 shuffled versions of each of the observed networks (see Methods). Note the clear separation of the functional correlation between the observed and shuffled networks.
Figure 4
Figure 4
Local structure of the H. pylori combined network captures functionally related genes. (a) Oxidative phosphorylation genes; (b) genes involved in phenylalanine, tyrosine and tryptophan biosynthesis. The color of the links between genes is the same as in Figure 2. Gray lines indicate links with genes not ascribed to that pathway.
Figure 5
Figure 5
Combined networks reconstruct portions of known pathways that cannot be obtained by applying the methods independently. The blue spheres correspond to clusters of genes ascribed to a particular functional pathway (such as the ones described in Figure 4). The three-dimensional coordinates of the spheres correspond to the fraction of the clusters (in terms of nodes) that could have been recovered by each of the methods (the axes). The names of some of the pathways are shown. The E. coli genome was used and only clusters of seven or more genes are shown.
Figure 6
Figure 6
Ascribing function to uncharacterized genes on the basis of their locus in the network. A portion ofthe T. maritima network is shown. Genes of uncharacterized function are highlighted in green and genes with only a partial annotation in yellow. Genes involved in energy production are in blue. The color of the links is the same as in Figure 2. Gray lines indicate links with genes not shown. From the network locus of TM0885 and TM1367, we predict for the genes shown in blue a role in energy production.
Figure 7
Figure 7
Network properties. (a) Basic characteristics. For each of the 43 genomes the number of nodes and edges of the four different networks based upon the three methods - chromosome proximity (red), domain fusion (blue), phylogenetic profiling (yellow) - and the combined networks (black) are shown. (b) Giant clusters. The networks are characterized by a giant cluster that relates a dominant fraction of the nodes in the network. In the combined networks of most of the genomes, the giant cluster accounts for 80-90% of the nodes, on average. (c) Universality. The characteristic path distance (global property) and clustering coefficient (local property) of the giant cluster of each network is mapped. Note that networks of the same method tend to cluster together. (d) Degree distributions. The histogram of the number of edges per node is shown on a log-log scale for each network type for Pseudomonas aeruginosa; similar distributions are observed for the other organisms. As in Figure 3, the circles denote values from the observed distributions while the triangles denote the degree distributions of the de novo random networks (see Materials and methods) with the same number of nodes and edges.
Figure 8
Figure 8
Global path through the networks. The nodes (genes) along this particular path of five links in the E. coli network are shown as circles. The symbols associated with each node represent the functional pathways in which the gene is annotated. Moon, phenylalanine, tyrosine and tryptophan biosynthesis; diamond, histidine metabolism; exclamation mark, phenylalanine metabolism; X, tyrosine metabolism; star, aminoacyl-tRNA biosynthesis; check mark, alanine and aspartate metabolism.

References

    1. Bar-Yam Y. Dynamics of Complex Systems. London: Addison Wesley Longman; 1997.
    1. Watts DJ, Strogatz SH. Collective dynamics of 'small-world' networks. Nature. 1998;393:440–442. - PubMed
    1. Barabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. - PubMed
    1. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. - PubMed
    1. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA. 2001;98:4569–4574. - PMC - PubMed

Publication types

LinkOut - more resources