Home::
 

About Wanda:

INTRODUCTION

Ray-finned fish (Actinopterygii) have more copies of many genes than species in their sister group Sarcopterygii (the lobed-finned fishes, amphibians, reptiles, birds, and mammals). Zebrafish (Danio rerio) and medaka (Oryzias latipes), for example, possess seven unlinked Hox gene clusters, almost twice as many as tetrapods such as mouse and human, which have four (1,2). Amores et al. (1) proposed that the extra Hox genes in zebrafish were produced during a genome duplication event before the evolution and radiation of Teleostei (3,4). In fact, several authors have speculated that the 'extra' genes produced during the proposed fish-specific genome duplication event somehow facilitated speciation in Actinopterygii (3,5-7). More recently, evidence for Hox cluster duplication has also been uncovered in the cichlid Oreochromis niloticus (8) and the pufferfish Fugu rubripes (9,10). Support for a large-scale gene duplication event in fish is not limited to the Hox clusters. Gene mapping and phylogenetic studies have identified a large number of other sarcopterygian genes with two zebrafish orthologs (11-15). The observations that many of these zebrafish 'paralogs' (16,17) are sister sequences in phylogenetic trees support the hypothesis that they were formed after ray-finned fish and tetrapods diverged from one another. The observations that different paralogous pairs were formed at approximately the same time (15), that they are found throughout the genome, and that they show synteny with other duplicated genes (11,12,14) support the hypothesis that they were formed during a complete genome duplication event.

However, gene duplication appears to be a remarkably frequent event (18) and tetraploidy has occurred independently in numerous lineages within Actinopterygii (19-23). It is, therefore, possible that independent gene and genome duplication events have led to the appearance of an ancient genome duplication event (24).

Redundant genes produced by gene or genome duplication events are likely to be silenced and eventually lost (18,25,26). Nevertheless, numerous models have been put forward to explain the retention of duplicated genes and their divergence in sequence and function. If a gene is not turned into a pseudogene, alternatively, by chance, a series of non-deleterious mutations might turn the duplicate into a gene with a new function (27). Although Ohno's model was adopted widely as an explanation for the evolution of functionally novel genes, it was criticized and numerous other models were put forward to explain both the retention and functional divergence of genes (28-30). In the fish species for which gene expression has been best studied, namely the zebrafish, retained duplicates often appear to have subdivided the roles of their single-gene ancestors (31-34). If duplicated genes lose different regulatory subfunctions, each affecting different spatial and/or temporal expression patterns, then they must complement each other by jointly retaining the full set of subfunctions present in the ancestral gene. Therefore, degenerative mutations facilitate the retention of duplicate functional genes, in which both duplicates now perform different but necessary subfunctions. Thus, the likelihood of gene retention/survival following duplication seems to increase when genes subdivide the roles of their ancestors.

Recently, a model called 'divergent resolution' has been proposed (18,35) that suggests that loss or silencing of duplicated genes might be more important to the evolution of species diversity than the evolution of new functions in duplicated genes. Divergent resolution occurs when different copies of a duplicated gene are lost in geographically separated populations and can genetically isolate these populations, should they become reunited (7). Divergent resolution of the 1,000 to tens of thousands of genes and their regulatory regions produced by large-scale gene duplications or a complete genome duplication event provides another alternative link between genome duplication and speciation in teleosts.
 

CONTENTS OF THE DATABASE AND AVAILABILITY

At the moment, the Wanda database lists more than 40 genes that occur once in human and mouse and twice in fish. In the near future, we expect the number of fish homologs to increase rapidly, because of the genome sequencing projects of Fugu rubripes, its fresh water relative Tetraodon nigroides and zebrafish, for which the complete genome is planned to be sequenced in 2003. The majority of duplicates compiled so far are from zebrafish, however, the database also includes genes from other fish species, even when only one copy is known. Many of these genes are most closely related to one of the two zebrafish duplicates, indicating that they are probably one of a pair of genes (i.e., semi-orthologs with respect to the tetrapod homolog) (36).

International databases, such as EMBL (37) and GenBank (38), are checked daily for the submission of new fish genes by using the Current Sequence Awareness tool, developed by the Belgian EMBNet node (http://ben.vub.ac.be/). When new fish genes have been submitted, similarity searches (for instance, BLAST) searches are performed to find sarcopterygian and actinopterygian homologs. Phylogenetic trees will be constructed to determine the relationship between the new gene(s) and known fish duplicates.

The Wanda database allows users to determine quickly whether a known fish gene is one of a set of 'semi-orthologs'. Fish genes very often have names that provide no hint to the fact that they are one of many orthologs of a given tetrapod gene. Nevertheless, this information can be very important. For example, differences in gene expression patterns between tetrapods (such as mouse) and fish (such as zebrafish) can sometimes be explained by paralogs in fish that subdivide the roles of their tetrapod orthologs. Furthermore, the future possibility of performing local BLAST (39) searches will be useful for naming correctly new genes that have been isolated from fishes. For example, a new gene from a certain species may be most similar to one of two paralogous genes in another species of fish. Wanda then allows users to determine whether the newly sequenced gene is, either the 'a' or the 'b' copy, which in the future should designate paralogous genes more consistently.

In addition to compiling duplicated genes in fishes and their vertebrate orthologs, Wanda will provide also: (i) references to literature and other databases reporting expression data for fish duplicates; (ii) nucleotide and amino acid sequence alignments and variability maps of fish paralogs; (iii) nucleotide and amino acid sequence alignments of fish duplicates and their sarcopterygian orthologs; (iv) phylogenetic trees showing the relative rates of evolution and the orthologous and paralogous relationships between duplicated fish genes and their sarcopterygian orthologs; (v) tables listing the results of studies that look for purifying or positive selection after duplication events; and (vi) cross-references to the international nucleotide sequence databases, such as EMBL and GenBank and to other fish-specific databases, such as ZFIN (40).

A well-annotated database that compiles paralogous gene sequences in fishes is a valuable source of information for biologists and geneticists, who are interested in the evolutionary and developmental consequences of duplication events as well as for biologists concerned by the evolutionary consequences of large-scale gene duplications. Overall, the Wanda database aims at comparing duplicated genes with their non-duplicated homologs, in terms of structure and function, and evolutionary divergence. As this database grows, it will provide also the data necessary to test the ancient fish-specific genome duplication hypothesis. Combining phylogenetic trees with map data will help testing the hypothesis that 'divergent resolution' has played a role in speciation within the ray-finned fishes.

The Wanda database is available via the WWW at URL Wanda/. Questions regarding the Wanda database should be addressed to: Yves Van de Peer, John Taylor, Jayabalan Joseph, or Axel Meyer.
 

ACKNOWLEDGEMENTS

This work was supported by a grant from the German Science Foundation (DFG PE 842/2-1). JST is indebted to the National Sciences and Engineering Research Council of Canada for a Postdoctoral fellowship. YVdP is a Postdoctoral Fellow of the Fund for Scientific Research (Flanders).
 

REFERENCES

1. Amores,A., Force,A., Yan,Y.-L., Joly,L., Amemiya,C., Fritz,A., Ho,R.K., Langeland,J., Prince,V., Wang,Y.-L. et al. (1998) Zebrafish hox clusters and vertebrate genome evolution. Science, 282, 1711-1714

2. Naruse,K., Fukamachi,S., Mitani,H., Kondo,M., Matsuoka,T., Kondo,S., Hanamura,N., Morita,Y., Hasegawa,K., Nishigaki,R. et al. (2000) A detailed linkage map of medaka, Oryzias latipes: comparative genomics and genome evolution. Genetics, 154, 1773-1784.

3. Wittbrodt,J., Meyer,A. and Schartl,M. (1998) More genes in fish? BioEssays, 20, 511-512.

4. Meyer,A. and Schartl,M. (1999) Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. Curr. Opin. Cell Biol., 11, 699-704.

5. Aparicio,S. (2000) Vertebrate evolution: recent perspectives from fish. Trends Genet., 16, 54-56.

6. Kappen,C. (2000) Analysis of a complete homeobox gene repertoire: implications for the evolution of diversity. Proc. Natl. Acad. Sci. USA, 97, 4481-4486.

7. Taylor,J., Van de Peer,Y. and Meyer,A. (2001) Genome duplication, divergent resolution and speciation. Trends Genet., 17, 299-301.

8. Málaga-Trillo,E. and Meyer, A. (2001) Genome duplications and accelerated evolution of Hox genes and cluster architecture in teleost fishes. Am. Zool., in press.

9. Aparicio,S., Hawker,K., Cottage,A., Mikawa,Y., Zuo,L., Venkatesh,B., Chen,E., Krumlauf,R. and Brenner,S. (1997) Organization of the Fugu rubripes Hox clusters: evidence for continuing evolution of vertebrate Hox complexes. Nature Genet., 16, 79-83.

10. Amores,A., Amemiya,C.T. and Postlethwait,J. (2001) Genome duplication and evolution of Hox clusters in teleosts. Am. Zool., in press.

11. Gates,M.A., Kim,L., Cardozo,T., Sirotkin,H.I., Dougan,S.T., Lashkari,D., Abagyan,R., Schier,A.F. and Talbot,W.S. (1999) A genetic linkage map for zebrafish: comparative analysis and localization of genes and expressed sequences. Genome Res., 9, 334-347.

12. Postlethwait,J.H., Woods,I.G., Ngo-Hazelett,P., Yan,Y.-L., Kelly,P.D., Chu,F., Huang,H., Hill-Force,A. and Talbot,W.S. (2000) Zebrafish comparative genomics and the origins of vertebrate chromosomes. Genome Res., 10, 1890-1902.

13. Robinson-Rechavi,M., Marchand,O., Escriva,H., Bardet,P.L., Zelus,D., Hughes,S. and Laudet,V. (2001) Euteleost fish genomes are characterized by expansion of gene families. Genome Res., 11, 781-788.

14. Woods,I.G., Kelly,P.D., Chu,F., Ngo-Hazelett,P., Yan,Y.?L., Huang,H., Postlethwait,J.H. and Talbot,W.S. (2000) A comparative map of the zebrafish genome. Genome Res., 10, 1903-1914.

15. Taylor,J., Van de Peer,Y., Braasch,I. and Meyer,A. (2001) Comparative genomics provides evidence for an ancient genome duplication in fish. Phil. Trans. Roy. Soc. B, 356, 1-19.

16. Fitch,W.M. (2000) Homology: a personal view on some of the problems. Trends Genet., 16, 227-231.

17. Mindell,D.P. and Meyer,A. (2001) Homology evolving. Trends Ecol. Evol., 16, 434-440.

18. Lynch,M. and Conery,J.S. (2000) The evolutionary fate and consequences of duplicate genes. Science, 290, 1151-1155.

19. Uyeno,T. and Smith,G.R. (1972) Tetraploid origin of the karyotype of catostomid fishes. Science, 175, 644-646.

21. Dingerkus,G. and Howell,W.M. (1976) Karyotypic analysis and evidence of tetraploidy in the North American paddlefish, Polyodon spathula. Science, 194, 842-844.

22. Allendorf,F.W. and Utter,F.M. (1976) Gene duplication in the family Salmonidae. III. Linkage between two duplicated loci coding for aspartate aminotransferase in the cutthroat trout (Salmo clarki). Hereditas, 82, 19-24.

23. Ferris,S.D. and Whitt,G.S. (1977) Loss of duplicate gene expression after polyploidisation. Nature, 265, 258-260.

24. Robinson-Rechavi,M., Marchand,O., Escriva,H. and Laudet,V. (2001) An ancestral whole?genome duplication may not have been responsible for the abundance of duplicated fish genes. Curr. Biol., 11, R458-R459.

25. Bailey,G.S., Poulter,R.T. and Stockwell,P.A. (1978) Gene duplication in tetraploid fish: model for gene silencing at unlinked duplicated loci. Proc. Natl. Acad. Sci. USA, 75, 5575-5579.

26. Li,W.-H. (1980) Rate of gene silencing at duplicate loci: a theoretical study and interpretation of data from tetraploid fishes. Genetics, 95, 237-258.

27. Ohno, S. (1970) Evolution by Gene Duplication. Springer Verlag, New York, NY.

28. Nowak,M.A., Boerlijst,M.C., Cooke,J. and Maynard Smith,J. (1997) Evolution of genetic redundancy. Nature, 388, 167-171.

29. Gibson,T.J. and Spring,J. (1998) Genetic redundancy in vertebrates: polyploidy and persistence of genes encoding multidomain proteins. Trends Genet., 14, 46-49.

30. Force,A., Lynch,M., Pickett,F.B., Amores,A., Yan,Y.?l. and Postlethwait,J. (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics, 151, 1531-1545.

31. Ekker,M., Akimenko,M.A., Allende,M.L., Smith,R., Drouin,G., Langille,R.M., Weinberg,E.S. and Westerfield,M. (1997) Relationships among msx gene structure and function in zebrafish and other vertebrates. Mol. Biol. Evol., 14, 1008-1022.

32. Martínez-Barberá,J.P., Toresson,H., Da Rocha,S. and Krauss,S. (1997) Cloning and expression of three members of the zebrafish Bmp family: Bmp2a, Bmp2b and Bmp4. Gene, 198, 53-59.

33. Laforest,L., Brown,C.W., Poleo,G., Geraudie,J., Tada,M., Ekker,M. and Akimenko,M.-A. (1998) Involvement of the Sonic Hedgehog, patched 1 and bmp2 genes in patterning of the zebrafish dermal fin rays. Development, 125, 4175-4184.

34. Van de Peer,Y., Taylor,J.S., Braasch,I. and Meyer,A. (2001) The ghost of selection past: rates of evolution and functional divergence in anciently duplicated genes. J. Mol. Evol., 53, 434-444.

35. Lynch,M. and Force,A. (2000) The probability of duplicate gene preservation by subfunctionalization. Genetics, 154, 459-473.

36. Sharman,A.C. (1999). Some new terms for duplicated genes. Cell Dev. Biol., 10, 561-563.

37. Stoesser,G., Baker,W., van den Broek,A., Camon,E., Garcia-Pastor,M., Kanz,C., Kulikova,T., Lombard,V., Lopez,R., Parkinson,H. et al. (2001) The EMBL nucleotide sequence database. Nucleic Acids Res., 29, 17-21.

38. Wheeler,D.L., Church,D.M., Lash,A.E., Leipe,D.D., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Tatusova,T.A., Wagner,L. and Rapp,B.A. (2001) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 29, 11-16.

39. Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403-410.

40. Sprague,J., Doerry,E., Douglas,S. and Westerfield,M. (2001) The Zebrafish Information Network (ZFIN): a resource for genetic, genomic and developmental research. Nucleic Acids Res., 29, 87-90.

41. Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G. (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res., 25, 4876-4882.
 

Methods

BLASTp searches using human genes as query sequences were limited to Xenopus laevis, Gallus gallus, Mus musculus and Homo sapiens and to the Class Actinopterygii. tBLASTn was used to search the Tetraodon and Takifugu genome sequence databases. The translated amino acid sequences produced by these tBLASTn searches were added to the protein sequences retrieved by BLASTp. BLAST ‘cut-off’ values varied among genes and among species. In many cases it was easy to decide which genes to include in our phylogenetic analyses and which genes to exclude. For example, when Human BMP2 was used to BLASTp zebrafish sequences, the e values were e-122 or lower for bmp2 and bmp4 genes of various lengths (which were all included in preliminary phylogenetic the analyses) and 2e-54 for the next gene in the list. In some cases gene names influenced our identification of a cut-off value. When we used Human OTX1 to search for zebrafish genes there were two obvious cut-off points. Zebrafish otx1 had an e value of e-109 and the next gene, otx2 had an e value of 7e-52. We included otx2 and several others genes until reaching the next cut-off which occurred between e-39 and e-18. It is important to note that we always used an obvious cut-off and never excluded or included genes based only upon their names. Whenever a cut-off was difficult to identify, we included genes that we suspected would not turn out to be orthologs. Our preliminary phylogenetic analyses provided us with the opportunity to remove distantly related genes.

Sequence alignments

Protein sequences were aligned in BioEdit using the accessory application CLUSTALW.  Preliminary trees were reconstructed from these initial alignments using TREECON. These analyses identified sequences that differed only in length or by very few amino acid substitutions, which were then removed from the alignments. These preliminary phylogenetic analyses also identified very divergent genes (i.e., non-orthologous sequences), which were also removed from the alignments. Sequences were then realigned and only unambiguously aligned amino acid positions were retained.

In some cases, two sequence alignments are provided for the same gene.  The reason is that sometimes additional fish genes could be added but often these were partial.  In these cases we prefer to show two different alignments, one with more (partial) fish sequences, and one with more sites but omitting the partial fish sequences.

Phylogenetic Analysis

Trees were constructed with different methods.  In general, we calculated Poisson-corrected genetic distances and reconstructed Neighbour-joining trees using TREECON. For almost all analyses the most closely related human gene was used to root the trees to root the trees. Most genes studied are members of multigene families and, therefore, the human outgroup sequence was usually a paralog of the human ingroup sequence. Confidence in topologies was assessed by 500 bootstrap reiterations.

In a number of cases we could improve our tree topologies considerably by taken into account mutational saturation.  To this end, a Java-based application has been developed to visualize the amount of saturation in amino acid sequences. The program, called ASaturA, graphically displays the number of observed frequent and rare amino acid replacements between pairs of sequences against their overall evolutionary distance (see Figure).

Discrimination between frequent and rare amino acid replacements is based on substitution probability matrices (e.g., PAM and BLOSUM).  Evolutionary distances between sequences can then be computed from the fraction of unsaturated sites only and evolutionary trees inferred by pairwise distance methods (Van de Peer et al. 2002, Gene (in press)).

Maximum likelihood trees were constructed with TREE-PUZZLE and Maximum parsimony trees were constructed with PHYLIP or PAUP*.

In general, the trees shown in Wanda are distance trees unless other methods provide more convincing support for a certain topology.


Last update 7/12/2K1