BACKGROUND

Songbird research has had major impact in neurobiology. Study of songbirds may illuminate relationships between genes, brains and behavior. The goal of the Songbird Neurogenomics Initiative is to facilitate the application of powerful new methods of genomics in songbird research. Since 2001, a consortium of seven songbird investigators, at six different institutions in the U.S., has worked with the Keck Center for Comparative and Functional Genomics at the University of Illinois, to establish:

All these resources have been made available to the larger scientific community. Much sequence information is already freely available here on this website, and investigators can obtain individual cDNA clones for their own research projects, at nominal cost.

This project is supported by the NIH, and will have both immediate and long-term consequences for research into a number of issues relevant to human health and disease, including molecular mechanisms involved in learning, control of sexual differentiation, effects of steroids on brain function and development, and processes that promote or limit brain cellular plasticity and repair.

The first goal of the SoNG Initiative was to produce a database for finding and retrieving gene sequences expressed in the zebra finch brain. The database is now in its third generation of development and is known ESTIMA:Songbird). ESTIMA:Songbird is a database of Expressed Sequence Tags (ESTs). ESTs are determined by high-throughput sequencing of cDNA clones, derived from zebra finch brain RNA. By operational definition, a gene is "expressed" if it gives rise to RNA. We refer to the sequences as "tags" because they are only partial sequences of each RNA - useful as tags to measure the amount of the corresponding RNA via hybridization techniques (e.g., DNA microarrays), and to predict at least part of the protein sequence encoded by the RNA. Identification of the encoded protein can give insight into the probable functions of the gene, especially since protein sequences and functions are highly conserved in nature. If a zebra finch protein is nearly identical in sequence to a protein well-characterized in many other species, one can be reasonably confident that the zebra finch protein has similar functional properties.

CURRENT CONTENT of ESTIMA:Songbird

  1. 86,784 high-quality ESTs representing RNAs expressed in the zebra finch brain
  2. Derived from several different cDNA libraries produced at Michigan State University, Duke University and the University of Illinois
  3. ESTs were assembled into 31,658 unique sequences, which were used for BLAST similarity searches against sequence databases for chicken and human (earlier generations also included annotation against mouse and rat). Annotations for each EST are derived from the results of these BLAST searches.
  4. Described in detail in Replogle et al. (submitted).

HISTORY

In the first generation of the database (2003-2005), 2400 clones from the zebra finch brain cDNA library were produced by Prof. Juli Wade and Michigan State University and partially sequenced (single reads from 5' ends of inserts), yielding 1840 ESTs with a redundancy of 9.56%. Next a new normalized cDNA library ("SB02") was produced at the Keck Center of the University of Illinois. More than 18,000 clones were partially sequenced (single reads from 5' ends of inserts), and a third cDNA library ("SB03") was then produced by subtractive depletion of these sequences from the SB02 library. More than 19,000 5' reads were obtained from SB03. In combination, 40,224 "filtered high quality" sequences were obtained from SB01, SB02 and SB03. Using standard clustering and "contig" algorithms, these ESTs were found to represent 17,878 non-redundant sequences. Those predicted gene products were annotated by BLAST sequence similarity searches against four external databases: TIGR Gallus gallus (chicken) EST, NCBI chicken unigene, Swissprot, NR.aa. Approximately 76% of the zebra finches ESTs had highly significant hits against the chicken EST collection.

To produce the second generation of this, 5' and 3' end reads from ~14,000 zebra finch brain cDNAs were obtained from the Jarvis, Wada et al. group at Duke University and combined with the SB01-SB03 sequences above, to yield a total collection of 58,211 filtered high quality sequences. These assembled into 22,628 unique sequences and were annotated by the Bioinformatics Unit of the Keck Center.

For the third (current) generation of the database, additional rounds of subtraction were performed on the original SB02 library, and additional sequences were incorporated from Xiao-Chieng Li (Rockefeller University).