1. What can I found from the Panda Database?
We set up this database to present the entire DNA sequence of giant panda Jingjing assembled based on 3.4 billion reads (176 Gbp raw data) generated by Illumina Genome Analyzer. In total, 36.2% of genome was identified as transposable elements, and a reference set of 21001 genes was created. We also identified the ncRNAs, CpG islands, and promoters in the genome. Besides the regular annotation, 2.7 million heterozygous SNPs, 267,958 Indels and 4,379 SVs were also identified. All these features are presented in Map View, and are freely available to download. Furthermore, a BLAST web service offers online alignment of query sequences against the panda genome.
2. How to use this database?
There are 6 functional pages in the top level: Home, short introduction of the panda project and database; Mapview, genome brower to show the detailed annotation information for each genomic region; Blast, provide homology searching service against the panda genome; Download, provide users data access through ftp; Help, equal to FAQ, give answers to the main questions; Links, some related genome resource and bioinformatic tools. You can get understand by the help information on each page, if any further question about the data, please contact email@example.com.
3. How did you assembly the large genome with so short reads?
We assembled the short reads using SOAPdenovo, a genome assembler developed based on the de Bruijn graph theory. Firstly, we assembled the short reads from small insert-size libraries (<500bp) into contigs according to pure sequence overlap information and break contigs at boundaries of ambiguous connections of repeat sequences. Then, the paired-end information was used step-by-step from the smallest (150bp) to the longest (10Kb) insert size to joint the contigs into scaffolds. Finally, we used the paired-end information to extract reads and performed local assembly to fill in the small gaps inside scaffolds.
4. How did you call SNPs, Indels and SVs by sequencing only one panda?
We have sequenced the diploid genome a 3 years old female panda, chosen from the Chengdu Research Base of Giant Panda Breeding. The captive breeding of panda follows the principle to maintain the genetic polymorphism, so there is high divergence between the two parental haploid genomes inside a single panda. Using the assembled panda genome sequence as reference, we realigned all the sequencing reads onto the genome to identify the heterozygous SNPs, small Indels, and SVs (structural variations).
5. Contact us
Beijing Genomics Institute(BGI) - Shenzhen
Tel: +86 (0) 755 25273910
Fax: +86 (0) 755 25273620
Add: Complex Building, Beishan Industrial Zone, Beishan Road, Yantian District, Shenzhen, China, 518083.