The genome of Schistosoma haematobium

B Li, S Liu, L Yang, Z Xiong, Y Li, X Xu, F Chen, X Wu, G Zhang, X Fang, Y Kang, H Yang, J Wang, J Wang & Z Yan
Schistosoma haematobium is an important digenetic trematode, and is found in the Middle East, India, Portugal and Africa. It is a major agent of schistosomiasis. More specifically, it is associated with urinary schistosomiasis. Adults are found in the Venous plexuses around the urinary bladder and the released eggs traverse the wall of the bladder causing haematuria and fibrosis of the bladder. The bladder becomes calcified, and there is increased pressure on ureters and kidneys otherwise...

Hepatocellular carcinoma genomic data from the Asian Cancer Research Group

Z Kan, H Zheng, X Liu, S Li, TD Barber, Z Gong, H Gao, K Hao, , J Xu, R Hauptschein, PA Rejto, J Fernandez, G Wang, Q Zhang, B Wang, R Chen, J Wang, NP Lee, WH Lee, PN Ariyaratne, C Tennakoon, FH Mulawadi, KF Wong, AM Liu … & The Asian Cancer Research Group
Hepatocellular carcinoma (HCC) is one of the most common solid tumors worldwide and represents the third leading cause of cancer deaths. Hepatitis B virus (HBV) is a major etiologic agent, leading to an increased risk of developing HCC, in particular those with acute liver disease and cirrhosis. The Asian Cancer Research Group (ACRG) is an independent, not-for-profit company established to accelerate research and improve treatment for patients affected with the most commonly-diagnosed cancers in Asia....

Type 2 Diabetes gut metagenome (microbiome) data from 368 Chinese samples and updated metagenome gene catalog

S Li, Y Guan, W Zhang, F Zhang, Z Cai, W Wu, D Zhang, Z Jie, S Liang, D Shen, Y Qin, R Xu, M Wang, M Gong, J Yu, Y Zhang, L Han, D Lu, P Wu, Y Dai, X Sun, Z Li, A Tang, S Zhong, X Li … & J Wang
We provide data from the sequenced and analyzed gut metagenome of 368 Chinese individuals with Type 2 Diabetes (T2D) and healthy controls used in a newly developed two stage Metagenome-Wide Association Study (MWAS) aimed at identifying associations between gut microbiota and Type 2 Diabetes. The data here include the an updated metagenome gene catalog, metagenome assemblies, genetic and functional markers associated with T2D, and a novel form of marker- a Metagenomic Linkage Marker (MGL), that...

Bisulfite-PCR combined with cloning Sanger sequencing data for validating DNA methylation level in Trichinella spiralis

F Gao, J Wang & G Ji
Trichinella spiralis is the smallest nematode parasite of humans and is also infectious in the rat, pig and bear species. Responsible for the disease trichinosis, it is often referred to as the “pork worm” due to infection usually being caused by the consumption of undercooked pork products. Adults mature in the intestines of an intermediate host, and each female produces batches of larvae that bore through the intestinal wall and the lymphatic system. They are...

Single cell whole-exome sequences of bladder cancer from an individual

Y Li, X Xu, L Song, Y Hou, F Li, K Wu, H Wu, J Liang, M Jian, J Li, X Zhang, J Wang, H Yang & J Wang
This dataset contains single-cell and whole-tissue sequencing and annotation data from a muscle-invasive bladder transitional cell carcinoma from one individual. The data available includes: single-cell whole-exome sequences from 55 individual cells, including 44 from the tumor and 11 from normal adjacent tissue; whole-tissue DNA sequence data from this cancer and the matched normal. Additional data includes alignments, SNP calling, and high confidence somatic mutation calling and their allelic frequencies.

Genomic data of the Puerto Rican Parrot (Amazona vittata) from a locally funded project

TK Oleksyk, W Guiblet, JF Pombert, R Valentin & JC Martinez-Cruzado
These data represent the first assembly of a genome sequence for a critically endangered parrot (Amazona vittata) endemic to the United States, and also the first genome of a species from the diverse and ecologically important genus Amazona native to South America and the Caribbean. One sample has been selected from the non-reproductive female at Rio Abajo Breeding Facility in Puerto Rico (IACUC#201109.1), and sequenced on Illumina HiSeq platform with both fragment and paired-end sequencing...

The genome of Darwin’s Finch (Geospiza fortis)

G Zhang, P Parker, B Li, H Li & J Wang
The Medium Ground Finch Geospiza fortis, is one of species of finches first collected by Charles Darwin on the Galapagos islands, and is emblematic for its involvement in the development of evolutionary theory by Darwin and for confirming the action of natural selection. Also know as a Darwin finch, we have sequenced a female individual at 115X coverage of HiSeq data, and produced a high quality draft genome assembly. The genome size of Geospiza fortis...

Updated genome assembly of YH: the first diploid genome sequence of a Han Chinese individual (version 2, 07/2012)

J Wang, Y Li, R Luo, B Liu, Y Xie, Z Li, X Fang, H Zheng, J Qin, B Yang, C Yu, P Ni, N Li, G Guo, J Ye, L Fang, Y Su, Asan , H Zheng, K Kristiansen, GK Wong, R Nielsen, R Durbin, L Bolund, X Zhang … & J Wang
Updated genomic data from the YH (Homo sapiens) diploid genome – the first sequenced Han Chinese individual, a representative of the Asian population. The genomic DNA used in this study came from an anonymous male Han Chinese individual who has no known genetic diseases. The original version of the YH genome was assembled based on 3.3 billion reads using the Illumina Genome Analyzer (see dataset doi:10.5524/100015). This latest (as of 07/2012) and improved version of...

Genomic data from an extinct Palaeo-Eskimo

M Rasmussen, Y Li, S Lindgreen, JS Pedersen, A Albrechtsen, I Moltke, M Metspalu, E Metspalu, T Kivisild, R Gupta, M Bertalan, K Nielsen, MT Gilbert, Y Wang, M Raghavan, PF Campos, HM Kamp, AS Wilson, A Gledhill, S Tridico, M Bunce, ED Lorenzen, J Binladen, X Guo, J Zhao … & E Willerslev
Available here is the genome of a male individual from an extinct Palaeo-Eskimo culture, the first known group of Homo sapiens to settle in Greenland. The DNA sample was obtained from ~4,000-year-old permafrost-preserved hair, and was shown to have very low modern DNA contamination. The diploid genome was sequenced to an average depth of 20x using Illumina GAII sequencing platforms, with 79% recovery. Correct indexed reads were mapped to the human genome (hg18) with a...

Resources for the MeDUSA (Methylated DNA Utility for Sequence Analysis) MeDIP-seq computational analysis pipeline for the identification of differentially methylated regions, and associated methylome data from 18 wild-type and mutant mouse ES, NP and MEF cells

G Wilson, P Dharmi, Y Saito, D Cortazar, C Kunz, P Schär & S Beck
Here we present 18 genome-wide DNA methylation profiles of wild type and Thymine DNA glycosylase (Tdg) knockout cells, which serve as an excellent murine methylome resource. The 18 samples represent 6 biological cohorts: 6 samples were derived from mouse embryonic stem cells (3 Tdg+/-, 3 Tdg-/-), 6 samples were from mouse neural precursor cells (3 Tdg+/-, 3 Tdg-/-) and 6 samples were obtained from mouse embryonic fibroblasts (3 Tdg+/+, 3 Tdg-/-). Next generation sequencing was...

Genomic data from the Wuzhishan inbred pig (Sus scrofa)

X Fang, Z Huang, Y Li, Y Feng, Y Chen, X Jiang & L Yang
The pig (Sus scrofa) is an economically important food source and also serves as an important model organism. Here we present the sequence of one male inbred Wuzhishan pig (WZSP) at generation F20. DNA was extracted from the ear of this species. We applied a whole genome shotgun strategy and next-generation sequencing technologies using Illumina HiSeq 2000 for sequencing, in total 340.53 Gb of raw data were generated and 210.79 Gb high quality data were...

Software and supporting material for “SOAPdenovo2: An empirically improved memory-efficient short read de novo assembly”

R Luo, B Liu, Y Xie, Z Li, W Huang, J Yuan, G He, Y Chen, Q Pan, Y Liu, J Tang, G Wu, H Zhang, Y Shi, Y Liu, C Yu, B Wang, Y Lu, C Han, D Cheung, SM Yiu, G Liu, X Zhu, S Peng, Y Li … & J Wang
SOAPdenovo2 is the latest de novo genome assembly package from BGI’s SOAP (short oligonucleotide analysis package) suite of tools (homepage here: http://soap.genomics.org.cn/). Compared to SOAPdenovo1, this new version has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closure, and is optimized for large genomes. Using new sequencing data from the YH (Homo sapiens)...

Genomic data from the Pacific oyster (Crassostrea gigas)

G Zhang, X Fang, X Guo, L Li, R Luo, F Xu, P Yang, L Zhang, X Wang, H Qi, Y Zhu, L Yang & Z Huang
The Pacific oyster (Crassostrea gigas) belongs to one of the most species-rich but genomically poorly explored phyla, the Mollusca. In this study, DNA was extracted from an oyster derived from four generations of full-sib mating. Short-reads and a fosmid-pooling strategy were used for sequencing and assembly of the oyster genome. DNA extracted from the inbred oyster and another wild oyster was used for resequencing analysis. Finally, RNA-seq and small RNA from different organs at different...

