Quality control of next generation sequencing data using random forests as probability machines

Silke Szymczak, Hua Ling, Terri H Beaty & Joan E Bailey-Wilson
Introduction: Next generation DNA sequencing technologies are a promising tool to identify rare genetic variants controlling susceptibility of complex diseases. Sequencing of many individuals usually identifies thousands of rare variants and some of them are sequencing or alignment artifacts. Quality[for full text, please go to the a.m. URL]