Data for : Poly(A) Dataset for PAS sequences and pseudo-PAS sequences Classification (fasta format)

Fahad Albalawi, Abderrazak Chahid, Xingang Guo, Somayah Albaradei, Arturo Magana-Mora, Boris R. Jankovic, Mahmut Uludag, Christophe Van Neste, Magbubah Essack, Taous-Meriem Laleg-Kirati & Vladimir B. Bajic
This Dataset contains DNA sequences of the human genome hg38 from GENCODE folder at EBI ftp server (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/GRCh38.primary_assembly.genome.fa.gz) A-Positive set (PAS sequences) Using GENCODE annotation for poly(A) (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.polyAs.gff3.gz) We selected poly(A) signal annotation. Using bedtools-slop option, we found regions extended 300 bp upstream and 300 bp downstream of the poly(A) hexamer. With the bedtools-getfasta option, we extracted 606 bp fasta sequences from these regions. After eliminating duplicates, we obtained 37’516 presumed true functional poly(A) signal...
This data repository is not currently reporting usage information. For information on how your repository can submit usage information, please see our documentation.