二代测序数据分析介绍PPT课件

简介 相关

截图

二代测序数据分析介绍PPT课件

简介

这是一个关于二代测序数据分析介绍PPT课件,包括了重测序的原理及流程,数据结构与质量评估,SRA数据库及数据获取,Bowtie2、BWA和SAMtools软件使用等内容。二代测序数据分析简介童春发 2013.12.23 主要内容重测序的原理及流程数据结构与质量评估 SRA数据库及数据获取 Bowtie2、BWA和SAMtools软件使用 重测序的原理及流程数据结构与质量评估 Fastq格式 FastQC FASTQ format http://en.wikipedia.org A FASTQ file containing a single sequence might look like this Illumina sequence identifiers With Casava 1.8 the format of the '@' line has changed Quality A quality value Q is an integer mapping of p (i.e., the probability that the corresponding base call is incorrect),欢迎点击下载二代测序数据分析介绍PPT课件哦。

二代测序数据分析介绍PPT课件是由红软PPT免费下载网推荐的一款公司管理PPT类型的PowerPoint.

二代测序数据分析简介童春发 2013.12.23 主要内容重测序的原理及流程数据结构与质量评估 SRA数据库及数据获取 Bowtie2、BWA和SAMtools软件使用 重测序的原理及流程数据结构与质量评估 Fastq格式 FastQC FASTQ format http://en.wikipedia.org A FASTQ file containing a single sequence might look like this Illumina sequence identifiers With Casava 1.8 the format of the '@' line has changed Quality A quality value Q is an integer mapping of p (i.e., the probability that the corresponding base call is incorrect). Phred quality score: The Solexa pipeline (i.e., the software delivered with the Illumina Genome Analyzer) earlier used Quality Encoding Sanger format can encode a Phred quality score from 0 to 93 using ASCII 33 to 126 Illumina's newest version (1.8) of their pipeline CASAVA will directly produce fastq in Sanger format Solexa/Illumina 1.0 format can encode a Solexa/Illumina quality score from -5 to 62 using ASCII 59 to 126 Starting with Illumina 1.3 and before Illumina 1.8, the format encoded a Phred quality score from 0 to 62 using ASCII 64 to 126 Starting in Illumina 1.5 and before Illumina 1.8, the Phred scores 0 to 2 have a slightly different meaning American Standard Code for Information Interchange (ASCII) FastQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ Double click “run_fastqc.bat” to run FastQC The analysis results for 11 modules Green tick for normal Orange triangle for slightly abnormal Red cross for very unusual Basic Statistics Per Base Sequence Quality Per Sequence Quality Scores Per Base Sequence Content Per Base GC Content Per Sequence GC Content Per Base N Content Sequence Length Distribution Duplicate Sequences Overrepresented Sequences Overrepresented Kmers Saving a Report SRA数据库及数据获取 SRA数据库及数据获取 SRA数据库及数据获取 SRA数据库及数据获取查看和下载SRR576183 Fastq-dum将SRA文件转化成FASTQ格式 fastq-dump --split-files -DQ “+” ./SRR576183.sra fastq-dump --split-files -DQ “+” --gzip ./SRR576183.sra 直接下载FASTQ格式数据 ftp://ftp.era.ebi.ac.uk/vol1/fastq/SRR576/SRR576183 将Reads比对到参考序列 BWA Bowtie2 Soap Samtools BWA http://bio-bwa.sourceforge.net/ https://github.com/lh3/bwa wget http://sourceforge.net/projects/bio-bwa/files/bwa-0.7.5a.tar.bz2 tar -xjvf bwa-0.7.5a.tar.bz2 cd bwa-0.7.5a make Dowload test.tar.gz from ftp://202.119.214.193 BWA ../bwa-0.7.5a/bwa index ref.fa ../bwa-0.7.5a/bwa mem ref.fa test_PE1.fa > aln-se.sam ../bwa-0.7.5a/bwa mem ref.fa test_PE1.fa test_PE2.fa > aln-se.sam Bowtie2 http://bowtie-bio.sourceforge.net/bowtie2/index.shtml 下载 bowtie2-2.1.0-linux-x86_64.zip unzip bowtie2-2.1.0-linux-x86_64.zip mv bowtie2-2.1.0 bowtie2 cd bowtie2/example mkdir work cd work Bowtie2 Index a reference genome ../../bowtie2-build ../reference/lambda_virus.fa lambda_virus Aligning single-end reads ../../bowtie2 -x lambda_virus -U ../reads/reads_1.fq -S eg1.sam Aligning paired-end reads ../../bowtie2 -x lambda_virus -1 ../reads/reads_1.fq -2 ../reads/reads_2.fq -S eg2.sam -U: unpaired reads -S: sam format SAM output Name of read that aligned Sum of all applicable flags. Flags relevant to Bowtie are: 1: The read is one of a pair 2: The alignment is one end of a proper paired-end alignment 4: The read has no reported alignments 8: The read is one of a pair and has no reported alignments 16: The alignment is to the reverse reference strand SAM output 32: The other mate in the paired-end alignment is aligned to the reverse reference strand 64: The read is mate 1 in a pair 128: The read is mate 2 in a pair Name of reference sequence where alignment occurs 1-based offset into the forward reference strand where leftmost character of the alignment occurs SAM output Mapping quality CIGAR string representation of alignment Name of reference sequence where mate's alignment occurs. Set to = if the mate's reference sequence is the same as this alignment's, or * if there is no mate. 1-based offset into the forward reference strand where leftmost character of the mate's alignment occurs. Offset is 0 if there is no mate SAM output Inferred fragment size. Size is negative if the mate's alignment occurs upstream of this alignment. Size is 0 if there is no mate. Read sequence (reverse-complemented if aligned to the reverse strand) ASCII-encoded read qualities (reverse-complemented if the read aligned to the reverse strand). The encoded quality values are on the Phred quality scale and the encoding is ASCII-offset by 33 (ASCII char !), similarly to a FASTQ file. Optional fields. Fields are tab-separated. bowtie2 outputs zero or more of these optional fields for each alignment, depending on the type of the alignment: SAM output Optional fields: AS:i: Alignment score. Only present if SAM record is for an aligned read XS:i: Alignment score for second-best alignment. Only present if the SAM record is for an aligned read and more than one alignment was found for the read YS:i: Alignment score for opposite mate in the paired-end alignment. Only present if the SAM record is for a read that aligned as part of a paired-end alignment. SAM output Optional fields: XN:i: The number of ambiguous bases in the reference covering this alignment. Only present if SAM record is for an aligned read XM:i: The number of mismatches in the alignment. Only present if SAM record is for an aligned read XO:i: The number of gap opens, for both read and reference gaps, in the aligment. Only present if SAM record is for an aligned read SAM output Optional fields: XG:i: The number of gap extensions, for both read and reference gaps, in the aligment. Only present if SAM record is for an aligned read NM:i: The edit distance; that is, the minimal number of one-necleotide edits (substitutions, insertions and deletions) needed to transform the read string into the reference string. Only present if SAM record is for an aligned read SAM output Optional fields: YP:i: Equals 1 if the read is part of a pair that has at least N concordant alignments, where N is the argument specified to –M plus one. Equals 0 if the read is part of pair that has fewer than N alignments. E.g. if –M 2 is specified and 3 distinct, concordant paired-end alignments are found, YP:i:1 will be printed. If fewer than 3 are found, YP:i:0 is printed. Only present if SAM record is for a read that aligned as part of a paired-end alignment. SAM output Optional fields: YM:i: Equals 1 if the read aligned with at least N unpaired alignments, where N is the argument specified to –M plus one. Equals 0 if the read aligned with fewer than N unpaired alignments. E.g. if –M 2 is specified and 3 distinct, valid, unpaired alignments are found, YM:i:1 is printed. If fewer than 3 are found, YM:i:0 is printed. Only present if SAM record is for a read that Bowtie 2 attempted to align in an unpaired fashion. SAM output Optional fields: YF:Z: String indicating reason why the read was filtered out. Only appears for reads that were filtered out. MD:Z: A string representation of the mismatched reference bases in the alignment. Only present if SAM record is for an aligned read. SAMtools http://samtools.sourceforge.net/ Install SAMtools: Dowload samtools-0.1.19.tar.bz2 tar –xjvf samtools-0.1.19.tar.bz2 Or: git clone git://github.com/samtools/samtools.git cd samtools-0.1.19 make SAMtools: Primer Tutorial http://biobits.org/samtools_primer.html Sample Data Files Aligning Reads Using Bowtie2 Converting SAM to BAM Sorting and Indexing Identifying Genomic Variants Understanding the VCF Format Visualizing Reads SAMtools: Primer Tutorial Sample Data Files unzip samtools_primer-master.zip Aligning Reads Using Bowtie2 cd samtools_primer-master ~/bowtie2/bowtie2 -x indexes/e_coli -U simulated_reads/sim_reads.fq -S sim_reads_aligned.sam SAMtools: Primer Tutorial Converting SAM to BAM ~/samtools-0.1.19/samtools view -b -S -o sim_reads_aligned.bam sim_reads_aligned.sam Sorting and Indexing ~/samtools-0.1.19/samtools sort sim_reads_aligned.bam sim_reads_aligned.sorted ~/samtools-0.1.19/samtools index sim_reads_aligned.sorted.bam SAMtools: Primer Tutorial Identifying Genomic Variants ~/samtools-0.1.19/samtools mpileup -g -f genomes/NC_008253.fna sim_reads_aligned.sorted.bam > sim_variants.bcf ~/samtools-0.1.19/bcftools/bcftools view -c -v sim_variants.bcf > sim_variants.vcf SAMtools: Primer Tutorial Understanding the VCF Format http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 SAMtools: Primer Tutorial Visualizing Reads ~/samtools-0.1.19/samtools tview sim_reads_aligned.sorted.bam genomes/NC_008253.fnaU4H红软基地

展开

同类推荐

热门PPT

相关PPT