(本文由GeneDock Senior Bioinformatics Engineer 许雄翻译撰写,转载请保留作者信息和原文链接)
usage
samtools
Program: samtools (Tools for alignments in the SAM format)
Version: 1.3 (using htslib 1.3)
Usage: samtools <command> [options]
Commands:
-- Indexing
dict create a sequence dictionary file
faidx index/extract FASTA
index index alignment
-- Editing
calmd recalculate MD/NM tags and '=' bases
fixmate fix mate information
reheader replace BAM header
rmdup remove PCR duplicates
targetcut cut fosmid regions (for fosmid pool only)
addreplacerg adds or replaces RG tags
-- File operations
collate shuffle and group alignments by name
cat concatenate BAMs
merge merge sorted alignments
mpileup multi-way pileup
sort sort alignment file
split splits a file by read group
quickcheck quickly check if SAM/BAM/CRAM file appears intact
fastq converts a BAM to a FASTQ
fasta converts a BAM to a FASTA
-- Statistics
bedcov read depth per BED region
depth compute the depth
flagstat simple stats
idxstats BAM index stats
phase phase heterozygotes
stats generate stats (former bamcheck)
-- Viewing
flags explain BAM flags
tview text alignment viewer
view SAM<->BAM<->CRAM conversion
depad convert padded BAM to unpadded BAM
升级日志
- 移除了过时的samtools sort in.bam out.prefix使用说明。如果你仍在使用-f,-o,或者out.prefix,则可转用-T PREFIX,-o FILE替代。
- 将bamshuf重命名成collate。
- mpileup命令在VCF/BCF中现将不可见的等位基因输出为<*>,替换掉了之前的X或,–output-tags注释加上了AD, ADF, ADR, INFO/AD, INFO/ADF, INFO/ADR,取代了之前的DV, DP4, DPR。
- mpileup现在在每个碱基位置采用BAQ计算,忽略了之前的-l 和 -r选项。
- samtools现在有一个configure脚本,这个脚本用来检查安装环境和帮助筛选HTSlib的安装环境。
- Samtools的Makefile文件现在支持可根据需要重写的CC/CPPFLAGS/CFLAGS/LDFLAGS/LIBS标准变量。
- 新增addreplacerg子命令,可用于增加或更改@RG header和添加RG:Z 记录项。
- rmdup不再立即终止,但仍不推荐大量使用。
- 合并上百万个headers现在只需花费合理的时间即可完成。
- samtools index 支持可选的输出路径选项。
- 修复了将远超过参考序列结束位置坐标给予未成功比对的reads时,calmd、targetcut、和潜在的 mpileup 字段错误。
- 如果你在源代码中使用bam_md.c’s bam_fillmd1_core(), bam_cap_mapQ(), or bam_prob_realn_core()这几个函数时,需要注意额外的ref_len选项。
- tview命令颜色方案改成了更加适应色盲者的颜色。
- samtools depad命令现可操作CIGAR记录项的N,而且支持CRAM文件。
- samtools stats 按列输出ATCG在每个碱基位的含量的N、other统计。
- samtools depth 增加-a选项,支持输出深度为0的位点。
- 增加了samtools dict子命令,与Picard用法一致,用于对参考序列建立字典。
- 在sam.h中增加了sam_index_load()和samfetch() API,提供了类似bam_fetch()迭代BAM或CRAM文件的方式。
- 修复了当写sam文件的时候,samopen()只能以wh模式写header。
- samtools fixmate - -又能以数据流的模式工作。
- 恢复了当写压缩级别为0的bam文件时,之前的samtools calmd -u选项。Samtools 1.0 到 1.2 版本写non-BGZF BAM文件不正确,以致其他工具不能读此bam文件。
- 恢复了bam_nt16_nt4_table[]到bam.h头文件。
建立参考基因组字典
samtools dict
About: Create a sequence dictionary file from a fasta file
Usage: samtools dict [options] <file.fa|file.fa.gz>
Options: -a, --assembly STR assembly
-H, --no-header do not print @HD line
-o, --output STR file to write out dict file [stdout]
-s, --species STR species
-u, --uri STR URI [file:///abs/path/to/file.fa]
time samtools dict ../testData/GRCh37.p13.genome.fa -o ../testData/GRCh37.p13.genome.fa.dict
real 0m53.134s
user 0m16.855s
sys 0m2.020s
建立参考基因组索引
time samtools faidx ../testData/GRCh37.p13.genome.fa
real 0m13.375s
user 0m12.553s
sys 0m0.771s
建好参考基因组索引之后,可检索出基因组上任意大小任意区域的序列
time samtools faidx ../testData/GRCh37.p13.genome.fa chr17:7564097-7565197
>chr17:7564097-7565197
CAGCCTGGGCAATGAGAGCGAAACTCTGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AGATAACAAAAAGAACAAAAGAACAAATATATCTAATGTGTTCCAATTAAATTGAAATGC
TAATTTTGGCCATGTAATAATTCTCCTGTATGAGAACATCTCCTGTATTTGTTACTTGCT
TTCCTTGGTTTGTCTACCACTCCATATGTCACATCCACCTCATTCCAGCAGTTATATAGT
AAGTTTTCTTTTTTTTTTTCTCGTTTCTTTTTTTTTTTTTTTGATGGAGTCTCACTCTTG
TTGCCCAGGCTGGAGTACAGTAGCGTGATCTCAGCTCGCTGCAGCTTCCACCTCTCGAGT
TCAAGTGATTCTCCTGCCTCAGCCTCCCAAGTAGCTGGGATTACAGGCATGTGCCACCAT
GCCTGGCTAATTTTGTATTTTTAGTAGAGATGGGGTTTCACCATGTTGGTCAGGCTGGTC
TTGAACTCCTGACCTCATGTGATCCACTGCCTTGGACTCCCAAAGTGCTGGGATTACAGG
CGTGAGCCACTGCACCTGCCTAGCCTATACTAAGTTCTCTAAAATCTGGCAAGGCAAATA
GGACCCCCTCTCCATTCTGTTGCTTATTATTTACTTATTATTTACAATTTTATTATTTAT
GTTATTTTATATTATTTATTTTATTTATTGAGACAGTCTCCCGCTGTGTCGTCCAGGCTG
GAGTGCAGTGGCACGATTTCAGCTTACTGCAACCTCTGCCTCCCGGTTTCAAGGGATTCT
CCTGCCTCAGCCTCTCGAGTAGCTGGGATTATAGGTATGCACTACCACGCCTGGCTAAGT
TTTGTATTTTTAGTATAGACGGGGTTTTGCCATGTTGCCCAGACTGGTCCGGAACTCCTG
AGCTGAAAGCGATTCACCCGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCGGGAGCCA
CCGTGCCCGGCCTCCAGTATTTTGTTTATTTATTTTTTTTGAGACAGAGTCTCACTCTGT
TGCACAGGCTGGAGTGCAGTGGCACAATCTCTGCTCACTGCAACCTCCTCCTCCCTGGTT
AAGAGATCCTCCTGCCTCAGC
real 0m0.003s
user 0m0.000s
sys 0m0.005s
为bam文件建立索引(需要已排序的bam文件)
samtools index
Usage: samtools index [-bc] [-m INT] <in.bam> [out.index]
Options:
-b Generate BAI-format index for BAM files [default]
-c Generate CSI-format index for BAM files
-m INT Set minimum interval size for CSI indices to 2^INT [14]
time samtools index 5667a836534680001d0ce64c.sort.bam
real 2m5.484s
user 1m26.812s
sys 0m3.096s
对bam文件随机打乱顺序
samtools collate
Usage: samtools collate [-Ou] [-n nFiles] [-c cLevel] <in.bam> <out.prefix>
Options:
-O output to stdout
-u uncompressed BAM output
-l INT compression level [1]
-n INT number of temporary files [64]
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
连接多个bam文件(不做排序)
Usage: samtools cat [-h header.sam] [-o out.bam] <in1.bam> [...]
合并多个排序后的bam文件并排序输出一个bam文件
samtools merge
Usage: samtools merge [-nurlf] [-h inh.sam] [-b <bamlist.fofn>] <out.bam> <in1.bam> [<in2.bam> ... <inN.bam>]
Options:
-n Sort by read names
-r Attach RG tag (inferred from file names)
-u Uncompressed BAM output
-f Overwrite the output BAM if exist
-1 Compress level 1
-l INT Compression level, from 0 to 9 [-1]
-R STR Merge file in the specified region STR [all]
-h FILE Copy the header in FILE to <out.bam> [in1.bam]
-c Combine @RG headers with colliding IDs [alter IDs to be distinct]
-p Combine @PG headers with colliding IDs [alter IDs to be distinct]
-s VALUE Override random seed
-b FILE List of input BAM filenames, one per line [null]
-@, --threads INT
Number of BAM/CRAM compression threads [0]
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
对参考基因组每个位点做碱基堆积,用于call SNP和INDEL
samtools mpileup
Usage: samtools mpileup [options] in1.bam [in2.bam [...]]
Input options:
-6, --illumina1.3+ quality is in the Illumina-1.3+ encoding
-A, --count-orphans do not discard anomalous read pairs
-b, --bam-list FILE list of input BAM filenames, one per line
-B, --no-BAQ disable BAQ (per-Base Alignment Quality)
-C, --adjust-MQ INT adjust mapping quality; recommended:50, disable:0 [0]
-d, --max-depth INT max per-BAM depth; avoids excessive memory usage [250]
-E, --redo-BAQ recalculate BAQ on the fly, ignore existing BQs
-f, --fasta-ref FILE faidx indexed reference sequence file
-G, --exclude-RG FILE exclude read groups listed in FILE
-l, --positions FILE skip unlisted positions (chr pos) or regions (BED)
-q, --min-MQ INT skip alignments with mapQ smaller than INT [0]
-Q, --min-BQ INT skip bases with baseQ/BAQ smaller than INT [13]
-r, --region REG region in which pileup is generated
-R, --ignore-RG ignore RG tags (one BAM = one sample)
--rf, --incl-flags STR|INT required flags: skip reads with mask bits unset []
--ff, --excl-flags STR|INT filter flags: skip reads with mask bits set
[UNMAP,SECONDARY,QCFAIL,DUP]
-x, --ignore-overlaps disable read-pair overlap detection
Output options:
-o, --output FILE write output to FILE [standard output]
-g, --BCF generate genotype likelihoods in BCF format
-v, --VCF generate genotype likelihoods in VCF format
Output options for mpileup format (without -g/-v):
-O, --output-BP output base positions on reads
-s, --output-MQ output mapping quality
Output options for genotype likelihoods (when -g/-v is used):
-t, --output-tags LIST optional tags to output:
DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR []
-u, --uncompressed generate uncompressed VCF/BCF output
SNP/INDEL genotype likelihoods options (effective with -g/-v):
-e, --ext-prob INT Phred-scaled gap extension seq error probability [20]
-F, --gap-frac FLOAT minimum fraction of gapped reads [0.002]
-h, --tandem-qual INT coefficient for homopolymer errors [100]
-I, --skip-indels do not perform indel calling
-L, --max-idepth INT maximum per-sample depth for INDEL calling [250]
-m, --min-ireads INT minimum number gapped reads for indel candidates [1]
-o, --open-prob INT Phred-scaled gap open seq error probability [40]
-p, --per-sample-mF apply -m and -F per-sample for increased sensitivity
-P, --platforms STR comma separated list of platforms for indels [all]
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
Notes: Assuming diploid individuals.
对bam文件按基因组上位置排序或者按read名字排序,支持指定内存大小,内存越大,分块的字排序文件越大
samtools sort
Usage: samtools sort [options...] [in.bam]
Options:
-l INT Set compression level, from 0 (uncompressed) to 9 (best)
-m INT Set maximum memory per thread; suffix K/M/G recognized [768M]
-n Sort by read name
-o FILE Write final output to FILE rather than standard output
-T PREFIX Write temporary files to PREFIX.nnnn.bam
-@, --threads INT
Set number of sorting and compression threads [1]
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
根据read group分割bam文件
samtools split
Usage: samtools split [-u <unaccounted.bam>[:<unaccounted_header.sam>]]
[-f <format_string>] [-v] <merged.bam>
Options:
-f STRING output filename format string ["%*_%#.%."]
-u FILE1 put reads with no RG tag or an unrecognised RG tag in FILE1
-u FILE1:FILE2 ...and override the header with FILE2
-v verbose output
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
Format string expansions:
%% %
%* basename
%# @RG index
%! @RG ID
%. filename extension for output format
检查 SAM/BAM/CRAM文件的完整性
samtools quickcheck
Usage: samtools quickcheck [options] <input> [...]
Options:
-v verbose output (repeat for more verbosity)
将bam转为fastq格式
samtools fastq
Usage: samtools fastq [options...] <in.bam>
Options:
-0 FILE write paired reads flagged both or neither READ1 and READ2 to FILE
-1 FILE write paired reads flagged READ1 to FILE
-2 FILE write paired reads flagged READ2 to FILE
-f INT only include reads with all bits set in INT set in FLAG [0]
-F INT only include reads with none of the bits set in INT set in FLAG [0]
-n don't append /1 and /2 to the read name
-O output quality in the OQ tag if present
-s FILE write singleton reads to FILE [assume single-end]
-t copy RG, BC and QT tags to the FASTQ header line
-v INT default quality score if not given in file [1]
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
将bam转为fasta格式
samtools fasta
Usage: samtools fasta [options...] <in.bam>
Options:
-0 FILE write paired reads flagged both or neither READ1 and READ2 to FILE
-1 FILE write paired reads flagged READ1 to FILE
-2 FILE write paired reads flagged READ2 to FILE
-f INT only include reads with all bits set in INT set in FLAG [0]
-F INT only include reads with none of the bits set in INT set in FLAG [0]
-n don't append /1 and /2 to the read name
-O output quality in the OQ tag if present
-s FILE write singleton reads to FILE [assume single-end]
-t copy RG, BC and QT tags to the FASTQ header line
-v INT default quality score if not given in file [1]
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
统计bed文件中每个区域的测序深度
samtools bedcov
Usage: samtools bedcov [options] <in.bed> <in1.bam> [...]
-Q INT Only count bases of at least INT quality [0]
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
统计基因组每个位置的测序深度
samtools depth
Usage: samtools depth [options] in1.bam [in2.bam [...]]
Options:
-a output all positions (including zero depth)
-a -a (or -aa) output absolutely all positions, including unused ref. sequences
-b <bed> list of positions or regions
-f <list> list of input BAM filenames, one per line [null]
-l <int> read length threshold (ignore reads shorter than <int>)
-d/-m <int> maximum coverage depth [8000]
-q <int> base quality threshold
-Q <int> mapping quality threshold
-r <chr:from-to> region
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
The output is a simple tab-separated table with three columns: reference name, position, and coverage depth. Note that positions with zero coverage may be omitted by default; see the -a option.
对bam文件做简单统计(reads数,比对率,双端比对率等)
samtools flagstat
Usage: samtools flagstat [--input-fmt-option OPT=VAL] <in.bam>
对bam索引文件做统计,输出结果格式为reference sequence name, sequence length, # mapped reads and # unmapped reads
samtools idxstats
Usage: samtools idxstats <in.bam>
对bam文件做详细统计,其统计结果可用mics/plot-bamstats作图
samtools stats
About: The program collects statistics from BAM files. The output can be visualized using plot-bamstats.
Usage: samtools stats [OPTIONS] file.bam
samtools stats [OPTIONS] file.bam chr:from-to
Options:
-c, --coverage <int>,<int>,<int> Coverage distribution min,max,step [1,1000,1]
-d, --remove-dups Exclude from statistics reads marked as duplicates
-f, --required-flag <str|int> Required flag, 0 for unset. See also `samtools flags` [0]
-F, --filtering-flag <str|int> Filtering flag, 0 for unset. See also `samtools flags` [0]
--GC-depth <float> the size of GC-depth bins (decreasing bin size increases memory requirement) [2e4]
-h, --help This help message
-i, --insert-size <int> Maximum insert size [8000]
-I, --id <string> Include only listed read group or sample name
-l, --read-length <int> Include in the statistics only reads with the given read length []
-m, --most-inserts <float> Report only the main part of inserts [0.99]
-P, --split-prefix <str> Path or string prefix for filepaths output by -S (default is input filename)
-q, --trim-quality <int> The BWA trimming parameter [0]
-r, --ref-seq <file> Reference sequence (required for GC-depth and mismatches-per-cycle calculation).
-s, --sam Ignored (input format is auto-detected).
-S, --split <tag> Also write statistics to separate files split by tagged field.
-t, --target-regions <file> Do stats in these regions only. Tab-delimited file chr,from,to, 1-based, inclusive.
-x, --sparse Suppress outputting IS rows where there are no insertions.
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
bam文件QC,plot-bamstats依赖gnuplot
samtools stats -m -c 1,100,1 -F 4 -i 1000 -r hg19.fa outfile.sort.bam > outfile.bc ;\
perl misc/plot-bamstats -p outfile.bc outfile.bc
输出bam文件中第二列bitwise flag所代表的含义,如0x4表示没有比对上
samtools flags 0x4
0x4 4 UNMAP
命令行可视化显示区域的比对情况
samtools tview
Usage: samtools tview [options] <aln.bam> [ref.fasta]
Options:
-d display output as (H)tml or (C)urses or (T)ext
-p chr:pos go directly to this position
-s STR display only reads from this sample or group
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
文件格式转换SAM<->BAM<->CRAM->->
samtools view
Usage: samtools view [options] <in.bam>|<in.sam>|<in.cram> [region ...]
Options:
-b output BAM
-C output CRAM (requires -T)
-1 use fast BAM compression (implies -b)
-u uncompressed BAM output (implies -b)
-h include header in SAM output
-H print SAM header only (no alignments)
-c print only the count of matching records
-o FILE output file name [stdout]
-U FILE output reads not selected by filters to FILE [null]
-t FILE FILE listing reference names and lengths (see long help) [null]
-L FILE only include reads overlapping this BED FILE [null]
-r STR only include reads in read group STR [null]
-R FILE only include reads with read group listed in FILE [null]
-q INT only include reads with mapping quality >= INT [0]
-l STR only include reads in library STR [null]
-m INT only include reads with number of CIGAR operations consuming
query sequence >= INT [0]
-f INT only include reads with all bits set in INT set in FLAG [0]
-F INT only include reads with none of the bits set in INT set in FLAG [0]
-x STR read tag to strip (repeatable) [null]
-B collapse the backward CIGAR operation
-s FLOAT integer part sets seed of random number generator [0];
rest sets fraction of templates to subsample [no subsampling]
-@, --threads INT
number of BAM/CRAM compression threads [0]
-? print long help, including note about region specification
-S ignored (input format is auto-detected)
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
-T, --reference FILE
Reference sequence FASTA FILE [null]
samtools view 5667a836534680001d0ce64c.sort.bam chr17:7564097-7565197 -o region.bam
按分位数比例抽样reads,整数位为随机数种子,小数位为抽取reads数的分位数比例
samtools view -b -s 1.1 input.bam -o input.0.1.bam
samtools reheader
Usage: samtools reheader [-P] in.header.sam in.bam > out.bam
or samtools reheader [-P] -i in.header.sam file.bam
Options:
-P, --no-PG Do not generate an @PG header line.
-i, --in-place Modify the bam/cram file directly.
(Defaults to outputting to stdout.)
去除bam文件中的重复
samtools rmdup
Usage: samtools rmdup [-sS] <input.srt.bam> <output.bam>
Option: -s rmdup for SE reads
-S treat PE reads as SE in rmdup (force -s)
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
添加或替换read group
samtools addreplacerg
Usage: samtools addreplacerg [options] [-r <@RG line> | -R <existing id>] [-o <output.bam>] <input.bam>
Options:
-m MODE Set the mode of operation from one of overwrite_all, orphan_only [overwrite_all]
-o FILE Where to write output to [stdout]
-r STRING @RG line text
-R STRING ID of @RG line in existing header to use
--input-fmt FORMAT[,OPT[=VAL]]...
Specify input format (SAM, BAM, CRAM)
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
如果输入bam是一个按reads name排序的文件,则填充上mate alignment(read1的mate是read2)的坐标,插入片段大小等相关的flag
samtools fixmate
Usage: samtools fixmate <in.nameSrt.bam> <out.nameSrt.bam>
Options:
-r Remove unmapped reads and secondary alignments
-p Disable FR proper pair check
-c Add template cigar ct tag
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
As elsewhere in samtools, use '-' as the filename for stdin/stdout. The input file must be grouped by read name (e.g. sorted by name). Coordinated sorted input is not accepted.
计算MD tag(a optional field,记录了mismatch信息)
samtools calmd
Usage: samtools calmd [-eubrAES] <aln.bam> <ref.fasta>
Options:
-e change identical bases to '='
-u uncompressed BAM output (for piping)
-b compressed BAM output
-S ignored (input format is auto-detected)
-A modify the quality string
-r compute the BQ tag (without -A) or cap baseQ by BAQ (with -A)
-E extended BAQ for better sensitivity but lower specificity
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
call杂合SNP,确定相位
samtools phase
Usage: samtools phase [options] <in.bam>
Options: -k INT block length [13]
-b STR prefix of BAMs to output [null]
-q INT min het phred-LOD [37]
-Q INT min base quality in het calling [13]
-D INT max read depth [256]
-F do not attempt to fix chimeras
-A drop reads with ambiguous phase
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
以数据流方式工作,比对工具(bowtie/bwa/soap)比对后直接输出sorted.bam
bwa mem -M -t 8 -R '@RG\tID:TS11N_L1_R1.fastq.gz.sam\tPL:ILLUMINA\tSM:TS11N_L1_R1.fastq.gz.sam\tDS:ref=/data/xux_data/testData/hg19.fasta,pfx=/data/xux_data/testData/hg19.fasta' /data/xux_data/testData/hg19.fasta /data/xux_data/testData/TS11N_L1_R1.fastq.gz /data/xux_data/testData/TS11N_L1_R2.fastq.gz|samtools view -u -S - | samtools sort -m 4G - TS11N_L1.sort
Reference