Broad Institute 的变异分析工具箱,常用于 germline calling、GVCF 联合分型和变异过滤。Genome Analysis Toolkit for variant discovery workflows.
mamba install -c bioconda gatk4
单样本 GVCF calling:
gatk HaplotypeCaller \
-R reference.fa \
-I sample.sorted.bam \
-O sample.g.vcf.gz \
-ERC GVCF
导入多个 GVCF:
gatk GenomicsDBImport \
--sample-name-map samples.map \
--genomicsdb-workspace-path cohort_db \
-L targets.interval_list
联合分型:
gatk GenotypeGVCFs \
-R reference.fa \
-V gendb://cohort_db \
-O cohort.vcf.gz
过滤 SNP:
gatk VariantFiltration \
-R reference.fa -V cohort.vcf.gz -O cohort.filtered.vcf.gz \
--filter-expression "QD < 2.0 || FS > 60.0 || MQ < 40.0" \
--filter-name "basic_snp_filter"
HaplotypeCaller:germline variant calling。-ERC GVCF:输出 GVCF 供联合分型。GenomicsDBImport:导入多样本 GVCF。GenotypeGVCFs:联合分型生成 cohort VCF。-R:参考基因组。-L:限制 intervals。