-
BAM Alignment - SAM 的二进制压缩形式,用于存储 reads 到参考基因组的比对结果。 / Binary compressed representation of SAM for storing read alignments to a reference genome.
Commands: BWA, samtools, MACS3, mosdepth, deepTools, bedtools, featureCounts, StringTie, STAR, FreeBayes, GATK
-
BCF Variant - VCF 的二进制形式,适合高效存储和处理大规模变异数据。 / Binary representation of VCF for efficient storage and processing of variant data.
Commands: bcftools
-
BED Genomic intervals - 用 0-based half-open 坐标表示基因组区间的轻量文本格式。 / Lightweight tab-delimited interval format using 0-based half-open genomic coordinates.
Commands: MACS3, mosdepth, deepTools, bedtools
-
BEDPE Genomic intervals - 用于表示成对基因组区间的 BED 扩展格式,常见于 paired-end peak calling 和结构变异。 / BED extension for paired genomic intervals, often used for paired-end peak calling and structural variants.
Commands: MACS3
-
bigWig Signal track - 用于存储连续基因组信号轨道的二进制索引格式,常见于覆盖度、富集信号和浏览器展示。 / Indexed binary format for continuous genome signal tracks such as coverage and enrichment profiles.
Commands: deepTools
-
broadPeak Peak - 基于 BED 的宽峰格式,常用于组蛋白修饰等宽区域富集信号。 / BED-derived broad peak format commonly used for broad enrichment signals such as histone marks.
Commands: MACS3
-
CRAM Alignment - 面向比对数据的高压缩格式,通常需要参考基因组辅助解码。 / Highly compressed alignment format that usually relies on the reference genome for decoding.
Commands: samtools, mosdepth, FreeBayes, GATK
-
FASTA Sequence - 用 header 和序列行表示核酸或蛋白序列的基础文本格式。 / Basic text format for nucleotide or protein sequences with headers and sequence lines.
Commands: Bowtie2, BWA, minimap2, SPAdes, QUAST, BUSCO, Prokka, Kraken2, MAFFT, IQ-TREE 2, kallisto, Salmon, HISAT2, STAR, BLAST+, DIAMOND, MMseqs2, SeqKit, seqtk
-
FASTQ Sequence reads - 保存测序 reads 及其碱基质量值的标准文本格式。 / Standard text format for sequencing reads and per-base quality scores.
Commands: Bowtie2, BWA, minimap2, SPAdes, SRA Toolkit, Cutadapt, fastp, Trim Galore, FastQC, Kraken2, kallisto, Salmon, HISAT2, STAR, SeqKit, seqtk
-
GenBank Annotation - NCBI GenBank 平面文件格式,可同时保存序列、feature 注释和来源信息。 / NCBI flat-file format for sequences, feature annotations, and source metadata.
Commands: Prokka
-
GFF Annotation - 通用基因组特征注释格式,用九列文本描述基因、转录本、外显子等 feature。 / General feature annotation format describing genes, transcripts, exons, and other genomic features in nine columns.
Commands: QUAST, Prokka, bedtools, StringTie
-
GTF Annotation - 常用于 RNA-seq 的基因注释格式,描述 gene、transcript、exon 等 feature。 / Gene annotation format widely used in RNA-seq workflows for genes, transcripts, and exons.
Commands: deepTools, bedtools, featureCounts, StringTie, STAR
-
GVCF Variant - VCF 的参考置信度扩展,既记录变异位点,也记录非变异区间,常用于联合分型。 / Reference-confidence VCF extension that records variant sites and non-variant blocks for joint genotyping.
Commands: GATK
-
HTML Report - 常用于可视化 QC 和分析报告的网页格式。 / Web document format commonly used for visual QC and analysis reports.
Commands: QUAST, deepTools, fastp, Trim Galore, FastQC, MultiQC, SnpEff, Nextflow, Snakemake
-
JSON Structured data - 常用于工具报告、参数和结构化结果交换的轻量数据格式。 / Lightweight structured data format often used for tool reports, parameters, and result exchange.
Commands: BUSCO, fastp, MultiQC, Ensembl VEP, Nextflow, Snakemake
-
narrowPeak Peak - 基于 BED 的窄峰格式,常用于 TF ChIP-seq 和 ATAC-seq peak 结果。 / BED-derived narrow peak format commonly used for TF ChIP-seq and ATAC-seq peaks.
Commands: MACS3
-
Newick Phylogeny - 用括号嵌套表示系统发育树拓扑和分支长度的轻量文本格式。 / Lightweight text format for phylogenetic tree topology and branch lengths.
Commands: IQ-TREE 2
-
NF Workflow - Nextflow 流程脚本文件,描述 process、channel 和 workflow 逻辑。 / Nextflow pipeline script describing processes, channels, and workflow logic.
Commands: Nextflow
-
PAF Alignment - minimap2 默认输出的轻量 pairwise mapping 格式,适合长读长和组装比对摘要。 / Lightweight pairwise mapping format, default output of minimap2 for long-read and assembly alignments.
Commands: minimap2
-
PGEN Genotype - PLINK 2 的二进制基因型格式,通常与 PVAR 和 PSAM 一起使用。 / PLINK 2 binary genotype format, usually paired with PVAR variant metadata and PSAM sample metadata.
Commands: PLINK 2
-
SAF Annotation - featureCounts 支持的简化注释格式,用少量列描述可计数 feature。 / Simplified annotation format supported by featureCounts for countable genomic features.
Commands: featureCounts
-
SAM Alignment - 以文本形式保存 reads 比对结果的标准格式,是 BAM/CRAM 的可读源格式。 / Text-based standard format for read alignments; human-readable counterpart of BAM/CRAM.
Commands: Bowtie2, BWA, minimap2, samtools, HISAT2, STAR, DIAMOND
-
SRA Sequence archive - NCBI Sequence Read Archive 的原始测序数据封装格式,常通过 SRA Toolkit 转换为 FASTQ。 / NCBI Sequence Read Archive container format, commonly converted to FASTQ with SRA Toolkit.
Commands: SRA Toolkit
-
TSV Table - 以制表符分隔的表格文本格式,常用于统计结果和中间矩阵。 / Tab-separated table format commonly used for statistics, matrices, and intermediate results.
Commands: QUAST, BUSCO, mosdepth, deepTools, Prokka, Kraken2, IQ-TREE 2, PLINK 2, MultiQC, featureCounts, kallisto, Salmon, StringTie, BLAST+, DIAMOND, MMseqs2, Ensembl VEP
-
VCF Variant - 保存遗传变异、样本基因型和注释信息的标准文本格式。 / Standard text format for genetic variants, sample genotypes, and annotations.
Commands: bedtools, PLINK 2, Ensembl VEP, SnpEff, FreeBayes, GATK, bcftools
-
YAML Configuration - 常用于流程配置、环境定义和参数文件的人类可读结构化格式。 / Human-readable structured format commonly used for workflow configs, environments, and parameter files.
Commands: Nextflow, Snakemake
-
ZIP Archive - 通用压缩归档格式,FastQC 等工具会把机器可读结果打包为 ZIP。 / General archive format; tools such as FastQC package machine-readable outputs as ZIP files.
Commands: FastQC