Bio Commands 生信命令大全
    • BAM Alignment - SAM 的二进制压缩形式,用于存储 reads 到参考基因组的比对结果。 / Binary compressed representation of SAM for storing read alignments to a reference genome.

      Commands: BWA, samtools, MACS3, mosdepth, deepTools, bedtools, featureCounts, StringTie, STAR, FreeBayes, GATK

    • BCF Variant - VCF 的二进制形式,适合高效存储和处理大规模变异数据。 / Binary representation of VCF for efficient storage and processing of variant data.

      Commands: bcftools

    • BED Genomic intervals - 用 0-based half-open 坐标表示基因组区间的轻量文本格式。 / Lightweight tab-delimited interval format using 0-based half-open genomic coordinates.

      Commands: MACS3, mosdepth, deepTools, bedtools

    • BEDPE Genomic intervals - 用于表示成对基因组区间的 BED 扩展格式,常见于 paired-end peak calling 和结构变异。 / BED extension for paired genomic intervals, often used for paired-end peak calling and structural variants.

      Commands: MACS3

    • bigWig Signal track - 用于存储连续基因组信号轨道的二进制索引格式,常见于覆盖度、富集信号和浏览器展示。 / Indexed binary format for continuous genome signal tracks such as coverage and enrichment profiles.

      Commands: deepTools

    • broadPeak Peak - 基于 BED 的宽峰格式,常用于组蛋白修饰等宽区域富集信号。 / BED-derived broad peak format commonly used for broad enrichment signals such as histone marks.

      Commands: MACS3

    • CRAM Alignment - 面向比对数据的高压缩格式,通常需要参考基因组辅助解码。 / Highly compressed alignment format that usually relies on the reference genome for decoding.

      Commands: samtools, mosdepth, FreeBayes, GATK

    • FASTA Sequence - 用 header 和序列行表示核酸或蛋白序列的基础文本格式。 / Basic text format for nucleotide or protein sequences with headers and sequence lines.

      Commands: Bowtie2, BWA, minimap2, SPAdes, QUAST, BUSCO, Prokka, Kraken2, MAFFT, IQ-TREE 2, kallisto, Salmon, HISAT2, STAR, BLAST+, DIAMOND, MMseqs2, SeqKit, seqtk

    • FASTQ Sequence reads - 保存测序 reads 及其碱基质量值的标准文本格式。 / Standard text format for sequencing reads and per-base quality scores.

      Commands: Bowtie2, BWA, minimap2, SPAdes, SRA Toolkit, Cutadapt, fastp, Trim Galore, FastQC, Kraken2, kallisto, Salmon, HISAT2, STAR, SeqKit, seqtk

    • GenBank Annotation - NCBI GenBank 平面文件格式,可同时保存序列、feature 注释和来源信息。 / NCBI flat-file format for sequences, feature annotations, and source metadata.

      Commands: Prokka

    • GFF Annotation - 通用基因组特征注释格式,用九列文本描述基因、转录本、外显子等 feature。 / General feature annotation format describing genes, transcripts, exons, and other genomic features in nine columns.

      Commands: QUAST, Prokka, bedtools, StringTie

    • GTF Annotation - 常用于 RNA-seq 的基因注释格式,描述 gene、transcript、exon 等 feature。 / Gene annotation format widely used in RNA-seq workflows for genes, transcripts, and exons.

      Commands: deepTools, bedtools, featureCounts, StringTie, STAR

    • GVCF Variant - VCF 的参考置信度扩展,既记录变异位点,也记录非变异区间,常用于联合分型。 / Reference-confidence VCF extension that records variant sites and non-variant blocks for joint genotyping.

      Commands: GATK

    • HTML Report - 常用于可视化 QC 和分析报告的网页格式。 / Web document format commonly used for visual QC and analysis reports.

      Commands: QUAST, deepTools, fastp, Trim Galore, FastQC, MultiQC, SnpEff, Nextflow, Snakemake

    • JSON Structured data - 常用于工具报告、参数和结构化结果交换的轻量数据格式。 / Lightweight structured data format often used for tool reports, parameters, and result exchange.

      Commands: BUSCO, fastp, MultiQC, Ensembl VEP, Nextflow, Snakemake

    • narrowPeak Peak - 基于 BED 的窄峰格式,常用于 TF ChIP-seq 和 ATAC-seq peak 结果。 / BED-derived narrow peak format commonly used for TF ChIP-seq and ATAC-seq peaks.

      Commands: MACS3

    • Newick Phylogeny - 用括号嵌套表示系统发育树拓扑和分支长度的轻量文本格式。 / Lightweight text format for phylogenetic tree topology and branch lengths.

      Commands: IQ-TREE 2

    • NF Workflow - Nextflow 流程脚本文件,描述 process、channel 和 workflow 逻辑。 / Nextflow pipeline script describing processes, channels, and workflow logic.

      Commands: Nextflow

    • PAF Alignment - minimap2 默认输出的轻量 pairwise mapping 格式,适合长读长和组装比对摘要。 / Lightweight pairwise mapping format, default output of minimap2 for long-read and assembly alignments.

      Commands: minimap2

    • PGEN Genotype - PLINK 2 的二进制基因型格式,通常与 PVAR 和 PSAM 一起使用。 / PLINK 2 binary genotype format, usually paired with PVAR variant metadata and PSAM sample metadata.

      Commands: PLINK 2

    • SAF Annotation - featureCounts 支持的简化注释格式,用少量列描述可计数 feature。 / Simplified annotation format supported by featureCounts for countable genomic features.

      Commands: featureCounts

    • SAM Alignment - 以文本形式保存 reads 比对结果的标准格式,是 BAM/CRAM 的可读源格式。 / Text-based standard format for read alignments; human-readable counterpart of BAM/CRAM.

      Commands: Bowtie2, BWA, minimap2, samtools, HISAT2, STAR, DIAMOND

    • SRA Sequence archive - NCBI Sequence Read Archive 的原始测序数据封装格式,常通过 SRA Toolkit 转换为 FASTQ。 / NCBI Sequence Read Archive container format, commonly converted to FASTQ with SRA Toolkit.

      Commands: SRA Toolkit

    • TSV Table - 以制表符分隔的表格文本格式,常用于统计结果和中间矩阵。 / Tab-separated table format commonly used for statistics, matrices, and intermediate results.

      Commands: QUAST, BUSCO, mosdepth, deepTools, Prokka, Kraken2, IQ-TREE 2, PLINK 2, MultiQC, featureCounts, kallisto, Salmon, StringTie, BLAST+, DIAMOND, MMseqs2, Ensembl VEP

    • VCF Variant - 保存遗传变异、样本基因型和注释信息的标准文本格式。 / Standard text format for genetic variants, sample genotypes, and annotations.

      Commands: bedtools, PLINK 2, Ensembl VEP, SnpEff, FreeBayes, GATK, bcftools

    • YAML Configuration - 常用于流程配置、环境定义和参数文件的人类可读结构化格式。 / Human-readable structured format commonly used for workflow configs, environments, and parameter files.

      Commands: Nextflow, Snakemake

    • ZIP Archive - 通用压缩归档格式,FastQC 等工具会把机器可读结果打包为 ZIP。 / General archive format; tools such as FastQC package machine-readable outputs as ZIP files.

      Commands: FastQC

    添加命令 / Add command | 命令列表 / Command list | 格式索引 / Formats | 资料来源 / Sources | AutoBA | tldr-pages | Awesome Bioinformatics
    GitHub | Bioconda | nf-core
    共收录 40 个生信高频命令, 27 个文件格式 / curated commands and formats.