Skip to content

Samtools faidx

samtools faidx creates an index for a FASTA file and enables random-access extraction of subsequences by coordinate. The index is required by many downstream tools (GATK, BWA, bcftools) before they can process a reference genome.

Details

PropertyValue
TypeNative tool
Binarysamtools
Subcommandfaidx

Installation

Samtools must be installed and available in your system PATH.

bash
brew install samtools
bash
sudo apt install samtools
bash
conda install -c bioconda samtools

Accepted inputs

ExtensionDescription
.fasta, .fa, .fnaUncompressed FASTA (required for indexing)
.fasta.gz, .fa.gzBgzip-compressed FASTA (requires bgzip, not regular gzip)

gzip vs bgzip

samtools faidx can only index FASTA files compressed with bgzip, not regular gzip. A bgzip-compressed file has the same .gz extension but a different internal format. If indexing fails on a .gz file, decompress it with gunzip and re-index the uncompressed version.

Usage

Create index

  1. Navigate to Tools → Samtools faidx.
  2. Select a FASTA file from your Data library.
  3. Click Create index.

Liatir runs samtools faidx <file>, which creates <file>.fai in the same directory. The index file is automatically registered in the Data library.

The results panel shows a table of all sequences in the FASTA with their lengths.

Extract subsequence

With a file selected and an existing .fai index, use the Extract subsequence form:

  1. Enter a region string in the format chr:start-end (1-based, inclusive).
  2. Click Extract.

The extracted FASTA sequence is displayed inline. Examples:

Region stringExtracts
chr1Entire chromosome 1
chr1:1-1000First 1000 bases of chr1
chr17:7,668,421-7,687,538TP53 locus

The .fai index format

The .fai file is a tab-separated text file with one line per sequence:

NAME    LENGTH    OFFSET    BASES_PER_LINE    BYTES_PER_LINE
chr1    248956422    52    60    61
chr2    242193529    253404903    60    61
ColumnDescription
NAMESequence name from the > header (up to first whitespace)
LENGTHSequence length in bases
OFFSETByte offset of the first base in the FASTA file
BASES_PER_LINENumber of bases per line
BYTES_PER_LINEBytes per line (includes newline)

The index allows tools to jump directly to any position in the file without reading from the start.

Liatir — powerful bioinformatics on your machine.

By using this app, you agree to our Privacy Policy and Terms of Service.