Appearance
Genomic Variant Effect
Genomic Variant Effect scores variants by comparing model embeddings for reference and alternate sequence windows.
Use it for
- local variant prioritization experiments;
- quick checks on small VCF/VCF.GZ files;
- generating BED tracks from model-derived variant scores;
- building exploratory genomics pipelines.
Inputs
- Reference FASTA/FA/FNA file, or an inline reference sequence.
- VCF or VCF.GZ variant file.
- Reference name when needed.
- Window start.
- Flank size.
- Max variants.
- Max tokens.
- Nucleotide Transformer AI Model.
Compatible models
Outputs
- Variant scores CSV.
- BED genome track.
- JSON summary.
- Variant count.
- Top embedding delta.
- Warnings.
- Provenance.
How to read the result
The main score is based on how much the alternate sequence changes the model embedding compared with the reference sequence.
A higher score means a bigger model-representation change. It does not mean the variant is automatically pathogenic, causal, or clinically important.
Always check warnings. The most important warning is a REF mismatch: it means the VCF reference allele does not match the FASTA sequence Liatir used.
Good first pipeline
- Reference FASTA input.
- VCF input.
- Genomic Variant Effect with Nucleotide Transformer 50M.
- BED output into a genome viewer.
- Inspect top scores and provenance.
Technical details
Tool ID: ai-genomic-variant-effect
Scoring method: 1 - cosine_similarity(reference_embedding, alternate_embedding).
The current scorer scans VCF/VCF.GZ records sequentially and stops at Max variants. It does not require a .tbi index for the current sequential workflow.