Feb 2 - 8, 2026

Protein Language Models, Single-Cell Variant Analysis, and Cancer Neoantigens

February 14, 2026

Can protein language models predict kinase-substrate relationships without training data? How do we find regulatory variants affecting specific cell types? What if we could predict which neoantigens a patient's immune system will recognize? This week's breakthroughs explore these frontiers.

Protein Language Model Benchmark: DARKIN

A new benchmark called DARKIN evaluates protein language models for phosphosite-dark kinase association using zero-shot learning. Published in Bioinformatics (Oxford), this study addresses a critical challenge in kinase annotation by leveraging the power of pre-trained protein language models. The benchmark provides a standardized way to assess how well these models can identify kinase-substrate relationships even for kinases with no known substrates, potentially accelerating discovery in signal transduction research.

DARKIN benchmark paper

cellSTAAR: Single-Cell Functional Variant Analysis

A groundbreaking method incorporates single-cell sequencing-based functional data to boost power in rare variant association testing for noncoding regions. Developed by researchers at multiple institutions including Harvard and MIT, cellSTAAR addresses a critical challenge in human genetics by integrating functional genomics data with statistical genetics. The framework enables identification of regulatory variants that influence disease risk in specific cell types, opening new avenues for understanding the genetic basis of complex traits and developing targeted therapies.

cellSTAAR methodology paper

iPepGen: Cancer Neoantigen Discovery Pipeline

A modular immunopeptidogenomic analysis pipeline enables discovery, verification, and prioritization of cancer peptide neoantigen candidates. Published in Genome Biology, iPepGen integrates peptide prediction with HLA binding affinity analysis and T-cell receptor recognition modeling. The pipeline represents a significant advance in personalized cancer immunotherapy by enabling rapid identification of patient-specific neoantigens that can be targeted by the immune system. Early validation shows promising results in identifying neoantigens that elicit strong T-cell responses.

iPepGen pipeline publication

Orthanq: Virus Variant Quantification

An improved tool for haplotype quantification enables uncertainty-aware quantification of virus variants in mixed infections. The method demonstrates superior performance for SARS-CoV-2 and HIV-1 mixture datasets compared to existing approaches. This advancement helps researchers and clinicians track viral evolution and understand co-infections more accurately, which is critical for public health monitoring and treatment strategies.

Orthanq virus variant tool

BioMark: Biomarker Analysis Platform

A new web-based platform streamlines biomarker discovery across diverse omics data types, integrating statistical methods with machine learning algorithms. BioMark lowers the barrier to advanced biomarker analytics by offering intuitive visualizations, automated reporting, and a feature-ranking strategy that consolidates results from multiple analytical methods. The platform empowers researchers without advanced computational expertise to uncover clinically relevant molecular signatures and accelerate translational research.

BioMark tool publication