Dana-Farber repository for machine learning in immunology

Guang Lan Zhang, Hong Huang Lin, Derin B. Keskin, Ellis L. Reinherz, Vladimir Brusic

Research output: Journal PublicationArticlepeer-review

28 Citations (Scopus)


The immune system is characterized by high combinatorial complexity that necessitates the use of specialized computational tools for analysis of immunological data. Machine learning (ML) algorithms are used in combination with classical experimentation for the selection of vaccine targets and in computational simulations that reduce the number of necessary experiments. The development of ML algorithms requires standardized data sets, consistent measurement methods, and uniform scales. To bridge the gap between the immunology community and the ML community, we designed a repository for machine learning in immunology named Dana-Farber Repository for Machine Learning in Immunology (DFRMLI). This repository provides standardized data sets of HLA-binding peptides with all binding affinities mapped onto a common scale. It also provides a list of experimentally validated naturally processed T cell epitopes derived from tumor or virus antigens. The DFRMLI data were preprocessed and ensure consistency, comparability, detailed descriptions, and statistically meaningful sample sizes for peptides that bind to various HLA molecules. The repository is accessible at http://bio.dfci.harvard.edu/DFRMLI/.

Original languageEnglish
Pages (from-to)18-25
Number of pages8
JournalJournal of Immunological Methods
Issue number1-2
Publication statusPublished - 30 Nov 2011
Externally publishedYes


  • Data repository
  • HLA binding
  • Immune system
  • Mathematical model
  • Prediction
  • T cell epitope

ASJC Scopus subject areas

  • Immunology and Allergy
  • Immunology


Dive into the research topics of 'Dana-Farber repository for machine learning in immunology'. Together they form a unique fingerprint.

Cite this