Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.

T cell recognition of a cognate peptide-MHC complex (pMHC) presented on the surface of infected or malignant cells are of utmost importance for mediating robust and long-term immune responses. Accurate predictions of cognate pMHC targets for T Cell Receptors (TCR) would greatly facilitate identification of vaccine targets for both pathogenic diseases as well as personalized cancer immunotherapies. Predicting immunogenic epitopes therefore has been at the centre of intensive research for the past decades but has proven challenging. Although numerous models have been proposed, performance of these models has not been systematically evaluated and their success rate in predicting epitopes for humans has not been measured and compared.In this study, we curate and present a ‘standard-dataset’ of class I MHC epitopes for evaluating the performance of immunogenicity classifiers. Using this data, we present a systematic evaluation of performance of several publicly available models which are commonly used to predict CD8+ T cell epitopes in the context of both pathogens and cancers. The benchmarking is performed in two different settings: a pan-HLA setting in which all models are trained and evaluated by a pool of peptides from multiple HLA types, and a HLA-specific setting in which models are trained and evaluated by peptides from HLA-A02:01. In the pan-HLA setting, we observe that the best performing model achieves 75.6% accuracy suggesting considerable room for improvement. In the HLA-specific setting we observe a substantial reduction in performance with a mean of -11.9%. Our work shows that existing models exhibit suboptimal performance for predicting immunogenic cancer neoantigens. We then further interrogate the underlying problems of model predictions and highlight HLA-bias as a main source of variation amongst other issues associated with the design of models and/or to their training data.Competing Interest StatementThe authors have declared no competing interest.

Original publication




Journal article




Cold Spring Harbor Laboratory

Publication Date