DOCTYPE HTML> Machine learning-guided deconvolution of plasma protein levels

Proteomic techniques now measure thousands of proteins circulating in blood at population scale, driving a surge in biomarker studies and biological clocks. However, their potential impact, generalisability, and biological relevance is hard to assess without understanding the origins and role of the thousands of proteins implicated in these studies. Here, we provide a data-driven identification of factors explaining variation in plasma levels of ~3,000 proteins among 43,240 participants of the UK Biobank that explain their links to ageing and diseases, and help guide protein biomarker and drug target discovery. We use machine learning to systematically identify a median of 20 factors (range: 1-37) out of >1,800 participant and sample charateristics that jointly explained an average of 19.4% (max. 100.0%) of the variance in plasma levels across protein targets. Proteins segregated into distinct clusters according to their explanatory factors, with modifiable characteristics explaining more variance compared to genetic variation (median: 10.0% vs 3.9%). We identify proteins for which the factors explaining varying levels in blood differed by sex (n=1374 proteins) or across ancestries (n=74 proteins). We establish a knowledge graph that integrates our findings with genetic studies and drug characteristics to guide identification of potential markers of drug target engagement. We demonstrate the value of our resource 1) by identifying disease-specific biomarkers, like matrix metalloproteinase 12 for abdominal aortic aneurysm, and 2) by developing a framework for phenotype enrichment of protein signatures from independent studies to identify underlying sources of variation. All results are explorable via an interactive web portal (https://omicscience.org/apps/prot_foundation) and can be readily integrated into ongoing studies using an associated R package (https://github.com/comp-med/r-prodente).

Data access

By using these results in your research, you agree to cite our publication. Our legal notice, data protection statement and data usage agreement apply.

Protein Atlas:
A data visualization tool for protein variance analysis
Interactive Knowledge Graph
Download app data: data.zip