Speaker
Description
Over the last decades biology and medicine have become data sciences. High-throughput ('omics') data on the level of gene expression, metabolic activity, epigenetic regulation and others now serve as a prominent source of systemic information. This makes these fields accessible to data-driven computational methods, in particular network science and machine learning.
Network science employs the formal view of graph theory to understand the design principles of complex systems. Abstracting cellular processes (gene regulation, metabolism, protein interactions) into networks has revolutionized the way we think about biological systems.
Machine learning is most prominent in biological and medical research via the successes of image analysis and of protein structure prediction via AlphaFold. Attempts to train machine learning devices to interpret 'omics' data has been less successful so far.
Focusing on gene expression data as the most common example (beyond the genome) of 'omics' data, we discuss possible reasons for the limited success of machine learning in biology and medicine. We start with a (deceptively) simple biological situation, bacterial gene regulation, and then move to the analysis of medical data.