How machine learning is driving structural biology

How machine learning is driving structural biology

The molecular machines, a chromatin remodeler (pink and green on the left) and RNA polymerase II (grey, yellow and blue in the center), work together to read the genomic information stored in the tightly packed DNA (white helix) . Credit: Farnung lab

For Lucas Farnung, there is no doubt how a single fertilized egg develops into a fully functional human. As a structural biologist, he is studying this process at the smallest scale: the trillions of atoms that must synchronize their work to make it happen.

“I don’t see any big difference between solving a 5,000-piece puzzle and the research we’re doing in my lab,” says Farnung, an assistant professor of cell biology at the Blavatnik Institute at Harvard Medical School. “We’re trying to understand what this process looks like visually, and from there we can generate ideas about how it works.”

Almost all cells in the human body contain the same genetic material, but the types of tissues those cells become during development—whether they become liver or skin, for example—are largely driven by gene expression, which dictates which genes are turned on and turn off. Gene expression is regulated by a process called transcription—the focus of Farnung’s work.

During transcription, molecular machines read the instructions contained in the genetic blueprint stored within DNA and create RNA, the molecule that carries out the instructions. Other molecular machines read RNA and use this information to make proteins that fuel almost all activities in the body.

Farnung studies the structure and function of the molecular machines responsible for transcription.

In a conversation with Harvard Medicine News, Farnung discussed his work and how machine learning is accelerating research in his field.

What is the main question your research seeks to answer?

I always say, we are interested in the smallest logistical problem there is. The human genome is present in almost every cell, and if you stretched out the DNA that makes up the genome, it would be roughly two meters, or six and a half feet long. But this two-meter-long molecule must fit inside the nucleus of a cell, which is only a few microns in size. That’s the equivalent of taking a fishing line that stretches from Boston to New Haven, Connecticut, or about 150 miles, and trying to squeeze it into a football.

To achieve this, our cells compact the DNA into a structure called chromatin, but then the molecular machines can no longer have genomic information in the DNA. This creates a conflict, because DNA must be compact enough to fit inside a cell’s nucleus, but molecular machines must be able to access the genomic information in the DNA. We are particularly interested in visualizing the process of how a molecular machine called RNA polymerase II gains access to genomic information and transcribes DNA into RNA.

What techniques do you use to visualize molecular machines?

Our general approach is to isolate molecular machines from cells and look at them using specific types of microscopes or X-rays. To do this, we insert the genetic material encoding a human molecular machine of interest into an insect or bacterial cell, so the cell makes a lot of that machine. Next, we use purification techniques to separate the machine from the cell so that we can study it in isolation.

However, it gets complicated because often we are not only interested in a single molecular machine, which we also refer to as a protein. There are thousands of proteins that interact with each other to regulate transcription, so we have to repeat this process thousands of times to understand these protein-protein interactions.

Artificial intelligence is beginning to penetrate many aspects of basic biology. Is the way you do structural biology research changing?

For the past 30 or 40 years, research in my field has been a tedious process. A Ph.D. A student’s career would be devoted to learning a little about a single protein, and it would take thousands of student careers to learn how proteins interact in a cell. However, over the last two or three years, we are increasingly looking for computational approaches to predict protein interactions.

There was a breakthrough when Google DeepMind released AlphaFold, a machine learning model that can predict protein folding. Most importantly, how proteins fold determines their function and interactions. We are now using artificial intelligence to predict tens of thousands of protein-protein interactions, many of which have never been described experimentally before. Not all of these interactions are actually happening inside the cells, but we can prove them with laboratory experiments.

This is super exciting because it really accelerates our science. When I look back on my PhD, the first three years were basically a failure – I wasn’t able to find any protein-protein interactions. Now, with these computational predictions, a Ph.D. student or postdoc in my lab can be fairly confident that a lab experiment to prove a protein-protein interaction will work. I call it molecular biology on steroids—but legal—because now we can get to the actual question we want to answer much faster.

Besides efficiency and speed, how else is AI reshaping your field?

An exciting change is that we can now, in an unbiased way, test every protein in the human body against every other protein to see if they could potentially interact. Machine learning tools in our field are causing disruptions similar to the disruptions in society caused by personal computers.

When I first became a researcher, people used X-ray crystallography to reveal the structure of individual proteins—a beautiful, high-resolution technique that can take many years. Then, during the Ph.D. and postdoc, cryo-electron microscopy or cryo-EM appeared – a technique that allows us to look at larger and more dynamic protein complexes at high resolution. Cryo-EM has enabled much progress in our understanding of biology over the past 10 years and has accelerated drug development.

I thought I was lucky to be part of the so-called resolution revolution brought about by cryo-EM. But now, it looks like machine learning for protein prediction is ushering in a second revolution, which is just amazing to me and makes me wonder how much more acceleration we’ll see.

In my estimation, we can probably now do research five to 10 times faster than we could 10 years ago. It will be interesting to see how machine learning transforms the way we do biological research in the next 10 years. Of course, we have to be careful how we manage these tools, but I find it exciting that I can make findings about problems I’ve thought about for a long time 10 times faster.

What are the applications of your work beyond the laboratory?

We are learning how biology works in the human body at a basic level, but there is always the promise that understanding basic biological mechanisms can help us develop effective treatments for various conditions. For example, it turns out that disruption of DNA-chromatin structure by molecular machines is one of the main drivers of many cancers. Once we understand the structure of these molecular machines, we can understand the effect of changing a few atoms to replicate the mutations that would lead to cancer, at which point we can begin to design drugs to target the proteins.

We just started a project in collaboration with the HMS Therapeutics Initiative that is looking at a chromatin remodeler, a protein that is highly mutated in prostate cancer. We recently obtained the structure of this protein and are performing virtual screens to see what chemical compounds bind to it. The hope is that we can create a compound that inhibits the protein and has the potential to be developed into a complete drug that can slow the progression of prostate cancer.

We are also studying proteins involved in neurodevelopmental disorders such as autism. This is one place where machine learning can help us, because the tools we use to predict protein structures and protein-protein interactions can also predict how small-molecule compounds will bind to proteins.

Speaking of collaboration, how is working in research fields and disciplines relevant to your research?

Collaboration is very important to my research. The landscape of biology has become so complex with so many different research sites that it is impossible to understand everything. The collaboration allows us to bring people with different expertise together to work on important biological problems, such as how molecular machines access the human genome.

We collaborate with other researchers at HMS on many different levels. Sometimes, we use our structural expertise to support the work of other laboratories. Other times, we have solved the structure of a particular protein, but need to collaborate to understand the role of that protein in the larger cellular context. We also collaborate with laboratories using other types of molecular biology approaches. Collaboration is indeed essential to drive progress and better understand biology.

Provided by Harvard Medical School

citation: Q&A: How machine learning is propelling structural biology (2024, July 22) retrieved July 22, 2024 from https://phys.org/news/2024-07-qa-machine-propelling-biology.html

This document is subject to copyright. Except for any fair agreement for study or private research purposes, no part may be reproduced without written permission. The content is provided for informational purposes only.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top