Deep Folding

DeepMind AI is at it again, but this time it is no game. Rather, they are taking on something near and dear to our hearts – protein folding. This Ars Technica article discusses the latest attempts to predict a protein’s 3D fold based on the sequence of amino acids.

This is a wicked hard problem. Without going too deeply into the biochemistry, the function of a protein depends on its 3D structure. A typical protein is made up from a chain of 20 different amino acids, and this linear chain folds into an often complex structure. Historically these structures were solved by an arduous process involving purification of the protein followed by crystallization followed by X-Ray diffraction. Some proteins resist any or all of these steps.

These days it is trivial to sequence genomes, and these serve as the master code for the sequence of amino acids. So we can figure out what the linear chain is, but that often doesn’t provide obvious clues into how it folds. Traditional computation methods would perform complex energy minimization calculations while exploring possible structures. Given the possible angles for each bond, there are an almost infinite number of possible configurations.

This is a great target for ML, as we know a lot of protein sequences and structures, so the algorithms can train on the known, then explore the unknown. Still a lot of challenges, particularly since many proteins are embedded in membranes (which changes things) and/or modified after synthesis. But we may have turned a corner.

Why do you need structures? Proteins often serve critical functions (structural, receptor/signaling, catalysis of chemical reactions, etc) and can be drug targets. Plus they are just cool.

Share your thoughts