The Critical Assessment of Structure Prediction (CASP): Over a quarter century tracking the state of the art of protein structure prediction
--
If you are looking to learn about AlphaFold 2 and its impact on biology, check out this other story too!
Introduction -what do you mean by “modeling protein structures”?
Structural biology attempts to explain biological systems at the atomic level. For this, the discipline depends critically on the availability of structures of the molecules involved, the most important of which are usually proteins. While many structures can be determined by experimental techniques such as X-ray or neutron diffraction, nuclear magnetic resonance, and now also electron cryo-microscopy, there is also the alternative of predicting, or “modeling”, these structures using computational methods.
Such predictions are essential for a large number of biological molecules that cannot be produced in the quantities and conditions needed for the various experiments. But predicting structures can also be useful for cases of molecules that may not be difficult to produce and manipulate during the experiments required to solve structures, but for which the amount of information provided by the structure does not justify the costs and time. Indeed, if we could predict structures of proteins with sufficient confidence, we could focus experiments only on particularly difficult systems or on studying effects of perturbations on the structure, such as the effect of a drug binding to the protein under investigation. At the extreme of this, if we could predict all the physics and chemistry of a given system at the atomic level, we could just dispense of structure determination experiments altogether, and could focus efforts directly on understanding mechanisms and everything that follows from them: drug development, design of new functions, understanding evolution, etc.
Given the impact that predictions of protein structures can have on biology, generations of researchers have been working on the problem for over decades. Many methods have been developed, which can be classified into two main groups. On the one hand, those that use already known structures to try to predict protein structures of similar sequence, a procedure known as “homology modeling”. On the other hand, those methods that attempt to “fold” sequences without any homology to proteins of known structure, for example by using simulations based on basic…