Scientific Activities 

Protein Folding and Aggregation


Minimalist Reduction


We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build upon it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work we use the α/β ubiquitin fold class, and design two new sequences which, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. These results indicate that a basic rule of patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that the patterning is tolerant to a small number of mutations that would manifest itself as residues that are poorly conserved being found in the folding nucleus, while being consistent with the robustness of fold topologies to mutation. We also suggest a possible criteria for performing sequence mappings from a 20-letter amino-acid code to a 3-letter reduced code for generalization to protein design. Minimalist Reduction



Computational Methods, Algorithms, Models


Monte Carlo Algorithms
Probablity flow through parameter space
Effective relaxation processes for difficult systems like proteins or spin glasses require special simulation techniques that permit barrier crossing to ensure ergodic sampling.  Numerous adaptations of the venerable Metropolis Monte Carlo (MMC) algorithm have been proposed to improve its sampling efficiency, including various hybrid Monte Carlo (HMC) schemes, and methods designed specifically for overcoming quasi-ergodicity problems such as Jump Walking (J-Walking), Smart Walking (S-Walking), Smart Darting, and Parallel Tempering. 

We present an alternative to these approaches that we call Cool Walking, or C-Walking.  In C-Walking two Markov chains are propagated in tandem, one at a high (ergodic) temperature and the other at a low temperature.  Non-local trial moves for the low temperature walker are generated by first sampling from the high-temperature distribution, then performing a statistical quenching process on the sampled configuration to generate a C-Walking jump move.  C-Walking needs only one high-temperature walker, satisfies detailed balance, and offers the important practical advantage that the high and low-temperature walkers can be run in tandem with minimal degradation of sampling due to the presence of correlations. 
To make the C-Walking approach more suitable to real problems we decrease the required number of cooling steps by attempting to jump at intermediate temperatures during cooling.  We further reduce the number of cooling steps by utilizing “windows” of states when jumping, which improves acceptance ratios and lowers the average number of cooling steps.  We present C-Walking results with comparisons to J-Walking, S-Walking, Smart Darting and Parallel Tempering on a one-dimensional rugged potential energy surface in which the exact normalized probability distribution is known. C-Walking shows superior sampling as judged by two ergodic measures.


Implicit Solvent Model
We have developed a solvation function that combines a Generalized Born model for polarization of protein charge by the high dielectric solvent, with a hydrophobic potential of mean force () as a model for hydrophobic interaction, to aid in the discrimination of native structures from other misfolded states in protein structure prediction.

We find that our energy function outperforms other reported scoring functions in terms of correct native ranking for 91% of proteins and low Z-scores for a variety of decoy sets including the challenging Rosetta decoys. Decoys generated by thermal sampling around the native state basin reveal a potentially important role for side chain entropy in future development of even more accurate free energy surfaces.

We also demonstrate the performance of the new implicit solvent model on native protein loop prediction from a large set of loop decoys of 4- to 12-residue lengths. While our results for small loop decoy sets are comparably good to existing energy functions, we find demonstrable superiority for loop lengths of 8-residues and greater, and that the quality of our predictions are largely insensitive to the length of the target loop on a filtered set of decoys. Given that the current weakness in loop modeling is the ability to select the most native-like loop conformers from loop ensembles, this energy function provides a means for greater prediction accuracy in structure prediction of homologous and distantly related proteins, thereby aiding large-scale genomics efforts in comparative modeling. Together this work shows that the stabilizing effect of hydrophobic exposure to aqueous solvent that defines the hydration physics is an apparent improvement over solvent accessible surface area models that penalize hydrophobic exposure.
Water solvating protein core




Distributed Computing

The distributed computing (DC) paradigm in conjunction with the folding@home (FH) client server has been used to study the folding kinetics of small peptides and proteins, giving excellent agreement with experimentally measured folding rates, although pathways sampled in these simulations are not always consistent with the folding mechanism. In this study, we use a coarse-grain model of protein L, whose two-state kinetics have been characterized in detail by using long-time equilibrium simulations, to rigorously test a FH protocol using approximately 10,000 short-time, uncoupled folding simulations starting from an extended state of the protein.

We show that the FH results give non-Poisson distributions and early folding events that are unphysical, whereas longer folding events experience a correct barrier to folding but are not representative of the equilibrium folding ensemble. Using short-time, uncoupled folding simulations started from an equilibrated denatured state ensemble (DSE), we also do not get agreement with the equilibrium two-state kinetics because of overrepresented folding events arising from higher energy subpopulations in the DSE. The DC approach using uncoupled short trajectories can make contact with traditionally measured experimental rates and folding mechanism when starting from an equilibrated DSE, when the simulation time is long enough to sample the lowest energy states of the unfolded basin and the simulated free-energy surface is correct. However, the DC paradigm, together with faster time-resolved and single-molecule experiments, can also reveal the breakdown in the two-state approximation due to observation of folding events from higher energy subpopulations in the DSE.


Coarse Grained Protein Models

We have recently developed a sequence based α−carbon model to incorporate a mean field estimate of the orientation dependence of the polypeptide chain that give rise to specific hydrogen bond pairing to stabilize α−helices and β−sheets. We illustrate the success of the new protein model to improve on thermodynamic measures and folding mechanism of proteins L and G. The model shows greater folding cooperativity and improvements in designability of protein sequences, as well as predicting correct trends for kinetic rates and mechanism for proteins L and G. We believe the model is broadly applicable to other protein folding and protein-protein co-assembly processes, and does not require experimental input beyond the topology description of the native state. Even without tertiary topology information, it can also serve as a mid-resolution protein model for more exhaustive conformational search strategies that can bridge back down to atomic descriptions of the polypeptide chain.

We present a new general analytical solution for computing the screened electrostatic interaction between multiple macromolecules of arbitrarily complex charge distributions, assuming they are well described by spherical low dielectric cavities in a higher dielectric medium in the presence of a Debye-Hückel treatment of salt. The benefits to this approach are threefold. First, by exploiting multipole expansion theory for the screened Coulomb potential, we can describe direct charge-charge interactions and all significant higher-order cavity polarization effects between low dielectric spherical cavities containing their charges, while treating these higher order terms correctly at all separation distances. Second, our analytical solution is general to arbitrary numbers of macromolecules, is efficient to compute, and can therefore simultaneously provide on-the-fly updates to changes in charge distributions due to protein conformational changes. Third, we can change spatial resolutions of charge description as a function of separation distance without compromising the desired accuracy. While the current formulation describes solutions based on simple spherical geometries, it appears possible to reformulate these electrostatic expressions to smoothly increase spatial resolution back to greater molecular detail of the dielectric boundaries. 




Bulk Water and Aqueous Hydration

Water structure controversy 

It has been suggested, based on x-ray absorption spectroscopy (XAS) experiments on liquid water (Wernet et al, Science 2004) that each water molecule, on average, has only one hydrogen bond donor and in turn accepts only one hydrogen bond. The larger implication of the XAS result is that the conventional view of water organizing as a four-fold tetrahedral coordinated random network is not true, and instead water organizes as hydrogen-bonded chains or large rings embedded in a weakly hydrogen-bonded disordered network. This is a radical departure from what is known about liquid water, which is thought to belong to the class of tetrahedral liquids such as silica and germanium.

This alternative structural view potentially impacts previous interpretations of experimental and theoretical work on water, ice, tetrahedral and associated liquids, and educators who teach students about hydrogen-bonding of the world’s most important liquid and chemical bonding in general. Given the importance of water as a solvent, there are also broad implications for biological molecules, the design of novel materials, and experimental probes that yield fundamental signatures of water as evidence of life on other planets. Given the broader scientific and educational context, radically alternative structural interpretations of liquid water need to be challenged.

A vast array of experimental data on water provides a global view of the liquid that implicates its tetrahedral hydrogen-bonding network as the unifying molecular connection to its observed structural, thermodynamic, and dielectric property trends with temperature. Anyone who advocates an alternative structural picture for liquid water must consider this other, non-structural, data. Although we firmly did not think it possible that chain networks could be consistent with these known liquid water trends with temperature, there is no existing evidence to directly refute such a possibility. Therefore we decided to examine the consequences of. chain networks using three different modified water models that exhibit a local hydrogen-bonding environment of two hydrogen-bonds (2HB) and therefore networks of chains. Using these very differently parameterized models we evaluate their bulk densities, enthalpies of vaporization, heat capacities, isothermal compressibilities, thermal expansion coefficients, and dielectric constants, over the temperature range of 235K-323K. We also evaluate the entropy of the 2HB models at room temperature and whether such models nucleate ice Ih. All show poor agreement with experimentally measured thermodynamic and dielectric properties over the same temperature range, and behave similarly in most respects to normal liquids. This is to be contrasted by many modern simulation models of water that reproduce experimentally determined thermodynamic, dynamic, and dielectric trends with temperature. These models yield liquid structure that shows significant tetrahedral order and increased hydrogen-bonding than advocated by Wernet et al. Thus it appears that water structure based on hydrogen-bonded chains is inconsistent with liquid water as we know it through a multitude of experiments.

An alternative structure for liquid water based on chain networks should certainly have been anticipated to be controversial, but seemed to go directly at the goal of overturning conventional wisdom. Similar scientific excitement must have existed in the early days of the discovery of “polywater”, but eventually that alternative view proved to be false because the characterized water contained chemical impurities that ultimately explained polywater’s unusual, and un-water-like, properties. It would seem that the challenge of knowing whether a new structural view of water is correct is to first try to reconcile whether it “fits” into a larger experimental context of other structural, thermodynamic, dynamical, and dielectric data collected over decades by many able scientists.  Theory and simulation can more directly address “what is” the global view of the liquid and its phases, thereby providing a better reference state for investigating “what is not”.