Sturtevant S Hypothesis Statement


The validity of the thermodynamic hypothesis of protein folding was explored by simulating the evolution of protein sequences. Simple models of lattice proteins were allowed to evolve by random point mutations subject to the constraint that they fold into a predetermined native structure with a Monte Carlo folding algorithm. We employed a simple analytical approach to compute the probability of violation of the thermodynamic hypothesis as a function of the size of the protein, the fraction of the total number of possible conformations which are kinetically accessible, and the roughness of the free-energy landscape. It was found that even if the folding is under kinetic control, the sequence will evolve so that the native state is most often the state of minimum free energy.

Understanding how proteins fold not only is one of the most interesting theoretical problems in molecular biophysics but also has far-reaching medical and biotechnological consequences. Levinthal pointed out that it is impossible for an unfolded protein to find the native state by randomly searching through the entire space of possible conformations (1). This led him to postulate that a protein must follow a specific path that guides it to the native state, and therefore folding must be under kinetic control. According to him, “If the final folded state turned out to be the one of lowest configurational energy, it would be a consequence of biological evolution and not of physical chemistry” (2). In contrast, Anfinsen concluded from the results of his numerous denaturation–renaturation experiments that the native state of the protein is indeed the global minimum of free energy, a conjecture that he called the thermodynamic hypothesis of protein folding (3).

The debate between these two viewpoints has continued, with numerous experimentalists and theoreticians investigating whether proteins reach their global energy minimum in a pathway-independent manner under thermodynamic control, or whether they follow a specific pathway to a possibly local minimum under kinetic control. Experiments have suggested that some small monodomain proteins obey the thermodynamic hypothesis (4, 5). There are also examples of proteins where the active state is not the thermodynamically most stable state. For instance, the plasminogen activator inhibitor (PAI-1) active conformation is metastable, and the protein takes a more stable inactive conformation within hours (6). Similar observations have been made with other members of the serpin family. Protein misfolding, as is the case in many diseases such as Alzheimer’s disease, Creutzfeldt–Jakob disease, and bovine spongiform encephalopathy, have been attributed to kinetic traps or folding to an alternate state of lower energy (7). Similarly, it has been possible to modify proteins so that they are no longer able to fold, although the stability of the native state is unaltered (8, 9).

On the theoretical side, Thirumalai used molecular dynamics simulations to support his hypothesis that proteins fold into a metastable state (10). Shakhnovich and co-workers questioned whether a protein could consistently find the same native state by using this mechanism (11), and they proposed that a sufficiently large “energy gap” separating the native state from others was a necessary and sufficient condition for rapid folding (12), consistent with the results of simple models based on spin-glass theory (13, 14). Such an energy gap would necessarily imply the thermodynamic hypothesis. In other work, Shakhnovich showed that lattice proteins under strong evolutionary pressure to fold as quickly as possible evolve so that the native state is a deep and global minimum (15). On the other hand, proteins have not evolved to fold at the maximum possible rate. In addition, in lattice models almost the entire conformation space is kinetically accessible, so the absence of other deeper minima is not surprising. Onuchic, Wolynes, and Dill and their co-workers postulated that the distinguishing characteristic of foldable proteins is the existence of a “folding funnel” that directs the folding protein into the native state without the need for a definite pathway (16–18). This approach leaves open the possibility of kinetically inaccessible lower-energy states outside of the folding funnel. More elaborate theoretical models have been developed that have further explored the relationship between the free-energy landscape and the folding kinetics, including the role of traps and intermediates in the folding process (19–21).

In this paper we attempt to address the challenge posed by Levinthal, and investigate whether the thermodynamic hypothesis could result through the process of protein evolution. During evolution, protein sequences undergo random mutations. If the mutation does not interfere with the folding and function of the protein, it is possible for this mutation to be accepted and fixed in the population. Mutations that destabilize the native state are likely to interfere with successful folding, and thus will have a low acceptance rate. Conversely, mutations that destabilize alternative deep minima are more likely to be accepted. With evolutionary time, energy minima representing nonnative states are likely to become higher in energy than the native state, so that the thermodynamic hypothesis becomes fulfilled. This can occur even if the folding is under kinetic control.

To test this hypothesis, we performed simulations of protein evolution by using simple lattice model proteins. We first designed protein sequences that fold under kinetic control to a native state different from the ground state of lowest energy. We then allowed the protein to evolve by random mutations. Mutations were accepted only if the protein still folded into the original native conformation. At each generation we calculated the energies of the native state, the initially designed ground state, and the current ground state of the sequence. As expected, the native state generally became the ground state, although there was some fraction of the time where this was not the case. We then developed a simple analytical model to estimate the probability that a protein under kinetic control would obey the thermodynamic hypothesis.


Lattice Model.

We used a simple model of 16-residue proteins confined to a two-dimensional lattice, as shown in Fig. 1. Each residue occupies one lattice site in a square lattice with an excluded volume and lattice parameter of unit length. All 802,075 compact and noncompact conformations were enumerated, enabling us to identify the global energy minimum by evaluating the energies of all possible conformations.

Figure 1

Examples of the two-dimensional lattice model used in this study, representing the target native structure (a) and target ground state structure (b) chosen for the simulations. The dotted line in conformation b indicates the interaction between residue 13 and 16 that is designed into the initial sequence as an unfavorable contact.

We used a simple energy function of the form 1 where γ(𝒜i𝒜j) is the contact potential between residue type 𝒜i at position i and residue type 𝒜j at position j, and Δij is equal to one if residues i and j are not adjacent in sequence but are on adjacent lattice sites, and zero otherwise. We used statistical potentials of interaction between two residues as derived by Miyazawa and Jernigan, which implicitly includes the effect of interaction of the residues with the solvent (table VI in ref. 22). The global energy minimum state for any sequence is the conformation of minimum energy among the 802,075 conformations. We used the index q to quantify the similarity between any two structures, equal to the fraction of the total number of contacts that are common between the structures (0 ≤ q ≤ 1), with pairs of identical structures having q = 1.

Folding Kinetics.

For a given sequence of amino acids, folding simulations were carried out to determine whether the protein could successfully find the target native state. At each Monte Carlo time step a local conformational change was made [tail-wag, corner move, or crankshaft rotation (23)] and the resulting new conformation was accepted with a probability P based on the Metropolis algorithm: 2 where Enew is the energy of resulting conformation after the Monte Carlo move and Eold is the energy of the existing conformation, and the Boltzmann constant has been set equal to one. We chose a working temperature of T = 0.085 where the folding kinetics were relatively rapid yet the native state was sufficiently stable so that the folded state population would weakly dominate that of the unfolded states during the later part of the simulations. Each simulation was carried out for 10 million time steps.

The protein was considered to successfully fold to the native state if the sequence adopted the target structure or a structure with sufficient similarity to it [q ≥ 0.88 (=8/9)], for more than 50% of the time during the final 2 million time steps of the simulation in at least five of ten simulation runs, each run starting from a random initial conformation. We also performed the simulations with a stricter folding criterion, where the protein had to be in the target state or in a state with q ≥ 0.88 for 70% of the time in at least seven of ten different runs. As results from both folding measures yielded similar qualitative results, we focus on the runs made under the first criterion.

Sequence Evolution.

We created an initial starting sequence that folded into a metastable native state that differed from a target ground state, violating the thermodynamic hypothesis. The target native state and ground state chosen in our study are shown in Fig. 1. The ground state was designed to be kinetically inaccessible by making the interaction between residues 13 and 16 unfavorable, preventing formation of the central nucleus. Starting from a random sequence, the initial sequence was designed by using a simple hill-climbing algorithm by changing the amino acid residues at random and accepting the change if it lowered the harmonic mean of the energy of the target native state (ENS) and the energy of the target ground state (EGS) while keeping EGS at a lower value than ENS. The residues at position 13 and 16 were fixed during this search. This optimization was performed until the target ground state became the global energy minimum. We generated three different sequences that successfully folded into the target native state in at least seven of ten runs, spending more than 85% of the time during those runs in the native state. We then made random mutations in the sequence and performed kinetic simulations of the resulting sequence to see whether the protein could still successfully fold to the same native state, before deciding whether or not to accept the mutation. Simulations of evolution based on the kinetic considerations on the folding of the protein were carried out for two cases, one where the restrictions on the residues at position 13 and 16 were maintained throughout the evolutionary runs and the other where it was relaxed after the initial generation.


Five different evolutionary runs were carried out for each of the three initial sequences where all residues were allowed to mutate, each run consisting of 150 generations proceeded by a 100-generation pre-equilibrium. Fig. 2 shows the energies of the native state, the nonnative state with the lowest energy, and the initial ground state conformation in three of the evolutionary runs. The generations where the nonnative state with the lowest energy is below the energy of the native state corresponds to the violation of the thermodynamic hypothesis. After the pre-equilibration, the average percentage of time when the hypothesis holds was 93.4% for sequence 1, 92.1% for sequence 2, and 87.0% for sequence 3. Fig. 2a represents a typical result, where the percentage of time the hypothesis holds is about 90%. Fig. 2b and c represents extreme cases, the former showing no violations of the thermodynamic hypothesis after pre-equilibration, and the latter violating this hypothesis for approximately one-third of the simulation. When the thermodynamic hypothesis failed, the nonnative ground state was in general not similar to the native state, with a q value between these states of 0.43, comparable to the average q value of 0.38 between any two random semicompact structures having at least 7 of a possible 9 interresidue contacts.

Figure 2

The energies of the native state (solid line), nonnative state of lowest energy (dotted line), and initial ground state(dashed line) for various generations for three different runs. (a) Representative plot for a typical simulation. (b and c) The extreme cases for which the hypothesis is least (b) and most (c) violated.

Our criterion for successful folding was quite generous. As we would expect, a more stringent criterion decreased the fraction of the time when the thermodynamic hypothesis was violated. Defining successful folding as reaching a native-like state 70% of the time for 7/10 of the folding simulations increased the percentage of time when the hypothesis was fulfilled to 94%.

Another set of five runs, three with sequence 1 and one each with sequences 2 and 3, were carried out where the designed unfavorable contact between residues 13 and 16 in the initial sequences was maintained throughout the run. In this manner, not all of the conformations would be kinetically accessible. The thermodynamic hypothesis held for these runs 87% of the time.

As indicated by these simulations, there is a finite probability for any random structure to have a lower energy than the native structure. This structure will not interfere with the folding process as long as this state is kinetically inaccessible on the time-scale of folding. Because of the astronomically large number of conformations available to a protein, it is possible that a significant fraction of the conformation space is not kinetically accessible. There is therefore some probability that a conformation in this fraction will be lower in energy than the native state, and the thermodynamic hypothesis will be violated. To explore this issue in more detail we used the Random Energy Model (24) and estimated the probability that the thermodynamic hypothesis is violated by considering the probability of finding a state with a lower energy than that of the native state among the kinetically inaccessible conformations. In this model we assume the simplest possible criterion for foldability: that the native state is sufficiently stable with respect to all of the other accessible conformations. While this criterion can be justified on the basis of ideas borrowed from the physics of spin glasses (13, 14, 25) and has been supported by lattice simulations (11, 12), it is significantly simpler than the more sophisticated models that have been developed by other researchers (19–21, 26, 27).

The total number of conformational states for a protein of length N is given by esN, where s is the effective entropy per residue. All of these available states cannot be explored during the folding time-scale, and some may be inaccessible because of the presence of kinetic barriers. Let us assume that a fraction ρ of all conformations are kinetically accessible. The number of states kinetically accessible to the protein chain is then ρesN. Following the random energy model, the probability density of unfolded states with energy E, n(E), was represented as a Gaussian centered at E = 0 3 where Γ is the width of the distribution of energies of the various conformational states, which is related to the degree of ruggedness in the protein energy landscape (13). This model explicitly neglects any correlations between the energy levels of different conformations, including conformations that share structural similarities, an issue that has been explored by other investigators (21).

Assuming that the accessible states are at equilibrium, the probability that the protein is in the ground state at any time, PNS, is given by the Boltzmann expression 4 in which we have neglected the possibility of nonnativelike outliers in n(E). As such outliers would greatly reduce PNS, they can reasonably be expected to be absent among the kinetically accessible conformations in natural proteins. If we have an estimate of PNS and Γ, we can use this equation to solve for ENS as a function of ρ for any given N.

Because the native state is the state of lowest energy among accessible conformations, we can estimate the probability of having an inaccessible state with energy less than ENS, meaning that the thermodynamic hypothesis is violated. Let us assume that the distribution of energies of the inaccessible states depends mostly on the overall composition of the protein, and thus is not overly different from the distribution of energies of the nonnative accessible states represented in Eq. 3. P(∃E < ENS), the probability of having a state with energy less than ENS among the (1 − ρ)esN inaccessible states, is then equal to one minus the probability that all of the inaccessible states have energy higher than ENS5 On the basis of hydrogen-exchange experiments, the ΔG for the global unfolding process for cytochrome c has been estimated to be −13.0 kcal/mol at 30°C (28). This provides us an estimate of PNS as a function of ρ. Wolynes and co-workers have estimated the effective conformational entropy s to be approximately 0.6 per monomer unit (29). They also made an estimate of Γ2/Tf2 as ranging between 22 and 36, consistent with values obtained from other experiments (29). Using these numbers, we calculated the probability that the thermodynamic hypothesis is violated for a 120-residue protein as a function of ρ, for different values of Γ2/Tf2, as shown in Fig. 3a. Fig. 3b shows the probability of violation as a function of ρ for various protein lengths, for Γ2/Tf2 equal to 36.

Figure 3

Logarithm of the probability of violation of the hypothesis P(∃E < ENS) as a function of ρ. (a) For various Γ2/Tf2, the ruggedness of the protein energy landscape, for a protein 120 residues long. (b) For various N, the length of the protein chain for Γ2/Tf2 = 36.

As can be seen in these plots, the probability that the thermodynamic hypothesis is satisfied is quite large as long as more than a minuscule fraction of the conformational landscape is accessible. It might be expected that the longer the chain of the protein, the greater is the possibility of finding an alternate state lower in energy than that of the native state. Contrary to this expectation, increasing the length of the protein for a constant ρ increases the number of accessible states, requiring a further decrease in ENS to maintain the population of the native state, decreasing the possibility of violation of the thermodynamic hypothesis. This effect might be decreased or reversed if ρ increases with increasing N.


The thermodynamic hypothesis is a statement concerning the nature of the native state, that this native state represents the ground state of lowest free energy. This hypothesis does not necessarily imply thermodynamic control of the folding process, where folding occurs to the native state because it is the conformation of lowest free energy. In contrast, the assumption of kinetic control is a statement about how the native state is determined by the folding process. If the thermodynamic hypothesis is violated, then the folding must be under kinetic control. The inverse is not necessarily true—it is possible, as illustrated here, for the folding to be under kinetic control and for the thermodynamic hypothesis to be satisfied. As a consequence, there is no conflict between kinetic control and the thermodynamic hypothesis, and demonstrations of kinetic control do not necessarily demonstrate that the thermodynamic hypothesis is wrong.

There is a tendency to expect that the results of evolution necessarily represent adaptation. In actuality, evolution can result in many modifications that themselves do not give comparative advantage, as is the situation in our model. Even if folding is under kinetic control and there is no evolutionary advantage to a protein satisfying the thermodynamic hypothesis, the process of random mutation may still result in the native state becoming the state of lowest energy. Levinthal may have suggested a way in which Anfinsen’s conclusions can be justified.

The current model focuses on the selective pressure acting of the protein to form a stable structure, which is only one of the factors necessary for the protein to fulfill its specific functional role. Other investigators have investigated the complementary problem of modeling protein evolution, given strong constraints on preserving functionality (30). In particular, there may be cases where there is an evolutionary advantage to not fulfilling the thermodynamic hypothesis, or where metastability is a consequence of some functional need. The instability of the active conformation of the plasminogen activator 1 (PAI-1) is believed, for instance, to have a selective advantage (6). In this case, our model would not be applicable. Similarly, this model would not hold for proteins that have been modified in the laboratory and are thus not the product of natural evolution (8, 9).

The thermodynamic hypothesis has been the basis behind many approaches to predict protein structure, by looking for the conformation of lowest free energy. Some methods, such as genetic and threading algorithms and landscape smoothing, employ search strategies not available to the protein in its own search (31–35). Our results suggest that these methods may be appropriate, even if the native state of the protein is determined by kinetic considerations.


We thank Kurt Hillig for computational assistance and Nicolas Buchler for helpful discussions. Financial support was provided by the College of Literature, Science, and the Arts, the Horace H. Rackham School of Graduate Studies, National Institutes of Health Grant LM05770, and National Science Foundation Equipment Grant BIR9512955.


  • ↵‡ To whom reprint requests should be addressed. e-mail: richardg{at}

  • This paper was submitted directly (Track II) to the Proceedings Office.

  • Received October 20, 1997.
  • Copyright © 1998, The National Academy of Sciences


A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. For example, a study designed to look at the relationship between sleep deprivation and test performance might have a hypothesis that states, "This study is designed to assess the hypothesis that sleep-deprived people will perform worse on a test than individuals who are not sleep deprived."

Let's take a closer look at how a hypothesis is used, formed, and tested in scientific research.

How Is a Hypothesis Used in the Scientific Method?

In the scientific method, whether it involves research in psychology, biology, or some other area, a hypothesis represents what the researchers think will happen in an experiment.

The scientific method involves the following steps:

  1. Forming a question
  2. Performing background research
  3. Creating a hypothesis
  4. Designing an experiment
  5. Collecting data
  6. Analyzing the results
  7. Drawing conclusions
  8. Communicating the results

The hypothesis is what the researchers' predict the relationship between two or more variables, but it involves more than a guess. Most of the time, the hypothesis begins with a question which is then explored through background research. It is only at this point that researchers begin to develop a testable hypothesis.

In a study exploring the effects of a particular drug, the hypothesis might be that researchers expect the drug to have some type of effect on the symptoms of a specific illness.

In psychology, the hypothesis might focus on how a certain aspect of the environment might influence a particular behavior.

Unless you are creating a study that is exploratory in nature, your hypothesis should always explain what you expect to happen during the course of your experiment or research.

Remember, a hypothesis does not have to be correct. While the hypothesis predicts what the researchers expect to see, the goal of the research is to determine whether this guess is right or wrong. When conducting an experiment, researchers might explore a number of factors to determine which ones might contribute to the ultimate outcome.

In many cases, researchers may find that the results of an experiment do not support the original hypothesis. When writing up these results, the researchers might suggest other options that should be explored in future studies.

How Do Researchers Come Up With a Hypothesis?

There are many ways to come up with a hypothesis. In many cases, researchers might draw a hypothesis from a specific theory or build on previous research.

For example, prior research has shown that stress can impact the immune system. So a researcher might for a specific hypothesis that: "People with high-stress levels will be more likely to contract a common cold after being exposed to the virus than are people who have low-stress levels."

In other instances, researchers might look at commonly held beliefs or folk wisdom.

"Birds of a feather flock together" is one example of folk wisdom that a psychologist might try to investigate.

The researcher might pose a specific hypothesis that "People tend to select romantic partners who are similar to them in interests and educational level."

Elements of a Good Hypothesis

When trying to come up with a good hypothesis for your own research or experiments, ask yourself the following questions:

  • Is your hypothesis based on your research of a topic?
  • Can your hypothesis be tested?
  • Does your hypothesis include independent and dependent variables?

Before you come up with a specific hypothesis, spend some time doing background research on your topic. Once you have completed a literature review, start thinking about potential questions you still have.

Pay attention to the discussion section in the journal articles you read. Many authors will suggest questions that still need to be explored.

How to Form a Hypothesis

The first step of a psychological investigation is to identify an area of interest and develop a hypothesis that can then be tested. While a hypothesis is often described as a hunch or a guess, it is actually much more specific. A hypothesis can be defined as an educated guess about the relationship between two or more variables.

For example, a researcher might be interested in the relationship between study habits and test anxiety.

They would then propose a hypothesis about how these two variables are related, such as "test anxiety decreases as a result of effective study habits."

In order to form a hypothesis, you should:

  • Start by collecting as many observations about something as you can
  • Next, it is important to evaluate these observations and look for possible causes of the problem
  • Create a list of possible explanations that you might want to explore
  • After you have developed some possible hypotheses, it is important to think of ways that you could confirm or disprove each hypothesis through experimentation.

In the scientific methodfalsifiability is an important part of any valid hypothesis. This does not mean that the hypothesis is false; instead, it suggests that if the hypothesis were false, researchers could demonstrate this falsehood.

In order to test a claim scientifically, it must be possible that the claim could also be proven false. One of the hallmarks of a pseudoscience is that it makes claims that cannot be refuted or proven false.

Students sometimes confuse the idea of falsifiability with the idea that it means that something is false, which is not the case. What falsifiability means is that if something was false, then it is possible to demonstrate that it is false.

The Role of Operational Definitions

In the previous example, study habits and test anxiety are the two variables in this imaginary study. A variable is a factor or element that can be changed and manipulated in ways that are observable and measurable. However, the researcher must also define exactly what each variable is using what is known as operational definitions. These definitions explain how the variable will be manipulated and measured in the study.

In our previous example, a researcher might operationally define the variable 'test anxiety' as the results on a self-report measure of anxiety experienced during an exam. The variable ‘study habits’ might be defined by the amount of studying that actually occurs as measured by time.

Why do psychologists and other researchers need to provide operational definitions for each variable? These precise descriptions of each variable are important because many things can be measured in a number of different ways. One of the basic principles of any type of scientific research is that the results must be replicable. By clearly detailing the specifics of how the variables were measured and manipulated, other researchers can better understand the results and repeat the study if needed.

Some variables are more difficult than others to define.

How would you operationally define a variable such as aggression? For obvious ethical reasons, researchers cannot create a situation in which a person behaves aggressively toward others. In order to measure this variable, the researcher must devise a measurement that assesses aggressive behavior without harming other people. In this situation, the researcher might utilize a simulated task to measure aggressiveness.


A hypothesis often follows a basic format of "If {this happens} then {this will happen}." One way to structure your hypothesis is to describe what will happen to the dependent variable if you make changes to the independent variable.

The basic format might be:

"If {these changes are made to a certain independent variable}, then we will observe {a change in a specific dependent variable}."

A few examples:

  • "Students who eat breakfast will perform better on a math exam than students who do not eat breakfast."
  • "Students who experience test anxiety prior to an English exam will get higher scores than students who do not experience test anxiety."
  • "Motorists who talk on the phone while driving will be more likely to make errors on a driving course than those who do not talk on the phone."

A Hypothesis Checklist

  • Does your hypothesis focus on something that you can actually test?
  • Does your hypothesis include both an independent and dependent variable?
  • Can you manipulate the variables?
  • Can your hypothesis be tested without violating ethical standards?

The Next Step: Collecting Data

Once a researcher has formed a testable hypothesis, the next step is to select a research design and start collecting data. The research method a researcher chooses depends largely on exactly what they are studying.

There are two basic types of research methods—descriptive research and experimental research.

Descriptive Research Methods

Descriptive research such as case studies, naturalistic observations and surveys are often used when it would be impossible or difficult to conduct an experiment. These methods are best used to describe different aspects of a behavior or psychological phenomenon. Once a researcher has collected data using descriptive methods, a correlational study can then be used to look at how the variables are related. This type of research method might be used to investigate a hypothesis that is difficult to test experimentally.

Experimental Research Methods

Experimental methods are used to demonstrate causal relationships between variables. In an experiment, the researcher systematically manipulates a variable of interest (known as the independent variable) and measures the effect on another variable (known as the dependent variable). Unlike correlational studies, which can only be used to determine if there is a relationship between two variables, experimental methods can be used to determine the actual nature of the relationship. That is to say that if changes in one variable actually cause another to change.

A Word From Verywell

The hypothesis is a critical part of any scientific exploration. It represents what researchers expect to find in a study or experiment. In some cases, the original hypothesis will be supported and the researchers will find evidence supporting their expectations about the nature of the relationship between different variables. In other situations, the results of the study might fail to support the original hypothesis.

Even in situations where the hypothesis is unsupported by the research, this does not mean that the research is without value. Not only does such research help us better understand how different aspects of the natural world relate to one another, it also helps us develop new hypotheses that can then be tested in future research.


Nevid, J. Psychology: Concepts and applications. Belmont, CA: Wadworth; 2013.

0 Thoughts to “Sturtevant S Hypothesis Statement

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *