How are mutations beneficial

How are mutations beneficial DEFAULT

Mutation Effects

  1. Last updated
  2. Save as PDF

Is this rat hairless?

Yes. Why? The result of a mutation, a change in the DNA sequence. The effects of mutations can vary widely, from being beneficial, to having no effect, to having lethal consequences, and every possibility in between.

Effects of Mutations

The majority of mutations have neither negative nor positive effects on the organism in which they occur. These mutations are called neutral mutations. Examples include silent point mutations. They are neutral because they do not change the amino acids in the proteins they encode.

Many other mutations have no effect on the organism because they are repaired beforeprotein synthesis occurs. Cells have multiple repair mechanisms to fix mutations in DNA. One way DNA can be repaired is illustrated in Figurebelow. If a cell’s DNA is permanently damaged and cannot be repaired, the cell is likely to be prevented from dividing.

DNA repair pathway

DNA Repair Pathway. This flow chart shows one way that damaged DNA is repaired in E. coli bacteria.

Beneficial Mutations

Some mutations have a positive effect on the organism in which they occur. They are calledbeneficial mutations. They lead to new versions of proteins that help organisms adapt to changes in their environment. Beneficial mutations are essential for evolution to occur. They increase an organism’s changes of surviving or reproducing, so they are likely to become more common over time. There are several well-known examples of beneficial mutations. Here are just two:

  1. Mutations in many bacteria that allow them to survive in the presence of antibiotic drugs. The mutations lead to antibiotic-resistant strains of bacteria.
  2. A unique mutation is found in people in a small town in Italy. The mutation protects them from developing atherosclerosis, which is the dangerous buildup of fatty materials in blood vessels. The individual in which the mutation first appeared has even been identified.

Harmful Mutations

Imagine making a random change in a complicated machine such as a car engine. The chance that the random change would improve the functioning of the car is very small. The change is far more likely to result in a car that does not run well or perhaps does not run at all. By the same token, any random change in a gene's DNA is likely to result in a protein that does not function normally or may not function at all. Such mutations are likely to be harmful. Harmful mutations may cause genetic disorders or cancer.

  • A genetic disorder is a disease caused by a mutation in one or a few genes. A human example is cystic fibrosis. A mutation in a single gene causes the body to produce thick, sticky mucus that clogs the lungs and blocks ducts in digestive organs. You can watch a video about cystic fibrosis and other genetic disorders at this link: ().
  • Cancer is a disease in which cells grow out of control and form abnormal masses of cells. It is generally caused by mutations in genes that regulate the cell cycle. Because of the mutations, cells with damaged DNA are allowed to divide without limits. Cancer genes can be inherited. You can learn more about hereditary cancer by watching the video at the following link: ()


  • Mutations are essential for evolution to occur because they increase genetic variation and the potential for individuals to differ.
  • The majority of mutations are neutral in their effects on the organisms in which they occur.
  • Beneficial mutations may become more common through natural selection.
  • Harmful mutations may cause genetic disorders or cancer.

Making Connections

Explore More

Explore More I

Use these resources to answer the questions that follow.

  1. Define genetic disorders.
  2. What are the two primary types of genetic aberrations?
  3. What is a carrier?

Explore More II

  1. What are the results of a mutation or defect in a single gene?
  2. Describe the causes and effects of cystic fibrosis, Huntington's Disease, and hemophilia.

Explore More III

  1. What is a chromosomal disorder?
  2. When and how do chromosomal errors occur?
  3. Describe an inversion and translocation.
  4. Describe the causes of Cri-du-chat Syndrome and Down Syndrome.


  1. Why are mutations essential for evolution to occur?
  2. What is a genetic disorder?
  3. What is cancer? What usually causes cancer?

So, how do mutations occur? The answer to this question is closely linked to the molecular details of how both DNA and the entire genome are organized. The smallest mutations are point mutations, in which only a single base pair is changed into another base pair. Yet another type of mutation is the nonsynonymous mutation, in which an amino acid sequence is changed. Such mutations lead to either the production of a different protein or the premature termination of a protein.

As opposed to nonsynonymous mutations, synonymous mutations do not change an amino acid sequence, although they occur, by definition, only in sequences that code for amino acids. Synonymous mutations exist because many amino acids are encoded by multiple codons. Base pairs can also have diverse regulating properties if they are located in introns, intergenic regions, or even within the coding sequence of genes. For some historic reasons, all of these groups are often subsumed with synonymous mutations under the label "silent" mutations. Depending on their function, such silent mutations can be anything from truly silent to extraordinarily important, the latter implying that working sequences are kept constant by purifying selection. This is the most likely explanation for the existence of ultraconserved noncoding elements that have survived for more than million years without substantial change, as found by comparing the genomes of several vertebrates (Sandelin et al., ).

Mutations may also take the form of insertions or deletions, which are together known as indels. Indels can have a wide variety of lengths. At the short end of the spectrum, indels of one or two base pairs within coding sequences have the greatest effect, because they will inevitably cause a frameshift (only the addition of one or more three-base-pair codons will keep a protein approximately intact). At the intermediate level, indels can affect parts of a gene or whole groups of genes. At the largest level, whole chromosomes or even whole copies of the genome can be affected by insertions or deletions, although such mutations are usually no longer subsumed under the label indel. At this high level, it is also possible to invert or translocate entire sections of a chromosome, and chromosomes can even fuse or break apart. If a large number of genes are lost as a result of one of these processes, then the consequences are usually very harmful. Of course, different genetic systems react differently to such events.

Finally, still other sources of mutations are the many different types of transposable elements, which are small entities of DNA that possess a mechanism that permits them to move around within the genome. Some of these elements copy and paste themselves into new locations, while others use a cut-and-paste method. Such movements can disrupt existing gene functions (by insertion in the middle of another gene), activate dormant gene functions (by perfect excision from a gene that was switched off by an earlier insertion), or occasionally lead to the production of new genes (by pasting material from different genes together).

  1. Coal seam gas wikipedia
  2. Trucks for sale escondido
  3. Kurulus osman new episode
  4. Eurovision 2017 uk entry
  5. Smooth operator lyrics

4 beneficial evolutionary mutations that humans are undergoing right now

Most random genetic changes caused by evolution are neutral, and some are harmful, but a few turn out to be positive improvements. These beneficial mutations are the raw material that may, in time, be taken up by natural selection and spread through the population. In this post, I&#;ll list some examples of beneficial mutations that are known to exist in human beings.

Beneficial mutation #1: Apolipoprotein AI-Milano

Heart disease is one of the scourges of industrialized countries. It&#;s the legacy of an evolutionary past which programmed us to crave energy-dense fats, once a rare and valuable source of calories, now a source of clogged arteries. But there&#;s evidence that evolution has the potential to deal with it.

All humans have a gene for a protein called Apolipoprotein AI, which is part of the system that transports cholesterol through the bloodstream. Apo-AI is one of the HDLs, already known to be beneficial because they remove cholesterol from artery walls. But a small community in Italy is known to have a mutant version of this protein, named Apolipoprotein AI-Milano, or Apo-AIM for short. Apo-AIM is even more effective than Apo-AI at removing cholesterol from cells and dissolving arterial plaques, and additionally functions as an antioxidant, preventing some of the damage from inflammation that normally occurs in arteriosclerosis. People with the Apo-AIM gene have significantly lower levels of risk than the general population for heart attack and stroke, and pharmaceutical companies are looking into marketing an artificial version of the protein as a cardioprotective drug.

There are also drugs in the pipeline based on a different mutation, in a gene called PCSK9, which has a similar effect. People with this mutation have as much as an 88% lower risk of heart disease.

Beneficial mutation #2:Increased bone density

One of the genes that governs bone density in human beings is called low-density lipoprotein receptor-related protein 5, or LRP5 for short. Mutations which impair the function of LRP5 are known to cause osteoporosis. But a different kind of mutation can amplify its function, causing one of the most unusual human mutations known.

This mutation was first discovered fortuitously, when a young person from a Midwest family was in a serious car crash from which they walked away with no broken bones. X-rays found that they, as well as other members of the same family, had bones significantly stronger and denser than average. (One doctor who&#;s studied the condition said, &#;None of those people, ranging in age from 3 to 93, had ever had a broken bone.&#;) In fact, they seem resistant not just to injury, but to normal age-related skeletal degeneration. Some of them have benign bony growths on the roof of their mouths, but other than that, the condition has no side effects &#; although, as the article notes dryly, it does make it more difficult to float. As with Apo-AIM, some drug companies are researching how to use this as the basis for a therapy that could help people with osteoporosis and other skeletal diseases.

Beneficial mutation #3:Malaria resistance

The classic example of evolutionary change in humans is the hemoglobin mutation named HbS that makes red blood cells take on a curved, sickle-like shape. With one copy, it confers resistance to malaria, but with two copies, it causes the illness of sickle-cell anemia. This is not about that mutation.

As reported in (see also), Italian researchers studying the population of the African country of Burkina Faso found a protective effect associated with a different variant of hemoglobin, named HbC. People with just one copy of this gene are 29% less likely to get malaria, while people with two copies enjoy a 93% reduction in risk. And this gene variant causes, at worst, a mild anemia, nowhere near as debilitating as sickle-cell disease.

Beneficial mutation #4:Tetrachromatic vision

Most mammals have poor color vision because they have only two kinds of cones, the retinal cells that discriminate different colors of light. Humans, like other primates, have three kinds, the legacy of a past where good color vision for finding ripe, brightly colored fruit was a survival advantage.

The gene for one kind of cone, which responds most strongly to blue, is found on chromosome 7. The two other kinds, which are sensitive to red and green, are both on the X chromosome. Since men have only one X, a mutation which disables either the red or the green gene will produce red-green colorblindness, while women have a backup copy. This explains why this is almost exclusively a male condition.

But here&#;s a question: What happens if a mutation to the red or the green gene, rather than disabling it, shifts the range of colors to which it responds? (The red and green genes arose in just this way, from duplication and divergence of a single ancestral cone gene.)

To a man, this would make no real difference. He&#;d still have three color receptors, just a different set than the rest of us. But if this happened to one of a woman&#;s cone genes, she&#;d have the blue, the red and the green on one X chromosome, and a mutated fourth one on the other&#; which means she&#;d have four different color receptors. She would be, like birds and turtles, a natural &#;tetrachromat&#;, theoretically capable of discriminating shades of color the rest of us can&#;t tell apart. (Does this mean she&#;d see brand-new colors the rest of us could never experience? That&#;s an open question.)

And we have evidence that just this has happened on rare occasions. In onestudy of color discrimination, at least one woman showed exactly the results we would expect from a true tetrachromat.


Image courtesy of iStock

Part 1: How Does New Genetic Information Evolve? Point Mutations


For other uses, see Mutation (disambiguation).

Alteration in the nucleotide sequence of a genome

A tulipflower exhibiting a partially yellow petal due to a mutation in its genes

In biology, a mutation is an alteration in the nucleotide sequence of the genome of an organism, virus, or extrachromosomal DNA.[1] Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA (such as pyrimidine dimers caused by exposure to ultraviolet radiation), which then may undergo error-prone repair (especially microhomology-mediated end joining[2]), cause an error during other forms of repair,[3][4] or cause an error during replication (translesion synthesis). Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.[5][6][7]

Mutations may or may not produce detectable changes in the observable characteristics (phenotype) of an organism. Mutations play a part in both normal and abnormal biological processes including: evolution, cancer, and the development of the immune system, including junctional diversity. Mutation is the ultimate source of all genetic variation, providing the raw material on which evolutionary forces such as natural selection can act.

Mutation can result in many different types of change in sequences. Mutations in genes can have no effect, alter the product of a gene, or prevent the gene from functioning properly or completely. Mutations can also occur in nongenic regions. A study on genetic variations between different species of Drosophila suggested that, if a mutation changes a protein produced by a gene, the result is likely to be harmful, with an estimated 70% of amino acidpolymorphisms that have damaging effects, and the remainder being either neutral or marginally beneficial.[8] Due to the damaging effects that mutations can have on genes, organisms have mechanisms such as DNA repair to prevent or correct mutations by reverting the mutated sequence back to its original state.[5]


Mutations can involve the duplication of large sections of DNA, usually through genetic recombination.[9] These duplications are a major source of raw material for evolving new genes, with tens to hundreds of genes duplicated in animal genomes every million years.[10] Most genes belong to larger gene families of shared ancestry, detectable by their sequence homology.[11] Novel genes are produced by several methods, commonly through the duplication and mutation of an ancestral gene, or by recombining parts of different genes to form new combinations with new functions.[12][13]

Here, protein domains act as modules, each with a particular and independent function, that can be mixed together to produce genes encoding new proteins with novel properties.[14] For example, the human eye uses four genes to make structures that sense light: three for cone cell or color vision and one for rod cell or night vision; all four arose from a single ancestral gene.[15] Another advantage of duplicating a gene (or even an entire genome) is that this increases engineering redundancy; this allows one gene in the pair to acquire a new function while the other copy performs the original function.[16][17] Other types of mutation occasionally create new genes from previously noncoding DNA.[18][19]

Changes in chromosome number may involve even larger mutations, where segments of the DNA within chromosomes break and then rearrange. For example, in the Homininae, two chromosomes fused to produce human chromosome 2; this fusion did not occur in the lineage of the other apes, and they retain these separate chromosomes.[20] In evolution, the most important role of such chromosomal rearrangements may be to accelerate the divergence of a population into new species by making populations less likely to interbreed, thereby preserving genetic differences between these populations.[21]

Sequences of DNA that can move about the genome, such as transposons, make up a major fraction of the genetic material of plants and animals, and may have been important in the evolution of genomes.[22] For example, more than a million copies of the Alu sequence are present in the human genome, and these sequences have now been recruited to perform functions such as regulating gene expression.[23] Another effect of these mobile DNA sequences is that when they move within a genome, they can mutate or delete existing genes and thereby produce genetic diversity.[6]

Nonlethal mutations accumulate within the gene pool and increase the amount of genetic variation.[24] The abundance of some genetic changes within the gene pool can be reduced by natural selection, while other "more favorable" mutations may accumulate and result in adaptive changes.

For example, a butterfly may produce offspring with new mutations. The majority of these mutations will have no effect; but one might change the color of one of the butterfly's offspring, making it harder (or easier) for predators to see. If this color change is advantageous, the chances of this butterfly's surviving and producing its own offspring are a little better, and over time the number of butterflies with this mutation may form a larger percentage of the population.

Neutral mutations are defined as mutations whose effects do not influence the fitness of an individual. These can increase in frequency over time due to genetic drift. It is believed that the overwhelming majority of mutations have no significant effect on an organism's fitness.[25][26] Also, DNA repair mechanisms are able to mend most changes before they become permanent mutations, and many organisms have mechanisms for eliminating otherwise-permanently mutated somatic cells.

Beneficial mutations can improve reproductive success.[27][28]


Main article: Mutagenesis

Four classes of mutations are (1) spontaneous mutations (molecular decay), (2) mutations due to error-prone replication bypass of naturally occurring DNA damage (also called error-prone translesion synthesis), (3) errors introduced during DNA repair, and (4) induced mutations caused by mutagens. Scientists may also deliberately introduce mutant sequences through DNA manipulation for the sake of scientific experimentation.

One study claimed that 66% of cancer-causing mutations are random, 29% are due to the environment (the studied population spanned 69 countries), and 5% are inherited.[29]

Humans on average pass 60 new mutations to their children but fathers pass more mutations depending on their age with every year adding two new mutations to a child.[30]

Spontaneous mutation[edit]

Spontaneous mutations occur with non-zero probability even given a healthy, uncontaminated cell. Naturally occurring oxidative DNA damage is estimated to occur 10, times per cell per day in humans and , times per cell per day in rats.[31] Spontaneous mutations can be characterized by the specific change:[32]

Error-prone replication bypass[edit]

There is increasing evidence that the majority of spontaneously arising mutations are due to error-prone replication (translesion synthesis) past DNA damage in the template strand. In mice, the majority of mutations are caused by translesion synthesis.[34] Likewise, in yeast, Kunz et al.[35] found that more than 60% of the spontaneous single base pair substitutions and deletions were caused by translesion synthesis.

Errors introduced during DNA repair[edit]

See also: DNA damage (naturally occurring) and DNA repair

Although naturally occurring double-strand breaks occur at a relatively low frequency in DNA, their repair often causes mutation. Non-homologous end joining (NHEJ) is a major pathway for repairing double-strand breaks. NHEJ involves removal of a few nucleotides to allow somewhat inaccurate alignment of the two ends for rejoining followed by addition of nucleotides to fill in gaps. As a consequence, NHEJ often introduces mutations.[36]

Induced mutation[edit]

Induced mutations are alterations in the gene after it has come in contact with mutagens and environmental causes.

Induced mutations on the molecular level can be caused by:

  • Chemicals
    • Hydroxylamine
    • Base analogs (e.g., Bromodeoxyuridine (BrdU))
    • Alkylating agents (e.g., N-ethyl-N-nitrosourea (ENU). These agents can mutate both replicating and non-replicating DNA. In contrast, a base analog can mutate the DNA only when the analog is incorporated in replicating the DNA. Each of these classes of chemical mutagens has certain effects that then lead to transitions, transversions, or deletions.
    • Agents that form DNA adducts (e.g., ochratoxin A)[38]
    • DNA intercalating agents (e.g., ethidium bromide)
    • DNA crosslinkers
    • Oxidative damage
    • Nitrous acid converts amine groups on A and C to diazo groups, altering their hydrogen bonding patterns, which leads to incorrect base pairing during replication.
  • Radiation

Whereas in former times mutations were assumed to occur by chance, or induced by mutagens, molecular mechanisms of mutation have been discovered in bacteria and across the tree of life. As S. Rosenberg states, "These mechanisms reveal a picture of highly regulated mutagenesis, up-regulated temporally by stress responses and activated when cells/organisms are maladapted to their environments—when stressed—potentially accelerating adaptation."[40] Since they are self-induced mutagenic mechanisms that increase the adaptation rate of organisms, they have some times been named as adaptive mutagenesis mechanisms, and include the SOS response in bacteria,[41] ectopic intrachromosomal recombination[42] and other chromosomal events such as duplications.[40]

Classification of types[edit]

By effect on structure[edit]

Five types of chromosomal mutations

The sequence of a gene can be altered in a number of ways.[44] Gene mutations have varying effects on health depending on where they occur and whether they alter the function of essential proteins. Mutations in the structure of genes can be classified into several types.

Large-scale mutations[edit]

See also: Chromosome abnormality

Large-scale mutations in chromosomal structure include:

  • Amplifications (or gene duplications) or repetition of a chromosomal segment or presence of extra piece of a chromosome broken piece of a chromosome may become attached to a homologous or non-homologous chromosome so that some of the genes are present in more than two doses leading to multiple copies of all chromosomal regions, increasing the dosage of the genes located within them.
  • Deletions of large chromosomal regions, leading to loss of the genes within those regions.
  • Mutations whose effect is to juxtapose previously separate pieces of DNA, potentially bringing together separate genes to form functionally distinct fusion genes (e.g., bcr-abl).
  • Large scale changes to the structure of chromosomes called chromosomal rearrangement that can lead to a decrease of fitness but also to speciation in isolated, inbred populations. These include:
    • Chromosomal translocations: interchange of genetic parts from nonhomologous chromosomes.
    • Chromosomal inversions: reversing the orientation of a chromosomal segment.
    • Non-homologous chromosomal crossover.
    • Interstitial deletions: an intra-chromosomal deletion that removes a segment of DNA from a single chromosome, thereby apposing previously distant genes. For example, cells isolated from a human astrocytoma, a type of brain tumor, were found to have a chromosomal deletion removing sequences between the Fused in Glioblastoma (FIG) gene and the receptor tyrosine kinase (ROS), producing a fusion protein (FIG-ROS). The abnormal FIG-ROS fusion protein has constitutively active kinase activity that causes oncogenic transformation (a transformation from normal cells to cancer cells).
  • Loss of heterozygosity: loss of one allele, either by a deletion or a genetic recombination event, in an organism that previously had two different alleles.

Small-scale mutations[edit]

Small-scale mutations affect a gene in one or a few nucleotides. (If only a single nucleotide is affected, they are called point mutations.) Small-scale mutations include:

  • Insertions add one or more extra nucleotides into the DNA. They are usually caused by transposable elements, or errors during replication of repeating elements. Insertions in the coding region of a gene may alter splicing of the mRNA (splice site mutation), or cause a shift in the reading frame (frameshift), both of which can significantly alter the gene product. Insertions can be reversed by excision of the transposable element.
  • Deletions remove one or more nucleotides from the DNA. Like insertions, these mutations can alter the reading frame of the gene. In general, they are irreversible: Though exactly the same sequence might, in theory, be restored by an insertion, transposable elements able to revert a very short deletion (say 1–2 bases) in any location either are highly unlikely to exist or do not exist at all.
  • Substitution mutations, often caused by chemicals or malfunction of DNA replication, exchange a single nucleotide for another.[45] These changes are classified as transitions or transversions.[46] Most common is the transition that exchanges a purine for a purine (A ↔ G) or a pyrimidine for a pyrimidine, (C ↔ T). A transition can be caused by nitrous acid, base mispairing, or mutagenic base analogs such as BrdU. Less common is a transversion, which exchanges a purine for a pyrimidine or a pyrimidine for a purine (C/T ↔ A/G). An example of a transversion is the conversion of adenine (A) into a cytosine (C). Point mutations are modifications of single base pairs of DNA or other small base pairs within a gene. A point mutation can be reversed by another point mutation, in which the nucleotide is changed back to its original state (true reversion) or by second-site reversion (a complementary mutation elsewhere that results in regained gene functionality). As discussed below, point mutations that occur within the protein coding region of a gene may be classified as synonymous or nonsynonymous substitutions, the latter of which in turn can be divided into missense or nonsense mutations.

By impact on protein sequence[edit]

The effect of a mutation on protein sequence depends in part on where in the genome it occurs, especially whether it is in a coding or non-coding region. Mutations in the non-coding regulatory sequences of a gene, such as promoters, enhancers, and silencers, can alter levels of gene expression, but are less likely to alter the protein sequence. Mutations within introns and in regions with no known biological function (e.g. pseudogenes, retrotransposons) are generally neutral, having no effect on phenotype – though intron mutations could alter the protein product if they affect mRNA splicing.

Mutations that occur in coding regions of the genome are more likely to alter the protein product, and can be categorized by their effect on amino acid sequence:

  • A frameshift mutation is caused by insertion or deletion of a number of nucleotides that is not evenly divisible by three from a DNA sequence. Due to the triplet nature of gene expression by codons, the insertion or deletion can disrupt the reading frame, or the grouping of the codons, resulting in a completely different translation from the original.[47] The earlier in the sequence the deletion or insertion occurs, the more altered the protein produced is. (For example, the code CCU GAC UAC CUA codes for the amino acids proline, aspartic acid, tyrosine, and leucine. If the U in CCU was deleted, the resulting sequence would be CCG ACU ACC UAx, which would instead code for proline, threonine, threonine, and part of another amino acid or perhaps a stop codon (where the x stands for the following nucleotide).) By contrast, any insertion or deletion that is evenly divisible by three is termed an in-frame mutation.
  • A point substitution mutation results in a change in a single nucleotide and can be either synonymous or nonsynonymous.
    • A synonymous substitution replaces a codon with another codon that codes for the same amino acid, so that the produced amino acid sequence is not modified. Synonymous mutations occur due to the degenerate nature of the genetic code. If this mutation does not result in any phenotypic effects, then it is called silent, but not all synonymous substitutions are silent. (There can also be silent mutations in nucleotides outside of the coding regions, such as the introns, because the exact nucleotide sequence is not as crucial as it is in the coding regions, but these are not considered synonymous substitutions.)
    • A nonsynonymous substitution replaces a codon with another codon that codes for a different amino acid, so that the produced amino acid sequence is modified. Nonsynonymous substitutions can be classified as nonsense or missense mutations:
      • A missense mutation changes a nucleotide to cause substitution of a different amino acid. This in turn can render the resulting protein nonfunctional. Such mutations are responsible for diseases such as Epidermolysis bullosa, sickle-cell disease, and SOD1-mediated ALS.[48] On the other hand, if a missense mutation occurs in an amino acid codon that results in the use of a different, but chemically similar, amino acid, then sometimes little or no change is rendered in the protein. For example, a change from AAA to AGA will encode arginine, a chemically similar molecule to the intended lysine. In this latter case the mutation will have little or no effect on phenotype and therefore be neutral.
      • A nonsense mutation is a point mutation in a sequence of DNA that results in a premature stop codon, or a nonsense codon in the transcribed mRNA, and possibly a truncated, and often nonfunctional protein product. This sort of mutation has been linked to different diseases, such as congenital adrenal hyperplasia. (See Stop codon.)

By effect on function[edit]

  • Loss-of-function mutations, also called inactivating mutations, result in the gene product having less or no function (being partially or wholly inactivated). When the allele has a complete loss of function (null allele), it is often called an amorph or amorphic mutation in the Muller's morphs schema. Phenotypes associated with such mutations are most often recessive. Exceptions are when the organism is haploid, or when the reduced dosage of a normal gene product is not enough for a normal phenotype (this is called haploinsufficiency).
  • Gain-of-function mutations, also called activating mutations, change the gene product such that its effect gets stronger (enhanced activation) or even is superseded by a different and abnormal function. When the new allele is created, a heterozygote containing the newly created allele as well as the original will express the new allele; genetically this defines the mutations as dominant phenotypes. Several of Muller's morphs correspond to gain of function, including hypermorph (increased gene expression) and neomorph (novel function). In December , the U.S. government lifted a temporary ban implemented in that banned federal funding for any new "gain-of-function" experiments that enhance pathogens "such as Avian influenza, SARS and the Middle East Respiratory Syndrome or MERS viruses."[49][50]
  • Dominant negative mutations (also called antimorphic mutations) have an altered gene product that acts antagonistically to the wild-type allele. These mutations usually result in an altered molecular function (often inactive) and are characterized by a dominant or semi-dominant phenotype. In humans, dominant negative mutations have been implicated in cancer (e.g., mutations in genes p53,[51]ATM,[52]CEBPA[53] and PPARgamma[54]). Marfan syndrome is caused by mutations in the FBN1 gene, located on chromosome 15, which encodes fibrillin-1, a glycoprotein component of the extracellular matrix.[55] Marfan syndrome is also an example of dominant negative mutation and haploinsufficiency.[56][57]
  • Hypomorphs, after Mullerian classification, are characterized by altered gene products that acts with decreased gene expression compared to the wild type allele. Usually, hypomorphic mutations are recessive, but haploinsufficiency causes some alleles to be dominant.
  • Neomorphs are characterized by the control of new protein product synthesis.
  • Lethal mutations are mutations that lead to the death of the organisms that carry the mutations.
  • A back mutation or reversion is a point mutation that restores the original sequence and hence the original phenotype.[58]

By effect on fitness (harmful, beneficial, neutral mutations)[edit]

See also: Fitness (biology)

In genetics, it is sometimes useful to classify mutations as either harmful or beneficial (or neutral):

  • A harmful, or deleterious, mutation decreases the fitness of the organism. Many, but not all mutations in essential genes are harmful (if a mutation does not change the amino acid sequence in an essential protein, it is harmless in most cases).
  • A beneficial, or advantageous mutation increases the fitness of the organism. Examples are mutations that lead to antibiotic resistance in bacteria (which are beneficial for bacteria but usually not for humans).
  • A neutral mutation has no harmful or beneficial effect on the organism. Such mutations occur at a steady rate, forming the basis for the molecular clock. In the neutral theory of molecular evolution, neutral mutations provide genetic drift as the basis for most variation at the molecular level. In animals or plants, most mutations are neutral, given that the vast majority of their genomes is either non-coding or consists of repetitive sequences that have no obvious function ("junk DNA").[59]

Large-scale quantitative mutagenesis screens, in which thousands of millions of mutations are tested, invariably find that a larger fraction of mutations has harmful effects but always returns a number of beneficial mutations as well. For instance, in a screen of all gene deletions in E. coli, 80% of mutations were negative, but 20% were positive, even though many had a very small effect on growth (depending on condition).[60] Note that gene deletions involve removal of whole genes, so that point mutations almost always have a much smaller effect. In a similar screen in Streptococcus pneumoniae, but this time with transposon insertions, 76% of insertion mutants were classified as neutral, 16% had a significantly reduced fitness, but 6% were advantageous.[61]

This classification is obviously relative and somewhat artificial: a harmful mutation can quickly turn into a beneficial mutations when conditions change. For example, the mutations that led to lighter skin in caucasians, are beneficial in regions that are less exposed to sunshine but harmful in regions near the equator. Also, there is a gradient from harmful/beneficial to neutral, as many mutations may have small and mostly neglectable effects but under certain conditions will become relevant. Also, many traits are determined by hundreds of genes (or loci), so that each locus has only a minor effect. For instance, human height is determined by hundreds of genetic variants ("mutations") but each of them has a very minor effect on height,[62] apart from the impact of nutrition. Height (or size) itself may be more or less beneficial as the huge range of sizes in animal or plant groups shows.

Distribution of fitness effects (DFE)[edit]

Attempts have been made to infer the distribution of fitness effects (DFE) using mutagenesis experiments and theoretical models applied to molecular sequence data. DFE, as used to determine the relative abundance of different types of mutations (i.e., strongly deleterious, nearly neutral or advantageous), is relevant to many evolutionary questions, such as the maintenance of genetic variation,[63] the rate of genomic decay,[64] the maintenance of outcrossingsexual reproduction as opposed to inbreeding[65] and the evolution of sex and genetic recombination.[66] DFE can also be tracked by tracking the skewness of the distribution of mutations with putatively severe effects as compared to the distribution of mutations with putatively mild or absent effect.[67] In summary, the DFE plays an important role in predicting evolutionary dynamics.[68][69] A variety of approaches have been used to study the DFE, including theoretical, experimental and analytical methods.

  • Mutagenesis experiment: The direct method to investigate the DFE is to induce mutations and then measure the mutational fitness effects, which has already been done in viruses, bacteria, yeast, and Drosophila. For example, most studies of the DFE in viruses used site-directed mutagenesis to create point mutations and measure relative fitness of each mutant.[70][71][72][73] In Escherichia coli, one study used transposon mutagenesis to directly measure the fitness of a random insertion of a derivative of Tn[74] In yeast, a combined mutagenesis and deep sequencing approach has been developed to generate high-quality systematic mutant libraries and measure fitness in high throughput.[75] However, given that many mutations have effects too small to be detected[76] and that mutagenesis experiments can detect only mutations of moderately large effect; DNA sequence data analysis can provide valuable information about these mutations.
The distribution of fitness effects (DFE) of mutations in vesicular stomatitis virus. In this experiment, random mutations were introduced into the virus by site-directed mutagenesis, and the fitnessof each mutant was compared with the ancestral type. A fitness of zero, less than one, one, more than one, respectively, indicates that mutations are lethal, deleterious, neutral, and advantageous.[70]
  • Molecular sequence analysis: With rapid development of DNA sequencing technology, an enormous amount of DNA sequence data is available and even more is forthcoming in the future. Various methods have been developed to infer the DFE from DNA sequence data.[77][78][79][80] By examining DNA sequence differences within and between species, we are able to infer various characteristics of the DFE for neutral, deleterious and advantageous mutations.[24] To be specific, the DNA sequence analysis approach allows us to estimate the effects of mutations with very small effects, which are hardly detectable through mutagenesis experiments.

One of the earliest theoretical studies of the distribution of fitness effects was done by Motoo Kimura, an influential theoretical population geneticist. His neutral theory of molecular evolution proposes that most novel mutations will be highly deleterious, with a small fraction being neutral.[81][25] Hiroshi Akashi more recently proposed a bimodal model for the DFE, with modes centered around highly deleterious and neutral mutations.[82] Both theories agree that the vast majority of novel mutations are neutral or deleterious and that advantageous mutations are rare, which has been supported by experimental results. One example is a study done on the DFE of random mutations in vesicular stomatitis virus.[70] Out of all mutations, % were lethal, % were non-lethal deleterious, and % were neutral. Another example comes from a high throughput mutagenesis experiment with yeast.[75] In this experiment it was shown that the overall DFE is bimodal, with a cluster of neutral mutations, and a broad distribution of deleterious mutations.

Though relatively few mutations are advantageous, those that are play an important role in evolutionary changes.[83] Like neutral mutations, weakly selected advantageous mutations can be lost due to random genetic drift, but strongly selected advantageous mutations are more likely to be fixed. Knowing the DFE of advantageous mutations may lead to increased ability to predict the evolutionary dynamics. Theoretical work on the DFE for advantageous mutations has been done by John H. Gillespie[84] and H. Allen Orr.[85] They proposed that the distribution for advantageous mutations should be exponential under a wide range of conditions, which, in general, has been supported by experimental studies, at least for strongly selected advantageous mutations.[86][87][88]

In general, it is accepted that the majority of mutations are neutral or deleterious, with advantageous mutations being rare; however, the proportion of types of mutations varies between species. This indicates two important points: first, the proportion of effectively neutral mutations is likely to vary between species, resulting from dependence on effective population size; second, the average effect of deleterious mutations varies dramatically between species.[24] In addition, the DFE also differs between coding regions and noncoding regions, with the DFE of noncoding DNA containing more weakly selected mutations.[24]

By inheritance[edit]

A mutation has caused this moss roseplant to produce flowers of different colors. This is a somaticmutation that may also be passed on in the germline.

In multicellular organisms with dedicated reproductive cells, mutations can be subdivided into germline mutations, which can be passed on to descendants through their reproductive cells, and somatic mutations (also called acquired mutations),[89] which involve cells outside the dedicated reproductive group and which are not usually transmitted to descendants.

Diploid organisms (e.g., humans) contain two copies of each gene—a paternal and a maternal allele. Based on the occurrence of mutation on each chromosome, we may classify mutations into three types. A wild type or homozygous non-mutated organism is one in which neither allele is mutated.

  • A heterozygous mutation is a mutation of only one allele.
  • A homozygous mutation is an identical mutation of both the paternal and maternal alleles.
  • Compound heterozygous mutations or a genetic compound consists of two different mutations in the paternal and maternal alleles.[90]

Germline mutation[edit]

Further information: Germline mutation

A germline mutation in the reproductive cells of an individual gives rise to a constitutional mutation in the offspring, that is, a mutation that is present in every cell. A constitutional mutation can also occur very soon after fertilisation, or continue from a previous constitutional mutation in a parent.[91] A germline mutation can be passed down through subsequent generations of organisms.

The distinction between germline and somatic mutations is important in animals that have a dedicated germline to produce reproductive cells. However, it is of little value in understanding the effects of mutations in plants, which lack a dedicated germline. The distinction is also blurred in those animals that reproduce asexually through mechanisms such as budding, because the cells that give rise to the daughter organisms also give rise to that organism's germline.

A new germline mutation not inherited from either parent is called a de novo mutation.

Somatic mutation[edit]

Main article: Somatic mutation

See also: Carcinogenesis and Loss of heterozygosity

A change in the genetic structure that is not inherited from a parent, and also not passed to offspring, is called a somatic mutation.[89] Somatic mutations are not inherited by an organism's offspring because they do not affect the germline. However, they are passed down to all the progeny of a mutated cell within the same organism during mitosis. A major section of an organism therefore might carry the same mutation. These types of mutations are usually prompted by environmental causes, such as ultraviolet radiation or any exposure to certain harmful chemicals, and can cause diseases including cancer.[92]

With plants, some somatic mutations can be propagated without the need for seed production, for example, by grafting and stem cuttings. These type of mutation have led to new types of fruits, such as the "Delicious" apple and the "Washington" navel orange.[93]

Human and mouse somatic cells have a mutation rate more than ten times higher than the germline mutation rate for both species; mice have a higher rate of both somatic and germline mutations per cell division than humans. The disparity in mutation rate between the germline and somatic tissues likely reflects the greater importance of genome maintenance in the germline than in the soma.[94]

Special classes[edit]

  • Conditional mutation is a mutation that has wild-type (or less severe) phenotype under certain "permissive" environmental conditions and a mutant phenotype under certain "restrictive" conditions. For example, a temperature-sensitive mutation can cause cell death at high temperature (restrictive condition), but might have no deleterious consequences at a lower temperature (permissive condition).[95] These mutations are non-autonomous, as their manifestation depends upon presence of certain conditions, as opposed to other mutations which appear autonomously.[96] The permissive conditions may be temperature,[97] certain chemicals,[98] light[98] or mutations in other parts of the genome.[96]Invivo mechanisms like transcriptional switches can create conditional mutations. For instance, association of Steroid Binding Domain can create a transcriptional switch that can change the expression of a gene based on the presence of a steroid ligand.[99] Conditional mutations have applications in research as they allow control over gene expression. This is especially useful studying diseases in adults by allowing expression after a certain period of growth, thus eliminating the deleterious effect of gene expression seen during stages of development in model organisms.[98] DNA Recombinase systems like Cre-Lox recombination used in association with promoters that are activated under certain conditions can generate conditional mutations. Dual Recombinase technology can be used to induce multiple conditional mutations to study the diseases which manifest as a result of simultaneous mutations in multiple genes.[98] Certain inteins have been identified which splice only at certain permissive temperatures, leading to improper protein synthesis and thus, loss-of-function mutations at other temperatures.[] Conditional mutations may also be used in genetic studies associated with ageing, as the expression can be changed after a certain time period in the organism's lifespan.[97]
  • Replication timing quantitative trait loci affects DNA replication.


In order to categorize a mutation as such, the "normal" sequence must be obtained from the DNA of a "normal" or "healthy" organism (as opposed to a "mutant" or "sick" one), it should be identified and reported; ideally, it should be made publicly available for a straightforward nucleotide-by-nucleotide comparison, and agreed upon by the scientific community or by a group of expert geneticists and biologists, who have the responsibility of establishing the standard or so-called "consensus" sequence. This step requires a tremendous scientific effort. Once the consensus sequence is known, the mutations in a genome can be pinpointed, described, and classified. The committee of the Human Genome Variation Society (HGVS) has developed the standard human sequence variant nomenclature,[] which should be used by researchers and DNA diagnostic centers to generate unambiguous mutation descriptions. In principle, this nomenclature can also be used to describe mutations in other organisms. The nomenclature specifies the type of mutation and base or amino acid changes.

  • Nucleotide substitution (e.g., 76A>T) – The number is the position of the nucleotide from the 5' end; the first letter represents the wild-type nucleotide, and the second letter represents the nucleotide that replaced the wild type. In the given example, the adenine at the 76th position was replaced by a thymine.
    • If it becomes necessary to differentiate between mutations in genomic DNA, mitochondrial DNA, and RNA, a simple convention is used. For example, if the th base of a nucleotide sequence mutated from G to C, then it would be written as gG>C if the mutation occurred in genomic DNA, mG>C if the mutation occurred in mitochondrial DNA, or rg>c if the mutation occurred in RNA. Note that, for mutations in RNA, the nucleotide code is written in lower case.
  • Amino acid substitution (e.g., DE) – The first letter is the one letter code of the wild-type amino acid, the number is the position of the amino acid from the N-terminus, and the second letter is the one letter code of the amino acid present in the mutation. Nonsense mutations are represented with an X for the second amino acid (e.g. DX).
  • Amino acid deletion (e.g., ΔF) – The Greek letter Δ (delta) indicates a deletion. The letter refers to the amino acid present in the wild type and the number is the position from the N terminus of the amino acid were it to be present as in the wild type.

Mutation rates[edit]

Further information: Mutation rate

Mutation rates vary substantially across species, and the evolutionary forces that generally determine mutation are the subject of ongoing investigation.

In humans, the mutation rate is about de novo mutations per genome per generation, that is, each human accumulates about novel mutations that were not present in his or her parents. This number has been established by sequencing thousands of human trios, that is, two parents and at least one child.[]

The genomes of RNA viruses are based on RNA rather than DNA. The RNA viral genome can be double-stranded (as in DNA) or single-stranded. In some of these viruses (such as the single-stranded human immunodeficiency virus), replication occurs quickly, and there are no mechanisms to check the genome for accuracy. This error-prone process often results in mutations.

Disease causation[edit]

Changes in DNA caused by mutation in a coding region of DNA can cause errors in protein sequence that may result in partially or completely non-functional proteins. Each cell, in order to function correctly, depends on thousands of proteins to function in the right places at the right times. When a mutation alters a protein that plays a critical role in the body, a medical condition can result. One study on the comparison of genes between different species of Drosophila suggests that if a mutation does change a protein, the mutation will most likely be harmful, with an estimated 70 percent of amino acid polymorphisms having damaging effects, and the remainder being either neutral or weakly beneficial.[8] Some mutations alter a gene's DNA base sequence but do not change the protein made by the gene. Studies have shown that only 7% of point mutations in noncoding DNA of yeast are deleterious and 12% in coding DNA are deleterious. The rest of the mutations are either neutral or slightly beneficial.[]

Inherited disorders[edit]

See also: Genetic disorder

If a mutation is present in a germ cell, it can give rise to offspring that carries the mutation in all of its cells. This is the case in hereditary diseases. In particular, if there is a mutation in a DNA repair gene within a germ cell, humans carrying such germline mutations may have an increased risk of cancer. A list of 34 such germline mutations is given in the article DNA repair-deficiency disorder. An example of one is albinism, a mutation that occurs in the OCA1 or OCA2 gene. Individuals with this disorder are more prone to many types of cancers, other disorders and have impaired vision.

DNA damage can cause an error when the DNA is replicated, and this error of replication can cause a gene mutation that, in turn, could cause a genetic disorder. DNA damages are repaired by the DNA repair system of the cell. Each cell has a number of pathways through which enzymes recognize and repair damages in DNA. Because DNA can be damaged in many ways, the process of DNA repair is an important way in which the body protects itself from disease. Once DNA damage has given rise to a mutation, the mutation cannot be repaired.

Role in carcinogenesis[edit]

See also: Carcinogenesis

On the other hand, a mutation may occur in a somatic cell of an organism. Such mutations will be present in all descendants of this cell within the same organism. The accumulation of certain mutations over generations of somatic cells is part of cause of malignant transformation, from normal cell to cancer cell.[]

Cells with heterozygous loss-of-function mutations (one good copy of gene and one mutated copy) may function normally with the unmutated copy until the good copy has been spontaneously somatically mutated. This kind of mutation happens often in living organisms, but it is difficult to measure the rate. Measuring this rate is important in predicting the rate at which people may develop cancer.[]

Point mutations may arise from spontaneous mutations that occur during DNA replication. The rate of mutation may be increased by mutagens. Mutagens can be physical, such as radiation from UV rays, X-rays or extreme heat, or chemical (molecules that misplace base pairs or disrupt the helical shape of DNA). Mutagens associated with cancers are often studied to learn about cancer and its prevention.

Prion mutations[edit]

Prions are proteins and do not contain genetic material. However, prion replication has been shown to be subject to mutation and natural selection just like other forms of replication.[] The human gene PRNP codes for the major prion protein, PrP, and is subject to mutations that can give rise to disease-causing prions.

Beneficial mutations[edit]

Although mutations that cause changes in protein sequences can be harmful to an organism, on occasions the effect may be positive in a given environment. In this case, the mutation may enable the mutant organism to withstand particular environmental stresses better than wild-type organisms, or reproduce more quickly. In these cases a mutation will tend to become more common in a population through natural selection. Examples include the following:

HIV resistance: a specific 32 base pair deletion in human CCR5 (CCR5-Δ32) confers HIV resistance to homozygotes and delays AIDS onset in heterozygotes.[] One possible explanation of the etiology of the relatively high frequency of CCR5-Δ32 in the European population is that it conferred resistance to the bubonic plague in midth century Europe. People with this mutation were more likely to survive infection; thus its frequency in the population increased.[] This theory could explain why this mutation is not found in Southern Africa, which remained untouched by bubonic plague. A newer theory suggests that the selective pressure on the CCR5 Delta 32 mutation was caused by smallpox instead of the bubonic plague.[]

Malaria resistance: An example of a harmful mutation is sickle-cell disease, a blood disorder in which the body produces an abnormal type of the oxygen-carrying substance hemoglobin in the red blood cells. One-third of all indigenous inhabitants of Sub-Saharan Africa carry the allele, because, in areas where malaria is common, there is a survival value in carrying only a single sickle-cell allele (sickle cell trait).[] Those with only one of the two alleles of the sickle-cell disease are more resistant to malaria, since the infestation of the malaria Plasmodium is halted by the sickling of the cells that it infests.

Antibiotic resistance: Practically all bacteria develop antibiotic resistance when exposed to antibiotics. In fact, bacterial populations already have such mutations that get selected under antibiotic selection.[] Obviously, such mutations are only beneficial for the bacteria but not for those infected.

Lactase persistence. A mutation allowed humans to express the enzyme lactase after they are naturally weaned from breast milk, allowing adults to digest lactose, which is likely one of the most beneficial mutations in recent human evolution.[]


Main article: Mutationism

Mutationism is one of several alternatives to Darwinian evolution that have existed both before and after the publication of Charles Darwin's book, On the Origin of Species. In the theory, mutation was the source of novelty, creating new forms and new species, potentially instantaneously,[] in a sudden jump.[] This was envisaged as driving evolution, which was limited by the supply of mutations.

Before Darwin, biologists commonly believed in saltationism, the possibility of large evolutionary jumps, including immediate speciation. For example, in Étienne Geoffroy Saint-Hilaire argued that species could be formed by sudden transformations, or what would later be called macromutation.[] Darwin opposed saltation, insisting on gradualism in evolution as in geology. In , Albert von Kölliker revived Geoffroy's theory.[] In the geneticistHugo de Vries gave the name "mutation" to seemingly new forms that suddenly arose in his experiments on the evening primrose Oenothera lamarckiana, and in the first decade of the 20th century, mutationism, or as de Vries named it mutationstheorie,[][] became a rival to Darwinism supported for a while by geneticists including William Bateson,[]Thomas Hunt Morgan, and Reginald Punnett.[][]

Understanding of mutationism is clouded by the midth century portrayal of the early mutationists by supporters of the modern synthesis as opponents of Darwinian evolution and rivals of the biometrics school who argued that selection operated on continuous variation. In this portrayal, mutationism was defeated by a synthesis of genetics and natural selection that supposedly started later, around , with work by the mathematician Ronald Fisher.[][][][] However, the alignment of Mendelian genetics and natural selection began as early as with a paper by Udny Yule,[] and built up with theoretical and experimental work in Europe and America. Despite the controversy, the early mutationists had by already accepted natural selection and explained continuous variation as the result of multiple genes acting on the same characteristic, such as height.[][]

Mutationism, along with other alternatives to Darwinism like Lamarckism and orthogenesis, was discarded by most biologists as they came to see that Mendelian genetics and natural selection could readily work together; mutation took its place as a source of the genetic variation essential for natural selection to work on. However, mutationism did not entirely vanish. In , Richard Goldschmidt again argued for single-step speciation by macromutation, describing the organisms thus produced as "hopeful monsters", earning widespread ridicule.[][] In , Masatoshi Nei argued controversially that evolution was often mutation-limited.[] Modern biologists such as Douglas J. Futuyma conclude that essentially all claims of evolution driven by large mutations can be explained by Darwinian evolution.[]

See also[edit]


  1. ^"mutation | Learn Science at Scitable". Nature. Nature Education. Retrieved 24 September
  2. ^Sharma S, Javadekar SM, Pandey M, Srivastava M, Kumari R, Raghavan SC (March ). "Homology and enzymatic requirements of microhomology-dependent alternative end joining". Cell Death & Disease. 6 (3): e doi/cddis PMC&#; PMID&#;
  3. ^Chen J, Miller BF, Furano AV (April ). "Repair of naturally occurring mismatches can induce mutations in flanking DNA". eLife. 3: e doi/elife PMC&#; PMID&#;
  4. ^Rodgers K, McVey M (January ). "Error-Prone Repair of DNA Double-Strand Breaks". Journal of Cellular Physiology. (1): 15– doi/jcp PMC&#; PMID&#;
  5. ^ abBertram JS (December ). "The molecular biology of cancer". Molecular Aspects of Medicine. 21 (6): – doi/S(00) PMID&#;
  6. ^ abAminetzach YT, Macpherson JM, Petrov DA (July ). "Pesticide resistance via transposition-mediated adaptive gene truncation in Drosophila". Science. (): –7. BibcodeSciA. doi/science PMID&#; S2CID&#;
  7. ^Burrus V, Waldor MK (June ). "Shaping bacterial genomes with integrative and conjugative elements". Research in Microbiology. (5): – doi/j.resmic PMID&#;
  8. ^ abSawyer SA, Parsch J, Zhang Z, Hartl DL (April ). "Prevalence of positive selection among nearly neutral amino acid replacements in Drosophila". Proceedings of the National Academy of Sciences of the United States of America. (16): – BibcodePNASS. doi/pnas PMC&#; PMID&#;
  9. ^Hastings PJ, Lupski JR, Rosenberg SM, Ira G (August ). "Mechanisms of change in gene copy number". Nature Reviews. Genetics. 10 (8): – doi/nrg PMC&#; PMID&#;
  10. ^Carroll SB, Grenier JK, Weatherbee SD (). From DNA to Diversity: Molecular Genetics and the Evolution of Animal Design (2nd&#;ed.). Malden, MA: Blackwell Publishing. ISBN&#;. LCCN&#; OCLC&#;
  11. ^Harrison PM, Gerstein M (May ). "Studying genomes through the aeons: protein families, pseudogenes and proteome evolution". Journal of Molecular Biology. (5): – doi/S(02) PMID&#;
  12. ^Orengo CA, Thornton JM (July ). "Protein families and their evolution-a structural perspective". Annual Review of Biochemistry. 74: – doi/annurev.biochem PMID&#;
  13. ^Long M, Betrán E, Thornton K, Wang W (November ). "The origin of new genes: glimpses from the young and old". Nature Reviews. Genetics. 4 (11): – doi/nrg PMID&#; S2CID&#;
  14. ^Wang M, Caetano-Anollés G (January ). "The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world". Structure. 17 (1): 66– doi/j.str PMID&#;
  15. ^Bowmaker JK (May ). "Evolution of colour vision in vertebrates". Eye. 12 (Pt 3b): –7. doi/eye PMID&#; S2CID&#;
  16. ^Gregory TR, Hebert PD (April ). "The modulation of DNA content: proximate causes and ultimate consequences". Genome Research. 9 (4): – doi/gr (inactive 31 May ). PMID&#;CS1 maint: DOI inactive as of May (link)
  17. ^Hurles M (July ). "Gene duplication: the genomic trade in spare parts". PLOS Biology. 2 (7): E doi/journal.pbio PMC&#; PMID&#;
  18. ^Liu N, Okamura K, Tyler DM, Phillips MD, Chung WJ, Lai EC (October ). "The evolution and functional diversification of animal microRNA genes". Cell Research. 18 (10): – doi/cr PMC&#; PMID&#;
  19. ^Siepel A (October ). "Darwinian alchemy: Human genes from noncoding DNA". Genome Research. 19 (10): –5. doi/gr PMC&#; PMID&#;
  20. ^Zhang J, Wang X, Podlaha O (May ). "Testing the chromosomal speciation hypothesis for humans and chimpanzees". Genome Research. 14 (5): – doi/gr PMC&#; PMID&#;
  21. ^Ayala FJ, Coluzzi M (May ). "Chromosome speciation: humans, Drosophila, and mosquitoes". Proceedings of the National Academy of Sciences of the United States of America. Suppl 1 (Suppl 1): – BibcodePNASA. doi/pnas PMC&#; PMID&#;
  22. ^Hurst GD, Werren JH (August ). "The role of selfish genetic elements in eukaryotic evolution". Nature Reviews Genetics. 2 (8): – doi/ PMID&#; S2CID&#;
  23. ^Häsler J, Strub K (November ). "Alu elements as regulators of gene expression". Nucleic Acids Research. 34 (19): –7. doi/nar/gkl PMC&#; PMID&#;
  24. ^ abcdEyre-Walker A, Keightley PD (August ). "The distribution of fitness effects of new mutations"(PDF). Nature Reviews Genetics. 8 (8): –8. doi/nrg PMID&#; S2CID&#; Archived from the original(PDF) on 4 March Retrieved 6 September
  25. ^ abKimura M (). The Neutral Theory of Molecular Evolution. Cambridge, UK; New York: Cambridge University Press. ISBN&#;. LCCN&#; OCLC&#;
  26. ^Bohidar HB (January ). Fundamentals of Polymer Physics and Molecular Biophysics. Cambridge University Press. ISBN&#;.
  27. ^Dover GA, Darwin C (). Dear Mr. Darwin: Letters on the Evolution of Life and Human Nature. University of California Press. ISBN&#;.
  28. ^Tibayrenc M (12 January ). Genetics and Evolution of Infectious Diseases. Elsevier. ISBN&#;.
  29. ^"Cancer Is Partly Caused By Bad Luck, Study Finds". Archived from the original on 13 July
  30. ^Jha A (22 August ). "Older fathers pass on more genetic mutations, study shows". The Guardian.
  31. ^Ames BN, Shigenaga MK, Hagen TM (September ). "Oxidants, antioxidants, and the degenerative diseases of aging". Proceedings of the National Academy of Sciences of the United States of America. 90 (17): – BibcodePNASA. doi/pnas PMC&#; PMID&#;
  32. ^Montelone BA (). "Mutation, Mutagens, and DNA Repair". Archived from the original on 26 September Retrieved 2 October
  33. ^Slocombe L, Al-Khalili JS, Sacchi M (February ). "Quantum and classical effects in DNA point mutations: Watson-Crick tautomerism in AT and GC base pairs". Physical Chemistry Chemical Physics. 23 (7): – BibcodePCCPS. doi/D0CPA. PMID&#; S2CID&#;
  34. ^Stuart GR, Oda Y, de Boer JG, Glickman BW (March ). "Mutation frequency and specificity with age in liver, bladder and brain of lacI transgenic mice". Genetics. (3): – doi/genetics/ PMC&#; PMID&#;
  35. ^Kunz BA, Ramachandran K, Vonarx EJ (April ). "DNA sequence analysis of spontaneous mutagenesis in Saccharomyces cerevisiae". Genetics. (4): – doi/genetics/ PMC&#; PMID&#;
  36. ^Lieber MR (July ). "The mechanism of double-strand DNA break repair by the nonhomologous DNA end-joining pathway". Annual Review of Biochemistry. 79: – doi/annurev.biochem PMC&#; PMID&#;
  37. ^Created from PDB 1JDGArchived 31 December at the Wayback Machine
  38. ^Pfohl-Leszkowicz A, Manderville RA (January ). "Ochratoxin A: An overview on toxicity and carcinogenicity in animals and humans". Molecular Nutrition & Food Research. 51 (1): 61– doi/mnfr PMID&#;
  39. ^Kozmin S, Slezak G, Reynaud-Angelin A, Elie C, de Rycke Y, Boiteux S, Sage E (September ). "UVA radiation is highly mutagenic in cells that are unable to repair 7,8-dihydrooxoguanine in Saccharomyces cerevisiae". Proceedings of the National Academy of Sciences of the United States of America. (38): – BibcodePNASK. doi/pnas PMC&#; PMID&#;
  40. ^ abFitzgerald DM, Rosenberg SM (April ). "What is mutation? A chapter in the series: How microbes "jeopardize" the modern synthesis". PLOS Genetics. 15 (4): e doi/journal.pgen PMC&#; PMID&#;
  41. ^Galhardo RS, Hastings PJ, Rosenberg SM (1 January ). "Mutation as a stress response and the regulation of evolvability". Critical Reviews in Biochemistry and Molecular Biology. 42 (5): – doi/ PMC&#; PMID&#;

Are beneficial how mutations

Beneficial Mutation&#x;Selection Balance and the Effect of Linkage on Positive Selection

Michael M. Desai &#x;,1 and Daniel S. Fisher &#x;,2

Michael M. Desai

Department of Physics, &#x;Department of Molecular and Cell Biology and &#x;Division of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts

Find articles by Michael M. Desai

Daniel S. Fisher

Department of Physics, &#x;Department of Molecular and Cell Biology and &#x;Division of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts

Find articles by Daniel S. Fisher

Author informationArticle notesCopyright and License informationDisclaimer

Department of Physics, &#x;Department of Molecular and Cell Biology and &#x;Division of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts

1Corresponding author: Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ E-mail: [email protected]

2Present address: Department of Applied Physics, Stanford University, Stanford, CA

Communicating editor: M. W. Feldman

Received Nov 1; Accepted Apr

Copyright © by the Genetics Society of America

This article has been cited by other articles in PMC.


When beneficial mutations are rare, they accumulate by a series of selective sweeps. But when they are common, many beneficial mutations will occur before any can fix, so there will be many different mutant lineages in the population concurrently. In an asexual population, these different mutant lineages interfere and not all can fix simultaneously. In addition, further beneficial mutations can accumulate in mutant lineages while these are still a minority of the population. In this article, we analyze the dynamics of such multiple mutations and the interplay between multiple mutations and interference between clones. These result in substantial variation in fitness accumulating within a single asexual population. The amount of variation is determined by a balance between selection, which destroys variation, and beneficial mutations, which create more. The behavior depends in a subtle way on the population parameters: the population size, the beneficial mutation rate, and the distribution of the fitness increments of the potential beneficial mutations. The mutation&#x;selection balance leads to a continually evolving population with a steady-state fitness variation. This variation increases logarithmically with both population size and mutation rate and sets the rate at which the population accumulates beneficial mutations, which thus also grows only logarithmically with population size and mutation rate. These results imply that mutator phenotypes are less effective in larger asexual populations. They also have consequences for the advantages (or disadvantages) of sex via the Fisher&#x;Muller effect; these are discussed briefly.

THE vast majority of mutations are neutral or deleterious. Extensive study of such mutations has explained the genetic diversity in many populations and has been useful for inferring population parameters and histories from data. Yet beneficial mutations, despite their rarity, are what cause long-term adaptation and can also dramatically alter the genetic diversity at linked sites. Unfortunately, our understanding of their dynamics remains poor by comparison.

When beneficial mutations are rare and selection is strong, positive selection results in a succession of selective sweeps. A mutation occurs, spreads through the population due to selection, and soon fixes. Some time later, another such event may occur. This situation is sometimes called the strong-selection weak-mutation regime. To make its character clear, we refer to it as the successional-mutations regime: between sweeps, there is a single ruling population. In this regime, the effect of positive selection on patterns of genetic variation is reasonably well understood. A selective sweep reduces the genetic variation in regions of the genome linked, over the timescale of the sweep, to the site at which a beneficial mutation occurs: other mutations in these regions hitchhike to fixation.

Successional-mutations behavior typically occurs in small- to moderate-sized populations in which beneficial mutations are sufficiently rare. However, a different regime occurs in larger populations, in which beneficial mutations occur frequently. When beneficial mutations are common enough that many mutant lineages can be simultaneously present in the population, selective sweeps will overlap and interfere with one another (i.e., different beneficial mutations will grow in the population concurrently). If, in addition, selection is strong enough that it is not dominated by random drift (except while mutants are very rare), we have a strong-selection strong-mutation regime. For clarity, we refer to this as the concurrent-mutations regime. The effects of concurrent mutations in asexual populations are the focus of this article. As we will see, the concurrent-beneficial-mutations regime is not an unusual special case: many viral, bacterial, and simple eukaryotic populations likely experience evolution via multiple concurrent mutations.

In populations that contain many different beneficial mutants, there will be substantial variation in fitness within the population. This variation will be acted on by selection. But in the absence of new mutations, the variation will soon disappear. Thus the traditional approach to evolution of quantitative traits&#x;to assume that genetic variation always exists (as for traits not subject to selection)&#x;fails badly. New mutations are crucially needed to maintain the variation on which further selection can act. Thus to understand adaptation when multiple mutations are involved, it is essential to analyze the interplay between selection and new beneficial mutations, especially how the latter maintains the variation acted on by the former. Understanding this beneficial mutation&#x;selection balance and the resulting dynamics is the primary goal of this article.

Both the successional- and the concurrent-mutations regimes require that selection dominates drift except while mutants are very rare. A qualitatively different regime occurs with weakly beneficial mutations: these do not sweep in the traditional sense because drift dominates their dynamics. This weakly beneficial regime most readily occurs in small populations, where selective forces cannot overcome drift, or when considering mutations of very small effect, such as those that affect synonymous codon usage (Li ; Comeronet al. ; Przeworskiet al. ; McVean and Charlesworth ). In this article we are interested in beneficial mutations in moderate to large populations, so we focus exclusively on the strong-selection regimes for which drift is important for beneficial mutant lineages only while they are a tiny minority of the population.

The essential difference between the successional-mutations and concurrent-mutations regimes is presented in Figure 1, which depicts beneficial mutations in an asexual population. In a small enough population, or one whose beneficial mutation rate (Ub) is low, beneficial mutations occur rarely enough that they are well separated in time and one can sweep before another arises (Figure 1a). This is the successional-mutations regime, in which the beneficial mutations all behave independently. However, in a larger population or at higher Ub, multiple mutant populations exist concurrently and they are no longer independent (Figure 1b). Mutations that occur in different lineages cannot both fix in the absence of recombination: at least one of them must be wasted.

An external file that holds a picture, illustration, etc.
Object name is GENf1.jpg

Open in a separate window

Figure 1.&#x;

For beneficial mutations to be acquired by a population, they must both arise and fix. (a) A small asexual population in the successional-mutations (or strong-selection weak-mutation) regime. Mutation A arises early on. Provided it survives drift, it fixes quickly, before another beneficial mutation occurs. Some time later, a second mutation B occurs and fixes. Evolution continues by this sequential fixation process. (b) A larger population in the concurrent-mutations (strong-selection strong-mutation) regime. A mutation A occurs, but before it can fix another mutation B occurs and the two interfere. Here a second mutation, C, occurs in an individual that already has the first mutation A and these two begin fixing together, driving the single mutants to extinction. These dynamics continue with further mutations, such as E and F, occurring in the already-double-mutant population. The key process is how quickly mutations arise in individuals that already have other mutations. This picture has elements of both clonal interference and multiple mutations, illustrated separately in c and d. (c) The clonal interference effect in large populations: a weak-effect beneficial mutation A occurs and begins to sweep, but is outcompeted by a later but more-fit mutation B, which in turn is outcompeted by mutation C. C fixes before any larger mutations can occur; the process can then begin again. Multiple mutations are ignored here. (d) The multiple-mutation effect: several mutations, A, B, and C, of identical effect occur and begin to spread. Mutant lineage B happens to get a second beneficial mutation D, which helps it sweep, outcompeting A and C. Eventually this lineage gets a third beneficial mutation E. Mutations that occur in less-fit lineages, or those that do not happen to get additional mutations soon enough (such as BDF), are driven extinct.

In the concurrent-mutations regime, two important effects occur. The first is when a moderately beneficial mutation occurs and begins to sweep, only to be outcompeted by a later, more strongly beneficial mutation that occurs in a wild-type individual. The first mutation is then wasted, as it is eliminated along with the then-majority type by the sweep of the stronger mutation. This effect is referred to as clonal interference; it is illustrated in Figure 1c. Note that despite earlier broader definitions we use the term clonal interference to refer to only this first effect, consistent with the focus of recent work on the subject (Gerrish and Lenski ). The second effect is when multiple mutations occur in the same lineage before the first beneficial mutation fixes. For example, a second beneficial mutation can occur in an individual that already has one beneficial mutation. The double mutant can then benefit from the combined effect of the two mutations and outcompete the single mutant as well as some other stronger single mutants that arise in the majority population. This process is illustrated in Figure 1d.

The dynamics of evolution in the concurrent-mutations regime are important to understand. At the very least, this is essential for forming sensible null expectations about experimental, observational, and genomic data from large populations. Knowing how the effects of beneficial mutations depend on mutation rate and population size is crucial for making meaningful comparisons between different populations. Most important, in our view, is developing an intuition for how large populations evolve. The simple picture of successive selective sweeps in the successional-mutations regime is a valuable guide to thinking about positive selection. Yet we have little intuitive guidance when the successional-mutations approximation does not apply. This is a serious shortcoming in our understanding of the evolution of a wide array of populations, including viruses and most unicellular organisms.

Although it is not as well understood as the successional-mutations regime, the concurrent-mutations regime has been the subject of substantial interest since the s. Fisher () and Muller () first noted the potential importance of interference between beneficial mutations (Muller drew diagrams very similar to our Figure 1). They proposed what has come to be known as the Fisher&#x;Muller hypothesis for the advantage of sex: sexual populations can recombine beneficial mutations in competing lineages into the same individual. This prevents mutational events from being wasted, as they often are in asexual populations.

Much subsequent work on positive selection in the concurrent-mutations regime has focused on the implications for the evolution of sex. Crow and Kimura (), Bodmer (), and Maynard Smith () attempted to quantify the Fisher&#x;Muller effect in the late s and the early s. However, their analysis was incomplete&#x;it did not fully account for stochastic behavior, ignored triple and higher mutations, and did not correctly account for the effects of sex. Contemporaneously, Hill and Robertson () looked at this problem from the perspective of the linkage disequilibrium generated by multiple linked beneficial mutations segregating simultaneously. This has become known as the Hill&#x;Robertson effect. It is essentially equivalent to the Fisher&#x;Muller effect (see Felsenstein for a detailed discussion). In recent years, Barton (), Otto and Barton (, ), and Barton and Otto () have analyzed the Fisher&#x;Muller effect from the Hill&#x;Robertson perspective. Their work focuses on the buildup of linkage disequilibrium due to mutations and selection and the average effect of recombination on the variance in fitness and the destruction of disequilibrium. This provides useful insight into the effects of sex, but does not explain the full evolutionary dynamics or population genetic structure created by this type of positive selection.

In this article, we step back from the long tradition of studying the implications of concurrent mutations for the evolution of sex and focus instead on the basic dynamics shown schematically in Figure 1b. We show how an asexual population in the concurrent-mutations regime accumulates many beneficial mutations, what the fitness distribution looks like, how it develops, and how quickly selected substitutions occur via collective sweeps. We develop a framework for thinking more generally about positive selection and its effects that is applicable to large populations of asexuals or any other case where linkage between mutations is important.

We do not analyze the questions about sex or patterns of diversity in this article. However, these questions should be informed by our results; some can be studied within the framework we present in this article. For example, when recombination is rare, the average effects of sex may be irrelevant&#x;instead all that matters is whether or not it creates rare individuals that are much more fit than the majority of the population. To study this, we must first understand the full distribution of genetic diversity within the population. Similarly, before analyzing the patterns of genetic variation exhibited by populations in which multiple linked beneficial mutations have occurred&#x;or are occurring&#x;one must understand the rate of beneficial substitutions and typical interference patterns between these within the linked regions.

To understand the concurrent-mutations dynamics in detail, it is essential to start with a specific model that focuses on some subset of the important effects. Features can then be added after enough understanding has been gleaned to enable predictions of which effects are model specific and which are more general. Positive selection can involve various complications, including epistasis (interactions between effects of mutations), conditionally beneficial mutations, frequency-dependent benefits, and changing environments, among others. Many different scenarios are possible. At present we have little understanding of which, if any, of these situations are biologically typical and which ones are unusual. In this article, we do not attempt to catalog all possible complications; this is an impossibly broad subject. Instead we look at the simplest possible situation involving positive selection of concurrent mutations. We suppose that a variety of beneficial mutations are available to a population and ask how the population acquires them. We assume these mutations interact in a simple multiplicative way (additive for the growth rates) with no epistasis, frequency dependence, or changing environment of any kind. In short, we ask how the population climbs a single smoothly sloped hill in fitness space.

This simple scenario is probably common. Populations often find themselves in an environment where they can accumulate quite a few different beneficial mutations that each roughly independently help them adapt. Even when this simple hill-climbing scenario does not apply, it is an important null model. Some more complex forms of positive selection may also prove tractable within the framework we describe, while others will not; these leave open many avenues for future work.

Various other authors have studied the dynamics of multiple concurrent beneficial mutations under the simple assumptions outlined above. Gerrish and Lenski () analyzed clonal interference between mutations of different strengths; this has since been extended by various authors (Orr ; Gerrish ; Johnson and Barton ; Kim and Stephan ; Campos and De Oliveira ; Wilke ). This work focuses on the interference between mutations of different strengths that occur in the same lineage, while neglecting the competition between mutations that arise in different lineages&#x;in particular multiple mutants. Yet we show below that if population parameters are such that clonal interference is important, the effects of multiple mutants are usually at least of comparable importance. Thus there is some inconsistency in focusing on clonal interference alone. Our analysis in this article starts instead with the other concurrent-mutation effect, multiple mutants, initially in a model in which clonal interference is absent. In any real situation, the two effects will both occur. We thus discuss the interplay between clonal interference and multiple mutations in a later section. Kim and Orr () have also recently analyzed a model that combines some aspects of clonal interference and multiple mutations.

To focus on the effects of multiple mutants without clonal interference, two additional simplifying approximations are useful. For most of this article, we study a model in which each beneficial mutation has the same effect, s, on fitness (i.e., each step uphill is of the same size). Furthermore, to focus on the effects of positive selection, we neglect deleterious mutations in the primary analysis. Even though neither assumption will typically be true, these turn out to be reasonable approximations in many circumstances. Situations in which they are not appropriate are more complicated scenarios for positive selection, some of which, especially the effects of a distribution of fitness increments, we discuss briefly.

Remarkably, even the simplest possible model with many equal-strength beneficial mutations available is only partially understood. Kessleret al. () and Ridgwayet al. () analyzed a similar simple model, but their initial work did not handle random drift correctly. More recently, they have developed a sophisticated although somewhat unwieldy moment-based approach (D. Kessler and H. Levine, unpublished results) from which it is unfortunately hard to understand the essential aspects of the dynamics. Rouzineet al. () also studied a model similar in its essential aspects to our simplest model (although also including deleterious mutations of the same magnitude). They were concerned with viral evolution, and their results are primarily valid for very large mutation rates appropriate for many viruses; we focus instead on regimes primarily applicable to single-celled organisms (and some viruses). Nevertheless, if worked out more fully from Rouzine et al.'s analysis, several results can be obtained that are closely related to ours. But our analysis involves a less mathematically formal approach&#x;we believe it is both clearer and a better basis for further development (some of which is included herein). We discuss the relationship between our analysis and that of Rouzineet al. () in more detail below.

The outline of this article is as follows. We begin by describing in the next section a heuristic approach to the dynamics. This analysis gets the behavior roughly correct and illustrates the ideas underlying our approach. We then describe the simplest model more precisely and analyze it in the following section. We next discuss transient behavior before the population has reached its steady-state fitness distribution and address the effects of deleterious mutations. In the next section, we make comparisons between our analytic results and simulations. We then relax our assumption that all mutations have the same effect and discuss the relationship between our theory and clonal interference analysis. Finally, we summarize our results and discuss future directions.


In the simplest situation with multiple concurrent beneficial mutations, there are three important parameters: the population size, N, the beneficial mutation rate per individual per generation, Ub, and the fitness increase provided by each mutation, s. We refer to the basic exponential growth rate, r, of a population as its fitness (rather than its growth factor per generation w er &#x; 1 r). That is, we use fitness to mean what is sometimes called log fitness. Thus in the absence of epistasis, which we generally assume, two mutations of magnitude s1 and s2 increase fitness by s1 s2. We call the rate of increase, d&#x;r /dt, of the average fitness of a population the speed of evolution and denote it v.

To focus on the effects of multiple mutants in a situation in which clonal interference does not occur, we initially restrict consideration to the approximation that all beneficial mutations have the same effect. A k-tuple mutant thus has fitness ks greater than the original wild type. The speed of evolution is then simply equation M1.

We begin by reviewing the successional-mutations regime where beneficial mutations are sufficiently separated in time for them to sweep independently, as in Figure 1a. Although this is exactly solvable and well known, it is instructive to consider it from a heuristic perspective. We then turn to a heuristic analysis of the more complex concurrent-mutations dynamics illustrated in Figure 1d.

Successional-mutations regime and the establishment of mutants:

Small asexual populations evolve by accumulating beneficial mutations sequentially. Beneficial mutations occur in the population at a total rate NUb. The probability that a particular mutant will survive random drift is proportional to its selective advantage s (provided equation M2). The constant of proportionality depends on the specific model for the stochastic dynamics; for our model it is 1 and we discuss in the simplestmodel section below the minor modifications of our results that are needed for other stochastic dynamics. We call the process by which the lineage of a beneficial mutant that survives drift becomes large enough for the population of its descendants to grow deterministically the establishment of the mutant clone. As we show below in the section on the fate of a single mutant, a mutant population becomes established when its size reaches of order 1/s individuals. Roughly speaking, this is because a mutant lineage of size n takes n generations to change by of order n individuals due to random drift. Since selection adds on average ns individuals to the lineage per generation, in this time selection has an average effect of adding n2s individuals. So selection dominates drift provided n2s  n or equation M3. Thus the mutant lineage must reach a size equation M4 before it becomes safe from extinction and begins to grow mostly deterministically.

We show in the section on the fate of a single mutant that if a mutant is destined to become established, it will reach this size 1/s very quickly. Thus new beneficial mutations are established at a rate roughly NUbs per generation (other mutant populations die out due to random drift), so a new mutation will become established about once every equation M5 generations. Once established on reaching size of order 1/s, the mutant lineage grows roughly exponentially at rate s and hence takes of order equation M6 generations to fix (we loosely call fixed a mutant lineage that has grown to represent a large fraction of the population; the conventional definition corresponds to fully fixed, which takes about twice as long).

When the population size or mutation rate is small enough, fixation will happen more quickly than establishment. This occurs when

equation M7


which corresponds to equation M8. When this condition holds, we are in the successional-mutations regime, in which the establishment rate is limiting: a mutation A that arises and fixes will do so long before the next mutation destined to survive drift, B, is established. Thus mutation B occurs in a population that has already fixed A, yielding AB, and B fixes well before mutation C is established. Beneficial mutations continue to accumulate in this simple way. New mutations arise and fix at average rate NUbs, each one increasing the fitness by s. Thus fitness increases at a speed

equation M9


linear in the product NUb. This linear mutation-limited behavior characterizes the successional-mutations regime of successional selective sweeps.

Concurrent-mutations regime:

In larger populations, the behavior is more complex, as illustrated by Figure 1b. In this case, the establishment times of new mutants are shorter than their fixation times, corresponding to

equation M10


Thus new beneficial mutations arise and become established before earlier ones can sweep, causing them to interfere with one another.

As noted in the Introduction, two types of interference are important. First, competition occurs when two mutations that have different strengths occur independently in individuals with similar initial fitness (clonal interference). We focus in the bulk of this article on the other type of interference: a mutation that arises in a fitter background (e.g., one with an earlier beneficial mutation) will outcompete another mutation of similar effect that occurs in a less fit background. In the constant-s model clonal interference is explicitly absent, and we thus initially focus exclusively on this latter effect. In this constant-s approximation, two different mutants that occur among those with the same fitness (in particular members of the same clone) will compete equally and sweep together, each becoming only partially fixed. Unless we are interested in the neutral genetic variability of the population, all subpopulations with the same fitness can be considered as a single subpopulation: we do this except in the discussion at the end of this article. Also, we postpone discussion of the interplay between clonal interference and multiple mutants (i.e., going beyond the constant-s model) to a later section below.

First consider starting from a monoclonal population. Mutations initially give rise to a subpopulation with fitness increased by s (Figure 2a). The size of this mutant subpopulation drifts stochastically, but eventually becomes large enough, 1/s individuals, to become deterministic. This takes a (stochastic) establishment time, Ä1. After its establishment but before its fixation, mutations can occur in the still-small mutant subpopulation to create double mutants with fitness 2s (Figure 2b). This typically happens well before the single mutants have fixed (else we are by Equation 1 in the successional-mutations regime). We assume the double mutants never arise before the single-mutant subpopulation has established; as we discuss below and in appendix g, this will be true unless mutation rates are extremely high or selection is very weak. A double-mutant population thereby becomes established a time Ä2 after the establishment of the single-mutant population. Triple mutants then begin to arise and become established after an additional time Ä3. This interval is typically shorter than Ä2, primarily because double mutants grow faster than single mutants and hence generate more mutations and, in addition, because the triple mutants are more fit than double mutants and hence survive drift more easily (with probability 3s rather than 2s).

An external file that holds a picture, illustration, etc.
Object name is GENf2.jpg

Open in a separate window

Figure 2.&#x;

Schematic of the evolution of large asexual populations. Shown are fitness distributions within a population, on a logarithmic scale. (a) The population is initially clonal. Beneficial mutations of effect s create a subpopulation at fitness s, which drifts randomly until after time Ä1 it reaches a size of order equation M11, after which it behaves deterministically. (b) This subpopulation generates mutations at fitness 2s. Meanwhile, the mean fitness of the population increases, so the initial clone begins to decline. (c) A steady state is established. In the time it takes for new mutations to arise, the less-fit clones die out and the population moves rightward while maintaining an approximately constant lead from peak to nose, qs (here q 5). The inset shows the leading nose of the population.

This process continues, accelerating at each step. Eventually, however, enough time passes that the single-mutant subpopulation (or one of the multiple-mutant subpopulations) becomes larger than the original wild type. This near fixation of the single mutants increases the mean fitness by s, which balances the accelerating front and creates a moving fitness distribution that will attain a (roughly) steady-state width with the mean fitness increasing with a steady-state average speed, v. This is a form of mutation&#x;selection balance: as each new beneficial mutation becomes established, the mean fitness increases by s and the fitness distribution moves to higher fitness while maintaining the same shape.

It is useful to consider this process in more general terms. The key to the behavior is the balance between mutation, which increases the variation in fitness within the population, and selection, which decreases the variation by eliminating all but the fittest individuals. If we were discussing deleterious mutations, mutation would also oppose the tendency of selection to increase the mean fitness, leading to a steady-state distribution of fitness (ignoring Muller's ratchet, which for large populations only matters on extremely long timescales). This deleterious mutation&#x;selection balance, which is independent of population size for large N, has long been understood (Gillespie ). In our case, the dynamics are more subtle because the important mutations are beneficial. The basic idea of mutation&#x;selection balance, however, is unchanged. Mutations broaden the fitness distribution while selection narrows it, creating a steady-state variance around an increasing mean fitness. But unlike the deleterious case, the dynamics of the rare individuals near the most-fit tail of the fitness distribution (the nose ) control the behavior. We show below that selection moves the distribution toward higher fitness at a rate very close to the steady-state variance in fitness&#x;the classic result in the absence of mutations (the fundamental theorem of natural selection ) (Fisher ). But new beneficial mutations at the nose are essential to maintain this variance: in their absence the fitness distribution would collapse to a narrow peak near the most-fit individual and evolution would grind to a halt.

The crucial dependence on new mutations in the nose makes the analysis of the beneficial mutation&#x;selection balance more complex than in the deleterious case. It is now essential to account properly for random drift in the small populations near the nose. In the case of deleterious mutation&#x;selection balance, rare new mutants are less fit than the rest of the population. They will die out soon anyway, so failing to account properly for the stochastic dynamics by which they do so has no serious consequences. Random drift is important with solely deleterious mutations only if Muller's ratchet is operating, i.e., if the most-fit individuals are rare enough that they can die out due to random drift. The beneficial mutation&#x;selection balance is quite analogous to this Muller's ratchet case. Here too the subpopulations that are more fit than average control the long-term behavior of the population, and these are small enough that correct stochastic treatment is essential. As is the case with Muller's ratchet, infinite-N deterministic approximations are not even qualitatively correct. Indeed, with a large supply of beneficial mutations, deterministic analysis incorrectly predicts a rapid acceleration of the nose toward an infinite speed of evolution. This nonsense result is because of the creation in the deterministic approximation of (what are effectively) fractional numbers of new much fitter mutants that then grow exponentially, unhampered by drift, and dominate the behavior soon after (we describe this in more detail in appendix a).

There are two factors that determine the dependence of the speed of evolution on the population size. The first is the dynamics of already established subpopulations, which is dominated by selection. The second is the new mutations that occur in the fittest subpopulation. We define the lead of the fitness distribution, Q, as the difference between the fitness of the most-fit individual and the mean fitness of the population (more precisely, Q &#x; s is the difference between the mean fitness and that of the most-fit established mutant class). We define q by Q qs, so that if the lead is Q, the most-fit individuals have q more beneficial mutations than the average individual: they have a lead Q in the race to higher fitness. Once it is established, this fittest population grows exponentially. In the time this population took to become established, in steady state the mean fitness must have increased by s, so the newly established population will initially grow exponentially at rate (q &#x; 1)s and later more slowly as the mean continues to advance. Growing from its establishment upon reaching size 1/qs until it reaches a large fraction of N will thus take time equation M12, since equation M13 is its average growth rate during the period between establishment and fixation. In this time the mean fitness will increase by (q &#x; 1)s. Therefore v &#x; (q &#x; 1)s 2/ 2 ln(Nqs) . One can show that this v is equal to the variance in fitness, as expected if mutation is indeed negligible compared to selection in the bulk (i.e., away from the nose) of the distribution, so that the fundamental theorem of natural selection applies.

The other factor is the dynamics of the nose, where mutations are essential. A more-fit mutant that moves the nose forward by s will be established some time Äq after the previous most-fit mutant. Thus the nose advances at a speed v s/&#x;Äq , where &#x;Äq is the average Äq. After it is established, the fittest established population nq&#x;1 will grow exponentially at rate (q &#x; 1)s and produce mutants at a rate Ubnq&#x;1 &#x; Ube(q&#x;1)st/qs. Many new mutants will establish soon after the time Äq at which equation M14 becomes equal to one, so the time it takes a new mutant to establish is equation M15. This means the nose advances at rate v &#x; s/&#x;Äq &#x; (q &#x; 1)s2/ln(s/Ub). Significantly, the behavior of the nose depends only on mutations from the most-fit subpopulation; it is almost independent of the less-fit populations and thus can depend on N only via the lead, qs. As far as the nose is concerned, the majority of the population&#x;destined to die out shortly&#x;is important only to ease the competition for the fittest few. Yet we argued above that the bulk of the population fixes the speed of the mean via the selection pressure: equation M16. In steady state, the speed of the mean must equal the speed of the nose&#x;the mutation&#x;selection balance. This implies that

equation M17



equation M18


These results are very close to the more careful calculations below. All the basic qualitative behavior follows from this intuitive reasoning.

For large NUb, we have found that v depends logarithmically on N and Ub, much slower than the linear dependence on NUb that holds for smaller populations. This reduction occurs because at large NUb, almost all beneficial mutations occur in individuals far from the nose of the fitness distribution (i.e., in a bad genetic background) and are therefore wasted, since these subpopulations are doomed to extinction. Thus increasing N does not directly increase the supply of important mutations, as these occur in the relatively few individuals at the nose. Rather, the effect of increasing N is to increase the time required for selection to move the mean fitness, which increases the lead, which makes individuals at the nose more fit relative to the mean fitness, which speeds the establishments at the nose. Similarly, increasing Ub does not directly affect the dynamics of most of the fitness distribution. Rather, it decreases the time for new mutations to occur at the nose, which means that more mutations can occur before the mean moves, which increases the lead and speeds the evolution.

This also explains why v is not a function of NUb: N directly affects only selection timescales, while Ub directly affects only the mutation supply rate, so v depends on N and Ubseparately. It is not a function of the commonly used parameter ¸ 2NUb. Instead, it is a function of the parameters Ns (which describes selective forces) and equation M19 (which describes the strength of selection relative to mutation), and it is valid in the regime where both are large. The expression for q above is of order the basic selective timescale, equation M20 divided by the basic mutation timescale, equation M21, which makes sense since the lead is set by the balance between these two forces. More generally, the two factors that determine the timescales of the multiple mutation dynamics are

equation M22


Although these are both logarithmic in the population parameters and thus never huge, they can be large enough to be considered as large parameters. Many of our more detailed results are valid in the limit that both L and &#x; are large, with corrections (some of which we include) smaller by powers of 1/&#x; or 1/L.

We show below that our result for v is consistent with the fundamental theorem of natural selection. Viewed in this light, our result for the speed of evolution is not in itself novel: the speed is just the variance in fitness, as usual. What our analysis does is to obtain what this variance is. In many aspects of quantitative genetics, the variance of a quantitative trait (such as fitness here) is taken as some external parameter. When the variance has accumulated during a period when it was neutral and is only starting to be selected on, this may be appropriate. But beyond that, it is surely not. Our analysis deals with the case when variance is accumulating while being selected on. That is, when variance in fitness is increasing due to mutations while at the same time it is being acted on by selection, then, even if the adaptation speed is only indirectly related to new mutations, it is essentially dependent on them: without mutations the variance will rapidly collapse to zero.

However, neither our heuristic analysis above nor our more careful work described below ever explicitly involves the fitness variance. Rather, the natural measure of the width of the fitness distribution is the lead. It is the lead, not the variance or the standard deviation, that can be most productively thought of as a balance between mutation and selection. It is true, of course, that the variance is also increased by mutation and decreased by selection. However, this is not the clearest way to understand the behavior. The increase in the variance from mutations is delayed and indirect. The new mutations that occur at the nose will only increase the variance after they have grown enough&#x;and by then the important new mutations that will keep the variance high later are happening further out in the nose. This is not to say that a variance (and higher-moment)-based approach is impossible, but it is unwieldy and prone to hard-to-understand errors when any approximations are made. We discuss such moment-based approaches in appendix a.


We now turn away from crude (though powerful) intuitive arguments towards more rigorous analysis. We begin in this section by defining the simplest model more precisely. We consider mutation, selection, and drift within a purely asexual population of constant size N. We assume that a large number of beneficial mutations, each of which increases fitness by s, are available and define Ub to be the total mutation rate to these mutations. We consider the situation where the number of beneficial mutations fixed is small compared to the total number available so that Ub does not change appreciably over the course of the evolution (we relax this assumption in appendix c). We neglect deleterious mutations and other-strength beneficial mutations (see later sections below for a discussion of the consequences of these assumptions). These simplifications are not essential and do not change the basic behavior in many situations. Indeed, we argue that these assumptions can all be good approximations even when the situation is more complex, in particular when N or Ub are not constant, or in the presence of deleterious mutations or variable s, as we discuss in detail in subsequent sections. But, more importantly, these simplest approximations make the analysis clearer.

In addition to the more innocuous simplifications, we make two essential biological assumptions: that there is no frequency-dependent selection and that there is no epistasis, so that the fitness of an individual with k mutations is (k &#x; &#x;)s greater than the fitness of an individual with &#x; mutations. When either of these conditions fails, the evolutionary dynamics can be very different from our predictions.

Key approximations:

There are two primary difficulties in analyzing the multiple subpopulations that occur even in the simplest model. The first is the stochastic aspects: when a subpopulation with a given fitness is rare, stochastic drift plays a crucial role and must be handled correctly. The second is the interactions between the subpopulations: the constraint of fixed total population size means that there is effectively a frequency dependence to the growth of a subpopulation&#x;albeit a simple one.

To model the stochastic effects, we assume that the basic process of birth and death is a continuous-time branching process. All individuals have the same constant death rate 1, which means that the average lifetime of an individual is 1 (i.e., the units of time are generations) and that the lifetimes are exponentially distributed. Each individual in the population has some number, y, of beneficial mutations. We define equation M23 to be the average value of y across the population (i.e., the average number of beneficial mutations per individual). An individual with y beneficial mutations has a birth rate equation M24. This ensures that the average birth rate in the population is 1, so the population stays at a constant size N. We assume all individuals give rise to mutant offspring at rate Ub, independent of their birth rate (i.e., mutants arise at a constant rate per unit time). If mutations instead occur at a constant rate per birth event, our assumption underestimates the mutation rate for the most-fit individuals. However, we always assume equation M25 for all individuals (i.e., the lead, Qequation M26), so that the two definitions are almost equivalent.

The branching process model allows one to calculate simple analytic expressions for a number of important quantities that are not readily available in diffusion approximations of the standard Wright&#x;Fisher model. However, branching process models cannot easily deal with the nonlinear saturation effects required to maintain a constant population size. By saturation effects, we refer to when a mutant subpopulation has become large enough to influence the mean fitness of the population and hence begins to compete with itself, slowing its growth: this is the essential effect of the fixed total population size. To handle the saturation effects, we make use of a simple observation: stochastic effects are important only when a subpopulation is rare, while saturation is important only when a subpopulation is common. Thus we use the stochastic branching process model, ignoring saturation effects, to describe the dynamics of a subpopulation while it is small. Conversely, when it is large, we ignore random drift and treat it with the correctly saturating deterministic equations. Our use of both deterministic and stochastic analyses requires an appropriate way of linking the two together. In this article, we describe a method for doing so. This method accounts for all of the important aspects of genetic drift and is simple and intuitive. It should be of broad applicability to related evolutionary problems.

This approach works as long as the stochastic regime and the saturation regime are different. That is, a subpopulation must become large enough to neglect random drift before it is too large to ignore saturation. We can treat a subpopulation of size n deterministically so long as equation M27. On the other hand, saturation can be ignored when equation M28. Thus to separate the stochastic and the saturating phases of growth of a subpopulation, we require equation M29. Throughout this article, we assume this condition holds. Unless s is extremely small (s Ub), a population small enough that equation M30 will usually be too small for clonal interference or multiple mutation effects to matter, so this is not a serious limitation.

A situation in which there are multiple subpopulations of varying sizes is illustrated in Figure 3: this shows the logarithm of a typical fitness distribution within a steadily evolving population. Where the subpopulations are small, at the front of the distribution, stochastic analysis is necessary but nonlinearities can be ignored. When a subpopulation represents a substantial fraction of the total, nonlinear saturation is important but stochasticity is not. As long as equation M31, there is an intermediate regime where neither matters. We can thus use a nonlinear deterministic analysis in the bulk of the distribution and a linear stochastic analysis near the front and match the two in the intermediate regime in which both are valid. These approximations are fully controlled and any corrections to our results will be small for equation M32.

An external file that holds a picture, illustration, etc.
Object name is GENf3.jpg

Open in a separate window

Figure 3.&#x;

Schematic of a typical fitness distribution on a logarithmic scale. The total population size is large: equation M33. At the front of the distribution&#x;the nose&#x;where only a few individuals are present, stochastic effects are strong but nonlinear saturation is not. The reverse is true in the bulk of the distribution. Stochasticity is strong only when a subpopulation size n is small, equation M34, and saturation is strong only when a subpopulation size is large, n N. Thus there is a wide intermediate regime where neither one matters. We can therefore use a nonlinear deterministic model in the bulk of the distribution, use a linear stochastic model at the front, and match the two in the intermediate regime where both are valid. The bulk of the distribution is dominated by selection, which gives rise to a steady-state Gaussian shape except near the nose.

Relationship of our model to the Wright&#x;Fisher model:

The deterministic limit of our model is identical to that of the Wright&#x;Fisher model. However, the stochastic dynamics are slightly different. In the Wright&#x;Fisher model, all individuals have a lifetime of exactly one generation, while in our model individuals have a random exponentially distributed lifetime with mean one generation. In the Wright&#x;Fisher model, the distribution of the number of offspring per individual is approximately Poisson, while in our model the number of offspring is geometrically distributed. Both the mean lifetime and the mean number of offspring per individual are identical in the two models (hence identical deterministic dynamics), but the different distributions do lead to slight differences. In particular, although the probability a beneficial mutation of size s (equation M35) will become established is proportional to s in both models, it is &#x;cs with the coefficient c 2 in the Wright&#x;Fisher model and c 1 in ours. Since it is likely that the population dynamics in any real population are not well represented by either of these models, there is no one correct model e.g., for populations dividing by binary fission, as in many experimental studies of evolution, the establishment probability is closer to s (Johnson and Gerrish ) . Fortunately, in our analysis of the behavior of large populations, these differences cause only negligible corrections in the arguments of logarithms e.g., replacing ln(Ns) with ln(cNs) when equation M36 . For smaller populations, however, the speed of evolution is proportional to the probability of establishment and thus does depend on more details of the model: in particular, the successional-mutation result for the speed is v &#x; cNUbs2.

It would in principle be possible to use a diffusion approximation to the Wright&#x;Fisher model instead of our branching process model. This would have the advantage of being able to handle saturation and drift at the same time and thus cases where equation M37. Such a model could in principle treat all the different subpopulations stochastically, including all mutations between these populations. However, this would lead to a complex and difficult to analyze infinite-dimensional diffusion process. There is, however, a controlled approximation&#x;valid for large Ns&#x;to the full diffusion process that is exactly equivalent to ours; as it would add little, we do not discuss this explicitly here.


This section contains the primary analysis presented in this article: the accumulation of beneficial mutations in the simple model described above. We begin by looking at what happens to a single mutant individual. We then ask what happens to a mutant population that is being fed constantly by new mutations. We next couple this analysis to the behavior of the rest of the population to gain an understanding of the evolution of large asexual populations and obtain our primary results. Finally, we connect this behavior to the small-population regime.

The fate of a single mutant individual:

We begin by considering the fate of a single mutant individual. We assume that in a large clonal population of size N, at time t 0 there is a single mutant individual with a beneficial mutation conferring fitness advantage s. We denote the size of the subpopulation carrying this beneficial mutation at time t as n(t) by assumption, n(0) 1 . We study the effects of selection and drift on this population by calculating the probability distribution of future n(t), equation M38, assuming that no further mutations occur. This provides an essential building block for all the subsequent analysis and also illustrates our basic approach in a simple context.

Throughout this analysis, we assume that the number of individuals with the beneficial mutation is small relative to the total population size, equation M39. Thus the mutants do not interfere with one another. Naturally, if the mutant becomes established it will supplant the wild-type population and this condition will cease to be true. By this time, however, the mutant subpopulation will be large enough that we can switch from the stochastic analysis described here to a correctly saturating deterministic analysis.

Because the mutant subpopulation is too small to affect the mean fitness, mutant individuals have a birth rate 1 s and death rate 1. We define g(n, n0, t) to be the probability of having n descendants at time t, starting from n0 descendants at t 0. We are interested in calculating g(n, 1, t). The probability of a birth or a death event in a unit of time dt is (2 s)dt, and this event is a birth with probability equation M40 and a death with probability equation M41. This means that

equation M42


where ´n,0 1 if n 0 and is 0 otherwise. This is a standard birth&#x;death process (Allen ). Assuming that individual lineages are independent and defining the generating function

equation M43


we can rewrite Equation 7 as a differential equation for G(z, t), which we solve to find

equation M44


We can now determine equation M45 from G(z, t). A standard inversion yields

equation M46


valid for n  0, and

equation M47


We are interested primarily in understanding the distribution of n given that the mutant population is not destined to go extinct. This is given approximately by

equation M48


Here we have approximated the geometric factor by a simpler exponential in n that is valid for equation M49, the regime of primary interest. Note, however, that although the crucial features are more apparent in the approximate expression, all the results below follow from the exact equations.

At this stage, the above results merely reproduce classical analysis, but it is useful to pause to compare them with various intuitive predictions. We first compute the average number of mutant individuals at time t,

equation M50


which confirms our understanding of what it means to have a beneficial mutation with advantage s. However, most of the time the mutation will die out. Conditional on not going extinct,

equation M51


which is larger at long times by a factor of 1/s. At short times, equation M52, this is &#x;n not extinct &#x; 1 t. At long times, equation M53, the extinction probability becomes equation M54, and equation M55. Note that short times correspond to equation M56, while long times mean equation M57. (Note also that none of these expressions saturate as n approaches N; they are valid for equation M58, as discussed above.)

It is useful to ignore mutations that are destined to go extinct due to drift and focus only on those that are destined to become established. We do this for the remainder of this section; all results are thus implicitly conditional on nonextinction. However, some care is required. If a mutation occurs at time t 0 and survives drift to become established, it may seem that on average it will grow as n(t) est, because it started from one individual at t 0 and grows on average exponentially. However, this is incorrect. Given that it survived drift, it is likely to have grown faster than est in the early stochastic phase of its growth during which drift is faster than selection (Otto and Barton ; Barton ). This is apparent from the expressions above: for equation M59, &#x;n not extinct &#x; 1 t, which is much faster than &#x;n est &#x; 1 st. Once the population is large and stochastic effects can be neglected, it naturally grows as est. However, because it grew faster than this in the early stochastic phase, it will on average be larger than if it had grown this fast through its entire history. As is clear from the expression for the average n at long times, equation M60, the behavior can be crudely approximated by assuming that it started at size equation M61 (rather than size 1) at t 0 and then grew exponentially as est thereafter. This approximation is of course not valid during the early phase of growth. Note that the above also implies that, given that a mutation is not destined to go extinct due to drift, it will fix in a time of order equation M62, notequation M63, as is sometimes seen in the literature. For s , this is a difference of generations. To be more precise, the fixation time is a random variable with a distribution of width 1/s and mean close to equation M64, rather than the naive equation M65.

For much of the subsequent analysis, we are concerned with the size of a subpopulation only after it is big enough to be essentially deterministic. Yet as the above discussion makes clear, the stochastic phase of growth affects the later deterministic dynamics. Thus we are interested in summing up the stochastic effects in terms of their impact on later deterministic growth.

Focusing only on the effects of stochasticity on later deterministic dynamics allows us to make a key simplification. Once the subpopulation is large enough to grow deterministically, but still small enough that saturation can be ignored (i.e., equation M66), its dynamics can be described by n ½est. The value of ½ is a random variable that depends on how fast the population grew in its stochastic phase. However, the only effect of this stochasticity on the later deterministic growth is to create random variation in ½. As almost all this stochasticity accumulates at short times, at large t (after the population has become deterministic) we can describe the overall effects of stochasticity in terms of a probability distribution equation M67. This is a big simplification, because the full probability distribution conditioned on nonextinction, A(n, t), depends on both n and t, while for large tequation M68 is independent of t, as we show below. This simplification is possible because at large t the only time dependence is the deterministic exponential growth.

We can justify the above heuristic argument rigorously. The definition of ½ is just a transformation of n, ½ &#x; ne&#x;st. This is valid in the early stochastic phase of growth as well as in the later deterministic phase. However, in the stochastic phase we do not expect that ½ will be independent of t. As we have the probability distribution A(n, t), it is straightforward to transform this to the distribution equation M69. When we take the large-t limit of equation M70, it becomes independent of t. This justifies our expectation that at large t, we have equation M71, independent of time.

Rather than using the probability distribution of ½, it will prove useful to define a related variable Ä by

equation M72


The random variable Ä is simply related to ½: equation M73. Since Ä is a simple transformation of n, we can immediately calculate equation M74 (with equation M75 the probability density as we are treating Ä as a continuous variable) from A(n, t). We find

equation M76


As with ½, this describes the distribution of n both in the deterministic and in the stochastic phase. Since n depends on t, so does the distribution of Ä. However, as expected from the previous discussion, the distribution of Ä becomes independent of t for large t. We define Äest as Ä(t &#x; ) and find

equation M77


The average value (as well as higher moments) of Äest can be easily computed from this distribution. We have

equation M78


where ³ is Euler's constant ³

We see from Equation 16 that the large-t condition required for the distribution of Ä to become independent of t is equation M79. This is the time at which equation M80. This indicates that our choice of equation M81 as the size at which a population becomes established is appropriate. After a time equation M82, when the population on average reaches this size provided it has not gone extinct, the probability distribution of Ä begins to become independent of t, indicating that the behavior of the population crosses over from mostly stochastic to mostly deterministic.

The variable Äest has an intuitive interpretation: Äest is the time at which n would have reached size equation M83 had it always grown deterministically, as calculated by looking at n(t) at large t and extrapolating backward. This is illustrated in Figure 4a. We can therefore approximate the destined-to-be-established subpopulation as drifting randomly for a time Äest, at which time it reaches size equation M84 and then grows deterministically thereafter. With this simplification, the only important stochasticity is the duration of the drift period. This is the key simplification that allows us to smoothly connect the branching process with the nonlinear dynamics once the subpopulation is no longer rare. It jibes with our intuitive expectation that the subpopulation is dominated by drift when rarer than equation M85 and then behaves deterministically once it exceeds this size. Note, however, that in addition to telling us nothing about n(t) before time Äest, it also gives a slightly inaccurate picture immediately after Äest when n(t) is equation M86. The time Äest is not in fact the time at which the subpopulation reaches size equation M87 (see Figure 4a). Rather, it is the time at which n(t) would have reached size equation M88 if we assumed that it always behaved deterministically, but it gets the large-t behavior right. In fact, some small drift does take place after reaching size equation M89; our approximation does not ignore this drift, but rather adds up all the drift that takes place through all the time and rolls it into a change in Äest. This can thus be thought of as the time at which the mutation establishes. In asking how quickly beneficial mutations accumulate, this is the most natural variable.

An external file that holds a picture, illustration, etc.
Object name is GENf4.jpg

Open in a separate window

Figure 4.&#x;

(a) The definition of the establishment time Äest. A single mutant individual is assumed to exist at t 0. It drifts stochastically until it either goes extinct or eventually gets large enough that it grows exponentially and its behavior becomes roughly deterministic. We define Äest to be the inferred time at which the population would have reached size equation M90 if one extrapolated backward from the long-time deterministic behavior. Note that Äest is not the time the population actually reached size equation M91 (indeed, Äest can be negative). (b) The definition of Äq: the time between successive establishments of the lead population with fitness qs more than the mean. Mutations occur with a rate that grows exponentially with time. Here, Äq is the time the new lead population would have reached size equation M92, extrapolating backward from its long-time deterministic behavior. This includes both the time to generate a mutant destined to establish and the time for it to drift to substantial frequency.

The caveats above illustrate why it is perfectly consistent to have Äest 0; the distribution Best) above shows that this is not even particularly improbable. This reflects the fact that, given that a mutant subpopulation is not going to go extinct, it is reasonably likely to grow remarkably fast in the early stochastic phase. A Äest 0 simply indicates that the mutant subpopulation grew so fast when rare that if we look at the subpopulation size much later and assume it always grew exponentially at rate s, the subpopulation would have had a size equation M93 at t 0.

We note that equation M94, while equation M95 for large t (as always, conditional on nonextinction). This may naively seem inconsistent, since equation M96 for large t. However, it merely reflects the fact that &#x;eX &#x; e&#x;X . The difference between these two averages is in fact the essential reason that Äest will prove to be such a useful variable to focus on. This is because the value of &#x;n(t) depends much more sensitively on the tails of equation M97 than does &#x;Äest .

Mutants generated by a changing population:

The above analysis of the population size of a clone founded by a single mutant individual is an important building block. However, it does not address the full problem. We must now ask how the mutants arise in the first place. In the simplest case, we might imagine a wild-type population of size N, starting with 0 mutants at time t 0. This population generates mutants at rate NUb. Each mutant follows the dynamics given in the above section, beginning at the time it was created, but now we have multiple such initial mutants that are created at random times.

Generally, the relevant process is even more complex. Starting from a wild-type population, a single-mutant subpopulation is generated, experiences a stochastic period, and then begins to grow deterministically. Then double mutants are created by mutation within the single-mutant population while it is still growing (i.e., before it fixes). The rate at which these double mutants are generated increases with time because the single-mutant subpopulation is growing. Later, the double mutants may themselves generate mutants before they fix (and possibly before the single mutants fix), and so on.

We therefore must tackle a more general problem: the distribution of the population size n(t) of a mutant subpopulation that starts with 0 individuals and is fed by mutants from a less-fit subpopulation of (growing) size f(t). If this less-fit clone is small enough that its growth is stochastic, calculating the probability distribution of the mutant subpopulation is extremely complex. Fortunately, most nonviral organisms live in parameter regimes where a clone will never generate mutants destined to establish while it is still so small that it must be treated stochastically. As we discuss in appendix g, this parameter regime is equation M98, which we will generally assume. Thus we take f(t) to be some deterministic function describing the growth of the clone from which mutants arise. Later we set the origin of time in f(t) stochastically, to reflect the stochasticity in the establishment of this feeding population.

Note that we no longer need to condition on the mutant subpopulation not being destined to go extinct. Since this subpopulation is being continuously fed with new mutations, eventually one of these mutations will survive drift. Thus at long times the mutant subpopulation will never be extinct.

Unlike in the previous section, the growth rate of the stochastic mutant population n(t) is not necessarily 1 s. Rather, the growth rate is equation M99, where ys is the fitness of the subpopulation n(t) and equation M is the mean fitness of the population. For convenience, we write this as 1 rs. The death rate of this population is still 1. Since equation M increases continuously, r is time dependent. Despite this, we approximate r as a constant. This is justified because we use the stochastic description of n(t) only during the brief period during which it is rare, and in this time r does not change significantly. We discuss this approximation in appendix h.

We define ·(t &#x; tk) to be the number of descendants at time t of a single mutant that occurred at time tk. That is, given that a mutation occurs from the feeding population at time tk, ·(t &#x; tk) is the number of descendants of this mutation at a later time t. Note that · is the random variable whose generating function is given by G(z, t &#x; tk) from Equation 9 above, but with s replaced by rs. We have

equation M


where M is the random number of individual mutations that have occurred and Tk are the random times at which they occurred.

The number of mutations and their timings are an inhomogeneous Poisson process, fed by the population f(t). We therefore have

equation M


Note the lower limit of integration here represents the earliest time that mutations are allowed to occur; we have chosen this to be infinitely early. We discuss this choice of cutoff more generally in appendix e. The timings of the mutations Tk, conditional on M m, are the ordered statistics of m independent identically distributed samples drawn from the distribution

equation M


This means that the joint distribution of the Tk conditional on m is given by

equation M


The generating function for the distribution of the number of mutant individuals, n(t), is given by H(z, t) &#x;zn(t) . Note that equation M. Conditioning on the distributions of M and the Tk given above, and using the fact that equation M, we find

equation M


where the integral is over all ordered configurations of the tk. Substituting the distributions of M and the Tk above, we find

equation M


To understand the full probability distribution of n(t), we simply have to plug in the appropriate form f(t) and then invert this generating function.

An exponentially growing population feeding another:

In large populations, there will typically be various multiple mutants present, as illustrated in Figure 2. We can now apply the results of the previous section to this situation. As before, we define the the most-fit subpopulation that is large enough to treat deterministically to have fitness (q &#x; 1)s above the mean fitness (note that q is not necessarily an integer). This subpopulation, nq&#x;1, grows exponentially at rate (q &#x; 1)s. We define the origin of time such that nq&#x;1(t) is given by

equation M


Note that, analogous to the previous section, we are approximating q as constant&#x;we discuss this further below. The reason for defining the origin of time such that equation M at t 0 will become clear below. We now want to understand the stochastic dynamics of the subpopulation a fitness qs above the mean denote this population size by nq(t) . The subpopulation nq&#x;1 feeds mutations to nq; we therefore have f (t) nq&#x;1(t) in the notation of the previous section.

This problem involves one exponentially growing population, nq&#x;1, feeding another, nq. In analyzing it, we first step back from our specific situation to study the general case of an exponentially growing population with with size equation M feeding mutants at rate Ub to a stochastic population N2 that on average grows exponentially with rate R2. We later will substitute equation M, R1 (q &#x; 1)s, and R2 qs. We begin by plugging equation M into Equation 24, using the obvious generalization of G(z, t) to a population that grows at rate R2. This gives us H(z, t), the generating function of the probability distribution of N2. It is convenient at this point to pass from generating functions to Laplace transforms by defining the transform variable ¶ 1 &#x; z. For our purposes we can assume that ¶ is small: this introduces errors into equation M when N2 1, but we will never use equation M in this regime. We find

equation M


Substituting equation M, we find

equation M


Assuming that ¶ is small, the integral in this expression is independent of ¶ and is given by equation M. We find

equation M


We can now substitute our values of ½1, R1, and R2 to find that in our case

equation M


This is the standard form for the Laplace transform of a one-sided Levy distribution, a well-studied special function. An integral representation of this is the inverse Laplace transform of H,

equation M


where the integral is over the imaginary axis. For large nq this can be integrated to give equation M. Note this distribution has infinite &#x;nq , an unimportant and unbiological artifact of our choice of cutoff in the integral for H(z, t); this is discussed in appendix e.

To understand this distribution P(nq, t), we define a variable Äq similar to that described in the section above on the fate of a single mutant. We first define

equation M


As before, Ä is time dependent, but for t &#x;  the distribution of Ä is independent of t. We define Äq &#x; Ä(t &#x; ). As before, Äq is the time at which the subpopulation nq(t) would have reached size equation M had it always grown deterministically at rate qs, as calculated by looking at the size nq(t) at large t and extrapolating backward. Unlike Äest, the value of Äq includes the time for the mutation (or mutations) to arise in the first place as well as time for their initial stochastic growth. This is illustrated in Figure 4b.

As in the section on the fate of a single mutant, we can think of the mutant subpopulation as drifting randomly for a time Äq, at which point it reaches size equation M and thereafter grows deterministically. We therefore sometimes refer to Äq as the establishment time. As before, this is somewhat inaccurate in describing the dynamics right around Äq (or before) when the population is around or below a size of equation M. Again Äq is not actually the time the population reaches size equation M. This is because both future random drift and future feeding mutations, after the population reaches size equation M, are included in the estimate of Äq. However, for the purposes of understanding the dynamics of the mutant population once it becomes large compared to equation M, it is valid to think of Äq as the time it takes the population to reach size equation M.

We often wish to use moments of Äq. These are straightforward to calculate in principle, but somewhat tricky in practice. We first note that because of the definition in Equation 31, we have

equation M


We can therefore calculate &#x;Ä by computing &#x;ln nq and plugging into this expression. Higher moments of Ä are easily computed by similar expressions; these depend also on higher moments of ln nq. We can calculate &#x;lnmnq(t) by noting that equation M. Using the integral representation of P(nq, t), we have

equation M


where the ¶-integral is over the imaginary axis and we have defined

equation M


We integrate this to find

equation M


where &#x;(x) is the Gamma function.

We can now calculate derivatives of this with respect to ¼ to get &#x;lnmnq and hence the moments of Ä. For large t, as expected, Ä becomes independent of t. For the mean of Äq, we find

equation M


where ³ is Euler's constant. The variance of Äq is given by

equation M


Higher moments are also simple to compute if desired (and demonstrate that there is substantial skew in the distribution of Äq, as Äq substantially smaller than &#x;Äq can occasionally occur, while Äq substantially larger than this almost never does&#x;this is important in understanding the fluctuations in the rate of adaptation around its steady-state value and is discussed in appendix d).

This calculation of &#x;Äq is somewhat involved because of the need to use the integral representation of P(nq, t). We can get rough estimates (often useful in other contexts) via a simpler method. Namely, we define a typical population size equation M, where ¶1 is defined by H1, t) e&#x;1. As is apparent from the definition of a Laplace transform, equation M, for well-behaved distributions this typical value equation M is roughly like the median of nq. We can then get a typical value equation M from this by using the relationship between ln nq and Äq. Doing this leads to a equation M that is very close to the &#x;Äq calculated above.

Note that the careful result for &#x;Äq is similar to the crude result in the heuristic analysis section above, which approximated the time required for a new mutation to arise at the nose as equation M, roughly the typical time at which the first mutant destined to establish arises. This crude expression is only weakly dependent on the lower cutoff to the integral, which is good since nq&#x;1(t) is not given accurately by the deterministic approximation in this regime. This weak dependence appears for the same reasons in the careful calculation of Äq and is discussed in more detail in appendix e. The crude and careful results do differ, however. The careful result accounts properly for the randomness in the timing of a new mutation and the fluctuations during its early drift phase. It also accounts for the fact that not only the first mutant destined to establish at the nose contributes. Rather, as we see later, of order q different mutations contribute significantly to the establishment of a new most-fit subpopulation at the nose.

The rate of evolution and maintenance of variation at large N:

We are now in a position to calculate the rate of evolution and amount of variation maintained in large populations. In the above calculations, we set t 0 to be the time at which the population nq&#x;1 reached size equation M. This corresponds to the establishment time of this population. After a (stochastic) time Äq, the next more-fit subpopulation, nq, establishes. For the later deterministic dynamics of nq, we can think of this as the time when nq reached size equation M. At this point, we have reached the identical situation where we started, but with the nose of the population fitness distribution moved forward by s. In the steady state, the mean fitness of the population must also have moved forward by s in the average establishment time &#x;Äq . Thus the population at nq now has fitness only (q &#x; 1)s ahead of the mean. It has size equation M, but thereafter grows exponentially only at rate (q &#x; 1)s, giving a population size equation M.

The process now repeats itself&#x;we can take this establishment time of the new population (nq, above) as the new t 0, and after that this population grows as we had described for the original population nq&#x;1. In fact, it now is the population nq&#x;1, since the mean fitness has increased by s. Thus we can see that the mean fitness of the population and the position of the nose move forward by s in a time &#x;Äq . Thus the average rate of increase in fitness in the population is

equation M


Note that this discussion makes clear why, for consistency, we defined the establishment time for nq&#x;1 to be when this population reached size equation M, not equation M. We also note that the population that we had originally called nq&#x;1 is now nq&#x;2 and its size is given by equation M.

This change in the growth rate of the population we had originally called nq&#x;1 raises an important point. We defined equation M and used this expression in calculating P(nq, t), particularly for large t. Yet at this large t, our expression for f (t) is not accurate, because the mean has shifted and the population with original (relative) fitness (q &#x; 1)s is no longer growing exponentially at rate (q &#x; 1)s. Fortunately, the mutations that occur after the establishment of nq when the expression f(t) becomes inaccurate do not greatly affect its later population size, nq(t). In other words, the mutations that dominate the population nq happen early while nq&#x;1 is still accurately given by f(t). Yet one must also ask whether these mutations happen too early when f(t) is also not a good approximation for nq&#x;1(t) because the definition of Äq, which we used to define f(t), includes mutations and stochastic behavior that happen later . Fortunately, the mutations that matter from nq&#x;1 to nq do occur late enough that nq&#x;1 is accurately described by f(t). This can be checked by studying the behavior of Ä(t); we discuss this and related subtle issues in appendix g.

When q is too small, the approximations above are no longer justified. Whenever q 2, the growth rate of the subpopulation nq&#x;1 slows substantially during the period while the important mutations to nq are occurring. That is, nq&#x;1 saturates while nq is becoming established. Thus our analysis in this section is valid only for q  2. As we will see, this corresponds to large N. We discuss the q 2 case in the next section. However, it is the large-N, q  2 result that we are most interested in&#x;this is where there are typically many multiple mutations at once, and the behavior differs dramatically from the successional-mutations regime.

Throughout this section, we have asserted a steady state in which the mean fitness increases at the same rate as new mutations are established and have defined the lead in steady state to be qs. Yet we have not discussed the balance between mutation and selection that sets this steady state. We now turn to this question. Roughly speaking, we expect that in larger populations, elimination of less-fit clones takes longer, and more mutations can arise in this time, so the steady state q should rise.

The relationship between q and N can be obtained from Äq. As we have seen, immediately after the subpopulation at q becomes established, its size is equation M. The subpopulation at q &#x; 1 has size equation M, the subpopulation at q &#x; 2 has size equation M, and so on. All of the subpopulations must add up to size N; in practice the total is dominated by one or a few (compared to q) subpopulations so that we can equate N to the size of the largest subpopulation, the one whose fitness is closest to the mean fitness. Imposing this condition and assuming that all the Äq are on average &#x;Äq , we find

equation M


This is a transcendental equation for q, but because of the logarithmic dependence on q on the right-hand side it is easily solved by iteration. For most purposes, even the zeroth approximation,

equation M


is sufficiently accurate. To get higher accuracy one can plug this into the right-hand side of Equation

As expected, the value of q increases with N and also increases with Ub because when mutations happen more quickly there are more of them in the population at once. The dependence on s is more complicated, because increasing s both decreases the fixation time (leaving less time for additional mutations to occur) and increases the rate of mutations that establish (because it increases the establishment probability).

With the value of q determined self-consistently above (Equation 39), the mean fitness shifts by s in exactly the time &#x;Äq . Thus the corresponding distribution of the subpopulations is indeed a steady state (see appendix d for a discussion of fluctuations around this steady state and its stability). By plugging Equation 39 into the expression for &#x;Äq and substituting this into equation M, we can obtain the speed of evolution. Doing this using the lowest-order result in the iterative expansion for q (Equation 40), we find that the speed of evolution is roughly

equation M


valid provided q is reasonably large basically, when 2 ln Ns  equation M, which will tend to be true when equation M . If a more accurate result is needed, we can simply carry the iterative expansion for q to higher order.

The calculations above confirm the intuitive picture and results described in the heuristic analysis section above. The speed of evolution is determined by two mostly independent factors. One factor is the dynamics of the nose&#x;the feeding process from nq&#x;1 to nq that sets Äq. This process depends directly only on Ub and s; the only impact of N here is via its effect on the lead qs. The other factor is the dynamics of the already established populations. This is dominated by selection and hence depends directly only on N and s; the only role of mutation here is its role in setting q.

Our result is consistent with the fundamental theorem of natural selection, which states that the speed of evolution is equal to the variance of fitness in the population. To see this, we first note that the bulk of the fitness distribution is Gaussian. This is because a population with &#x; more (or less) mutations than the mean grows (or shrinks) as e&#x;st, and the mean shifts by 1 during every time interval Äq. This means that at the end of an interval, the number of individuals with &#x; mutations more or less than the mean is determined by its cumulative growth or decline over all these time intervals: equation M, a Gaussian distribution. We call the variance of this fitness distribution Ã2. The number of individuals that differ from the mean by ks is then roughly NÃ exp &#x;(ks)2/2Ã2 , and the fittest established population&#x;with k &#x; q&#x;will have of order equation M individuals. We therefore expect equation M. This means that if the fundamental theorem for natural selection holds, we expect equation M. And indeed, some algebra verifies that this yields the expression for v in Equation

The fundamental theorem of natural selection should apply whenever mutation can be neglected compared to selection. Since this is true in the bulk (i.e., away from the nose) of the fitness distribution, the correspondence between our result and the theorem is reassuring. The speed of evolution is equal to the variance in fitness, as usual. Thus our calculations can be viewed as an analysis of how much variance in fitness a population can maintain while at the same time this variation is being selected on. Yet nowhere did our analysis depend on the variance in fitness. Rather, the lead proved to be a more useful measure of the width of the fitness distribution, because it is the lead that is directly affected by new mutations at the nose. The variance is of course also increased by mutations, but only as a consequence of the dynamics of the lead and only after the new mutant populations have grown to substantial numbers. The key fact that the distribution is close to Gaussian out almost to the nose, which is many standard deviations above the mean, is indicative of the small significance of the region near the mean that controls the variance.

Evolution at moderate N:

In addition to the evolution at large N, we want to understand the crossover between small-N and large-N behavior. In this subsection, we explore this crossover.

For very small N, the successional-mutations regime obtains. In the heuristic analysis section, we noted that mutations take equation M generations to establish in this regime and then fix in a much shorter time. Thus evolution is mutation limited, and we have v &#x; NUbs2. It is instructive to redo this calculation using the machinery we developed for the large-N case. To do this, we must replace the exponential form for f(t). As before, we take the establishment time of the mutation at (q &#x; 1)s to be t 0. Of course, here q 1 so (q &#x; 1)s 0. In this regime, each mutant fixes soon after becoming established. For the purposes of the next establishment, we can therefore approximate the population at (q &#x; 1)s by

equation M


where ¸(t) 1 for t  0 and 0 otherwise. We substitute this form of f(t) into H and integrate and take the inverse Laplace transform of the result to obtain

equation M


This gives equation M, so the velocity v &#x; NUbs2, as expected.

We now turn to the intermediate regime. For NUb comparable to equation M, the fixation time is not short compared to the establishment time. Thus we cannot use f(t) N¸(t). At the same time, the establishment time is not so short compared to the fixation time that saturation in the feeding population is unimportant (the large-N case we have focused on thus far). We therefore need to consider the case of a growing and saturating population feeding another. We assume that the single-mutant population always fixes before the triple-mutant population establishes, so that we have to consider only two deterministic clones and one stochastic clone in the population (i.e., q between 1 and 2). The dynamics of the single-mutant population a time t after it establishes are given by

equation M


Note that f(t) initially grows as e(q&#x;1)st, with q 2, but later slows to e(q&#x;&#x;1)st with q&#x; 1 (i.e., it becomes approximately constant). This slowing occurs over a time interval of order 1/s, which is much smaller than the establishment times and is thus effectively a sharp transition. The behavior of the feeding population is thus roughly equivalent to having q between 1 and 2. The stochastic population that it feeds initially grows at rate qs with q 2. The establishment of this stochastic population occurs at a time Ä2 when, roughly,

equation M


with c of order unity. This yields

equation M


A more careful analysis (analogous to the earlier calculations of Äq) that takes into account the distribution of Ä2 yields a result that is the same as the above simple argument but with a factor of order unity inside ln Ns, which is a small correction over the whole range of validity. While in general c will depend on the detailed birth and death processes, and the speed of evolution in the successional mutations regime will be proportional to c, for the dynamics we have analyzed throughout, c 1. We use this below. For equation M, we obtain

equation M


which crosses smoothly&#x;and simply!&#x;over from the successional-mutations behavior for equation M but equation M to equation M, which is just the result we obtain for q 2. When NUb becomes of order unity, from the above expression we have equation M. For equation M the behavior is well into the multiple-mutations regime we analyzed earlier, and the results obtained for general noninteger q  2 apply. The two sets of results match together for Ns &#x; s/Ub, up to order-unity factors inside logarithms of Ns and of s/Ub. An example of the crossover between the two regimes is shown in Figure 5a.

An external file that holds a picture, illustration, etc.
Object name is GENf5.jpg

Open in a separate window

Figure 5.&#x;

Comparisons between simulations and our theoretical predictions for the mean speed of adaptation v (measured in increase in fitness per generation, × 105). (a) Speed of adaptation v vs. log10 N for Ub 10&#x;5 and s Both the large-N (Equation 41) and the moderate-N (Equation 47) theoretical results are shown in their regimes of validity, which are above and below N &#x; 1/Ub, respectively (the crossover between the two regimes is indicated). (b) v vs. log10 Ub for N 106 and s (c) v vs. log10 s for N 106 and Ub 10&#x;5. Each simulation result shown is the mean v between generations and of the simulation, averaged over 30 independent runs. Beginning the average at generations ensures that in all cases the evolution has reached the steady-state mutation-selection balance (as verified from the time-dependent simulation data, not shown).


So far, our analysis has assumed that the mutation&#x;selection balance has already been reached. If a population starts with an arbitrary distribution of fitnesses, it will gradually approach the steady-state distribution. A full analysis of this is beyond the scope of this article, but in this section we provide an outline of the important effects and briefly describe a method for analyzing this transient behavior. We focus on the case where the population is initially monoclonal. Other starting fitness distributions can be analyzed using similar methods. We consider the large-N concurrent-mutations regime (in the successional-mutations regime the monoclonal population is already essentially in steady state).

Starting from a monoclonal population, we can calculate the dynamics of the single-mutant subpopulation that arises by using the small-N results above, since here too the feeding population is f(t) N¸(t). It would now be tempting to assume that this single-mutant population just grows exponentially at rate s after first becoming established. We could then immediately import our previous results for the establishment time of the double-mutant population, Ä2, triple-mutant population, Ä3, and so on. We could then assume that all these populations establish in order until the qth population, at which point the steady state would be reached.

Unfortunately, this is wrong, for two reasons. First, the single-mutant population grows faster than exponentially at rate s because it is receiving mutations from the still-large wild-type population. Because of this, the double-mutant population establishes more quickly than the steady-state Ä2 and then itself grows faster than exponentially with rate 2s because it is receiving more mutants from the fast-growing single-mutant population. This then affects the triple mutants, and so on. The second complication is that the mean fitness does not stay at the wild-type value until the qth mutation has established, so it takes more than q establishments to reach steady state.

Rather than attempt to find a closed-form analytical result, we discuss here an algorithmic solution to the transient dynamics. We proceed in steps. First, we calculate the lead from the current fitness distribution. On the basis of this, we calculate the next establishment time (interpolating if the lead changes during this period because of an increase in the mean fitness). We then calculate the new fitness distribution and the new lead and repeat the process.

When calculating the establishment times, we must remember that the feeding populations are not necessarily growing as simple exponentials. Earlier we used the establishment time Äp to approximate the population size of np as equation M. We noted that this is inaccurate while np equation M, because it includes both future mutations from np&#x;1 to np and future stochasticity. Since we have used this form of np(t) to calculate the establishment time of the next more-fit subpopulation, this approximation for np(t) must be accurate by the time the mutations that lead to the subsequent establishment occur. In the steady-state case, this holds, as shown in appendix g. However, for the transient dynamics it is not always correct.

This problem is most serious for the single-mutant population, which we consider now. The wild-type population has roughly constant size N during the period when the single-mutant population is rare. This means that the single-mutant population grows on average as

equation M


This reaches size equation M after a time of order equation M generations. However, the inferred establishment time (by extrapolating backward) is equation M generations. This is substantially negative because mutations that occur well after the population reaches size equation M contribute significantly to n1. The approximation we used before would be to take equation M in calculating the establishment time of the double-mutant population Ä2. But using the correct form of n1, we find that the first double mutants occur roughly at time equation M. Thus when equation M, double mutants do not occur until our usual approximation for n1 becomes reasonable. We can therefore use our previous calculation of the establishment time Ä2 from the steady-state analysis above. All future establishment times (i.e., Ä3 for the triple mutants, etc.) can similarly be imported directly from the steady-state calculations. However, when equation M, we must use the correct form of n1 to calculate Ä2 and n2. In this case, n2 will also grow faster than our usual approximation equation M would predict. We must therefore repeat this procedure to consider whether it is reasonable to calculate Ä3 on the basis of our usual approximation or whether we need to use the more complex form for n2. However, this effect is much weaker than for n1; it matters only if NUb is much larger than in the previous condition. If it does matter, we must again ask if the more complex form for n3 will be important in calculating Ä4; this will matter only if NUb is larger yet. In practice, in comparing with previous experiments we have found that considering the complex form of n1 in calculating Ä2 is sometimes necessary, but all future establishments can be calculated using the steady-state large-N results (Desaiet al. ), because in these experimental situations q is never much larger than 4.

A second subtlety in the above algorithmic approach is the way in which the mean fitness changes; it does not increase in evenly spaced steps of size s as it would in steady state. For example, the double-mutant subpopulation can become established soon after the single-mutant subpopulation does. Then, as it grows twice as fast, it will outcompete the single-mutant subpopulation while both are still rare. We call such an event a jump, since it will lead to a jump in the mean fitness by 2s when the double mutants become the dominant subpopulation. Of course, it is also possible that the triple mutants will jump past the double mutants or that the double mutants will jump the singles, and then the quadruple mutants will jump the triples, etc. These effects can lead to complex dynamics of the mean fitness before the steady state is established. However, given the establishment times of the various populations, the time dependence of the mean fitness is straightforward to calculate from the deterministic dynamics of the competing subpopulations that are growing exponentially.

Putting all these effects together, we can construct an algorithmic solution for the transient dynamics. We calculate the first establishment time and note at what time this new subpopulation will change the mean fitness. We then calculate the next establishment time and again the implied future effects on mean fitness (modifying previous such results if jumping events will occur). We continue to repeat this process. When the mean fitness changes, we note how this changes the lead and adjust the establishment times appropriately. We iterate this process until the steady-state lead, qs, is reached. Even after that there can be some lingering effects of the transient, as the rest of the fitness distribution may not yet have reached the steady-state Gaussian profile. Yet soon thereafter the steady-state behavior is indeed reached.

Rather than using this algorithmic approach, it is also possible to use a deterministic approximation for the transient behavior. Starting from a monoclonal population, the timing of the first few establishments is given accurately by a deterministic approximation. However, this typically cannot give us the full transient dynamics, because stochastic effects at the nose become important once the fitness distribution grows to a substantial width, which usually occurs before the transient regime is over. This deterministic approach is also less versatile, as it is valid only for some starting distributions.

The transient behavior can be quite important. During the transient phase, the accumulation of beneficial mutations proceeds more slowly than in the steady state, because after the first few establishments, but before the steady state is reached, the lead will be ps with establishment interval Äp Äq (since p q). Thus a clonal population will accumulate beneficial mutations slowly at first, before the rate of accumulation gradually increases to its steady-state rate. This slower transient phase lasts a substantial time&#x;longer than it takes to accumulate q mutations once the steady state has been established, again because Äp Äq for p q (and, as noted above, in fact it can take more than q establishments to reach the steady state). While this section provides a rough sketch of the behavior, a detailed analysis of these transient effects remains an important topic for future work.


Our simplest model neglects deleterious mutations. But deleterious mutations can alter the dependence of v on the mutation rate (and on N), because increasing Ub typically comes at the cost of also increasing the deleterious mutation rate. This has proved an important consideration in clonal interference analyses (Orr ; Johnson and Barton ). In this section, we consider qualitatively and semiquantitatively various effects of deleterious mutations in the simple model in which all the beneficial mutations have the same s. The effects of deleterious mutations of size s in this model have been studied by Rouzineet al. (). Here we discuss briefly the effects of deleterious mutations of various sizes, but leave detailed analysis for future work.

It is useful to separate the effects of deleterious mutations into their impact on the dynamics of the bulk of the distribution (and hence the mean fitness) and their effects on the establishment of new most-fit clones at the nose. In the bulk of the distribution, deleterious mutations come to a deterministic mutation&#x;selection balance that alters the shape of the fitness distribution and reduces the mean fitness. This effect actually speeds up the evolution: if the deleterious mutations had no effect at the nose, their impact in reducing the mean fitness would increase the lead and thus make new establishments at the front occur faster. But deleterious mutations at the nose have the opposite effect: they slow down the growth of the most-fit populations and decrease the fitness of some of these individuals, reducing the rate at which new more-fit individuals establish.

In understanding these effects, it is useful to consider large-effect and small-effect deleterious mutations separately. First we consider deleterious mutations whose cost sd  s. When a deleterious mutation with equation M occurs at the nose, that individual is no longer at the nose. Thus the deleterious mutations reduce the effective growth rate just at the nose. If equation M is the mutation rate to deleterious mutations with equation M, then the growth rates of subpopulations at the nose are simply reduced by equation M. The effect of deleterious mutations on the mean fitness is also simple, because the mean fitness of the population is dominated by the largest subpopulation (which is exponentially larger than all others). Thus in considering the effect of the deleterious mutations on the mean fitness, we can focus on their impact in this subpopulation. This remains the largest subpopulation for equation M generations, which for sd  s is larger than equation M. Thus it comes to a deleterious mutation&#x;selection balance while it is largest, since this balance is obtained in equation M generations. This means that the deleterious mutations reduce the mean fitness by equation M (up to small corrections due to the dynamics and the other subpopulations). This reduction in the mean fitness effectively increases the lead by equation M, which increases the growth rates at the nose by the same amount. This cancels the effect of the deleterious mutations at the nose. Thus deleterious mutations with equation M have very little net effect on v: they do not change the rate of new establishments at the nose, up to the small corrections noted above. This is not surprising&#x;the deleterious mutants are all doomed, so roughly speaking their effect is simply to reduce the effective fitness of all individuals equally, which has no net effect on v. But they do increase the lead qs, which changes the shape of the fitness distribution.

For weakly deleterious mutations with equation M, which occur at mutation rate equation M, the effects are more complicated. In this case, the fact that an individual at the nose has a deleterious mutation does not make it substantially less likely to be the source of a new nose-extending mutation. Thus the effective growth rates at the nose are unaffected by deleterious mutations. However, some nose-extending mutations will occur in individuals with one or more deleterious mutations and hence will not necessarily extend the nose by s. Instead, they will sometimes have an effect s &#x; sd, or s &#x; 2sd, or less. We can estimate the strength of this effect by using a deterministic approximation for the deleterious mutation accumulation at the nose. When equation M (or, roughly, when equation M), we find that on average, nose-extending mutations are burdened by a deleterious load of equation M. Thus the effect of the deleterious mutations at the nose is to reduce the effective s by the amount equation M, which is small compared to s. This will tend to slow the evolution. An analogous calculation applies when equation M; here the deleterious mutations have a larger effect, but still produce an average fitness cost only at most of order sd. The effect of the deleterious mutations on the bulk of the distribution is again to reduce the mean fitness of the population. The amount of this reduction, however, does not depend only on the most-fit subpopulation as before, because equation M. Rather, these small-effect deleterious mutations accumulate throughout the collective-sweep time, qs/v &#x; ln(s/Ub)/s, in which a subpopulation grows from being the lead population to the dominant population. We expect this effect to be largest relative to the effects of these deleterious mutations on the dominant subpopulations when 1/sd is of order the collective-sweep time. This effect reduces the mean fitness by an amount at most of order equation M. This again speeds the evolution and partially cancels the slowing effect at the nose. Thus deleterious mutations with equation M affect v by increasing the effective lead by of order equation M and reducing the effective s by equation M (when equation M) or by of order sd (when equation M is larger). These effects are all small.

To analyze in more detail the quantitative effects of deleterious mutations (even in the simplest single-beneficial-s model) is beyond the scope of this article. Note in particular that the analysis in this section is invalid when the deleterious mutation rate is large enough that the deterministic approximation for their behavior at the nose becomes incorrect. In this regime&#x;on the border between Muller's ratchet and adaptive evolution&#x;a more careful analysis is needed. We leave this discussion, which is essential for understanding the dependence of the rate of evolution on the mutation rate when mutation rates become large, for future work.


Our analysis involves a number of approximations. While we have analyzed their validity above and in the appendixes, we also used computer simulations to test our results. In this section, we describe these simulations and the comparisons to our results.

We started our computer simulations with a clonal population with a birth and death rate of 1 and a mutation rate of Ub. We arbitrarily defined this population to have fitness 0. We divided time into small increments. At each increment, we first calculated the average fitness equation M and then produced births, deaths, and mutations with the appropriate probabilities. The birth rate of individuals at fitness y was set to be equation M (with equation M always small compared to unity), their death rate 1, and the mutation rate Ub. We then repeated this process to simulate the population dynamics, providing a full stochastic simulation of the simplest constant-s, beneficial-only model analyzed above. We recorded the mean fitness and lead as a function of time and, for each set of parameters, measured the average v and q once past the initial transient regime.

We carried out these simulations at a variety of different parameter values. The match between simulations and our theoretical results was good, provided the conditions for the validity of the concurrent-mutations regime obtained. Examples of these comparisons are shown in Figures 5 and 6. In Figure 5, we show the theoretical predictions for the average speed of adaptation (using the lowest-order iterative result for v presented in Equation 41) compared to simulation results as a function of N, Ub, and s. In Figure 6, we show similar comparisons for the average lead q (again using the lowest-order iterative result for our theoretical predictions). The agreement is good in both cases, although our theory slightly underestimates both v and q. This may be due to the effects of fluctuations in Äq (described in appendix d) slightly increasing the mean v and q because of their nonlinear effects or to other factors arising from ln(s/Ub) not being sufficiently large for the asymptotic results to obtain to this accuracy.

An external file that holds a picture, illustration, etc.
Object name is GENf6.jpg

Open in a separate window

Figure 6.&#x;

Comparisons between simulations and our theoretical predictions for the mean q. (a) q vs. log10 N for Ub 10&#x;5 and s (b) q vs. log10 Ub for N 106 and s (c) q vs. log10 s for N 106 and Ub 10&#x;5. All the simulation results are averages of 30 independent simulations, after the steady state has been established, as in Figure 5.


The simple model we have analyzed assumes that all beneficial mutations confer the same advantage s. But in most natural situations different beneficial mutations will have different fitness effects. This does not change the basic dynamics of adaptation in large asexual populations: many beneficial mutations still occur before earlier ones have fixed and these can help or interfere with each other's fixation (Figure 1b). And the successful mutant lineages are likely to have had multiple beneficial mutations before they fix, while many other mutations will be wasted when other lineages outcompete them.

Thus far we have focused on how beneficial mutations are wasted because they occur in individuals who are not very fit (i.e., away from the nose) and are therefore handicapped by their poor genetic background. But when beneficial mutations have a variety of different effects, there is another way they can be wasted: small-effect mutations can be outcompeted by larger mutations that occur in the same or a similar genetic background. We refer to this latter process as clonal interference. As before, we use the term clonal interference to refer to this latter effect only (despite some broader definitions in the literature), consistent with the focus of recent work on the subject. This can occur only when not all mutations have the same fitness increment and is thus absent in the simple constant-s model.

Recent work by Gerrish and Lenski () and others (Orr ; Gerrish ; Johnson and Barton ; Kim and Stephan ; Campos and De Oliveira ; Wilke ; Kim and Orr ) has taken the opposite approach to the multiple constant-s mutations approximation and focused instead on the effects of clonal interference, while ignoring multiple mutations. In this section, we first summarize the conclusions of such analyses, which assume all mutations occur on the same genetic background. We then consider the effects of including both clonal interference and multiple mutations. As we will argue, whenever the former plays a significant role, so does the latter.

The now-conventional clonal interference analysis considers how small-effect mutations can be outcompeted by larger mutations. Specifically, if a mutation A with fitness sA becomes established, one considers the probability that another mutation B, with effect sB  sA, will also become established before mutation A has fixed. If this happens, mutation B drives A to extinction and mutation A is thus wasted. Of course, it is also possible that mutation B is subsequently outcompeted by a still fitter mutation C, and so on. The key approximation is that the largest mutation that occurs and is not outcompeted by a still larger one fixes, becomes the new wild type, and the process then repeats. Additional mutations that might occur in a lineage that already has mutation A, B, or C are ignored. For any fixed population size, there is some selective advantage, sci, such that sufficiently large mutations, those with s  sci, are rare enough that they are unlikely to occur before some less-fit mutation arises and fixes. In the conventional clonal interference analysis, it is assumed that a mutation of size around sci will thereby fix before any others, and the process will then repeat. This is equivalent to successional-mutation behavior with a set of mutations each with the same strength, sci. Since sci increases with the population size, more mutations are wasted in larger populations, implying that v increases less than linearly with NUb.

Before discussing the problems with the basic successional-fixation assumption, we consider how the characteristic sci depends on N and on the distribution of selective advantages, Á(s)ds. Because only beneficial mutations with substantial s matter for large N, the total Ub itself is not important. It is more convenient to use the mutation rate per generation for mutations in a range ds about s:

equation M


We assume that large-effect beneficial mutations are typically much less common than small-effect ones, so that ¼(s) is small and decreases rapidly with s. Since ¼(s) UbÁ(s) is dimensionless, it is convenient to define (s) by

equation M


Note that equation M increases with s. Mutations with effect of order s occur at an overall rate of order s¼(s) se&#x; (s), so (s) roughly plays the role that ln(s/Ub) does in the single-s case.

The basic clonal interference analysis is simple: in the time that a mutation of size sA will take to fix, equation M, some mutation of larger size s will have time to occur and become established as long as the total establishment rate for mutations larger than sA is sufficiently large:

equation M


This will no longer be true above some critical sci, where equation M. We can estimate this sci by noting that since ¼(s) decreases rapidly with s, equation M. We find

equation M


Using the definition of , we see that sci(N) is the value of s at which in the whole population there is of order one mutation per generation. Further, because ¼(s) UbÁ(s), we see that sci depends only on the product NUb, with the functional form determined by Á(s). In the successional clonal interference analysis approximation, the speed of evolution is assumed to be the size of these mutations, sci, times the rate at which mutations of order this effect occur, sci¼(sci), times the probability that they become established, sci. This yields

equation M


where C is a factor of order unity that is not really obtainable from clonal interference analysis, as it depends on the details of further approximations. (Note that the details of how we define fixation do not make much difference in the clonal interference result. We have also ignored other factors inside logarithms, since equation M.) At this point we should note that various potential improvements are possible. In particular, it is not at all clear why the establishment time rather than the fixation time should be used to obtain the accumulation rate of the sci mutations. As we shall see below, if the latter rather ad hoc assumption is made instead, the clonal interference analysis gives closer to the correct results for certain distributions: those with long tails in Á(s). But with or without such improvements, some of the predictions of clonal interference analysis are qualitatively wrong&#x;in particular, the prediction that as the overall beneficial mutation rate increases, the typical size of the mutations that fix (predicted to be sci) also increases. As we shall see, the opposite is true.

The above clonal interference analysis makes a crucial approximation that is essentially never valid: that double mutants can be ignored even when mutations are common enough that they often interfere. This is manifest in the assumption that the important mutations occur only in the majority (wild-type) population. The basic problem is that even if a more-fit mutation B occurs before an earlier but less-fit mutation A fixes, A may still survive. An individual with A can get another mutation D such that the A&#x;D double mutant is fitter than B. If this happens, mutation A (along with D) can fix after all. Indeed, such events should be expected: any population large enough for clonal interference to matter is also large enough for double mutants to routinely appear even for s sci. This is because clonal interference can affect the fixation of a mutation of size s only when the establishment rate of mutations stronger than s, which is at least equation M, is large compared to the rate at which the mutation of size s fixes, equation M. But when this occurs, we have equation M. Thus, from our analysis of the single-s model, whenever clonal interference occurs, multiple mutations also play a role.

The single-s model, in contrast, is unrealistic because it explicitly excludes competition between mutations of different effects. Thus the conclusions from this model and the clonal interference analysis are each only part of the story. In the remainder of this section, we outline the behavior for more general distributions of beneficial mutations, taking into account both clonal interference and multiple mutations. Fortunately, as we shall see, for many forms of ¼(s), the single-s

Mutations - Genetics - Biology - FuseSchool


Created by: CK/Adapted by Christine Miller

Mutant Cosplay

You probably recognize these costumed comic fans in Figure as two of the four Teenage Mutant Ninja Turtles. Can a mutation really turn a reptile into an anthropomorphic superhero? Of course not — but mutations can often result in other drastic (but more realistic) changes in living things.

 are random changes in the sequence of bases in  or . The word mutation may make you think of the Ninja Turtles, but that&#;s a misrepresentation of how most mutations work. First of all, everyone has mutations. In fact, most people have dozens (or even hundreds!) of mutations in their DNA. Secondly, from an evolutionary perspective, mutations are essential. They are needed for evolution to occur because they are the ultimate source of all new genetic variation in any .

Mutations have many possible causes. Some mutations seem to happen spontaneously, without any outside influence. They occur when errors are made during DNA replication or during the transcription phase of protein synthesis. Other mutations are caused by environmental factors. Anything in the environment that can cause a mutation is known as a . Examples of mutagens are shown in the figure below.

Examples of Radiation, chemicals and infectious agents: An mage of a sun icon and hand x-ray for UV and x-ray radiation; a picture of hands holding a cigarrette and a vape, 3 smokies on a grill (nitrates/ nitrites and mutagenic BBQ chemicals) and a stylized image of a woman in a green acne face mask with benzoyl peroxide to represent chemicals. To represent infectious agents: an orange spherical virus as human papillomavirus (HPV) and a purple spirilla bacterium with flagella for Helicobacter Pylori - a bacteria spread through contaminated food.

Types of Mutations

Mutations come in a variety of types. Two major categories of mutations are germline mutations and somatic mutations.

  • occur in gametes (the sex cells), such as eggs and sperm. These mutations are especially significant because they can be transmitted to offspring, causing every cell in the offspring to carry those mutations.
  • occur in other cells of the body. These mutations may have little effect on the organism, because they are confined to just one cell and its daughter cells. Somatic mutations cannot be passed on to offspring.

Mutations also differ in the way that the genetic material is changed. Mutations may change an entire chromosome, or they may alter just one or a few nucleotides.

Chromosomal Alterations

 are mutations that change chromosome structure. They occur when a section of a chromosome breaks off and rejoins incorrectly, or otherwise does not rejoin at all. Possible ways in which these mutations can occur are illustrated in the figure below. Chromosomal alterations are very serious. They often result in the death of the organism in which they occur. If the organism survives, it may be affected in multiple ways. An example of a human disease caused by a chromosomal duplication is Charcot-Marie-Tooth disease type 1 (CMT1). It is characterized by muscle weakness, as well as loss of muscle tissue and sensation. The most common cause of CMT1 is a duplication of part of chromosome


A  is a change in a single nucleotide in DNA. This type of mutation is usually less serious than a chromosomal alteration. An example of a point mutation is a mutation that changes the codon UUU to the codon UCU. Point mutations can be silent, missense, or nonsense mutations, as described in Table The effects of point mutations depend on how they change the genetic code.

Silentmutated codon codes for the same CAA (glutamine) → CAG (glutamine)none
Missensemutated codon codes for a different amino acidCAA (glutamine) → CCA (proline)variable
Nonsensemutated codon is a premature stop codonCAA (glutamine) → UAA (stop) usuallyserious

Frameshift Mutations

A  is a deletion or insertion of one or more nucleotides, changing the of the base sequence. Deletions remove nucleotides, and insertions add nucleotides. Consider the following sequence of bases in RNA:

AUG-AAU-ACG-GCU = start-asparagine-threonine-alanine

Now, assume that an insertion occurs in this sequence. Let’s say an A nucleotide is inserted after the start codon AUG. The sequence of bases becomes:

AUG-AAA-UAC-GGC-U = start-lysine-tyrosine-glycine

Even though the rest of the sequence is unchanged, this insertion changes the reading frame and, therefore, all of the codons that follow it. As this example shows, a frameshift mutation can dramatically change how the codons in mRNA are read. This can have a drastic effect on the  product.

The majority of  have neither negative nor positive effects on the organism in which they occur. These mutations are called neutral mutations. Examples include silent point mutations, which are neutral because they do not change the amino acids in the proteins they encode.

Many other mutations have no effects on the organism because they are repaired before protein synthesis occurs. Cells have multiple repair mechanisms to fix mutations in DNA.

Beneficial Mutations

Some mutations — known as beneficial mutations — have a positive effect on the organism in which they occur. They generally code for new versions of proteins that help organisms adapt to their environment. If they increase an organism’s chances of surviving or reproducing, the mutations are likely to become more common over time. There are several well-known examples of beneficial mutations. Here are two such examples:

  1. Mutations have occurred in bacteria that allow the bacteria to survive in the presence of antibiotic drugs, leading to the evolution of antibiotic-resistant strains of bacteria.
  2. A unique mutation is found in people in Limone,  a small town in Italy. The mutation protects them from developing atherosclerosis, which is the dangerous buildup of fatty materials in blood vessels despite a high-fat diet. The individual in which this mutation first appeared has even been identified and many of his descendants carry this gene.

Harmful Mutations

Imagine making a random change in a complicated machine, such as a car engine. There is a chance that the random change would result in a car that does not run well — or perhaps does not run at all. By the same token, a random change in a gene&#;s DNA may result in the production of a protein that does not function normally&#; or may not function at all. Such mutations are likely to be harmful. Harmful mutations may cause genetic disorders or .

  • A genetic disorder is a disease, syndrome, or other abnormal condition caused by a mutation in one or more genes, or by a chromosomal alteration. An example of a genetic disorder is cystic fibrosis. A mutation in a single gene causes the body to produce thick, sticky mucus that clogs the lungs and blocks ducts in digestive organs.
  • is a disease in which cells grow out of control and form abnormal masses of cells (called tumors). It is generally caused by mutations in genes that regulate the . Because of the mutations, cells with damaged are allowed to divide without restriction.

Inherited mutations are thought to play a role in roughly five to ten per cent of all cancers. Specific mutations that cause many of the known hereditary cancers have been identified. Most of the mutations occur in genes that control the growth of cells or the repair of damaged DNA.

Genetic testing can be done to determine whether individuals have inherited specific cancer-causing mutations. Some of the most common inherited cancers for which genetic testing is available include hereditary breast and ovarian cancer, caused by mutations in genes called BRCA1 and BRCA2. Besides breast and ovarian cancers, mutations in these genes may also cause pancreatic and prostate cancers. Genetic testing is generally done on a small sample of body fluid or tissue, such as blood, saliva, or skin cells. The sample is analyzed by a lab that specializes in genetic testing, and it usually takes at least a few weeks to get the test results.

Should you get genetic testing to find out whether you have inherited a cancer-causing mutation? Such testing is not done routinely just to screen patients for risk of cancer. Instead, the tests are generally done only when the following three criteria are met:

  1. The test can determine definitively whether a specific gene mutation is present. This is the case with the BRCA1 and BRCA2 gene mutations, for example.
  2. The test results would be useful to help guide future medical care. For example, if you found out you had a mutation in the BRCA1 or BRCA2 gene, you might get more frequent breast and ovarian cancer screenings than are generally recommended.
  3. You have a personal or family history that suggests you are at risk of an inherited cancer.

Criterion number 3 is based, in turn, on such factors as:

  • Diagnosis of cancer at an unusually young age.
  • Several different cancers occurring independently in the same individual.
  • Several close genetic relatives having the same type of cancer (such as a maternal grandmother, mother, and sister all having breast cancer).
  • Cancer occurring in both organs in a set of paired organs (such as both kidneys or both breasts).

If you meet the criteria for genetic testing and are advised to undergo it, genetic counseling is highly recommended. A genetic counselor can help you understand what the results mean and how to make use of them to reduce your risk of developing cancer. For example, a positive test result that shows the presence of a mutation may not necessarily mean that you will develop cancer. It may depend on whether the gene is located on an autosome or sex chromosome, and whether the mutation is dominant or recessive. Lifestyle factors may also play a role in cancer risk even for hereditary cancers. Early detection can often be life saving if cancer does develop. Genetic counseling can also help you assess the chances that any children you may have will inherit the mutation.

  • are random changes in the sequence of bases in or . Most people have multiple mutations in their DNA without ill effects. Mutations are the ultimate source of all new genetic variation in any species.
  • Mutations may happen spontaneously during or . Other mutations are caused by environmental factors called . Mutagens include radiation, certain chemicals, and some infectious agents.
  • occur in gametes and may be passed onto offspring. Every cell in the offspring will then have the mutation. occur in cells other than gametes and are confined to just one cell and its daughter cells. These mutations cannot be passed on to offspring.
  • are mutations that change chromosome structure and usually affect the organism in multiple ways. Charcot-Marie-Tooth disease type 1 is an example of a chromosomal alteration in humans.
  • are changes in a single nucleotide. The effects of point mutations depend on how they change the genetic code and may range from no effects to very serious effects.
  • change the reading frame of the genetic code and are likely to have a drastic effect on the encoded protein.
  • Many mutations are neutral and have no effect on the organism in which they occur. Some mutations are beneficial and improve fitness. An example is a mutation that confers antibiotic resistance in bacteria. Other mutations are harmful and decrease fitness, such as the mutations that cause genetic disorders or .
  1. Define mutation.
  2. Identify causes of mutation.
  3. Compare and contrast germline and somatic mutations.
  4. Describe chromosomal alterations, point mutations, and frameshift mutations. Identify the potential effects of each type of mutation.
  5. Why do many mutations have neutral effects?
  6. Give one example of a beneficial mutation and one example of a harmful mutation.
  7. Why do you think that exposure to mutagens (such as cigarette smoke) can cause cancer?
  8. Explain why the insertion or deletion of a single nucleotide can cause a frameshift mutation.
  9. Compare and contrast missense and nonsense mutations.
  10. Explain why mutations are important to evolution.


How Radiation Changes Your DNA, Seeker,

Where do genes come from? &#; Carl Zimmer, TED-Ed,

What you should know about vaping and e-cigarettes | Suchitra Krishnan-Sarin,



Ninja Turtles by Pat Loika on Flickr is used under a CC BY ( license.


Examples of Mutagens by Christine MIller is used under a CC BY SA ( license.
Separate images are all in public domain or CC licensed:


Scheme of possible chromosome mutations/ Chromosomenmutationen by unknown on Wikimedia Commons is adapted from NIH&#;s Talking Glossary of Genetics. [Changes as described by de:user:Dietzel65]. Further use and adapation (text translated to English) by Christine Miller as image is in the public domain (



Seeker. (, April 23). How radiation changes your DNA. YouTube.

TED. (, June 5). What you should know about vaping and e-cigarettes | Suchitra Krishnan-Sarin. YouTube.

TED-Ed. (, September 22). Where do genes come from? &#; Carl Zimmer. YouTube.

Wikipedia contributors. (, July 6). Breast cancer. In Wikipedia.

Wikipedia contributors. (, July 9). Charcot–Marie–Tooth disease. In Wikipedia.

Wikipedia contributors. (, July 7). Cystic fibrosis. In Wikipedia.

Wikipedia contributors. (, June 4). Limone sul Garda. In Wikipedia.

Wikipedia contributors. (, June 23). Ovarian cancer. In Wikipedia.

Wikipedia contributors. (, May 7). BRCA mutation. In Wikipedia.

Wikipedia contributors. (, July 10). Teenage Mutant Ninja Turtles. In Wikipedia.



Now discussing:

Do all gene variants affect health and development?

No; only a small percentage of variants cause genetic disorders—most have no impact on health or development. For example, some variants alter a gene's DNA sequence but do not change the function of the protein made from the gene.

Often, gene variants that could cause a genetic disorder are repaired by certain enzymes before the gene is expressed and an altered protein is produced. Each cell has a number of pathways through which enzymes recognize and repair errors in DNA. Because DNA can be changed or damaged in many ways, DNA repair is an important process by which the body protects itself from disease.

A very small percentage of all variants actually have a positive effect. These variants lead to new versions of proteins that help an individual better adapt to changes in his or her environment. For example, a beneficial variant could result in a protein that protects an individual and future generations from a new strain of bacteria.

Because a person's genetic code can have many variants with no effect on health, diagnosing genetic disorders can be difficult.

When determining if a gene variant is associated with a genetic disorder, the variant is evaluated using scientific research to date, such as information on how the variant affects the function or production of the protein that is made from the gene and previous variant classification data. The variant is then classified on a spectrum based on how likely the variant is to lead to the disorder.

Gene variants, as they relate to genetic disorders, are classified into one of five groups:

  • Pathogenic: The variant is responsible for causing disease. There is ample scientific research to support an association between the disease and the gene variant. These variants are often referred to as mutations.
  • Likely pathogenic: The variant is probably responsible for causing disease, but there is not enough scientific research to be certain.
  • Variant of uncertain significance (VUS or VOUS): The variant cannot be confirmed to play a role in the development of disease. There may not be enough scientific research to confirm or refute a disease association or the research may be conflicting.
  • Likely benign: The variant is probably not responsible for causing disease, but there is not enough scientific research to be certain.
  • Benign: The variant is not responsible for causing disease. There is ample scientific research to disprove an association between the disease and the gene variant.

Evaluation needs to be done for each variant. Just because a gene is associated with a disease, does not mean that all variants in that gene are pathogenic. Additionally, evaluation of a variant needs to be done for all diseases with which it is thought to be associated. A variant that is pathogenic for one disease, is not necessarily pathogenic for a different disease. It is important to re-evaluate variants periodically; the classification of a variant can change over time as more information about the effects of variants becomes known through additional scientific research.

Scientific journal article for further reading

Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, Grody WW, Hegde M, Lyon E, Spector E, Voelkerding K, Rehm HL; ACMG Laboratory Quality Assurance Committee. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. May;17(5) doi: /gim Epub Mar 5. PMID: ; PMCID: PMC


228 229 230 231 232