HomeWriting Links Resources


An Introduction to the History and Basic Scientific Concepts in Evolution

Alec MacAndrew



This is a short, basic introduction to the history and the scientific concepts around the Theory of Evolution.  Those who are entirely new to the topic can read from beginning to end and gain a general understanding of the subject.  I hope that it can also be used as a reference by those wishing to brush up on particular aspects of the subject.  I have attempted to link into other sources of information, either on this site or elsewhere that cover the subject in far greater detail.

I start with a bit of history, which also builds up an understanding of the various concepts involved in evolution; then I cover the fundamentals of the science of genetics and elementary molecular biology; link that into the way we understand evolution to progress, and finish with some of the topics scientists are currently wrestling with and some of the more recent thinking in this area which might be of some interest to those for whom the rest is outrageously elementary.

The topics covered are:

Darwinism - The Origin of Species:  An introduction to Natural Selection as the mechanism by which evolution occurs

Genetics and Gregor Mendel:  Genetics - the rules of inheritance

The Modern Sythesis: Bringing genetics and the mechanism of Natural Selection together

The Structure of DNA:  Unravelling the genetic code

How does the Genetic Code work?: The mechanism by which the code controls the synthesis of protein in the cells

How is the DNA stored in the cells?: The way the DNA is stored in the cells; chromosomes and cell division; DNA in reproduction

The genetic code is a recipe not a blueprint: Complexity in the code means very few one to one relationships between genes and functions

Mutation, variation and creation of new genetic information:  How  variation arises for Natural Selection to work on

Speciation:  The mechanisms by which new species arise

The major steps leading from bacteria to Man: The fossil record reveals the major steps in the evolutionary process

Modern Issues and Ideas:  The direction that evolutionary theory is headed now.


Darwinism - The Origin of Species

In 1859 Darwin published The Origin of Species, the book that explained in full detail the hypothesis that he and Alfred Wallace had separately conceived. Darwin and Wallace had published a paper together the previous year outlining the hypothesis, but Darwin's much more thorough development of the reasoning behind the ideas was one reason his name is now more famously associated with the modern theory than is Wallace's. The text of Darwin's The Origin of Species is available on-line here and here.

So what was Darwin's theory? The idea that species are not unchangeable has a very long history and certainly pre-dates Darwin and Wallace. Indeed, Charles Darwin's own grandfather, Erasmus Darwin toyed with the idea. What was it that Darwin introduced to our thinking, that was so compelling that, even today, the modern theory of evolution bears his name?

Although, it was recognised that evolution occurred, at least after some fashion, what was missing was a credible mechanism by which change was directed. What Darwin realised and explained in the most eloquent way in the Origin of Species were the following points:

  1. The inconstancy of species. Darwin recognised through his incomparably careful work as a naturalist aboard the Beagle, and his observation of such things as the breeding of domestic animals, that offspring are not identical copies of their parents, but change gradually over time
  2. Common descent. He realised that the nested classification of species into more and less closely resembling groups (the taxonomy first introduced by Carl von Linne) could be explained by the concept of common descent; that closely related species radiate from a common ancestor, that less closely related groups do the same but with a common ancestor further in the past, and that ultimately all living things are descended from one or a very few ancestors.
  3. Gradualism. He was very clear on the point that very large changes of morphology can be explained by gradual change generation by generation over immense tracts of time (although no-one realised at the time just how long the earth has been in existence). This understanding could only have occurred at or about the time of Darwin, as the age of the earth up until 1800 was considered to be thousands of years. That view had been shattered by the findings of late Georgian and early Victorian geologists of whom his friend, colleague and collaborator, Charles Lyell, was a prominent example
  4. Diversity. Darwin realised that the process of change over time would result in new species and that its inevitable consequence would be a proliferation of different species - a world rich in different life-forms
  5. Natural selection. This was perhaps the most important element of Darwin's ideas - that over many generations, changes which result in competitive advantage for the organism survive, whereas those that do not offer an advantage are selected out. The members of the population that are better fitted to survive leave more offspring and so their traits became more common in the population, whereas traits that contribute less to fitness or traits which are actually deleterious become less common or die out altogether. Evolution, therefore, acts to transform species in the direction of better fitness for the environment in which those organisms find themselves
  6. Sexual selection. Darwin's 'other' selection idea, forgotten for many years but in the last twenty years making a comeback. Darwin pointed out that the preference of sexual partners for certain traits could drive evolution even though those traits confer no objective competitive advantage. The plumage of peacocks and birds of paradise and the noses of proboscis monkeys can be explained in no other way.

So, Darwin postulated a mechanism in which more or less random variation from generation to generation is selected by Natural Selection, that over time steers the change in the direction of greater fitness and that over immense time results in the creation of new species and higher taxa. This ultimately is the mechanism by which the wide diversity of species that we see today arose.

So where did Darwin's original theory have shortcomings? Well, Darwin never figured out the process that caused the variation from one generation to the next. He needed a mechanism that created variation in offspring on which Natural Selection could act and he never found it. He postulated a hypothesis in which the traits of mother and father were blended but never came to terms with evidence that pointed definitively away from the blending hypothesis:

  1. The blending hypothesis would always lead to suppression of variation over many generations, and would actually result in a process which acted against the diversity of species rather than promoting it
  2. The most obvious trait of offspring is that they are either male or female, never a blend: blending is not the process by which traits are passed on from parents to offspring


Genetics and Gregor Mendel

While Darwin was formulating his theory, the monk, Gregor Mendel was doing his work with garden peas that showed that the inheritance of traits was not due to blending of traits but the inheritance of and sorting of specific particulate or quantised characteristics (now known to be the genes of the genetic code). Mendel's work received little publicity and it is unlikely that Darwin actually realised the significance of any of Mendel's research (although he did have some writing by Mendel in his library).

In any case, the significance of Mendel's work was not recognised in his lifetime by anyone. He died in 1884, and it took a further 16 years, until the turn of the century, for his work to be fully disseminated. Mendel put the inheritance of certain traits (those that we now understand are influenced by single genes), on a mathematical predictable footing.

Go here for much more detailed information about Gregor Mendel and his work.

The Modern Synthesis

Between 1900 and about 1925, an argument raged that is almost unimaginable today. Darwin's theory was regarded as outmoded and replaced by Mendelian genetics - the two approaches were regarded as incompatible rivals. Some of the comments about the Origin of Species that appeared at the time are absolutely astonishing in their dismissal of Natural Selection and in relegating the concept to the waste bin of history.

This integration was carried out by a glittering array of scientists - Theodosius Dobzhansky, Ernst Mayr, Ronald Fisher, JBS Haldane, Gaylord Simpson and Sewal Wright amongst others. Together, these scientists integrated the thinking of Mendel and Darwin into what is now called the Modern synthesis. There is a number of key points to this new theory:

  1. The inheritance of acquired traits, such as suggested by Lamarck is utterly rejected
  2. The idea that inheritance of traits is determined by inheritance of individual particulate factors (called genes), where each gene has two copies in each individual, one inherited from the father and one from the mother and where an organism passes on one of its copies of each gene randomly chosen to its offspring which makes up the pair by getting another one from its other parent
  3. The concept of a species as a group of individual organisms sharing a single gene pool: the idea that genes cannot pass across species barriers and still successfully remain in the germ line
  4. The concept of population genetics and the mathematics associated with the distribution of alleles (different versions of the same gene) within a population
  5. The idea that the source of variation in a population is random mutation in the genes; that the mutations themselves do not drive the direction of evolution but that natural selection of different traits defined by the genes and by mutations of those genes does drive the direction of evolution

At this stage, the new integrated theory, now called neo-Darwinism, could be characterised by the following two simple statements: that evolution is the change in allele frequency in a population over time and that evolution is caused by the combination of random mutations (which causes variation from one generation to the next) and natural selection (which results in improvements in fitness of organisms over generations).

The structure of DNA

Soon after 1900 and the full understanding of Mendel's genetics, it became clear that the chromosomes, which could be seen in cells under the microscope, were the structures that carried the genetic code.

Chromosomes contain both DNA (deoxyribose nucleic acid) and protein. Until the late 1940s, no one knew whether the genetic information was coded in the protein or the DNA. Most scientists believed that the code resided in the protein which was considered to be complex enough to encode the complexity of everything from oak trees to man. In the late 40s and early 50s, the hypothesis that DNA was responsible for carrying the genetic code was first advanced and was ridiculed at first, as DNA was considered to be far too simple a molecule to encode life. After all, although it is a big molecule, its constituents are simple, comprising a limited set of different nitrogen containing bases (four different sorts as it happens) each attached to a sugar and a phosphate group. The backbone of DNA is a polymer of alternating sugar and phosphate groups. However in the late 1940s, a very elegant radio-labelling experiment, conducted by Alfred Hershey and Martha Chase, (go here for more information) proved definitively that the genetic code was carried in the DNA and not the protein. The race was on to discover the structure of DNA on which so much depended. Very famous scientists, including Linus Pauling, became involved in the search for the structure of DNA. The reason that finding the structure was and is so important, is that the structure holds the key to many things in the way the code works and is replicated from cell to cell and generation to generation. In fact, the key to the very code itself resides in the structure.

In 1953, Jim Watson, an American biologist, and Francis Crick, an English physicist, (in a wonderful example of interdisciplinary collaboration) arrived at the answer. They both worked at the Cavendish Laboratory - the Cavendish is the physics laboratory at Cambridge University. They did have the help of a wee bit of information about the X-ray diffraction pattern of DNA that gave them a critical clue, and that had been all but purloined from Rosalind Franklin who worked at Kings College, London. Rosalind Franklin is only now, many years after her death, getting the recognition she deserves. She has become known as 'The Dark Lady of DNA'.

Anyway, in this, the 50th anniversary year of the discovery of the structure of DNA, it is amazing how much progress has been made in understanding the genetic code and in mapping the genome of many organisms including humans. (The legend goes that Crick and Watson used to have lunch at the Eagle pub in Cambridge every day and that one day they turned up and announced to the assembled locals that they had discovered the secret of life. Francis Crick has become a bit of a recluse, and he stayed at home during this year's celebrations, but Jim Watson was in town and the locals say that he visited the pub on a nostalgic whim. Word got out that he was in the pub and the crowd soon overflowed into the street and stopped the traffic; a far cry from the early 1950s when the discovery was largely ignored. It probably wasn't until the late 1950s when the true significance of the decoding of the structure of DNA was fully understood.)

The most famous understatement in the history of science reporting appears in Crick and Watson's seminal paper published in Nature in April 1953:

'It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material'.

Go here for a facsimile of the original paper and here for an annotated version

The structure that they came up with, that also explained the deliciously elegant copying mechanism (the copying mechanism was eventually confirmed experimentally by Meselson and Stahl in 1958), was this: a pair of helices on the common axis of a polymer backbone of phosphate-sugar. The helices are joined together by hydrogen bonds between pairs of bases, with each member of the pair on one of the two helices. There are four possible bases to select from::

If there is a purine on one helix it bonds only with a pyrimidine in the opposing position on the other helix; in fact, adenine bonds only with thymine and cytosine with guanine. However the base sequence on any one helix is not constrained and can follow any sequence and the sequence can be arbitrarily long. If the helices are separated, each carries the entire information and by assembling a new helix from free bases the entire double helix can be replicated.

A, C, G, T are then the four letters of the genetic code. Everything that is required to pass the information from parent to create the offspring is written in this language of four letters.

How does the genetic code work?

Now that we understand the structure of DNA and the fact it is a very big molecule with a very simple but elegant underlying structure that contains a code, how does the code work and how is information passed on from generation to generation?

Living things are made of proteins - proteins are vastly complicated structures that fold and twist into clumps and chains and loops and other functional structures. The primary structure of proteins is, however a sequence of amino acids (also called residues). Amino acids, in nature, come in twenty different varieties, and a protein can be anything from a few tens of amino acids long to thousands long. (There is a vast and vigorous field of current biological research into understanding and predicting how the primary sequence structure of proteins is folded into secondary and tertiary structures).

Now then, in order to make a living thing we need to specify the specific proteins from which it is made, and the other proteins that promote and mediate the building of structures and organs in more complicated organisms (since most living things consist of more than a ball of protein).

All the immensely unimaginably complicated structures and process that go to make up living things are coded with just the four letters of the genetic code: A, C, G and T. It works like this: the letters on the DNA make 3 letter words. The words are always three letters long. These words are called codons. Any combination of three letters is allowed. Each codon specifies one amino acid in a protein. Those of you of a mathematical bent will realise that there are 64 different possible ways of arranging three letters chosen from four possibilities: 43. There are however only 20 amino acids. Three of the codons actually code the instruction 'stop' that tells the mechanism that translates between DNA and protein where the end of the protein sequence is. However there is still considerable 'redundancy' in the code. Generally speaking more than one codon represents each amino acid (some amino acids are coded by up to six different codons).



































































































The process of making proteins from DNA uses another polymer chemical, like DNA but somewhat different, called RNA (ribonucleic acid). RNA comes in single strands and contains the same sequences of bases as in DNA (except that thymine is replaced by a different base, uracil - so the four letters of RNA code are A, C, G and U). First, an enzyme called RNA polymerase separates the two strands of the double helix. One strand of the DNA is used as a template to synthesise a strand of RNA called nuclear RNA (nRNA) that contains the information of one gene. In a process called excision and splicing, non-coding portions of the gene are removed and the coding portions spliced together into a single continuous sequence, exactly one gene long, called messenger RNA (mRNA). Each gene codes for one protein (there are some exceptions to this - a few genes code for the creation of other cellular material, such as tRNA - see below). This process is called transcription and is considerably more complicated than I have described here. The mRNA is transported out of the cell nucleus to the location where the protein will be made.

At this location, a cellular mechanism called a ribosome, moves along the mRNA strand. When the ribosome detects a particular codon (AUG) it takes that as a signal to start a process whereby strands of RNA three bases long (called transfer RNA or tRNA), are aligned with the equivalent codon on the mRNA. Each three base length of tRNA, called an anticodon, brings with it the amino acid associated with that codon. As the ribosome moves along the mRNA, it builds up a sequence of amino acids that bind to one another and eventually form the protein. The whole process comes to an end at a stop codon which has no tRNA associated with it.

So now we know how the proteins are made.  Go here for more detailed information.

How is the genetic code stored in cells?

Each cell in an organism contains the entire DNA code for making the entire organism. Humans consist of billions of cells and the entire code is stored in each cell. The DNA is stored in structures called chromosomes. If codons are words and genes are paragraphs, then chromosomes are chapters. The DNA material and the chromosomes themselves are generally not visible in cells, except when the cell is going through a process of division, at which point the chromosomes condense and become visible.

Bacteria generally have only one chromosome that is usually a circle made of double strand DNA that contains most of the genetic code for making the bacterium. Bacteria reproduce by fission, by creating a clone of themselves.

More complex creatures like plants and animals that have cells with nuclei and that reproduce sexually have a very different arrangement. Most of the DNA in such organisms resides in the cell nucleus (some DNA also resides in a small separate organelles within the cell called mitochondria and chloroplasts which are present by the hundreds and thousands in each cell). The DNA in the nucleus, that carries the vast bulk of the information for building the organism, is arranged in a number of chromosomes that come in pairs.

The reason that they come in pairs is that one member of each pair of chromosomes comes from the mother and one comes from the father. Let's take people: we have 23 pairs of chromosomes. Each chromosome on average contains an astonishing 130 million base pairs and about 1300 genes (the human genetic code is about 3 billion base pairs long and we have about 30,000 genes). The chromosomes that come from your father and those that come from your mother are almost, but not quite, identical. There is a number of different versions of each gene within each species gene pool that are ever so slightly different from one another and each version is called an allele. The chromosomes that you inherit from each parent will be a little different as they will contain different alleles of some genes that reside on that chromosome.

Each person originally arose from the fusion of a sperm cell with an egg. These germ cells, unlike others in the body, have only one copy of each chromosome and when they fuse they create a cell with two copies of each chromosome. But in the germ cells, egg and sperm, that fused to start the process of making you, as in all germ cells, it was pot luck in the case of each chromosome, say from your mother, whether you got the version from your maternal grandfather or from your maternal grandmother. The same goes for the chromosomes you inherit from your father; some of the chromosomes you inherit from your father will have come from your paternal grandfather and some will have come from your paternal grandmother, on a random basis. To mix things up even more, when the germ cells are being replicated (it's called meiosis) there is an exchange of genetic material between the chromosomes that come from the parent's mother and those that come from the parent's father (this 'mixing' process is called crossing over or genetic recombination). No wonder each higher organism is unique (except for identical twins!).

The original fertilised egg, now with two copies of each chromosome, divides in a process called mitosis to create two and then four cells and so on. Division continues with cells eventually differentiating into all the organs and structures of the body. But each body cell (somatic cell) contains the entire genetic code for the individual in two copies of each of the 23 chromosomes (actually there is one pair of chromosomes which are not very similar to one another, the X and Y chromosomes. Female mammals have two X-chromosomes which are a normal size of chromosome and contain a substantial quantity of genetic material. Male mammals have one X-chromosome and one Y-chromosome. The Y-chromosome is a tiny little degraded thing that carries very little genetic information other than the sex-determining genes).

The genetic code is a recipe not a blueprint

Don't run away with the idea that the genetic code specifies a one to one relationship between a single gene and an organ (or even a function). There is no such thing as the gene for a finger, an eye or a lung, or a gene for blood or bile. In working with his peas, Mendel was very, very lucky to choose traits that were controlled by a single gene. Very few traits are under the control of a single gene, and most are controlled by the extremely complex interaction of several, often, many different genes, each of which appear in several alleles.

Take human eye colour for example. It's under the control of at least three different genes that reside on different chromosomes. And these are not genes 'for' eye colour. Although they, working together, do control eye colour, they control and influence many other things as well.

Biochemical pathways are breathtakingly complex, and as we unravel the way they are controlled genetically, it's clear that it is not a simple picture. Similarly with the genes that control the anatomical development of the organism - entire books are written about the development of eyes or limbs in mammals, and books' worth of knowledge is added every year.

Mutation, variation and creation of new genetic information

So, how does variation occur from generation to generation? Variation is the raw material on which Natural (and Sexual) Selection acts.

Well, the copying of chromosomes, either in the process of creating new somatic cells (mitosis) or in the process of creating germ cells (meiosis), is extremely accurate. There are fantastically complex biochemical processes around DNA replication that act to proof read and correct errors. But don't forget that in the process of replicating a single cell, 6 billion bases have to be copied; and some mistakes are inevitable.

Furthermore, certain environmental factors, such as high energy radiation, can cause random damage to the DNA molecule. The processes that repair such damage are very good but not perfect and sometimes the repair has errors.

Of course, once a change is made to the DNA sequence in a germ cell that is subsequently fertilised and grows into a fertile adult organism, the change is then locked in and is passed on from generation to generation. Each of us has, on average two substitutions (see below) in our protein coding genes compared with our parents' sequences.

There is a range of different kinds of mutation that can occur:

ACC GTA CTA CTA GGG ATG; suppose the first C is deleted; we now have


Over time, organisms have become more complex and new functions have been added. How does that happen? The mechanism that currently has most support is gene duplication and mutation. The idea is that a gene is duplicated so that two copies exist in the genome. Before duplication, a gene cannot change much because it is needed and constrained by the functions it performs. After duplication, one of the two copies of the gene acquires the ability to change significantly as it is no longer constrained by functionality; or the two copies can share the functionality originally carried out by just one thus allowing each to become more specialised and to diverge in sequence and function over time.


The theory of evolution calls for species to diverge, for new species to arise and for most old species to become extinct. How do new species arise? Well, a species of advanced organisms is defined, in the biological species concept, as that set of individuals that would choose to mate in the wild and would produce viable fertile offspring.

There are three basic hypotheses of speciation. They are:

The major steps leading from Bacteria to Man

Two major lines of evidence exist for the occasionally tentative hypotheses about the steps that life has taken from its beginning to today. The first line of evidence is found in the fossil record and this has been known about for over 200 years. The second line is in the genetic code itself where evolutionary events leave clues and signs behind in the genetic code. Our understanding of evolution has been pushed forward by leaps and bounds in the last 10 years.

Here are the major steps, focusing on animals:

Modern issues and ideas

So what are the issues and ideas that scientists are wrestling with today? Any one of these could provide the material for several books so we can just scratch the surface here.

Randomness of mutations

The normal characterisation of neo-Darwinism is change in allele frequency over time caused by random mutations and Natural Selection. The whole concept of randomness of mutations is currently under scrutiny and there are some senses in which mutations are absolutely not random. There are however held to be random by neo-Darwinists with regard to direction of evolution. Some work with the evolution of bacterial digestion of different foods has challenged this idea in the last ten years.

Neutral Theory

Kimura suggested that much of the direction of evolution over time comes not from beneficial mutations being selected but from the pressure of more or less random genetic drift caused by neutral mutations which occur more frequently than do beneficial ones.

The importance of pseudogenes and other epigenetic material

A great deal of DNA in mammals appears to be non-functional; in fact in humans, the 30,000 genes that code for protein occupy only 1.5% of the total genome. The other 98.5% was thought to consist of junk, garbage and leftovers from the past history of the genome going all the way back to the origin of eukaryotes.

But in the comparison between man and mouse genomes that was published last year a surprising result emerged: 5% of the genome appeared to be under selection - in other words 5% was sufficiently affecting the organism that it could not change willy-nilly like truly non-functional DNA would be able to. If this is so, there is more than three times as much genetic material which is functional than we thought (of course 95% of the genome was confirmed as not being under selection and is therefore truly non-functional). Functional DNA material that lies outside genes and does not code for protein is part of the so-called epigenetic programme, and we are just beginning to study it. People were shocked when they discovered that humans only have 30,000 genes - but it seems that there is another layer of complexity that we've barely begun to study. Recently, the first report of a pseudogene (a gene fossil that does not create a protein, because it is broken in some way) that has a function (in mediating its analogous active protein coding gene partner) was reported.

The importance of gene promotion and suppression in determining the way the organism develops is also becoming clearer. The amino acid sequence making up a protein is coded by the gene and is very important, but also critical is the time during development and the place in the body of the organism that the protein is expressed. Promoter regions in eukaryotes, both up-stream and down-stream of the gene are much more complex in eukaryotes than in prokaryotes, and proteins such as transcription factors control the expression of particular genes both temporally and spatially.

Genes, organisms and species

What is the unit of evolution - is it the gene, the organism or the species? Richard Dawkins pointed out that it is the purpose of DNA genes to replicate themselves. Whereas we originally viewed DNA as being a servant of and providing a function for organisms, another way to consider it, is that organisms act on behalf of the genes. Ten billion cells in the form of a human are more effective at creating another 20 billion cells with the same genes, generation after generation than 10 billion individual cells. Or at least, so the hypothesis goes.

Mayr, of course, points out that Natural Selection does not act on genes but on phenotypes and so he would contend that the unit of evolution is the organism. And Gould and Eldredge would argue for the notion that evolution acts at the level of species and lineages of species, promoting the successful ones and extinguishing the line of the less successful.

Punctuated equilibrium and catastrophism

The mention of Gould and Eldredge brings us to the concept of Punctuated Equilibrium. They noticed that the fossil record could be better interpreted by long periods of species stasis with shorter periods of species change. Dawkins, who is a thoroughgoing gradualist, and Gould, fought tooth and nail over this for years, until Gould's untimely death last year allowed Dawkins to have an unopposed last word.

In one sense, it's a storm in a teacup, as Gould's punctuations still take a long time by our reckoning.

In another sense the idea that environmental catastrophes have occurred in the past, such as the huge bolide event that led to the extinction at the end of the Cretaceous and the vulcanism that led to the even greater extinction at the close of the Permian, are making a comeback after having been suppressed for decades by strict Darwinian gradualism.

Evolutionary Development

Evolutionary development (Evo-devo) is the study of how evolution affects the development of organisms and is affected by it. It is an immensely rich field at the moment. One idea is that evolution is not just mutation and selection, but includes mutational changes in the epigenetic programme in development and selection.

Studies of things like hox genes, that control the basic body plan of animals at a very high level, and how they interact with other genes, is continuing apace.  Also being studied is the important families of transcription factors such as the forkhead binding or FOX family of genes that control the development of organisms at an anatomical and biochemical pathway level.

Comparative genomics

This is the comparison of the genome of one creature with another. It's turning out to be an immensely powerful way to understand evolutionary history. The recent comparison of mouse and human genomes shed a tremendous amount of light on the evolution of humans.

With the imminent publication of the full human genome (so far we have had only a draft) and the draft chimpanzee genome, the stage is set for further revelations as we compare the two.


Go here for a list of books that contain further detail on many of these subjects

HomeWriting Links Resources