Mouse repeats

2. Repeat sequences in mouse and human genome

Alec MacAndrew

The draft mouse genome was published on 6th December 2002 , Waterstone et al, Nature 420, 520 - 562

Note that this is a 43 page paper (Nature averages 2 -3 pages per paper) with around 200 authors and 330 references. This is all new to science and the volume of material is more than a very fat text book if one includes the references . The detail is published not in a single paper, but in about six related papers occupying more than half of the super fat 6th December issue of Nature.

Repeat sequences

Repeat sequences are very common in mammalian genomes. It is very illuminating to compare the repeat content of mouse and human genomes. Repeat sequences are mainly either active or inactive transposable elements such as transposons, retroviruses and so on. Because these sequences, once they become inactive, are subject to the neutral mutation rate (that is the rate at which mutations that do not affect the organism fix in the genome), an understanding of how far and how they have changed from their original form, gives us a lot of information about the evolutionary history of the human and mouse genomes.

46% of the human genome consists of recognisable interspersed repeats (compare with about only 2% coding genes). There might be more repeats – but elements older than 200 million years have mutated so much that they are no longer recognisable. About 38% of the mouse genome is recognisable interspersed repeats – the reason that this percentage is less than the percentage in humans is probably because the higher substitution rate in mouse (see below) masks older repeats.

There are two sorts of repeat sequence:

1) ancestral repeats, which were introduced to the genome before the divergence of the mouse/human lineages. These can be recognised by their syntenic position on the genome: in other words, repeat sequences which are in the identical relative place on the genome in both mouse and man, were clearly inserted before the divergence of mouse and man.

2) lineage specific repeats: those that have been added to the genome of mouse and man since their divergence from our common ancestor, which are recognised by not being present in both genomes or by not occupying a syntenic position on the genomes.

32% of the mouse genome consists of lineage specific sequences compared with only 24% in humans. There has, therefore, been more transposon derived DNA added to the mouse genome than to the human genome since divergence.

22% of the entire human genome consists of ancestral repeats whereas only 5% of the mouse genome has recognisable ancestral repeats – the reason seems to be because mouse mutates faster and thus the ancestral repeats that can be recognised in the human genome reach back further in time than those that can be recognised in the mouse genome.

Since individual ancestral repeats have, by definition, been diverging for an identical period of time in the two lineages, they are a powerful way to measure mutation rate. The data indicates an average two fold faster mutation rate in mouse than in human (on a time rather than a generational basis - ie twice as fast as measured in years as opposed to generations). The least diverged ancestral repeats (ie those that were inserted into the genome just before divergence) show 0.17 substitutions per site in human and 0.34 substitutions per site in mouse. Using 75 million years for the date of divergence gives a neutral rate for humans 2.2x10^-9per year or 4 x 10^-8 per generation (or about 120 substitutions per individual across the 3 billion nucleotides in the human genome. That means that each of us has on average about two mutations in an active protein coding gene compared with our parents and that there are therefore several billion mutations in active genes in the world population. Contrary to creationist claims there are more than enough mutations occurring - and fixing - in the genomes to explain the observed rate of evolution and the differences between the species). Mouse rate is 4.5x10^-9. Note these are averages over 75My. Data from more recent divergence (mouse/rat and human/gibbon) indicate a much higher current rate (about 5 times that of human) for mouse.

By looking at the dates of insertion of different types of repeat (as determined by their difference from the consensus or original sequence) we can understand the rate of insertion of transposons into the genomes. The rate has remained quite constant in mouse but in human it increased to a peak at 40My and dropped to a relatively very low figure since then.

Another interesting thing to consider is where the sequences appear in the genome. In the case of SINES, a very interesting phenomenon is observed. The SINE sequence families active in mouse and human are actually different: Alu in humans and B1, B2, ID and B4 in mouse . However they accumulate in the same places in the human and mouse genome, probably indicating that the two genomes tolerate the insertion of SINES in the same places.

The four homeobox clusters in mouse and man are by far the most repeat poor part of the genome, supporting the concept that the fundamental developmental function of these clusters cannot tolerate much change. Other 100Kb repeat-poor regions in human are also repeat-poor in mouse and this could lead us to identify areas of the genome that contain fundamental developmental or regulatory codes. Other repeat-poor regions are also correlated between mouse and man and many of these are single genes. The reason for the paucity of repeat sequences in functionally critical regions is that repeat sequences are inserted into the genome over millions of years and selection does not tolerate disruption in these parts of the genome.

LINE elements accumulate more densely on sex chromosomes in mouse and man. There are twice as many LINES on the Y chromosome of both species compared with their autosomes. Since the Y chromosome is so gene poor, it would tolerate more insertions and it is of course unable to purge deleterious mutations by recombination.

There are four times more simple sequence repeats (caused by slippage during DNA replication) in mouse than man. The reason is not understood (but it is my view is that it can be explained by the greater number of generations in mice since divergence and so the more opportunities for errors in meiotic duplication to accumulate).