Tuesday, 27 November 2012

Genome sequencing: the next steps.

The majority of people with access to any kind of mass media will have heard of the human genome project and be aware of genome sequencing, even if they don't know the ins and outs of it. A lot of those people will know about the 1000 genome project, the results of which were published in Nature last month, and plenty of people will know that the genomes of all of our main model organisms, such as Arabidopsis thaliana, Mus musculus, Danio rerio, Drosophila melanogaster, Saccharomyces cerevisae and (of course, the laboratory workhorse) Escherichia coli (That's Thale Cress, Mice, Zebrafish, Fruit Flies, Brewer's Yeast and E. coli, respectively). Advances in genome sequencing, particularly in pyrosequencing, mean that sequencing the genome of a whole organism is no longer a major issue. The time consumed by the process, as well as the cost, is coming down rapidly in some kind of Biological version of Moore's Law. So now that we're edging ever closer to the ability to personalise human medicine based on our own individual DNA sequence, and we can be sure that the big commercial sequencing companies will keep chipping away at the both the cost and time issues, which direction will basic research be taking from now on?

One avenue being pursued is that of 'metagenomics', or the sequencing of genetic material isolated from whole environments or ecosystems. The main interest in metagenomics stems from the fact that a technique called 'massive parallel pyrosequencing', a technique based on sequencing between one and one hundred million short DNA sequences in parallel, allows an unprecedented snapshot into the diversity of bacteria present in a given environment.

The process involves the extraction of DNA from environmental samples before cloning into a bacterially derived artifical chromosome capable of accommodating up to 350kilobases of DNA. The DNA is then amplified via the polymerase chain reaction (PCR) and sequenced. In the past, this would have meant the Sanger chain-termination method of sequencing, which was quite low throughput. Now, pyrosequencing is used, which involves building a strand of DNA based on an immobilised template strand. Each letter of the genetic code (A, T, G and C) is added sequentially to the reaction. As one of the letters is incorporated into the growing strand a fluorescent signal is emitted. Because only one letter is present in the reaction mixture at a given point in time, it's easy to figure out which letter is being added when the fluorescent signal appears. This gives you a heck of a lot of sequence data but leaves you with a big, big problem... you could be working with approximately 10,000 different species and dealing with an impossibly large number of sequence reads, most of which will be code that has been read several times in the same experiment,  so how do you even begin to make sense of this information overload? In short, the answer is 'with great difficulty'. Bioinformaticians have developed programmes which should, in principle, assemple the sequences into genomes accurately. However, most of these programmes are optimised for single organism assemblies, not for metagenomic studies. The use of a 'reference' sequence improves accuracy immensely but there are relatively few bacterial genomes available outside of the main species used in the laboratory, which makes it quite clear that sequencing the genomes of single organisms is far from flogging a dead horse.

So, what's the point of all this? Well, it's a pretty big deal. One of the major metagnomics projects is the study of the human microbiome, particularly the gastro-intestinal tract microbiome. Human associated bacterial cells outnumber your own body cells ten to one and species diversity exceeds 10,000, we simply have to accept that the influence they have over us is enormous. There's even one school of thought, albeit a hotly constested one, that the unit of natural selection in evolutionary terms is not the gene, or the organism, but the organism and all of the associations it forms with microbes. The idea states that an organism is capable of utilising the genome of the microbes it hosts (humans, as an example, use gut bacteria to aid food metabolism) and that the microbial genome evolves at a faster rate than the host genome. This gives us what is called a 'hologenome' and the hologenome's propensity for rapid evolution allows a far greater level of adaptive potential than would be possible when considering the host genome alone.

Quite simply: an understanding of the microbial communities we host will allow us a better picture of who we are and where we came from, as well as opening the door to a new generation of medicine, acting in concert with personalised medicine stemming from the sequencing of individual human genomes.