Rapid DNA sequencing: Technology addressing new problems, solving them, and handing back entire new visions

Ernest Retzel

DNA sequencing is the process of determining the content and order of the G, A, T and C “bases” in the genome of an organism. sometimes called “the code.”  So, what’s the big deal about DNA sequencing?  They have known about DNA since the ‘50s, and they have been doing DNA sequencing since the mid-70s.  It’s even in text books!

Indeed, we have been doing sequencing at one level or another since I was a graduate student.  There are over 300 billion bases of DNA sequence data in GenBank, the U.S. national archive of sequence information, representing species from bacteria and viruses to humans, from arachnids to zebra.  Researchers thought that, if we just mined that data, we would probably have ten years of work ahead of us.

In the last two years, however, there have been developments in the technology of DNA sequencing that have changed everything.  At one point in my graduate career, generating 140 bases of sequence data took six months with all the bench work that accompanied the preparation.  DNA sequencing evolved with what seemed like amazing speed over the following 25 years of applying it to biological problems.  Then everything changed – in the past two years, truly new technologies for performing sequencing have been developed.  We can now generate 2.5 billion bases of DNA sequence data in less than a week, and a majority of that time is spent on computer analysis. 

There are many ideas in biology that “Changed Everything.”  This can be said a lot less frequently of technologies.  DNA sequencing changed everything early on – genes and genomes became accessible.  A technology known as the polymerase chain reaction (PCR) changed everything, allowing miniscule amounts of DNA to be amplified.  TV shows like CSI demonstrated to the public how powerful it could be.  But high-throughput DNA sequencing changed everything in a way that I have not seen in my career.  Suddenly, a full genome is accessible to almost every researcher in not very much time and for a relatively small amount of money. The first human genome took 13 years to complete, hundreds (if not thousands) of people working on it, and hundreds of millions of dollars.  The work was beyond arduous; the complexity of the project was almost unimaginable. 

By contrast, there is now a major program being pushed forward to develop the technology to deliver the sequence of a human genome for $1,000.  Not waiting for that, but certainly hoping it will happen, there is the “1,000 Genome Project” – seeking to completely sequence a thousand individuals worldwide to understand everything from evolution of humankind to the differences between each of us.

That is exciting, even mind-boggling, when you understand not only the possibilities, but also the scale of that data and the scope of the analysis.  Our biology problems are suddenly looking like astrophysics problems in terms of scale.   A sequencing run starts with a terabyte (TB) of raw data (TB= 1,000 gigabytes), is reduced to a few gigs of sequence data, and the analysis generates about 300 gigs of information.  It doesn’t fit well in a spreadsheet.  Each machine we have in the lab generates that much data in two days.  And we have six machines just at NCGR.

“But wait, there’s more!!” 

The idea of a “personal genome” is now within reach, and there is in fact the Personal Genome Project.  As cool as that is, there is so much more we can do now.  There are only 20,000-30,000 genes in most animals and plants, and a lot of the genome (generally 90% or more) is not accounted for in genes.  We used to refer to this as “junk DNA.”  In the last couple years, because of the deep sequencing we can obtain with the new technologies, we have found that over 95% of the human genome is biologically important and useful.  We just don’t know what all of it does yet, but we know it happens. Whole new classes of RNA molecules have been defined.  It has been shown that there is a dynamic process occurring between these newly discovered classes of molecules and the RNA molecules that code for proteins that define how things are controlled in a cell. 

On a whole different topic, we can now take an tissue that is infected with a virus or a bacteria, and see what happens in the process of the infection – what host genes are turned on and when, what viral genes are turned on, and in what order.  And we look at them ALL at the same time, in the same sequence-based snapshot of an infection.  We have taken plants that have been studied for years, whose genome sequence has been explored in detail, and we have discovered areas that are only expressed in certain tissues at a very specific time in plant replication.  In some areas, plants make excellent models even for humans.  You might not think that plants have a lot in common with humans, but the replication process is similar in many respects, and we can study mutations made in plant genes without going to jail!

With this depth of potential understanding of the genome, I have noticed that my colleagues have begun talking about not just gene insertion or breeding but to begin engineering plants for extremely complex characteristics.  Most recently, this has arisen from plant studies related to bioenergy and biofuels, where we talk about how to increase the levels of certain traits (sugars for fermentation and oils for biodiesel) while modifying the structural characteristics that sequester those products (reducing lignin in trees, for example).

Beyond this, there is an entirely new science of metagenomics.  A bit of background: first, over 99.9% of the microbial life on the planet remains completely unidentified, largely because we are not able to grow them in the laboratory – we have not identified the nutritional requirements of these organisms in a way that we can mimic their growth environment.  Second, these organisms frequently create what you might call a meta-organism – many organisms living in balance within an environment.  That environment might be a soil sample, or an intestinal tract or mouth, a hot spring or an ocean.  Small changes in those environments cause shifts in the population; for example, shifting the temperature or the carbon dioxide level over a plot of earth can cause a shift in representation of organisms that are present in the soil.  The sensitivity and immense output of even our current sequencing technologies lets us take a sample of those environments, and even though we can’t culture those organisms, we can explore the families they likely belong to by sequencing their metagenome, or the aggregate DNA from the pool of organisms.

Everything has changed.  The possibilities and questions are endless.  There are important questions about the ethics and privacy of genomic information and about the genetic engineering of plants and animals that need to be resolved.  Beyond those questions, though, is a goldmine of understanding of the natural world.

 

Links


Café Presentation | Read Further | What you said

Webcast



< < Back to Schedules page
Café Details

Find the date, time, and location for a café near you by clicking on a city below


Click for café info Albuquerque

Click for café info Española/
     Pojoaque

Click for café info Los Alamos

Click for café info Santa Fe