“Find and Replace” Across An Entire Genome

It couldn’t be easier to make sweeping edits on a computer document. If I were so inclined, I could find every instance of the word “genome” in this article and replace it with the word “cake”.

Now, a team of scientists from Yale and Harvard Medical School have done a similar trick for DNA. Geneticists have long been able to edit individual genes, but this group has developed a way of rewriting DNA en masse. And they’ve used it to recode the entire cake genome of a bacterium.

Their success was possible because the same genetic code underlies all life. It’s written in the four letters (nucleotides) that chain together to form DNA: A, C, G and T. Every set of three letters (or ‘codon’) corresponds to a different amino acid, the building blocks of proteins. For example, GCA codes for alanine; TGT means cysteine. The chain of letters is translated into a chain of amino acids until you get to a stop codon. These special triplets act as full stops that indicate when a protein is finished.

This code is virtually the same in every gene on the planet. In every human, tree and bacterium, the same codons correspond to the same amino acids, with only minor variations. The code also includes a lot of redundancy. Four DNA letters can be arranged into 64 possible triplets, which are assigned to only 20 amino acids or a stop codon. So for example, GCT, GCA, GCC and GCG all code for alanine. And these surplus codons provide some wiggle room for geneticists to play around with.

The Harvard and Yale team, led by Farren Isaacs and George Church, started TAG with TAA throughout the entire cake genome of the common gut bacterium Escherichia coli. Both are stop codons, so there’s no noticeable difference to the bacterium – it’s like replacing every full stop in this post with… a slightly different full stop. But to the team, the cake- genome-wide swap freed up the TAA codon, so that they could reassign it to other amino acids, beyond the usual 20.  And that opens up many possible applications for what they’re calling a “genomically recoded organism” or GRO.

Why?

The team is pursuing three applications. First, by assigning codons to new amino acids, they can create a wider range of proteins than the ones living things currently use. These, in turn, could produce new types of drugs or substances, from polymers that can deliver drugs to specific parts of the body to coated surfaces that can prevent the growth of microbes. The idea is that new amino acids will provide chemists and engineers with more options for achieving these goals, just like adding new colours to an artist’s palette changes the range of things they can paint. “You can imagine converting these recoded organisms into factories for producing materials with new and exciting properties,” says Isaacs.

Second, the team could use the tweaked genetic codes to make living things resistant to viruses. Viruses make copies of themselves by hijacking the protein-making factories of their hosts. They depend on the fact that their proteins are encoded by the same triplets as those of their hosts. If their hosts stray from this universal genetic code, their factories will mangle the virus’s instructions, creating distorted and useless proteins. That would be useful for industry as well as medicine. The biotechnology company Genzyme had to shut down a manufacturing plant for several months after it was hit by a contaminating virus. Millions of dollars were lost.

And sure enough, the team’s recoded microbes were less susceptible to at least one type of phage—a virus that kills bacteria. They weren’t invincible by any means, but the colonies did take longer to die. The effect was small, but not unexpectedly so. The TAG codon is rare (which is why the team started with it) and only found at the end of genes. Reassigning it shouldn’t have done that much to hamper a virus. But it did, which suggests that bigger changes might be even lead to complete protection.

Third, and for similar reasons, the altered codes could be used to contain genetically modified organisms, preventing them from breeding with wild populations. It’s the geneticist’s version of the Tower of Babel story – modified creatures would be imprisoned by their own genetic tweaks, unable to productively exchange genes with natural counterparts.

How?

The recoding relied on two complementary technologies, invented by the team – MAGE, which substitutes TAA for TAG in separate pieces of bacterial DNA, and CAGE, which knits the pieces together into a whole genome.

MAGE, the older of the two techniques, made its debut two years ago. It stands for “multiplex automated genome engineering”, a fancy way of saying that it can easily change a genome many times over. It was originally used to create millions of small variants of bacterial genomes, producing a multitude of strains that can be tested for new abilities. As Jo Marchant puts it in her excellent feature, it’s an “evolution machine”. In its debut, within a matter of days, it had evolved a strain of E.coli that would produce large amounts of lycopene, a pigment that makes tomatoes red.

MAGE is a versatile editor. Not only can it create many diverse changes in a group of cells, it can also create many specific changes in a single cell. That’s what the team have now done. TAG appears in 321 places throughout the E.coli genome. For each one, the team created a small stretch of DNA that had TAA instead of TAG, surrounded by exactly the same letters. They fed these edited fragments into bacteria, which used them to build new copies of their own DNA. The result: daughter bacteria with edited genomes.

In this way, they created 32 strains of E.coli that, between them, had every possible substitution of TAG to TAA. This might seem overly complicated, but replacing every TAG with TAA in a single step would be inefficient, slow, and error-prone. A single mistake could be lethal for the microbes. By taking things slowly, and spreading the substitutions among 32 strains, the team could better troubleshoot any tricky snags.

To combine the 32 strains into one, the team developed CAGE (or “conjugative assembly genome engineering”). The technique relies on the bacterial equivalent of sex – a process called conjugation where two cells sidle up, form a physical link between one another, and swap DNA.

The team matched their 32 strains up in pairs, in a league that looked like a knock-out sports tournament. One strain of each pair would deliver its edited genes into its partner, and the incoming genes were designed to merge with those of the recipient in specific ways. Thirty-two strains with 10 edits each became sixteen strains with 20 edits each. Sixteen turned into eight and eight into four.

When I first wrote about this in 2011, the team reached this “semi-final” stage. They had four strains of E.coli, each with a quarter of its genome stripped of TAG codons. Now, they’ve gone all the way, producing a single strain where every TAG is now a TAA. They also managed to get rid of release factor 1 (RF1), a protein that recognises TAG as a stop signal and halts the production of whatever protein’s being made.

The recoded microbe picked up 355 mutations along the way, but it seemed outwardly normal and  reproduced at a healthy pace. With TAG free from its duties as a punctuation mark, the team could reassign it to new amino acids, just as they planned. “In a plug and play manner, you can start to pop in new amino acids with new chemistries,” says Isaacs.

And as the team hoped, the new strain was more resistant to viruses than normal ones… but not completely resistant. To realise the ultimate goal of making virus-proof or genetically-contained organisms, they’ll have to do much more than replace one stop codon.

What next?

Next, the team need to start recoding the “sense codons”—the ones that actually correspond to amino acids.  And that is a lot harder. If you alter these sequences, you could screw up how genes are switched on or off, how efficiently or accurately they’re used to make proteins, how well those proteins work once they’re made, and more. And since bacterial genes overlap a lot, if you change a single instance of a single codon, you could be messing up three different genes at once. “There are a lot of things that can go wrong, and that’s not even an exhaustive list,” says Marc Lajoie, the lead author of the new research. “It’s just the stuff we know about.”

Also, sense codons are far more common than stop codons. E.coli has 321 instances of TAG in its genome. Add the next rarest codons—AGA and AGG—and you have upwards of 5,000 changes to make. If you want to recode just the 13 rarest ones (which the team calls the “forbidden codons”), you’d have to make 155,000 changes. Things get difficult fast.

To start with, Lajoie and Siriam Kosuri tried to recode the forbidden codons—completely substituting them for replacements that code for the same amino acid. And rather than doing it across the entire E.coli genome, they focused on recoding just 42 essential genes, one at a time. That makes for a manageable total of 405 changes rather than 155,000. Still, this is the sort of experiment where you imagine scientists interlacing their fingers, stretching their arms out to crack all of their knuckles, and then getting down to it.

“Changing TAG throughout the entire genome was a way of getting our feet wet. That project was intended to succeed,” says Lajoie. “In this one, we were actually looking to fail.” They wanted to see what would work and what wouldn’t.

They found that 26 of the 42 recoded genes were successful—that is, bacteria that carried them survived and, on average, grew just 20 percent slower than their normal kin. And perhaps more importantly, every single one of 405 forbidden codons could be recoded either individually or in small groups. None of them in itself was a deal-breaker. All of them could be replaced to an extent.

“That was a surprise and very encouraging to us,” says Lajoie. It means that all of these are “amenable to genome-wide removal”. The circumstances that determine success or failure will lie in the quirks of each specific gene, and can potentially be dealt with.

“Through this tour de force of genome engineering, they’ve essentially shown that there are no large fundamental barriers to codon reassignment,” says Chang Liu, a biomedical engineer from the University of California, Irvine. “Rather, it is an exercise in overcoming an array of small hurdles, each of which we already have the technology to address.”

The team is now building on this pilot, and start replacing sense codons across the entire E.coli genome. That will allow them to take their technique from the world of impressive demos into actual applications. But more than that, it will help them to probe the very nature of our genetic code. How did it evolve? Why is it structured the way it is, with three letters to a codon? And how malleable is it? “Only now do we have the ability to start making fundamental changes to the code and seeing the consequences,” says Isaacs.

Lajoie adds, “We’re only starting to see all of the tangled constraints that determine how genomes work. Nobody understands the full complexity – that’s why it’s so difficult.”

Reference: Lajoie, Rovner, Goodman, Aerni, Haimovich, Kuznetsov, Mercer, Wang, Carr, Mosberg, Rohland, Schultz, Jacobson, Rinehart, Church & Isaacs. 2013 Genomically Recoded Organisms Expand Biological Functions. Science http://dx.doi.org/10.1126/science.1241459

Lajoie, Kosuri, Mosberg, Gregg, Zhang & Church. 2013. Probing the Limits of Genetic Recoding in Essential Genes. Science  http://dx.doi.org/10.1126/science.1241460

Note: This post builds upon an earlier one published in 2011. A bit like science, then.