Flu Wiki 2 | Forum / Influenza Genome Project

21 June 2006

anon_22 – at 05:35

There are many questions that remain unanswered about H5N1 and pandemic influenza, many of them having to do with the fact that many questions remain unanswered about influenza viruses in general. For example, we don’t really know what mechanisms drive a particular avian virus to becoming one capable of infecting humans. A lot of ‘conventional wisdom’ about flu viruses are based on not extremely solid, certainly not extensive evidence. To answer that, we’ll have to answer some fundamental questions about how flu viruses change in general.

The following is a study by JKT’s team Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution, Ghedin et al of a collection of samples from New York state.

Abstract: “Scientists now have the ability to sequence large numbers of complete genomes (rather than segments) of flu viruses. Influenza viruses are remarkably adept at surviving in the human population over a long timescale. The human influenza A virus continues to thrive even among populations with widespread access to vaccines, and continues to be a major cause of morbidity and mortality. The virus mutates from year to year, making the existing vaccines ineffective on a regular basis, and requiring that new strains be chosen for a new vaccine. Less-frequent major changes, known as antigenic shift, create new strains against which the human population has little protective immunity, thereby causing worldwide pandemics.

<snip>

Motivated by the need for a better understanding of influenza evolution, we have developed flexible protocols that make it possible to apply large-scale sequencing techniques to the highly variable influenza genome. Here we report the results of sequencing 209 complete genomes of the human influenza A virus, encompassing a total of 2,821,103 nucleotides.

In addition to increasing markedly the number of publicly available, complete influenza virus genomes, we have discovered several anomalies in these first 209 genomes that demonstrate the dynamic nature of influenza transmission and evolution. This new, large-scale sequencing effort promises to provide a more comprehensive picture of the evolution of influenza viruses and of their pattern of transmission through human and animal populations. All data from this project are being deposited, without delay, in public archives.”

anon_22 – at 05:39

This very colorful chart is a very good visual aid to understanding some possible patterns of how viruses can change

207 gene sequences

“Figure 1. Sites with genetic changes across the ten main proteins in 207 influenza A viruses. Each row represents a single amino acid position in one protein. Amino acids (single-letter abbreviations are used) are colourcoded as shown in the key, so that mutations can be seen as changes in colour when scanning from left to right along a row. For simplicity, only amino acids that showed changes in at least three isolates are shown. Each column represents a single isolate, and columns are only a few pixels wide in order to display all 207 H3N2 isolates in this figure. Isolates are ordered along the columns chronologically according to the date of collection; boundaries between influenza seasons are indicated by gaps between columns.”

anon_22 – at 06:09

OK, that took longer than I expected. I have to go and pick up the new pup. (Yay! She must be BIG by now!! Can’t wait)

I’ll do the comments later (much later, I suspect) today.

<anon goes off with a big grin>

lugon – at 06:19

So if time runs along the horizontal axis (each row represents the evolution for each aminoacid along time), then there’s a vertical stripe in 2003 (at about 60–65% of the row’s length) where many aminoacids change more or less simultaneously. Maybe because the change in one aminoacid elicits some instability that facilitates changes in others? Looks like the recipe for some nice little chaos (scientific term) here!

If edge of chaos situations, and therefore pandemics, are only natural, why don’t we just acknowledge it and prepare, so that we become more resilient in the face of change? Because preparing is darn difficult to think about: facts and imagined scenarios and emotions and other people’s opinions (and our opinions about their opinions) get all mixed up, and then we do what we do.

This is not meant to be an off-topic. I really want to see this thread to evolve in its own right. It’s just to point out, briefly, that variability, even sudden variability, is a natural thing to happen.

Please look at the details and see if you can predict anything or convince others that “change happens” (that’s the case, isn’t it?). But I personally take this as one more reason to focus on what I can do, such as creating a few wikipages and alerting my people locally and elsewhere. :) We’re listening! Thank you!

anonymous – at 06:22

that is only for normal flu and from October 2005 and the picture is too low resolution. You better do these things by computer, human visualization is just too slow. The goal is to write a program which reasonbly simulates virus evolution. I assume these programs already exist, but the properties of the virus depends on its 3d-structure and this is hard to predict from the sequences. It might be possible to identify a pandemic virus in the next future and verify its pandemic potential. This virus can be used for bioweapons so that all would be kept secret. But suppose we have that target-virus, we can use such simulations to predict how likely it will be achieved in nature. Of course, we would probably also produce vaccine immediately once we find such a target virus

anonymous – at 06:24

many of the H3N2-changes are done to escape host-immunity. H5N1 doesn’t have this problem yet

anon_22 – at 08:53

I posted this not for the purpose of human visualization :-), but just to demonstrate the range of possibilities. This is only the beginning of a multi-centre international collaborative effort to build a huge database of complete genome sequences from randomly selected unbiased samples. Scientists are saying how so many of them have these samples sitting around in fridges. These often have not much importance on their own, but if gather enough over time and from many different places and different species, the information should tell us a lot.

anon_22 – at 08:57

anon “many of the H3N2-changes are done to escape host-immunity. H5N1 doesn’t have this problem yet”

Just to use your statements as an example, there are a whole lot of assumptions in them that actually we are not completely sure of. How exactly and which changes are done to escape host immunity would be a very important question to find out. The ability to track virus sequences over time would help a lot in our understanding.

Monotreme – at 10:31

anon_22, I think Dr. Niman has identified an anomaly on the accuracy of the flu polymerase thread at 08:00. I can confirm the anomaly. If you get a chance check out my comment at 10:16 on that thread. I’d really like to know how Dr. Taubenberger and the groups responsible for the relevant sequencing explain that one.

anon_22 – at 18:42

Monotreme, thanks for pointing it out. However, I’d much rather this thread stay EXACTLY on this Influenza Genome Project and the implications, which now that the pup has gone to bed, I might be able to make a few points before I fall asleep. So let’s leave Niman’s anomaly as you call it in the other thread for now while I try to make some comments on the Project before this thread goes way off topic!

anon_22 – at 19:42

Now to give some examples of how to use such whole genome surveillance data.

If you look at the HA segments of the different samples over time, you can see that from around mid 2003, there was a sudden emergence of a strain with an entirely different HA than before. In the middle of that, in the Nov 2003 column, you can see a column where the HA is the same as the dominant strain, but the other segments are entirely different. From this, one can conclude that a reassortment event occurred sometime early in the 2003 season creating an antigenically distinct HA, and the samples shown in the Nov 2003 bands are the likely donor of the HA segment for the reassortment. This donor minor clade continued to co-circulate, as shown by its being isolated in Nov 2003 well into the flu season.

This example shows how a minor clade can both co-circulate and contribute to significant genetic changes to the dominant clade, resulting in an ‘antigenically novel’ strain.

“It is worth emphasizing that our sequence-based sampling approach—in contrast to traditional serologically based sampling—will reveal co-circulating strains even before they become antigenically novel.”

anon_22 – at 20:03

The paper then goes on to discuss various mutations which I’m not going to go through here, but there is an interesting mention of the issue of ‘correlated mutations’ - certain sets of 2 or more mutations that tend to happen together. This is something that is much more easily seen if you have a very large number of whole genome sequences, and then line them up as the diagram illustrates, so that you begin to see that mutations X and Y tend to happen together even if they are in different segments.

One of the major questions about the emergence of a pandemic virus is what mutation or collection of mutations are required for the virus to jump species? Are the necessary mutations independently or singly acquired, or are there particular combination of mutations that are so significant that they need to happen simultaneously for the change to be conserved? And if that’s the case, what is the mechanism for initiating or triggering such co-mutations?

anon_22 – at 20:05

lugon,

“briefly, that variability, even sudden variability, is a natural thing to happen.”

You are absolutely right, that variability is a natural thing. But until we can see enough data, we may be missing a whole lot of variability that may not be random at all!

anon_22 – at 20:05

I’m going to pause here and let someone pick up the thoughts.

22 June 2006

NS1 – at 02:05

I’m giving this thread a lift kit to clear all the obstacles.

Hurricane Alley RN – at 02:50

bump

anon_22 – at 23:38

OK, I thought there’s a whole bunch of sequence-nuts on this forum who would be interested in this topic. You know, those who’ve been saying ‘we gotta know the sequence’ every day.

What happened? You guys changed your mind?

Or is it a bit boring cos there’s no conspiracy? :-)

23 June 2006

Sasher – at 01:17

Bump…

laura in pa – at 01:40

bumping for bill

anonymous – at 02:14

I get an impressive 905 H3N2 and 164 H5N1 full genom-sequences. But they need some hand-working before they are easily computer-readable. The format is not very friendly and there are errors which have to be eliminated by hand. I’ll upload the data, once I have it computer-readable. This can probably also be used to estimate the amount of bird flu in China from considering the mutations at fixed locations. Well, in birds there is probably already antigenic adaption in H5N1, as with humans in H3N2, so forget what I wrote 06:24

anonymous – at 04:36

is there someone who can help out with some computer-time, running clustalw in background for some hours or days ?

So we can get the sequences in computer-readable form and analyze them and maybe extract some statistics about the mutations to help predicting the pandemic.

You can contribute a bit here to actively support the research and you will be credited !

FrenchieGirl – at 04:57

If clustalw does not need administrative rights to run on my work computer (‘coz I don’t have them), then I can run in background about 8–9 hours a day. At home it can run 24/24 and I have admin rights.

anonymous – at 06:09

OK, thanks. I sent email. I’m actually aligning the 164 H5N1-genoms. Segments 1 and 2 done. Took 1–2 hours each. But the 905 H3N2-sequences will take much longer. Isn’t there a faster program than clustalw.exe ?

beehiver – at 09:28

Hello Anon_22 at 23:38 - this article and color chart is so packed full of information, on my end it’s been a matter of having time to read and absorb the material. To be able to view the correlated changes is almost downright scary, lol…for instance look at the correlation in Jan & Feb 2002 between the HA49 and the NA56; or in winter 99–2000 between NA267 and PB2′s 569. So thanks for posting about this project. As time permits I’d like to compare what they’ve found with these sequences, against some changes in H3N2 from other research articles that may be related to infectivity. If some other people in this thread are attempting a similar effort with H5N1, that would be fabulous.

anonymous – at 10:01

did you notice, that there are 905 H3N2 genoms available meanwhile ? So the article with 270 is already outdated. Time for an update. I wrote to Steven Salzberg, but he’s on vacation until Sunday.

FrenchieGirl – at 10:12

Bump for anonymous at 04:36 and 09:28. Anyone else with some computer time to devote to this project?

FrenchieGirl – at 10:21

What a coincidence posting at the same time!

anonymous – at 12:15

http://msc.tigr.org/infl_a_virus/infl_a_virus.shtml

anonymous – at 12:17

http://www.niaid.nih.gov/dmid/genomes/mscs/influenza.htm

http://www.ncbi.nlm.nih.gov/genomes/FLU/overview.html

anon_22 – at 17:53

anonymous, “did you notice, that there are 905 H3N2 genoms available meanwhile ? So the article with 270 is already outdated”

The reason why I thought this was worth looking at was NOT whether it is current, but because it illustrates what happens when you can look at data in this particular way. I guess this is the kind of thing where there is not much I want to say to you about the chart specifically, but more about training yourself to think about what you are looking at. I purposefully did not put down everything that might be seen so you have a good exercise in asking questions in your own mind. Notice that asking questions even if you don’t necessarily find the answers is an extremely useful thing that IMHO everyone should practise daily.

NS1 – at 18:59

GS and Niman,

Have you looked at the open source software that Salzberg and his team have developed over the years?

Talk about introspection driving extrospection; I think you’ll each find some excellent ideas.

His background is artificial intelligence and machine learning . . . Yale and Harvard, oh, and, of course, an undergrad in English.

NS1 – at 19:36

Anon22,

In my experience architecting a major portion of the world’s largest, massively-parallel database systems, I’ve found simply that the best answers come from analyzing the most data.

The brilliant mind may immediately notice a trend even with only a small amount of data.

That same mind will make dozens of refining discoveries in a single day given a larger set of data that is designed for rapid enquiry.

If more questions may be asked of the data, due to flexibility of design and appropriateness of platform, then the refinements (tangentials) in an unstructured search may become much more important than the original question.

I’ve tracked tertiary ideas for a few days that saved one company $800 million a year.

Because the data was available in its most granular form and was designed well for rapid access by a wide group of people with varying backgrounds.

Think of what we could do if we could get the Influenza data organised?

Gather and Solve

24 June 2006

anonymous – at 01:23

NS1, thanks for the link to Salzberg’s page:
http://cbcb.umd.edu/~salzberg
he also has a letter there calling for the release of genome-data. I’ll try the MUMmer. Clustalw is too slow.

laura in pa – at 01:28

bumping for bill

27 June 2006

anonymous – at 00:32

I have a question about the NS-nucleotide sequences at genbank. They are apparantly not aligned in that 3 nucleotides make one amino-acid. There is a point where some extra nucleotides are included and then the 3-alignment changes. Why is it ? Where exactly is that place ? It’s uncomfortable for computer-analysis and programming.

From Flu Wiki 2

Forum: Influenza Genome Project

21 June 2006

anon_22 – at 05:35

anon_22 – at 05:39

anon_22 – at 06:09

lugon – at 06:19

anonymous – at 06:22

anonymous – at 06:24

anon_22 – at 08:53

anon_22 – at 08:57

Monotreme – at 10:31

anon_22 – at 18:42

anon_22 – at 19:42

anon_22 – at 20:03

anon_22 – at 20:05

anon_22 – at 20:05

22 June 2006

NS1 – at 02:05

Hurricane Alley RN – at 02:50

anon_22 – at 23:38

23 June 2006

Sasher – at 01:17

laura in pa – at 01:40

anonymous – at 02:14

anonymous – at 04:36

FrenchieGirl – at 04:57

anonymous – at 06:09

beehiver – at 09:28

anonymous – at 10:01

FrenchieGirl – at 10:12

FrenchieGirl – at 10:21

anonymous – at 12:15

anonymous – at 12:17

anon_22 – at 17:53

NS1 – at 18:59

NS1 – at 19:36

24 June 2006

anonymous – at 01:23

laura in pa – at 01:28

27 June 2006

anonymous – at 00:32

bumping for bill – at 01:17

bumping for bill – at 01:31

18 August 2006

Closed - Bronco Bill – at 12:48