From Flu Wiki 2

Forum: H 5 N 1 Recombination 2 Continued

19 March 2006

NS1 – at 04:30

This forum is a continuation of the the excellent discussion begun by Dr. Henry Niman of Recombinomics on H 5 N 1 Recombination.

Please use the link format shown just below the comment box to effectively avoid sidescroll and provide a more readable format.

GS-

Please consider running your program against the following datasets:

NS1 – at 05:42

gs-

Also consider

NS1 – at 05:48

Niman-

Have you researched potential donors for

Segment 8, NS1 B92E / Asp92Glu?

How about potential donors for a reversal at residue 92?

Do you believe that this polymorphism is essential to cytokinic dysregulation?

Is this position on your list of concerns?

anonymous – at 06:30

in the HA-gene with different Hs ? But
1.different Hs give quite different sequences
2.there are so many sequences with the H-gene, that takes me several weeks


looking at PB2 instead,there are 244,66,1010,68,384,89,44,4,206,20,15,9,9,0,5,3 sequences in the database for H1,H2,..,H16 for the PB2-gene.


I put H1,H5,H7,H9 (PB2) in one big file now - this will take some days, and this is only for PB2.
I also updated http://magictour.free.fr/recom.exe
you can play with the parameters, using the files
http://magictour.free.fr/pb2_h5
http://magictour.free.fr/pb1_h5
http://magictour.free.fr/pa_h5
http://magictour.free.fr/ha_h5
tell me when there are questions,suggestions,problems: sterten(at}aol.com

NS1 – at 06:47

guenter-

I don’t know how niman does it, but somehow he does these comparisons? I know that its a huge amount of data. Do what you can.

anonymous – at 10:03

Niman said, he has no own database, but uses the programs and features of the existing database. But then he can’t easily use own programs to examine the data for special properties or doing statistics ?! Running clustalw on PB2:H1+H5+H7+H9 now, takes an estimated 30h. Then converting, editing, preparing, running recom.exe, filtering.. give me 36 hours in total if everything works well. Your HA-thing would take an estimated 5days with 2749 sequences, many pairs have a slow score of 57 or even 28, so alignment could be poor.

Allquietonthewesternfront – at 11:23

NS1 - I’m embarrassed to report that I failed to find the reference to the animal die-off you requested. I’ve searched over at Curevents where I read it but to no avail. It was in Costa Rica. It was a report of a 50% die off of monkeys and toucans that prompted them to close a forest to tourists this winter. Anyone better at research than me want to take a crack?

Katerina – at 11:49

Allquiet — Found a NYT article about it: http://tinyurl.com/o73uo

Sounds like it wasn’t a virus, but instead starvation due to weather that caused the deaths:

“Dr. Carrillo and his colleagues, as well as government officials, worried they might have a mini-epidemic on their hands. But tissue samples from Corcovado spider monkeys — Costa Rica’s most endangered species of monkey — sent to a laboratory at the University of Texas for analysis showed no evidence of a virus or other pathogen.

The story of what really happened in Corcovado, or at least the prevailing theory, is less worrisome in the short term than a disease outbreak, but it has the potential to be deadly serious.

Costa Rican researchers think the affected animals starved to death because of a lack of available food sources and an inability to forage for food during several months of extreme rain and cold.”

Allquietonthewesternfront – at 12:03

Katerina - thank you for finding that. I read a discussion about it where someone from Costa Rica scoffed at the starvation theory, saying the rains weren’t that bad and that no one there really believed those animals could starve under those conditions. They believe that story was put out so tourists wouldn’t be scared away, but of course that is just speculation. Still, it does suggest events in India this past year.

Katerina – at 12:24

I think some of the unfortunate fallout from governments’ efforts to protect themselves (understandably) from economic consequences and (perhaps foolishly) from “panicking” their populace is that they are losing their citizens’ trust so that when they give explanations of events or try to assure the populace that all is well, they are not believed. I hope TPTB realize that having a more mature and realistic dialogue with citizens is only hope of avoiding disastrous panic when and if the TSHTF.

gs – at 13:09

NS1,others: shall we put this over to the fluwiki2 forum ? Seems that only we two and occasionally niman are interested in this thread, it’s becoming rather special.


I have a question about the 3rd-position mutations, which occur much more often than other mutations and usually don’t change the properties of the virus. Can we expect that these are random or are some amino-acids or positions still preferred for mutating ? What are they good for, why has mother nature invented this 3rd position ? It’s not just that we can identify recombinations easier ? ;-)

Racter – at 13:19

? Seems that only we two and occasionally niman are interested in this thread

Don’t be too sure about that. You can’t know who may be following along even though there aren’t a lot of responses.

Katerina – at 13:39

I agree with Racter. I can follow only a little of the specialized conversation in this thread, but I think it is very important and am avidly reading all entries, trying to understand what I can. The wiki is valuable in many ways. One of which is to allow collaboration like you have been creating and part of which is to allow others to “listen” in on the exchange of ideas and maybe occassionally throw in an idea of one’s own or just a question that then becomes a launching point for another idea or hypothesis.

Name – at 13:55

This looks interesting, but what is it that you’re trying to do? I’m afraid I missed the background.

20 March 2006

gs – at 11:06

NS1,I have the results with your combined H1,H5,H7,H9 sequences - PB2-gene. Best recombination candidates are here:
http://tinyurl.com/odqz7
probably only the first 5–10 are relevant - I’ll check against a randomized run later. There seem to be no mixed recombinants e.g. H1+H5, maybe mixed coinfection is rare or when it occurs, then the viruses don’t like to recombine ? This is not very significant with so few recombinations.

NS1 – at 17:42

gs-

Please interpret your output for us here. Can you provide a detailed specification of each column of your report along with headings?

NS1 – at 17:43

gs-

You must have these jobs running around the clock. When do you sleep and play chess?

I suppose that’s true multi-tasking.

Good work.

22 March 2006

NS1 – at 17:25

Reposting of GS latest results from the earlier thread.

GS wrote:

I finished examining the H5-viruses for recombination-candidates. 11 very clear recombinants, the candidate which has most possible partners gets a “*”.

PB2:Tree sparrow/Henan/2/2004(H5N1) *Tree sparrow/Henan/4/2004(H5N1) PB2:Ck/Korea/ES/03(H5N1) *SCk/HK/YU100/2002(H5N1) PB2:goose/Guangxi/914/2004(H5N1) *Dk/Guangxi/668/2004(H5N1) PB2:*WildDk/Guangdong/314/2004(H5N1) Ck/Henan/13/2004(H5N1) PB1:Ck/Yamaguchi/7/2004(H5N1) *Ck/Hebei/718/2001(H5N1) PB1:Gf/HK/38/2002(H5N1) *Ck/HK/37.4/2002(H5N1) PA: Ck/HK/31.4/02(H5N1) *Ck/HK/37.4/2002(H5N1) PA: *SCk/HK/YU100/2002(H5N1) Ck/HK/31.2/2002(H5N1) NP: Gf/HK/38/2002(H5N1) *Ck/HK/37.4/2002(H5N1) NA: Ck/Hong Kong/258/97(H5N1) *Ck/Hebei/718/2001(H5N1) NA: *tree sparrow/Henan/3/2004(H5N1) tree sparrow/Henan/4/2004(H5N1)

JoeWat 17:39

I agree with Katerina and Racter, please do not move the thread. Some of us are ignorant and polite enough to simply listen.

NS1 – at 17:50

JoeW and folks-

We’ll definitely keep the main thread here.

Guenter was being polite in offering to host the more esoteric portions of the conversation on the other site. Primary discussions will remain here. Some tangential pursuits may be studied elsewhere and reported here.

We’ll likely keep everything here so all may contribute.

Gather and Solve.

sue – at 17:58

http://tinyurl.com/ko888 Influenza virus receptors in the human airway NS1, Niman just posted this over at CE. Has Niman been here lately?

NS1 – at 18:32

You’ll likely see him on this thread or starting another on his latest commentary on S227N.

He and the Revere’s have already discussed everything that is coming in that next Nature article at much greater depth, so he may not be posting much on it here.

This latest article is not much in the way of news.

Niman’s latest commentary is more informative.

sue – at 18:40

Thank you NS1…I’ve been away and find it hard to catch up by reading all the threads. lol..not that I really knew what you guys were talking about anyway but it certainly sounds like information that I should try to understand! In the news lately is a mixed message….all the news services seem to have jumped in to warn us of the coming danger and at the same time we hear news about how hard this is to go human to human due to “the cough factor”. What’s one to believe or think? Anyway, I feel better when I see you guys talking about this…lol..don’t know why, just do.

23 March 2006

gs – at 03:56

it seems that recombination prefers other genes than HA in H5-viruses. I didn’t find any clear example of recombination in H5-HA. OK, I mean “this” sort of recombination, Niman would identify other sorts of recombination which even result in single nucleotide-changes.

But, as the two recent papers about binding of H5N1 to human respiratory-tract-cells show, recombination of H5N1 might not be required and a series of single-nucleotide-changes could do the trick.

I measured the “clustering” of mutations of the genes (this might indicate recombination):
PB2:223:496, 7497:15,353:620,,120:599, 979: 7, 637:1131
PB1 75:428, 483:17,266:590,, 81:492, 385: 5, 620: 996
PA 192:383, 820:17,340:651,, 77:484, 448: 5,1119:1387
HA 385:526, 859:47,489:660,,156:499, 360:15,1072:1224
NP 310:413, 584:15,260:663,,163:298, 281: 4, 876:1253
NA 169:527,10414:24,244:673,,176:369, 2341: 2, 830:1242
M 357:473, 2104: 5,224:755,,106,127,15054: 1, 735:1793
NS 204:510,42870:16, 39:764,, 27:343,34789: 3, 602:1966

the exact meaning of the numbers is hard to explain, but the 2nd value after “:” is always from a randomized run. The 2nd part after “,,” considers only 3rd position mutations (in codon), these often don’t change the properties of the virus. The NS gene shows high tendency for clustering the mutations but without showing the clear identifiable candiates for recombination as in the 11 examples above, so there could be other reasons for the clustering - the head-part could be more susceptible to mutations than the tail-part or vice versa. Maybe I should restrict to mutations which don’t change an amino-acid ? These should occur more “randomly”.

NS1 – at 16:38

gs-

Please provide specifications on the information in each column. Could you describe how you arrived at the list and the rankings? If you are able to produce a narrative or some specifications on the type of measurements that you are taking to get these outputs, then perhaps we can help interpret.

I’d love to give you some input on what ‘this’ type of recombination is, compared to the ‘other’ types that Niman is describing? But I don’t have any idea what ‘this’ type means in your output? We all need to come to a common definition, so that our work can proceed rapidly.

Everyone is looking to you for this detailed analysis . . . do you have time to tell us more?

NS1 – at 16:46

gs-

The cleavage areas on practially all of the HPAI Hemagglutinin (HA) H5N1 isolates clearly show recombination if you go back and read Niman’s early commentaries.

it seems that recombination prefers other genes than HA in H5-viruses.

Let’s be certain of our terms and definitions before we make sweeping interpretations of our calculations.

Niman’s concept is that single or groups of nucleotides are constantly recycled in sets smaller that an entire gene segment according to a set of observed tendencies. We expect that he will be publishing some of these observed tendencies or rules as time passes. Until then we can clearly follow his commentaries and see that large and small sub-segment portions of the Influenza genome are being recycled / recombined to form new strains.

Niman’s now given us dozens of straightforward examples that defy all explanations of random mutation due to persistence of identity across multiple isolates?

gs – at 20:33

the numbers are from the http://magictour.free.fr/recom.exe program. Source attached to the executable or at http://magictour.free.fr/recom.c

The 12 numbers in PB2:223:496, 7497:15,353:620,,120:599, 979: 7, 637:1131 are obtained by:

recom pb2_h5 l5 b100 i020 (average p) recom pb2_h5 l5 b100 i020 r66 (average p)

recom pb2_h5 l5 b100 i020 (number of pairs) recom pb2_h5 l5 b100 i020 r66 (number of pairs)

recom pb2_h5 b100 i020 (average p) recom pb2_h5 b100 i020 r66 (average p)

recom pb2_h5 l5 b100 i020 3 (average p) recom pb2_h5 l5 b100 i020 r66 3 (average p)

recom pb2_h5 l5 b100 i020 3 (number of pairs) recom pb2_h5 l5 b100 i020 r66 3 (number of pairs)

recom pb2_h5 b100 i020 3 (average p) recom pb2_h5 b100 i020 r66 3 (average p)

the program tries to cut the nucleotide-chain at a cut-point into two halfs and examines the distribution of the differences (mutations) of two sequences into these two parts. It computes the probability p, that the numbers of differences in the parts would have been obtained by randomly putting differences into the sequence.

b100: no cutpoint in the first or last 100 nucleotides. i020: tries cutpoints b,b+20,b+40,… l5: ignores sequence-pairs with p>10^−5 r66: permutes the differences of any two sequences randomly,

   this is the referrence to show what would have been expected
   if all mutations had been random. The “66″ is just a seed for
   initializing the random number generator, any other value is good too.

3: interprets 3 nucleotides as codon and reduces the sequences

   to 1/3 length by only considering the 3rd nucleotid in a codon
   This is assuming that the mutations are typically concentrated
   there and are more randomly distributed there

“this” type of recombination only refers to homologuous recombination where two sequences AB and CD with length(A)=length© and length(B)=length(D) become AD or CB.

I’d appreciate if you have a simple explanation and an example of some other recombination, e.g. with the cleavage site. I assume that the frequencies of occurrence of “this” and “that” recombination are (cor)related

gs – at 20:40

so far I’m assuming that all mutations are single-nucleotide changes or “this” recombinations AB+CD→AD+CB. Reassortments may or may not occur, since I’m fixing to one specified gene ,ignoring the others, this doesn’t matter.
Is there a statistics, how often mutations occur changing a single nucleotid but without changing the amino-acid ? Are these completely random or do they depend on the 3d-structure ?

gs – at 20:45

sorry, I forgot to format the message above with the end-of-line double-backslashes.So here again some lines:


The 12 numbers in
PB2:223:496, 7497:15,353:620,,120:599, 979: 7, 637:1131
are obtained by:


recom pb2_h5 l5 b100 i020 (average p)
recom pb2_h5 l5 b100 i020 r66 (average p)

recom pb2_h5 l5 b100 i020 (number of pairs)
recom pb2_h5 l5 b100 i020 r66 (number of pairs)

recom pb2_h5 b100 i020 (average p)
recom pb2_h5 b100 i020 r66 (average p)

recom pb2_h5 l5 b100 i020 3 (average p)
recom pb2_h5 l5 b100 i020 r66 3 (average p)

recom pb2_h5 l5 b100 i020 3 (number of pairs)
recom pb2_h5 l5 b100 i020 r66 3 (number of pairs)

recom pb2_h5 b100 i020 3 (average p)
recom pb2_h5 b100 i020 r66 3 (average p)


b100: no cutpoint in the first or last 100 nucleotides.
i020: tries cutpoints b,b+20,b+40,…
l5: ignores sequence-pairs with p>10^−5
r66: permutes the differences of any two sequences randomly,

   this is the referrence to show what would have been expected
   if all mutations had been random. The “66″ is just a seed for
   initializing the random number generator, any other value is good too.

3: interprets 3 nucleotides as codon and reduces the sequences to 1/3 length by only considering the 3rd nucleotid in a codon This is assuming that the mutations are typically concentrated there and are more randomly distributed there

24 March 2006

NS1 – at 03:23

gs-

Now we are on the way to understanding! You’ve done a great deal of work on the probability of AB and CD swapping partial gene segments of contiguous nucleotides to become AD or CB.

Please take your same example of the PB2 gene segment for all H5 Influenza strains on deposit at GenBank and tell us a little bit more about the 12 numbers. If you’d like, choose the simplest of the 12 numbers and give us a pseudocode explantion of just that one number with all of the steps of calculation and the assumptions that went into each calculation.

I know that it is agonizing for you to have to explain this in such detail to us, but please start with something simple that the average non-mathematician can understand and once we understand your first number, we’ll move to the other 11 numbers. Thank you for your patience. When we get the first one clearly understood, the others will be easier to teach.

Racter, Dubina, anyone?

gs – at 03:52

NS1, I don’t think anyone but you wants to know, so we could also do it in email, but OK. Given 2 sequences, aligned with clustalw of same length, first generate a binary difference sequence D(1..n) of their differences. Typically n=2000 and 100 of the values are 1, the others 0. Now you cut the sequence in half at position x, you have l1 of the about 100 differences at positions before x and l2 at positions after x. Now you can ask: if I through l1+l2 ones randomly onto a string of n zeros, what’s the probability that l1 or fewer ones land on positions before x ? Or that l2 or fewer land at positions after x ? Now take the minimum over all these probabilities for all x. That’s the value printed by recom.exe for that pair of sequences. The pair with the smallest such value is the best candidate for recombination.
recom pb2_h5 b100 i020
just tests not all x, but only 100,120,140,…,n-100 because that’s faster.


BTW., I’m reading this now: http://tinyurl.com/g6pnu
Do universal codon-usage patterns minimize the effects of mutation and translation error?

NS1 – at 04:03

gs-

Off the top of my head, please advise…

Can we do something that just looks for the pattern matches:

do until maxaccession
do until maxposition
if
HX.ISOLATEA.GeneSegGS.PositionP==HX.ISOLATEA+1.GeneSegGS.PositionP
Store all matches between two isolates at the same positions for later interp.

Please?

gs – at 04:59

we first get all the sequences in question and align them with clustalw

NS1 – at 06:21

gs-

Start with some of Niman’s reported sequences (10 or fewer) as the base and compare them to the entire H3+H5+H7+H9 series, byte by byte, recording each match in a table. Sort Niman’s sequences in descending order based on the number of other isolates that have matches, then show the matching positions for each.

Should make a very interesting start to verification.

Is it possible?

NS1 – at 06:38

gs-

We should only have to align once and then insert them into an aligned database by position, nucleotide by nucleotide. After that we can extract them at our leisure.

gs – at 07:10

the aligned H5-sequences are already uploaded, see above. I don’t know, what you mean with “Niman’s reported sequences” ? Example ? You mean: given 2 sequences, see where they are in the p-ordered list of all pairs for that gene ? Yes, that’s possible but I’m doing this by hand, because sequence-names are not unified (yet)

NS1 – at 07:20

gs-

Pattern matched search.

For your base, use up to 10 of the isolates that Niman has reported in any of his commentaries. You choose the ones that you’ve found most interesting.

For each one of those recombinated isolates, search the GenBank for any matches, position by position, to other isolates. Count the number of positions matched to any searched isolate. Rank then sort your matchcounts in descending order and then report Niman’s recombinant followed by the ranked list of each matching isolate showing the matched nucleotides. 1 matched isolate showing the nucleotides per line.

gs – at 08:17

how is this different/better from just searching my output-file with all pairs evaluated ?

26 March 2006

gs – at 08:01

I finished the calculation for the best recombination candidates for H1 using 244,272,240,796,351,383,299,238 sequences for 8 segments:


PB2:Swine/Illinois/100085A/01(H1N2) Swine/Ontario/53518/03(H1N1)
PB1:Swine/Ontario/57561/03(H1N1) Mallard/Alberta/130/2003(H1N1)
PA :WSN/33(H1N1) Swine/Italy/1513–1/98(H1N1)
PA :Swine/Ontario/57561/03(H1N1) Human/Denver/57(H1N1)
PA :Swine/Tennessee/26/77(H1N1) Swine/Ontario/53518/03(H1N1)
NA :Human/USA/1995(H1N1) Human/Slovakia/2000(H1N1)
HA :Swine/Japan/1992(H1N1) Human/USA/1991(H1N1)
NS :Swine/Korea/S10/2004(H1N1) Swine/Korea/S175/2004(H1N1)


NP :0005250 Swine/Canada/2003(H1N1) Swine/USA/2001(H1N2)
NA :0016467 Swine/Canada/2003(H1N1) Human/USA/1994(H1N1)
PA :0406947 Swine/Alberta/56626/03(H1N1) Swine/USA/1931(H1N1)
PB2:2895191 Swine/Tennessee/24/77(H1N1) Swine/Ontario/55383/04(H1N2)


format:
gene :
(score, the smaller the better the candidate)
first candidate, the one with the more partners
second candidate, the best partner for the first candidate

26 May 2006

BroncoBillat 01:28

Old thread closed to speed Forum access

Check dates

Retrieved from http://www.fluwikie2.com/index.php?n=Forum.H5N1Recombination2Continued
Page last modified on May 26, 2006, at 01:28 AM