Can we distinguish between noise and information? Is Allan Variance that tool?

- 2001-01- 2 Mathew Hendry: ". . . the recent thread "compression of DNA sequences" in news:comp.compression might be of interest."
- 2001-01- 5 John Feth: "I looked at about 700,000 bases . . . ." "An Allan deviation analysis shows that the order looks like noise in strings of up to about 1,000 bases but carries information in strings from 1,000 to 10,000 bases long."
- 2001-01- 6 Terry Ritter: "Although I am somewhat familiar with "Allan variance" . . . I am confused about the implication that it can be relied upon to distinguish between noise and information."
- 2001-01- 8 Mok-Kong Shen: "Could you or someone else kindly give a good reference of Allan variance or a tiny summary of it?
- 2001-01- 8 Douglas A. Gwyn: "It's essentially a 2-point variance, used for oscillators."
- 2001-01- 9 Terry Ritter: "Common (or "classic") variance is based on the mean, the arithmetic average of sampled values . . . ." "In contrast, Allan variance is based on the value of the previous sample . . . ." "The more interesting role of Allan variance is to assist in the analysis of residual noise."

Subject: Re: Genomes Date: Tue, 02 Jan 2001 14:34:23 +0000 From: Mathew Hendry <scampi@dial.pipex.com> Message-ID: <jjo35tkfrmsj6osukllvguekoqo3i7ace1@4ax.com> References: <3A5069FC.DB84B303@t-online.de> Newsgroups: sci.crypt Lines: 13 On Mon, 01 Jan 2001 12:29:00 +0100, Mok-Kong Shen <mok-kong.shen@t-online.de> wrote: >Does anyone happen to know the statistical properties of the >genome sequences in general? Are they sufficiently 'random'? They're not random or they almost certainly wouldn't work. :) But the recent thread "compression of DNA sequences" in news:comp.compression might be of interest. The thread starts at Message-ID <Pine.GSO.4.21.0010130158120.29055-100000@chopin.ifp.uiuc.edu> -- Mat.

Subject: Re: Genomes Date: 5 Jan 2001 18:54:46 GMT From: "John Feth" <John.Feth@honeywell.com> Message-ID: <01c0774a$05965f40$2104ef81@cdc11q71.cas.honeywell.com> References: <3A5069FC.DB84B303@t-online.de> Newsgroups: sci.crypt Lines: 30 I looked at about 700,000 bases (base here relates to a chemical constitution of the components A, C, G, T, not numerical) on a gene and found the A:C:G:T ratios to be very close to 3:2:2:3. An Allan deviation analysis shows that the order looks like noise in strings of up to about 1,000 bases but carries information in strings from 1,000 to 10,000 bases long. I believe an A always occurs with a T so steganography in DNA might be a little different than in photos or music. John Feth Mok-Kong Shen <mok-kong.shen@t-online.de> wrote in article <3A5069FC.DB84B303@t-online.de>... > > Does anyone happen to know the statistical properties of the > genome sequences in general? Are they sufficiently 'random'? > > BTW, since the code is base 4, one can use the same to readily > transcribe any given binary sequences. This could have some > steganographical benefit, I suppose. For a paragraph that is > gibberish easily gives rise to suspicion of crypto, while the > same in the alphabet AGCT is presumably difficult to > distinguish from the result of a genetic research, if > appropriately enveloped. One could perhaps also hide information > in genuine genome sequences through modifications analogous to > what is done with graphical files. > > M. K. Shen >

Subject: Re: Genomes Date: Sat, 06 Jan 2001 03:06:00 GMT From: ritter@io.com (Terry Ritter) Message-ID: <3a568193.841457@news.io.com> References: <01c0774a$05965f40$2104ef81@cdc11q71.cas.honeywell.com> Newsgroups: sci.crypt Lines: 23 On 5 Jan 2001 18:54:46 GMT, in <01c0774a$05965f40$2104ef81@cdc11q71.cas.honeywell.com>, in sci.crypt "John Feth" <John.Feth@honeywell.com> wrote: >I looked at about 700,000 bases (base here relates to a chemical >constitution of the components A, C, G, T, not numerical) on a gene and >found the A:C:G:T ratios to be very close to 3:2:2:3. An Allan deviation >analysis shows that the order looks like noise in strings of up to about >1,000 bases but carries information in strings from 1,000 to 10,000 bases >long. I believe an A always occurs with a T so steganography in DNA might >be a little different than in photos or music. Although I am somewhat familiar with "Allan variance," and continue to read the many papers available on the web, I am confused about the implication that it can be relied upon to distinguish between noise and information. Perhaps you would care to describe your experiments in detail. --- Terry Ritter ritter@io.com http://www.io.com/~ritter/ Crypto Glossary http://www.io.com/~ritter/GLOSSARY.HTM

Subject: Re: Genomes Date: Mon, 08 Jan 2001 18:32:41 +0100 From: Mok-Kong Shen <mok-kong.shen@t-online.de> Message-ID: <3A59F9B9.D6EF5FF2@t-online.de> References: <3a568193.841457@news.io.com> Newsgroups: sci.crypt Lines: 19 Terry Ritter wrote: > [snip] > Although I am somewhat familiar with "Allan variance," and continue to > read the many papers available on the web, I am confused about the > implication that it can be relied upon to distinguish between noise > and information. [snip] Could you or someone else kindly give a good reference of Allan variance or a tiny summary of it? I failed to find pointers from a couple of well-known and very comprehensive reference materials of statistical sciences in the library. Thanks. M. K. Shen

Subject: Re: Genomes Date: Mon, 8 Jan 2001 20:08:04 GMT From: "Douglas A. Gwyn" <gwyn@arl.army.mil> Message-ID: <3A5A1E24.CE8847FA@arl.army.mil> References: <3A59F9B9.D6EF5FF2@t-online.de> Newsgroups: sci.crypt Lines: 7 Mok-Kong Shen wrote: > Could you or someone else kindly give a good reference of > Allan variance or a tiny summary of it? http://www.allanstime.com/AllanVariance/ It's essentially a 2-point variance, used for oscillators.

Subject: Re: Genomes Date: Tue, 09 Jan 2001 06:29:52 GMT From: ritter@io.com (Terry Ritter) Message-ID: <3a5aafc8.19532804@news.io.com> References: <3A59F9B9.D6EF5FF2@t-online.de> Newsgroups: sci.crypt Lines: 115 On Mon, 08 Jan 2001 18:32:41 +0100, in <3A59F9B9.D6EF5FF2@t-online.de>, in sci.crypt Mok-Kong Shen <mok-kong.shen@t-online.de> wrote: >[...] >Could you or someone else kindly give a good reference of >Allan variance or a tiny summary of it? I failed to find >pointers from a couple of well-known and very comprehensive >reference materials of statistical sciences in the library. VARIANCE We recall from descriptive statistics that a "variance" statistic attempts to capture (or "model") -- in one value -- the extent to which data vary from some basis. The square root of variance is "deviation," which is the expected difference each sample has from the base value. Common (or "classic") variance is based on the mean, the arithmetic average of sampled values (here please pardon my pseudocode): | mean := SUM(x[i]) / n; | var := SUM( SQR(x[i] - mean) ) / (n - 1); | sdev := SQRT( var ); for an array of n sample values x[]. In contrast, Allan variance is based on the value of the previous sample: | allanvar := SUM( SQR(x[i] - x[i-1]) ) / (2*(n-1)); | allandev := SQRT( allanvar ); The value "2" in the denominator is apparently intended to produce the same result as classical variance over white noise. Note that an n-element array implies only n-1 difference values. There is also a "mean deviation" or "absolute deviation" which uses the absolute value of the difference, which thus avoids the squaring operation and is also supposedly "more robust": | adev := SUM( ABS(x[i] - mean) ) / n; Other types of variance include "Hadamard variance," a related form called "SIGMA-Z," and probably many other types as well. Each of these presumably provides a unique view of the differences in sampled data, and none is likely to be ideal for every application. ADVANTAGE The first role of Allan variance is fairly conventional: to provide a measure of variation. In frequency measurement work, measured frequency may be sampled at some rate. The resulting Allan deviation over the sample values is a general measure of frequency stability at the sampling rate. And by averaging each m adjacent samples, we can get an Allan variance for (synthetic) slower sampling rates. It is also possible to measure time differences between two sources and then compute the Allan deviation from a slightly more complex formula. The more interesting role of Allan variance is to assist in the analysis of residual noise. In frequency measurement work, five different types of noise are defined: white noise phase modulation, flicker noise phase modulation, white noise frequency modulation, flicker noise frequency modulation, and random walk frequency modulation. A log-log plot of Allan variance versus sample period produces approximate straight line values of different slopes in four of the five possible cases. A different (more complex) form called "modified Allan deviation" can distinguish between the remaining two cases. The result is a powerful basis for identifying problems and engineering improved designs. SOURCES If you go to www.google.com and type in "Allan variance" or "Allan deviation" you should get several pages of links to web pages. Some of those are just a use in a particular project, some purport to be definitions and are confusing, but overall one can develop an understanding of the concept. There is a lot of mention of Allan variance in the literature surrounding precision frequency measurement, e.g., in the yearly proceedings of the annual IEEE International Frequency Control Symposium. EXAMPLE REFERENCES Allan, D. and J. Barnes. 1981. A Modified "Allan Variance" with Increased Oscillator Characterization Ability. Proceedings of the 35th Annual Frequency Control Symposium. 470-475. Greenhall, C. 1992. A Shortcut for Computing the Modified Allan Variance. 1992 IEEE Frequency Control Symposium. 262-264. Ferre-Pikal, E., et. al. 1997. Draft Revision of IEEE Std 1139-1988 Standard Definitions of Physical Quantities for Fundamental Frequency and Time Metrology -- Random Instabilities. 1997 IEEE International Frequency Control Symposium. 338-357. Respero. 1999. Allan variance: variations and application to metrology gauge data. http://huey.jpl.nasa.gov/~respero/allan-var/ Riley, W. 2001. The Calculation of Time Domain Frequency Stability. http://www.ieee-uffc.org/freqcontrol/paper1ht.html --- Terry Ritter ritter@io.com http://www.io.com/~ritter/ Crypto Glossary http://www.io.com/~ritter/GLOSSARY.HTM

*Terry Ritter, his
current address, and his
top page. *

*Last updated:* 2001-06-27