Copyright 1995 to 2007 Terry Ritter. All Rights Reserved.
For a basic introduction to cryptography, see "Learning About Cryptography" @: http://www.ciphersbyritter.com/LEARNING.HTM. Please feel free to send comments and suggestions for improvement to: ritter@ciphersbyritter.com (you may need to copy and paste the address into a web email reader). You may wish to help support this work by patronizing "Ritter's Crypto Bookshop" at: http://www.ciphersbyritter.com/BOOKSHOP.HTM.
Or use the browser facility "Edit / Find on this page" to search for particular terms.
This Glossary started as a way to explain the terms on my cryptography web pages describing my:
The value of a definition is insight. But:
Consider the idea that cryptography is used to keep secrets: We expect a cipher to win each and every contest brought by anyone who wishes to expose secrets. We call those people opponents, but who are they really, and what can they do? In practice, we cannot know. Opponents operate in secret: We do not know their names, nor how many they are, nor where they work. We do not know what they know, nor their level of experience or resources, nor anything else about them. Because we do not know our opponents, we also do not know what they can do, including whether they can break our ciphers. Unless we know these things that cannot be known, we cannot tell whether a particular cipher design will prevail in battle. We cannot expect to know when our cipher has failed.
Even though the entire reason for using cryptography is to protect secret information, it is by definition impossible to know whether a cipher can do that. Nobody can know whether a cipher is strong enough, no matter how well educated they are, or how experienced, or how well connected, because they would have to know the opponents best of all. The definition of cryptography implies a contest between a cipher design and unknown opponents, and that means a successful outcome cannot be guaranteed by anyone.
Consider the cryptographer who says: "My cipher is strong," and the cryptanalyst who says: "I think your cipher is weak." Here we have two competing claims with different sets of possibilities: First, the cryptographer has the great disadvantage of not being able to prove cipher strength, nor to even list every possible attack so they can be checked. In contrast, the cryptanalyst might be able to actually demonstrate weakness, but only by dint of massive effort which may not succeed, and will not be compensated even if it does. Consequently, most criticisms will be extrapolations, possibly based on experience, and also possibly wrong.
The situation is inherently unbalanced, with a bias against the cryptographer's detailed and thought-out claims, and for mere handwave first-thoughts from anyone who deigns to comment. This is the ultimate conservative bias against anything new, and for the status quo. Supposedly the bias exists because if the cryptographer's claim is wrong user secrets might be exposed. But the old status-quo ciphers are in that same position. Nothing about an old cipher makes it necessarily strong.
Unfortunately, for users to benefit from cryptography they have to accept some strength argument. Even more unfortunately:
In modern society we purchase things to help us in some way. We go to the store, buy things, and they work. Or we notice the things do not work, and take them back. We know to take things back because we can see the results. Manufactured things work specifically because design and production groups can test which designs work better or worse or not at all. In contrast, if the goal of cryptography is to keep secrets, we generally cannot expect to know whether our cipher has succeeded or failed. Cryptography cannot test the fundamental property of interest: whether or not a secret has been kept.
The inability to test for the property we need is an extraordinary situation; perhaps no other manufactured thing is like that. Because the situation is unique, few understand the consequences. Cryptography is not like other manufactured things: nobody can trust it because nobody can test it. Nobody, anywhere, no matter how well educated or experienced, can test the ability of an unbroken cipher to keep a secret in practice. Thus we see how mere definitions allow us to deduce fundamental limitations on cryptography and cryptanalysis by simple reasoning from a few basic facts.
The desire to expose relationships between ideas meant expanding the Glossary beyond cryptography per se to cover terms from related areas like electronics, math, statistics, logic and argumentation. Logic and argumentation are especially important in cryptography, where measures are few and math proofs may not apply in practice.
This Crypto Glossary is directed toward anyone who wants a better understanding of what cryptography can and cannot do. It is intended to address basic cryptographic principles in ways that allow them to be related, argued, and deeply understood. It is particularly concerned with fundamental limits on cryptography, and contradictions between rational thought and the current cryptographic wisdom. Some of these results may be controversial.
The Glossary is intended to build the fundamental understandings which lie at the base of all cryptographic reasoning, from novice to professional and beyond. It is particularly intended for users who wish to avoid being taken in by attacker propaganda. (Propaganda is an expected part of cryptography, since it can cause users to take actions which make things vastly easier for opponents.) The Glossary is also for academics who wish to see and avoid the logic errors so casually accepted by previous generations. One goal of the Glossary is to clarify the usual casual claims that confuse both novices and professionals. Another is to provide some of the historical technical background developed before the modern mathematical approach.
The way we understand reality is to follow logical arguments. All of us can do this, not just professors or math experts. Even new learners can follow a cryptographic argument, provided it is presented clearly. So, in this Glossary, one is occasionally expected to actually follow an argument and come to a personal conclusion. That can be scary when the result contradicts the conventional wisdom; then one then starts to question both the argument and the reasoning, as I very well know. But that scary feeling is just an expected consequence of a field which has allowed various unsupported claims and unquestioned beliefs to wrongly persist (see old wives' tales).
Unfortunately, real cryptography is not well-modeled by current math (for example, see proof and cryptanalysis). It is normally expected that the link between theory and reality is provided by the assumptions the math requires. (Obviously, proof conclusions only apply in practice when every assumed quality actually occurs in practice.) In math, each of these assumptions has equal value (since the lack of any one will void the conclusion), but in practice some assumptions are more equal than others. Certain assumptions conceivably can be guaranteed by the user, but other assumptions may be impossible to guarantee. When a model requires assumptions that cannot be verified in practice, that model cannot predict reality.
Current mathematical models almost never allow situations where the user can control every necessary assumption, making most proof results meaningless in practice. In my view, mathematical cryptography needs practical models. Of course, one might expect more realistic models to be less able to support the current plethora of mathematical results. Due to the use of more realistic models, some results in the Crypto Glossary do contradict well-known math results.
By carrying the arguments of conventional cryptographic wisdom to their extremes, it is possible to see two opposing groups, which some might call theoretical versus practical. While this simplistic model is far too coarse to take very seriously, it does have some basis in reality.
The Crypto Theorists supposedly argue that no cryptosystem can be trusted unless it has a mathematical proof, since anything less is mere wishes and hope. Unfortunately, there is no such cryptosystem. No cipher can be guaranteed strong in practice, and that is the real meaning of the one time pad. As long as even one unbreakable system existed, there was at least a possibility of others, but now there is no reason for such hope. The OTP is secure only in simplistic theory, and strength cannot be guaranteed in practice for users. This group seems most irritating when they imply that math proofs are most important, even when in practice those proofs provide no benefits to the user.
The Crypto Practitioners supposedly argue that systems should be designed to oppose the most likely reasonable threats, as in physical threat model analysis. In the physical world it is possible to make statements about limitations of opponents and attacks; unfortunately, few such statements can be made in cryptography. In cryptography, we know neither the opponents nor their attacks nor what they can do in combination. Successful attack programs can be reproduced and then applied by the most naive user, who up to that time had posed only the most laughable threat.
Both groups are wrong: There will be no proof in practice, and speculating on the abilities of the opponents is both delusional and hopeless. Moreover, no correct compromise seems possible. Taking a little proof from one side and some threat analysis from the other simply is not a valid recipe for making secure ciphers.
There is a valid recipe for security and that is a growing, competitive industry of cipher development. Society needs more than just a few people developing a handful of ciphers, but actual design groups who continually innovate, design, develop, measure, attack and improve new ciphers in a continuing flow. That is expensive work, as the NSA budget clearly shows. Open society will get such results only if open society will pay for them. Since payment is the issue, it is clear that "free" ciphers act to oppose exactly the sort of open cryptographic development society needs.
Absent an industry of cipher design, perhaps the best we can do is to design systems in ways such that a cipher actually can fail, while the overall system retains security. That is redundancy, and is a major part of engineering most forms of life-critical systems (e.g., airliners), except for cryptography. The obvious start is multiple encryption.
The practical worth of all this should be a serious regard for cryptographic risk. The possibility of cryptographic failure exists despite all claims and proofs to the contrary. Users who have something to protect must understand that cryptography has risks, and there is a real possibility of failure. If a possibility of information exposure is acceptable, one might well question the use of cryptography in the first place.
Even if users only want their information probably to be secure, they still have a problem: Only our opponents know our cipher failures, because they occur in secret. Our opponents do not expose our failures because they want those ciphers to continue in use. Few if any users will know when there is a problem, so we cannot count how many ciphers fail, and so cannot know that probability. Since there can be no expertise about what unknown opponents do, looking for an "expert opinion" on cipher failure probabilities or strength is just nonsense.
Conventional cryptographic expertise is based on the open literature. Unfortunately, unknown attacks can exist, and even the best informed cannot predict strength against them. While defending against known attacks may seem better than nothing, that actually may be nothing to opponents who have another approach. In the end, cipher and cryptosystem designers vigorously defend against attacks from academics who will not be their opponents.
On the other hand, even opponents read the open literature, and may make academic attacks their own. But surprisingly few academic attacks actually recover key or plaintext and so can be said to be real, practical threats. Much of the academic literature is based on strength assumptions which cannot be guaranteed or vulnerability assumptions which need not exist, making the literature less valuable in practice than it may appear.
Math cannot prove that a cipher is strong in practice, so we are forced to accept that any cipher may fail. We do not, and probably can not know the likelihood of that. But we do know that a single cipher is a single point of failure which just begs disaster. (Also see standard cipher.)
It is possible to design in ways which reduce risk. Systems can be designed with redundancy to eliminate the single point of failure (see multiple encryption). This is often the done in safety-critical fields, but rarely in cryptography. Why? Presumably, people have been far too credulous in accepting math proofs which rarely apply in practice. Thus we see the background for my emphasis on basics, reasoning, proof, and realistic math models.
To protect against fire, flood or other disaster, most software developers should store their current work off-site. The obvious solution is to first encrypt the files and then upload an archive to a web site. The straightforward use of cryptography to protect archives is an example of the pristine technical situation often seen as normal. Then we think of cipher strength and key protection, which seem to be all there is. But most cryptography is not that simple.
Climate of Secrecy. For any sort of cryptography to work, those who use it must not give away the secrets. Most times keeping secrets is as easy, or as hard, as just not talking or writing about them. Issues like minimizing paper output and controlling and destroying copies seem fairly obvious, although hardly business as usual. But secrets are almost always composed in plaintext, and the computers doing that may have plaintext secrets saved in various hidden operating system files. And opponents may introduce programs to compromise computers which handle secrets. It is thus necessary to control all forms of access to equipment which holds secrets, despite that being awkward and difficult. It is especially difficult to control access on the net.
Network Security. Computers only can do what they are told to do. When network designers decide to include features which allow attacks, that decision is as much a part of the problem as an attack itself. It seems a bit much to complain about insecurity when insecurity is part of the design. Design decisions have made the web insecure. Until web systems only implement features which maintain security, there can be none.
It is possible to design computing systems more secure than the ones we have now. If we provide no internal support for external attack, no attacks can prevail. The entire system must be designed to limit and control external web access and prevent surprises that slip by unnoticed. We can decompose the system into relatively small modules, and then test those modules in a much stronger way than trying to test a complex program. A possible improvement might be some form of restricted intermediate or quarantine store between the OS and the net. Better security design may mean that some things now supported insecurely no longer can be supported at all.
Current practice identifies two environments: The local computer, which is "fully" trusted, and the Internet, which is not trusted. This verges on a misuse of the concept of trust, which requires substantial consequences for misuse or betrayal. Absent consequences, trust is mere unsupported belief and provides no basis for reasoning. We do not trust a machine per se, since it only does what the designer made it do. And when there are no consequences for bad design, there really is no reason to trust the designer either.
A better approach would be fine OS control over individual programs, including individual scripts, providing validation and detailed limits on what each program can do, on a per-program basis. This would expand the firewall concept from just net access to every resource, including processor time, memory, all forms of I/O, plus the ability to invoke, or be invoked by, other programs. For example, most programs do not need, and so would not be allowed, net access, even if invoked by a program or running under a process which has such access. Programs received from the net would by default start out in quarantine, not have access to normal store, and could run only under strong limitations. A human would have to explicitly elevate them to a selected higher status, with the change logged. Program operation exceeding limitations would be prevented, logged, and accumulated in a control which supported validation, fine tuning, selective responses and serious quarantine.
Security is Off-The-Net. The best way to avoid web insecurity has nothing to do with cryptography. The way to avoid web insecurity is to not connect to the web, ever. Use a separate computer for secrets, and do not connect it to the net, or even a LAN, since computers on the LAN probably will be on the net. Carefully move information to and from the secrets computer with a USB flash drive. Protect access to that equipment.
For most users, the Crypto Glossary will have many underlined (or perhaps colored) words. Usually, those are hypertext "links" to other text in the Glossary; just click on the desired link.
Links to my other pages generally offer a choice between a "local" link or a full web link. The user working from a downloaded copy of the Glossary only would normally use the full web links. The user working from a CD or disk-based copy of all my pages would normally use the local links.
Links to my other pages also generally open and use another window. (Hopefully that will avoid the need to reload the Glossary after a reference to another article.) Similarly, links from my other pages to terms in the Glossary also generally open a window specifically for the Glossary. (In many cases, that will avoid reloading the Glossary for every new term encountered on those pages.)
In cryptography, as in much of language in general, the exact same word or phrase often is used to describe two or more distinct ideas. Naturally, this leads to confused, irreconcilable argumentation until the distinction is exposed (and often thereafter). Usually I handle this in the Crypto Glossary by having multiple numbered definitions, with the most common usage (not necessarily the best usage) being number 1.
The worth of this Glossary goes beyond mere definitions. Much of the worth is the relationships between ideas: Hopefully, looking up one term leads to other ideas which are similar or opposed or which support the first. The Glossary is a big file, but breaking it into many small files would ruin much of the advantage of related ideas, because then most related terms would be in some other part. And although the Glossary could be compressed, that would generally not reduce download time, because most modems automatically compress data during transmission anyway. Dial-up users typically should download the Glossary onto local storage, then use it locally, updating periodically.
I have obviously spent a lot of personal time constructing this Crypto Glossary, with the hope that it would be more than just background to my work. Hopefully, the Glossary and the associated introduction: "Learning About Cryptography" (see locally, or @: http://www.ciphersbyritter.com/LEARNING.HTM) will be of some wider benefit to the crypto community. So, if you have used this Glossary lately, why not drop me a short email and tell me so? Feel free to tell me how much it helped or even how it failed you; perhaps I can make it better for the next guy. If you use web email, just copy and paste my email address: ritter@ciphersbyritter.com
Resistor excess noise is a
The especially large amount of
In a single-crystal
semiconductor,
Generally used for power distribution because the changing
current supports the use of
transformers. Utilities can thus
transport power at high
voltage and low
current, which minimize
"ohmic"
or I2R losses. The high voltages are then reduced
at power substations and again by pole transformers for delivery
to the consumer.
One example is byte addition modulo 256, which simply adds two byte values, each in the range 0..255, and produces the remainder after division by 256, again a value in the byte range of 0..255. The modulo is automatic in an addition of two bytes which produces a single byte result. Subtraction is also an "additive" combiner.
Another example is bit-level exclusive-OR which is addition mod 2. A byte-level exclusive-OR is a polynomial addition.
Additive combiners are linear, in contrast to nonlinear combiners such as:
Knuth, D. 1981. The Art of Computer Programming, Vol. 2, Seminumerical Algorithms. 2nd ed. 26-31. Addison-Wesley: Reading, Massachusetts.
Marsaglia, G. and L. Tsay. 1985. Matrices and the Structure of Random Number Sequences. Linear Algebra and its Applications.67:147-156.
Advantages include:
In addition, a vast multiplicity of independent cycles has the potential of confusing even a quantum computer, should such a thing become realistic.
For Degree-n Primitive, and Bit Width w Total States: 2nw Non-Init States: 2n(w-1) Number of Cycles: 2(n-1)(w-1) Length Each Cycle: (2n-1)2(w-1) Period of lsb: 2n-1
The binary addition of two bits with no carry input is just XOR, so the lsb of an Additive RNG has the usual maximal length period.
A degree-127 Additive RNG using 127 elements of 32 bits each has 24064 unique states. Of these, 23937 are disallowed by initialization (the lsb's are all "0") but this is just one unusable state out of 2127. There are still 23906 cycles which each have almost 2158 steps. (The Cloak2 stream cipher uses an Additive RNG with 9689 elements of 32 bits, and so has 2310048 unique states. These are mainly distributed among 2300328 different cycles with almost 29720 steps each.)
Like any other
LFSR, and like any other
RNG, and like any other
FSM, an Additive RNG is very weak when
standing alone.
But when steps are taken to hide the sequence (such as using a
jitterizer nonlinear filter and
Dynamic Substitution
combining), the resulting cipher can have significant strength.
The mechanics of AES are widely available elsewhere. Here I note how one particular issue common to modern block ciphers is reflected in the realized AES design. That issue is the size of the implemented keyspace compared to the size of the potential keyspace for blocks of a given size.
A common academic model for conventional block ciphers is a "family of permutations." The "permutation" part of this means that every plaintext block value is found as ciphertext, but generally in a different position. The "family" part of this can mean every possible permutation. However, modern block ciphers key-select only an infinitesimal fraction of those possibilities.
Suppose we have a block which may take on any of n
different values.
How many ways can those n block values be rearranged as
in a block cipher?
Well, the first value can be placed in any of the n possible
positions, but that fills one position so the second value has only
A
A
For
The obvious conclusion is that almost none of the keyspace implicit in the theoretical model of a conventional block cipher is actually implemented in AES, and that is consistent with other modern designs. Is that important? Apparently not, but nobody really knows. It does seem to imply that just a few known plaintext blocks should be sufficient to identify the correct key from a set of possibilities, which might make known plaintext more of an issue than normally claimed. Does it lead to a known break? No, or at least not yet. But having only a tiny set of keyed permutations should lead to questions about patterns and relationships within the selected set.
The real issue here is not the exposure of a particular weakness in AES, since no such weakness is shown. Instead, the issue is that conventional cryptographic wisdom does not force models to correspond to reality, and poor models lead to errors in reasoning. The distinction between theory and practice is pronounced in cryptography. For other examples of failure in the current cryptographic wisdom, see one time pad, BB&S, DES, and, of course, old wives' tale.
AES is said to be certified for SECRET and TOP SECRET classified material. That might have us believe that AES is trusted by NSA, but it may mean less than it seems.
No cipher, by itself, can guarantee security.
Any cryptographic system will have to be certified by NSA before
protecting classified information.
In practice, cryptosystems will be provided by NSA to contractors,
those systems may or may not use AES, and they may not use AES in
the expected form.
That does not imply that AES is bad, it just means that we cannot
really know what NSA will allow, despite general claims.
Technically, function f : G -> G of the form:
f(x) = ax + bwith non-zero constant "b".
anxn + an-1xn-1 + ... + a1x1 + a0where the operations are mod 2: addition is Exclusive-OR, and multiplication is AND.
Note that all of the variables xi are to the first power only, and each coefficient ai simply enables or disables its associated variable. The result is a single Boolean value, but the constant term a0 can produce either possible output polarity.
Here are all possible 3-variable affine Boolean functions (each of which may be inverted by complementing the constant term):
affine truth table
c 0 0 0 0 0 0 0 0
x0 0 1 0 1 0 1 0 1
x1 0 0 1 1 0 0 1 1
x1+x0 0 1 1 0 0 1 1 0
x2 0 0 0 0 1 1 1 1
x2+ x0 0 1 0 1 1 0 1 0
x2+x1 0 0 1 1 1 1 0 0
x2+x1+x0 0 1 1 0 1 0 0 1
See also:
Boolean function
nonlinearity.
F(x) = ax + b (mod n)
where the non-zero
term makes the
equation
affine.
Most of the classic hand-ciphers can be seen as
simple substitution
stream ciphers. Each
plaintext letter selects an entry in the
substitution table (for that
cipher), and the contents of that entry becomes the
ciphertext letter.
The affine equation thus represents one way to set
up the table, as a particular simple
permutation of the letters in the table.
(Of course, by using the equation we need no explicit table, but
we also constrain ourselves to the simplicity of the equation.)
To assure that we have a permutation, we require that a and n
be
relatively prime, that is, the
In modern terms, the
strength of the classic substitution
ciphers is essentially nil. In modern
cryptanalysis, we generally assume
that the
opponent has a substantial amount of
known plaintext.
Since the table does not change, every known-plaintext character
has the potential to fill in another entry in the table.
Very soon the table is almost completely exposed, which ends all
strength.
These simple substitution ciphers with small, fixed tables (or even
just equations for such tables) are also extremely vulnerable to
attacks using
ciphertext only.
"The first combining operation is called the product operation and corresponds to enciphering the message with the first secrecy system R and enciphering the resulting cryptogram with the second system S, the keys for R and S being chosen independently."
"The second combining operation is 'weighted addition.'
It corresponds to making a preliminary choice as to whether system R or S is to be used with probabilities p and q, respectively. When this is done or R or S is used as originally defined." [p.658]S = pR + qS p + q = 1.
More specifically (and with a change of notation):
"If we have two secrecy systems T and R we can often combine them in various ways to form a new secrecy system S. If T and R have the same domain (message space) we may form a kind of 'weighted sum,'
S = pT + qRwherep + q = 1. This operation consists of first making a preliminary choice with probabilities p and q determining which of T and R is used. This choice is part of the key of S. After this is determined T or R is used as originally defined. The total key of S must specify which of T and R is used, and which key of T (or R) is used.""More generally we can form the sum of a number of systems.
S = p1T + p2R + . . . + pmU Sum( pi ) = 1We note that any system T can be written as a sum of fixed operationsT = p1T1 + p2T2 + . . . + pmTmTi being a definite enciphering operation of T corresponding to key choice i, which has probability p.""A second way of combining two secrecy systems is taking the 'product,' . . . . Suppose T and R are two systems and the domain (language space) of R can be identified with the range (cryptogram space) of T. Then we can apply first T to our language and then R to the result of this enciphering process. This gives a resultant operation S which we write as a product
S = RTThe key for S consists of both keys of T and R which are assumed chosen according to their original probabilities and independently. Thus if the m keys of T are chosen with probabilitiesp1p2 . . . pmand the n keys of R have probabilitiesp'1p'2 . . . p'n ,then S has at most mn keys with probabilities pi p'j. In many cases some of the product transformations Ri Tj will be the same and can be grouped together, adding their probabilities."Product encipherment is often used; for example, one follows a substitution by a transposition or a transposition by a Vigenere, or applies a code to the text and enciphers the result by substitution, transposition, fractionation, etc."
"It should be emphasized that these combining operations of addition and multiplication apply to secrecy systems as a whole. The product of two systems TR should not be confused with the product of the transformations in secrecy systems Ti Rj . . . ."
-- Shannon, C. E. 1949. Communication Theory of Secrecy Systems. Bell System Technical Journal.28:656-715.
It is easy to dismiss this as being of historical interest only, but there are advantages here which are well beyond our current usage.
For the keyed selection among ciphers, there would be some sort of simple protocol (i.e., not cryptographic per se), for communicating cipher selections to the deciphering end. (Perhaps there would be some sort of simple handshake for email use.) The result would be to have (potentially) a new selection from a set of ciphers on a message-by-message basis.
With respect to multiple encryption or ciphering "stacks" (as in "protocol stacks"), there are various security advantages:
Also see:
Perfect Secrecy and
Ideal Secrecy.
An algorithm intended to execute reliably as a computer program necessarily must handle, or in some way at least deal with, absolutely every error condition which can possibly occur in operation. (We do assume functional hardware, and thus avoid programming around the possibility of actual hardware faults, such as memory or CPU failure.) These "error conditions" normally include Operating System errors (e.g., bad parameters passed to an OS operation, resource not available, various I/O failures, etc.), and arithmetic issues (e.g., division by zero, overflow, etc.) which may halt execution when they occur.
Other possibilities include errors the OS will not know about, including the misuse of programmer-defined data structures, such as buffer overrun.
A practical algorithm must recognize various things which
validly may occur, even if such things are exceedingly rare.
One example might be in assuming that two floating-point variables
which represent the same value will be equal.
Another example might be to assume that a floating-point variable
will "never" have some particular value (which might lead to a
divide-by-zero fault).
Yet another example would be to assume that an arbitrary selection
of x will lead to a sufficiently long cycle in
BB&S, even if the alternative is very,
very unlikely.
In particular, my Cloak2 and Penknife ciphers implemented encrypted alias files of text lines of arbitrary length, each of which included name, start date, and key. New keys were made available only as secure ciphertext, but the alias files were arranged so they could consist of multiple ciphertext files simply concatenated as ciphertext. Thus, new keys could be added to the start of the alias file just using a simple and secure file copy operation. When searching for a particular alias, the date was also checked, and that key used only when the correct date had arrived. This allowed an entire office of users to change to a new key automatically, at the same time, without even knowing they were using a different key. Appropriate functions allowed access to old keys so that email traffic could be archived in ciphertext form.
Obviously, an alias file must be encrypted. The single key or keyphrase decrypting an alias file thus provides access to all the keys in the file. But each alias file contains only a subset of the keys in use within an organization, and even those are only valid over a subset of time. An organization security officer could archive old alias files, strip out the old keys and add new ones, then encipher the new alias file under a new pass phrase. In this way, the contents of old encrypted email would not be hidden from the authorizing organization. Alias file maintenance could be either as complex or as simple as one might like.
See, for example,
Allan Variance is useful in analysis of residual noise in precision frequency measurement. Five different types of noise are defined: white noise phase modulation, flicker noise phase modulation, white noise frequency modulation, flicker noise frequency modulation, and random walk frequency modulation. A log-log plot of Allan variance versus sample period produces approximate straight line values of different slopes in four of the five possible cases. A different (more complex) form called "modified Allan deviation" can distinguish between the remaining two cases.
Also see
"Definition. A transformation f mapping a message sequence m1,m2,...,ms into a pseudo-message sequence m1',m2',...,ms' is said to be an all-or-nothing transform if:
- The transform f is reversible: given the pseudo-message sequence, one can obtain the original message sequence.
- Both the transformation f and its inverse are efficiently computable (that is, computable in polynomial time).
- It is computationally infeasible to compute any function of any message block if any one of the pseudo-message blocks is unknown."
-- Rivest, R. 1997. All or nothing encryption and the package transform. Fast Software Encryption 1997. 210-218.
When used with a conventional block cipher, an AONT appears to increase the cost of a brute-force attack by a factor which is the number of blocks in the message. Rivest also notes that the large effective block size can avoid ciphertext expanding chaining modes by using ECB mode on the large block. Also see huge block cipher advantages.
The
Balanced Block Mixing (BBM)
which I introduced to cryptography in my article:
"Keyed Balanced Size-Preserving Block Mixing Transforms" (
locally, or @:
http://www.ciphersbyritter.com/NEWS/94031301.HTM)
in early 1994 (three years before the Rivest publication), and then
developed in a series of subsequent articles, apparently can be an
especially fine example of an all-or-nothing transform.
The alternative hypothesis H1 is also called the
research hypothesis,
and is logically identical to
"NOT-H0" or "H0
is not true."
Transistors are analog amplifiers which are basically linear over a reasonable range and so require DC power. In contrast, relays are classically mechanical devices with direct metal-to-metal moving connections, and so can handle generally higher power and AC current. The classic analog amplifier is an operational amplifier.
Unexpected oscillation can be indicated by:
Oscillation occurs when:
To stop undesired oscillation:
To Increase Isolation
To decrease gain:
To change phase:
When two things are related by appropriate similarity in
structure or function, we can infer that what is known about one
thing also may apply to the other.
Such an inference may or may not be true, but it can be examined
and tested.
In On Sophistical Refutations (350 B.C.E.), Aristotle (384-322 B.C.E.) lists five goals for countering arguments:
Refutation can occur in various ways. Disputing the evidence being used to support a claim can be considered a new claim and different evidence presented. However, disputing the reasoning itself requires only logic and typically no further evidence at all. See extraordinary claims.
Like cryptography, argumentation is war, and tricks abound when winning is the ultimate goal. But arguing to win is fundamentally unscientific, since learning occurs mainly when an error is found and recognized.
The first requirement of successful argumentation is to have a stated topic or thesis. Without a stated topic, an unscrupulous opponent can lead the argument to some apparently similar but more vulnerable issue, and few in the audience will notice. That is especially true when a topic is introduced casually, and then changed by the opponent in the very first response. Another approach for the opponent is to indignantly bring up and discuss in detail some supposed error on an irrelevant but apparently related topic. A clever topic change also may cause awkward repetition and babbling in the attempt to expose the change and reverse it. The correct response is to be aware enough to recognize the topic change immediately, and return to the original topic; to argue that the comments are off-topic is to introduce a new topic.
There is no way to make an opponent stay on-topic, and if they know they will lose on-topic, that actually may be impossible. Moreover, the opponent may pose various questions (on some new topic), and claim you are not being responsive, the discussion of that claim itself being a new topic. But if you want to take your topic to conclusion, you cannot follow an opponent who wants anything but that. (Also see spin.)
The second requirement of successful argumentation is to force the discussion to remain on the material content. If the original argument might be successful, an unscrupulous opponent may seek to divert the discussion to the appropriateness of, or bias in, the symbols or names used for the concept. Or the opponent may find and protest premises stated without mathematical precision. But a conventional argument need be neither mathematically complete nor mathematically precise to be valid. (This is the fallacy of accident.) The correct response is to point out that the comments are irrelevant and return to the material issue; to argue that the comments are wrong is to argue a changed topic.
The goal of scientific argumentation is to improve knowledge and insight, not to anoint a "superior" contestant. Sadly, those willing to "win" with dishonesty generally do find an easily mislead audience.
Almost all on-line arguments are technically informal in the sense of depending upon context and definitions. The need for particular context generally leaves ample room to confuse the issue, even for someone who knows almost nothing about the topic.
If the proposed argument is basically unsound, that case can be won on its merits.
If the proposed argument is basically sound, but based on analogy, we need to realize that there are few really good analogies. Examine the analogy in detail and try various cases until one is found that is good in the analogy but bad in the proposed argument.
If the proposed argument is basically sound, one can win anyway by changing the topic and doing so in a smooth way the audience will not notice.
Most responses carry a least a thin patina of respectability.
However, many times a response is actually just the first sad shot
in a verbal combat that seeks defeat and winning by deception.
Unfortunately, it may be difficult to distinguish between mere
ignorance and actual attack.
It is thus important to actually examine the logic of any response.
DEC HEX CTRL CMD DEC HEX CHAR DEC HEX CHAR DEC HEX CHAR
0 00 ^@ NUL 32 20 SPC 64 40 @ 96 60 '
1 01 ^A SOH 33 21 ! 65 41 A 97 61 a
2 02 ^B STX 34 22 " 66 42 B 98 62 b
3 03 ^C ETX 35 23 # 67 43 C 99 63 c
4 04 ^D EOT 36 24 $ 68 44 D 100 64 d
5 05 ^E ENQ 37 25 % 69 45 E 101 65 e
6 06 ^F ACK 38 26 & 70 46 F 102 66 f
7 07 ^G BEL 39 27 ' 71 47 G 103 67 g
8 08 ^H BS 40 28 ( 72 48 H 104 68 h
9 09 ^I HT 41 29 ) 73 49 I 105 69 i
10 0a ^J LF 42 2a * 74 4a J 106 6a j
11 0b ^K VT 43 2b + 75 4b K 107 6b k
12 0c ^L FF 44 2c , 76 4c L 108 6c l
13 0d ^M CR 45 2d - 77 4d M 109 6d m
14 0e ^N SO 46 2e . 78 4e N 110 6e n
15 0f ^O SI 47 2f / 79 4f O 111 6f o
16 10 ^P DLE 48 30 0 80 50 P 112 70 p
17 11 ^Q DC1 49 31 1 81 51 Q 113 71 q
18 12 ^R DC2 50 32 2 82 52 R 114 72 r
19 13 ^S DC3 51 33 3 83 53 S 115 73 s
20 14 ^T DC4 52 34 4 84 54 T 116 74 t
21 15 ^U NAK 53 35 5 85 55 U 117 75 u
22 16 ^V SYN 54 36 6 86 56 V 118 76 v
23 17 ^W ETB 55 37 7 87 57 W 119 77 w
24 18 ^X CAN 56 38 8 88 58 X 120 78 x
25 19 ^Y EM 57 39 9 89 59 Y 121 79 y
26 1a ^Z SUB 58 3a : 90 5a Z 122 7a z
27 1b ^[ ESC 59 3b ; 91 5b [ 123 7b {
28 1c ^\ FS 60 3c < 92 5c \ 124 7c |
29 1d ^] GS 61 3d = 93 5d ] 125 7d }
30 1e ^^ RS 62 3e > 94 5e ^ 126 7e
31 1f ^_ US 63 3f ? 95 5f _ 127 7f DEL
a + (b + c) = (a + b) + c a * (b * c) = (a * b) * c
Also see:
commutative and
distributive.
In a mathematical proof, each and every assumption must be true for the proof result to be true. If the truth of any assumption is unknown, the proof is formally incomplete and the result has no meaning.
In practice, proofs have meaning only to the extent that each
and every required assumption can be assured, including
assumptions which may not be immediately apparent.
In practical cryptography, while some assumptions possibly could
be assured by the user, others could only be assured by the cipher
designer, who must then be
trusted, along with his company, the entire
distribution path and so on.
Even worse, still other assumptions may be impossible to
assure in practice by any means at all, which makes any such
proof useless for practical cryptography.
Also RS-232 and similar "serial port" signals, in which
byte or character values are transferred
bit-by-bit in bit-serial format.
Since digital signals require both proper
logic levels and proper timing to sense
those levels, timing is established by the leading edge of a
"start bit" sent at the start of each data byte. See
asynchronous transmission.
Transmit: The line rests "high." When a character is to be sent, a start bit or "low" level is sent for one bit-time. Then each data bit is sent, for one bit-time each, as are one or two stop or "high" level bit-times. Then, if no more data are ready for sending, the line just rests "high."
Receive: The line is normally "high." The instant the line goes "low" is the beginning of a start bit, and that establishes an origin for bit timing. Exactly 1.5 bit-times later, hopefully in the middle of the first data-bit time, the line level is sampled to record the first incoming bit. The second bit is recorded one bit-time later, and so on. When all bits have been recorded, the receiver sends the resulting character, all bits simultaneously, to a local register or FIFO queue for pickup. Note that all this implies that we know the format of the character with respect to bit time and number of bits.
Timing Accuracy: Everything depends upon both transmit
and receive ends having approximately the same bit timing.
The leading edge of the start bit temporarily
synchronizes the receiver, even though
the transmit and receive clock rates may be somewhat different.
With 8-bit characters, the last data bit is sampled exactly 8.5
bit-times from the detected leading edge of the start bit.
If the receive timing varies as much as +/- 0.5 bit in 8.5, the
last bit will be sampled outside the correct bit time.
So the total timing accuracy must be within +/- 5.8 percent, for
all sources transmit and receive clock variation, including
sampling delay in detecting the start bit.
Nowadays this is easily achieved with cheap
crystal oscillator
clock modules
and digital count logic.
In normal cryptanalysis we start out knowing plaintext, ciphertext, and cipher construction. The only thing left unknown is the key. A practical attack must recover the key. (Or perhaps we just know the ciphertext and the cipher, in which case a practical attack would recover plaintext.) Simply finding a distinguisher (showing that the cipher differs from the chosen model) is not, in itself, an attack. If an attack does not recover the key (or perhaps the particular key-selected internal state used in ciphering), it is not a real attack.
In cryptography, when someone says they have "an attack," the implication is that they have a successful attack (a break) and not just another failed attempt. It is obviously much easier to simply claim to have an attack than to actually analyze, innovate, build and test a working attack, which makes it necessary to back up such claims with evidence. Arrogant claims, with "proof left as an exercise for the student" or "read the literature" responses, deserve jeers instead of the cowed respect they often get.
A claim to have an attack can be justified by:
It is not sufficient to say: "My interpretation of the theory is that there must be a break, so the cipher is broken"; it is instead necessary to actually devise a process which recovers key or plaintext. Furthermore, there are many attacks which work against scaled-down tiny ciphers, but which do not scale up as valid attacks against the original large cipher: Just because we can solve newspaper-amusement ciphers (tiny versions of conventional block ciphers) does not imply that any real-size block ciphers are "broken." The process used to solve newspaper ciphers is not "an attack" on block ciphers in general.
Classically, attacks were neither named nor classified; there was just: "here is a cipher, and here is 'the' attack." (Many different attacks may be possible, but even one practical attack is sufficient to cause us to avoid that cipher.) And while this gradually developed into named attacks, there is no overall attack taxonomy. Currently, attacks are often classified by the information available to the attacker or constraints on the attack, and then by strategies which use the available information. Not only ciphers, but also cryptographic hash functions can be attacked, generally with very different strategies.
We are to attack a cipher which enciphers plaintext into ciphertext or deciphers the opposite way, under control of a key. The available information necessarily constrains our attack strategies.
The goal of an attack is to reveal some unknown plaintext, or the key (which will reveal the plaintext). An attack which succeeds with less effort than a brute-force search we call a break. An "academic" ("theoretical," "certificational") break may involve impractically large amounts of data or resources, yet still be called a "break" if the attack would be easier than brute force. (It is thus possible for a "broken" cipher to be much stronger than a cipher with a short key.) Sometimes the attack strategy is thought to be obvious, given a particular informational constraint, and is not further classified.
Many attacks try to isolate unknown small components or aspects
so they can be solved separately, a process known as
divide and conquer. Also see:
security.
A network in the form of a tree is used, with goals represented as nodes. Various possible ways to achieve a particular goal are represented as branches, which then can be taken as goals with their own branch nodes.
In cryptographic analysis, the idea is that the root node will represent the ultimate security we seek. Each path to the root then represents the accumulated effort needed to break that security. The problem is that it is typically impossible to assure that every alternative attack has been considered. And if some unconsidered approach is cheaper than any other, that becomes the true limit on security, despite not being present in the analysis.
Attack tree analysis does not tend to expose unconsidered attacks. Yet those are exactly the issues which carry the greatest cryptographic risk, because we can at least generally quantify the risk from known attacks. Since an attack tree cannot do what most needs to be done, it would seem to be a strange choice for cryptographic risk analysis. One could even argue that an attack tree is most useful as a formal aid in deluding naive executives and users.
Threat models basically concern what is to be protected, from whom, and for how long. But with ciphers, we seek to protect all our data, from everyone, forever. The extreme nature of these expectations is only part of what makes a conventional threat model unhelpful in understanding ciphering risks.
Cipher failure and exploitation happens in secret, so we cannot know how often it occurs and cannot develop a probability for it. Absent a probability of cipher failure, any attempt to understand ciphering risk is necessarily limited.
A more effective approach to system security is to build with understandable components. In component design, we can define exactly what each component permits. In component analysis, we can consider the security effects and expose the precise range of things each component allows. If none of the allowed things can cause a security problem, we will have no security problems. Components essentially become a custom language of system design which has no way of expressing security faults.
A component-based security design is far more restrictive and so is far more demanding than the conventional mode of hacking through a design and implementation. However, this design process provides a road map for real security, as opposed to belief in results from flawed analytical tools (like attack trees) and ad hoc analysis that simply cannot deliver the assurances we need.
The ability to analyze security must be designed into a system;
it cannot be just added on to finished systems.
For a known population, the number of repetitions expected at each level has long been understood to be a binomial expression. But if we are sampling in an attempt to establish the effective size of an unknown population, we have two problems:
Fortunately, there is an unexpected and apparently previously unknown combinatoric relationship between the population and the number of combinations of occurrences of repeated values. This allows us to convert any number of triples and higher n-reps to the number of 2-reps which have the same probability. So if we have a double, and then get another of the same value, we have a triple, which we can convert into three 2-reps. The total number of 2-reps from all repetitions (the augmented 2-reps value) is then used to predict population.
We can relate the number of samples s to the population N through the expected number of augmented doubles Ead:
Ead(N,s) = s(s-1) / 2N .
This equation is exact, provided we interpret all
the exact n-reps in terms of 2-reps. For example, a triple is
interpreted as three doubles; the augmentation from 3-reps to 2-reps
is (3 C 2) or 3. The augmented result is the sum of the
contributions from all higher repetition levels:
n i
ad = SUM ( ) r[i] .
i=2 2
where ad is the number of augmented doubles, and r[i]
is the exact repetition count at the i-th level.
And this leads to an equation for predicting population:
Nad(s,ad) = s(s-1) / 2 ad .
This predicts the population Nad as based on a mean value
of augmented doubles ad. (For an example and comparison to
various other methods, see the
conversation:
However, since the trials should have approximately a simple Poisson distribution (which has only a single parameter), we could be a bit more clever and fit the results to the expected distribution, thus perhaps developing a bit more accuracy. Also see population estimation, birthday attack, birthday paradox and entropy.
Also see
It is possible to authenticate individual blocks, provided they are large enough to minimize the impact of adding extra authentication data in each block (see block code). One advantage lies in avoiding the alternative of buffering an entire message before it can be authenticated. That can be especially important for real-time (e.g., voice) communications.
Other forms of cryptographic authentication include key authentication for public keys, and source or user authentication, for the authorization to send the message in the first place.
Also see
certification and
certification authority.
Science does not recognize mere authority as sufficient basis for a conclusion, but instead requires that facts and reasoning be exposed for review. The simple use of a name does not automatically create an ad verecundiam fallacy ("Appeal to Awe"). A name can identify a body of work giving the needed facts and the reasoning supporting a scientific conclusion.
Authority tends to hide the basis for drawing conclusions. Authority tends to avoid addressing complaints of false reasoning. Authority tends to hide reasoning and insists that a statement is correct simply because of who made it. A person repeating a conclusion from an authority often has no idea of the reasoning behind it, or what it really means with respect to limits or context.
In contrast, scientific thought exposes the factual basis and
the reasoning, which tells us what the conclusion really means.
Scientific thought is democratic and informs, and ideally gives
everyone the same materials from which to draw factual conclusions,
some of which may be new, strange and disconcerting, but nevertheless
correct.
"As the input moves through successive layers the pattern of 1's generated is amplified and results in an unpredictable avalanche. In the end the final output will have, on average, half 0's and half 1's . . . ." [p.22]-- Feistel, H. 1973. Cryptography and Computer Privacy. Scientific American.228(5):15-23.
Also see mixing, diffusion, overall diffusion, strict avalanche criterion, complete, S-box. Also see the bit changes section of the "Binomial and Poisson Statistics Functions in JavaScript," locally, or @: http://www.ciphersbyritter.com/JAVASCRP/BINOMPOI.HTM#BitChanges.
"For a given transformation to exhibit the avalanche effect, an average of one half of the output bits should change whenever a single input bit is complemented." [p.523]-- Webster, A. and S. Tavares. 1985. On the Design of S-Boxes. Advances in Cryptology-- CRYPTO '85.523-534.
Also see the bit changes section of the "Binomial and Poisson Statistics Functions in JavaScript" page (locally, or @: http://www.ciphersbyritter.com/JAVASCRP/BINOMPOI.HTM#BitChanges).
In normal junctions, the space-charge region (depletion region) between P and N materials is fairly broad, so the extreme fields found in Zener breakdown do not occur. However, a combination of applied voltage, temperature, and random motion may cause a covalent bond to break anyway, in a manner similar to normal diode leakage. When a breakdown does occur, the charge carrier is attracted by the opposing potential and drops through the space-charge region, periodically interacting with covalent bonds there. When the field is sufficiently high, a falling charge carrier may build up enough energy to break another carrier free when it hits. Then both the original and resulting carriers continue to accelerate through the space-charge region, each possibly hitting and breaking many other bonds. The result is a growing avalanche of carriers produced by each single breakdown. The avalanche effect can be seen as a form of amplification and can be huge, for example, 10**8.
In a series of almost forgotten semiconductor physics research papers from the 1950's and 1960's, avalanching breakdown was shown to consist of a multitude of "microplasma" events of perhaps 20uA each. These events are not completely independent, but instead interact, but also have some apparently random component, probably thermal. At least some of the microplasma events seem to have negative dynamic resistance and function like tiny like neon bulbs (and may even emit light). One implication if this is an ability of some avalanching "zener" diodes to directly support small, unsuspected oscillations. A series of very extensive discussions on sci.electronics.design in 1997 (search: "zener oscillation") give experimental details. Both LC tank oscillation and RC relaxation oscillation were demonstrated in practice. Thus, avalanche multiplication, often assumed to be unquestionably "quantum random," actually may have a disturbing amount of predictable structure. True Zener breakdown does not appear to have the same problems, nor does thermal noise, as far as we know. Unfortunately, these "purer" sources may be much smaller than noise from avalanche multiplication.
In contrast to Zener breakdown, which has a negative
temperature
coefficient, avalanche multiplication has a positive temperature
coefficient, like most resistances or
conductors.
Presumably this is due to heat causing increased activity in the
crystal lattice, thus preventing electrons from falling as far
before interacting, thus reducing the probability of breaking
another bond, and reducing the amplification.
In junctions that break down at about 6 volts the temperature
effects tend to cancel.
Also see: "Random Electrical Noise: A Literature Survey"
(locally, or @:
http://www.ciphersbyritter.com/RES/NOISE.HTM).
"A function is balanced if, when all input vectors are equally likely, then all output vectors are equally likely."-- Lloyd, S. 1990. Properties of binary functions. Advances in Cryptology-- EUROCRYPT '90.124-139.
There is some desire to generalize this definition to describe multiple-input functions. (Is a dyadic function "balanced" if, for one value on the first input, all output values can be produced, but for another value on the first input, only some output values are possible?) Presumably a two-input balanced function would be balanced for either input fixed at any value, which would essentially be a Latin square or a Latin square combiner. Also see Balanced Block Mixing. As opposed to bias. Also see Ideal Secrecy and Perfect Secrecy.
Balance is a pervasive requirement in many areas of cryptography; for example:
A mechanism for mixing large block values like those used in block ciphers. A BBM is balanced to avoid leaking information, and is effective in just a single pass, thus avoiding the need for repeated rounds and added hardware. A BBM has no data expansion. A BBM supports the construction of scalable ciphers with large blocks, and can be more efficient, more flexible, and more useful than conventional fixed and smaller designs. (See: ideal mixing, Mixing Cipher, Mixing Cipher design strategy and also the BBM articles, locally, or @: http://www.ciphersbyritter.com/index.html#BBMTech).
Technically, a Balanced Block Mixer is an m-input-port m-output-port mechanism with various properties:
The inverse mixing behaves similarly. Say, for example, we are mixing 64 bytes of message into 64 bytes of result: If we know 63 of the result bytes, we can step through the values of the 64th byte, and get 256 different messages, each of which will produce the 63 bytes we know (a homophonic sort of situation). If the actual messages are random-like and evenly distributed, it will be difficult to know which particular message is implied. The amount of uncertainty we have in the result is reflected in the amount of uncertainty we have about the message.
The basic Balanced Block Mixer is a pair of orthogonal Latin squares. The two input ports affect the rows and columns of both squares, with the selected result in each square being the two output ports. For example, here is a tiny nonlinear "2-bit" or "order 4" BBM:
3 1 2 0 0 3 2 1 30 13 22 01 0 2 1 3 2 1 0 3 = 02 21 10 33 1 3 0 2 1 2 3 0 11 32 03 20 2 0 3 1 3 0 1 2 23 00 31 12
Suppose we wish to mix (1,3); 1 selects the second row up in both squares, and 3 selects the rightmost column, thus selecting (2,0) as the output. Since there is only one occurrence of (2,0) among all entry pairs, this discrete mixing function is reversible, as well as being balanced on both inputs.
In practice, we would probably want to use at least order 16, which can be efficiently stored as an ordinary 256-byte "8-bit" substitution table, one with a particular oLs structure in the data.
One way to use the BBM mixing concept is to develop linear equations for oLs mixing for scaling to various sizes (see my article: Fencing and Mixing Ciphers from 1996 Jan 16). We can do that in the finite field of mod-2 polynomials with an irreducible modulus. So we can easily have similar mixers of 16, 32, 64, 128 and 256 bit port widths, and so on. By using multiple mixers of different size in various connections, we can easily mix blocks of size compatible to existing ciphers, and much larger.
A usually better way to use the BBM mixing concept is to develop small, nonlinear and keyed oLs's for use in FFT-like patterns with 2n ports. It is easy to construct keyed nonlinear orthogonal pairs of Latin squares of arbitrary 4n order as I describe in my articles:
In any FFT-style structure, there is exactly one "path" from any input to any output, and "cancellation" cannot occur. Thus, we can guarantee that any change to any one input must "affect" each and every output. Similarly, each input is equally represented in each output, which is ideal mixing. The resulting wide ideal mixing structure, using small BBM tables as each butterfly operation, is itself a BBM, and is dynamically scalable to virtually arbitrary size.
Mixing has long been a problem in block ciphers. The difficulty of mixing wide block values is one reason most conventional block ciphers are small. But having a small block means that there is not much room to add features like:
Large blocks also have room to hold sufficient uniqueness to support electronic codebook mode, which is not normally appropriate for block ciphers. Large blocks in ECB mode can support secure ciphering without ciphertext expansion, a goal which is very hard to reach in other ways.
When a BBM is implemented in software, the exact same unchanged routine can handle both wide mixing for real operation and narrow "toy" mixing for thorough experimental testing. This supports both scalable operation, and exhaustive testing of the exact code used in actual operation.
In hardware, BBM block throughput or block rate can be independent of block size. Wide blocks can be mixed in the same time as narrow blocks by pipelining each sub-layer of the mixing. That of course makes large blocks far faster per byte than small ones.
Also see All or Nothing Transform, Mixing Cipher, Dynamic Substitution Combiner, and Variable Size Block Cipher.
Also see some of the development sequence:
In a statically-balanced combiner, any particular result
value can be produced by any value on one input, simply by
selecting some appropriate value for the other input. In this way,
knowledge of only the output value provides no information
The common examples of cryptographic combiner, including byte exclusive-OR (mod 2 polynomial addition), byte addition (integer addition mod 256), or other "additive" combining, are perfectly balanced. Unfortunately, these simple combiners are also very weak, being inherently linear and without internal state.
A Latin square combiner
is an example of a statically-balanced
reversible nonlinear combiner with massive internal state.
A Dynamic Substitution
Combiner is an example of a dynamically or
statistically-balanced reversible nonlinear combiner with
substantial internal state.
Each conductor of a balanced line systems should have similar driver output impedances (ideally low), similar wire effects, and similar receiver termination impedances (ideally high). At audio frequencies cables are not transmission lines, so "cable impedance" is not an issue, and the differential receiver need not match either the cable or the driver. When each wire has a similar impedance to ground, external magnetic and electrostatic fields should act on them similarly, producing a common effect on each wire which can "cancel out."
A transformer winding makes a good balanced line driver. In contrast, operational amplifier circuits with direct outputs probably will have only roughly-similar output impedances. Output resistors (e.g., 100 ohms) typically isolate each op amp output from the cable, and any difference will represent driver imbalance to external noise. After being transported, the differential mode signal is taken between the two conductors, thus ignoring common mode noise. A transformer winding makes a good differential receiver and also provides ground loop isolation. Operational amplifier receivers need a common-mode-rejection null adjustment for best performance.
At audio frequencies, the main advantage of balanced line is rejection of AC hum and related power noises. This can be achieved by driving only one line with the desired audio signal, provided both lines are terminated similarly both in the driver and receiver.
At radio frequencies, balanced line also minimizes undesired
signal radiation.
When the current changes in each wire are equal but opposite, they
radiate "out of phase," resulting in cancellation.
This is especially useful in
TEMPEST, but does require that both lines
be actively driven.
0 1 2 3 4 5 6 7 8 9 a b c d e f
0 A B C D E F G H I J K L M N O P
1 Q R S T U V W X Y Z a b c d e f
2 g h i j k l m n o p q r s t u v
3 w x y z 0 1 2 3 4 5 6 7 8 9 + /
use "=" for padding
A bipolar transistor is made by diffusing impurities into
a thin slice of extremely pure single-crystal
semiconductor, such as silicon.
Typically, the collector contact is made at the top surface, and
the emitter contact is made on the bottom.
The base element is essentially a thin film situated between the
collector and emitter plates.
The base current must flow on the film, which is naturally more
resistive than the other thicker elements.
Blum, L., M. Blum and M. Shub. 1983. Comparison of Two Pseudo-Random Number Generators. Advances in Cryptology: CRYPTO '82 Proceedings. Plenum Press: New York.61-78. Blum, L., M. Blum and M. Shub. 1986. A Simple Unpredictable Pseudo-Random Number Generator. SIAM Journal on Computing.
15:364-383.
The BB&S RNG is basically a simple squaring of the current seed (x) modulo a composite (N) composed of two primes (P,Q) of public key size. Primes P and Q must both be congruent to 3 mod 4, but the BB&S articles say that P and Q also must be special primes. The special primes construction apparently has the advantage of controlling the cycle structure of the system, and is part of the BB&S design in the original articles. Unfortunately, the special primes construction generally is not presented in current texts. Instead the texts deceptively describe a simplified version which they nevertheless call BB&S. Readers who do not study referenced articles will assume they know what BB&S said, but they are only partly correct.
Unlike more common RNG's, the BB&S construction is not maximal length, but instead defines systems with multiple cycles, including degenerate, short and long cycles. With large integer factors, state values on short cycles are very rare, but do exist. Short cycles are dangerous with any RNG, because when a RNG sequence begins to repeat, it has just become predictable, despite any theorems to the contrary. Consequently, if we key BB&S by choosing x[0] at random, we may unknowingly select a weak short cycle (a weak key), which would make the sequence predictable as soon as the cycle starts to repeat.
The original BB&S articles lay out the technology to compute the exact length of a long-enough cycle in the BB&S system. Since it can be much easier to verify cycle length than to actually traverse the cycle, this is a practical way to verify that x[0] selects a long-enough cycle. Values of x[0] can be chosen and checked until a long cycle is selected. Modern cryptography insists, to the point of strident intimidation, that such verification is unnecessary. However, the original authors apparently thought it was important enough to include in their work.
The real issue here is not the exposure of a particular weakness in BB&S, since choosing x[0] on a short cycle is very unlikely. But "unlikely" is not the same as "impossible." And if the design goal is to eliminate every known weakness, even extensive math which concludes "that particular weakness is too unlikely to worry about" is beside the point: "unlikely" does not satisfy the goal. Mathematics does not get to impose goals on designers or users.
BB&S is said to be "proven secure" in the sense that if
factoring is hard, then the sequence
is unpredictable. And many people do think that factoring large
composites of public key size is hard.
Yet when a short cycle is selected and used, BB&S is obviously
insecure, and that is a direct contradiction for anyone who imagines
that "proven secure" applies to them.
Just knowing the length of a cycle (by finding sequence repetition)
should be enough to expose the factors.
This is also
evidence that the
assumption
that factoring is hard is not universally true.
Of course, we already know that factoring is not hard
The advantage of the special primes construction apparently is that all "short" (but not degenerate) cycles are "long enough" for use. Thus, we can simply choose x[0] at random, and then easily test that it is not on a degenerate cycle. (Just get some x[0], step x[0] to x[1], save x[1], step x[1] to x[2], then compare x[2] to x[1] and if they are the same, start over.) The result is a guarantee that the selected cycle is "long enough" for use. See the sci.crypt discussion:
It is sometimes said that the special primes construction adds nothing to BB&S, but that really depends more on the goals of the cipher designer than the math. Since BB&S is very slow in comparison to other RNG's, someone selecting BB&S clearly has decided to pay a heavy toll with the expectation of getting an RNG which is "proven secure" in practice. (That actually misrepresents the BB&S proof, which apparently allows weakness to exist provided it is not an easy way to factor N.) The obvious goal is to get a practical RNG which has no known weakness at all.
No mere proof can protect us when we ourselves choose and use a weak key, even if doing that is shown to be statistically very unlikely. And if we do use a weak key, the "proven secure" RNG is clearly insecure, which surely contradicts the motive for using BB&S in the first place. In contrast, simply by using the special primes construction and checking for degenerate cycles, weak keys can be eliminated, at modest expense. Eliminating a known possibility of weakness, even if that possibility is very small, seems entirely consistent with the goal of achieving a practical RNG with no known weakness, even if the result is not an RNG proven to have absolutely no weakness at all.
Some would say that even the special primes construction is
overkill, but without it the so-called "proof of strength" becomes
a mere wish or hope that a short cycle is not being used, and
I see that as a contradiction.
It also might be a cautionary tale as to what mathematical
cryptography currently accepts as
proof, and as to what such "proof" means
in practical use.
For other examples of failure in the current cryptographic wisdom,
see
one time pad, and
AES (as an example of the size of the
permutation family in real conventional
block ciphers), and, of course,
old wives' tale.
Also see
algorithm.
Ordinarily we distinguish mere belief from proven truth, belief thus implying something less than conclusive evidence. In this sense, to believe is to be willing to accept unproven or even unprovable assumptions, such as having faith, or trusting in some machine or property. One issue is whether such assumptions or trust is reasonable in the real world.
Limiting what one can or should believe seems intertwined with freedom of speech and individual rights: Surely, anyone can believe what they want. However, to the extent that we have real responsibilities to others and society at large, unfounded belief can not uphold those obligations. In a seminal essay called "The Ethics of Belief" (circa 1877 and reprinted on the web), William Clifford shows how unfounded belief is insufficient support for decisions of life and death and reputation. Many of us would extend that to business planning (the recent Waltzing with Bears by DeMarco and Lister (see risk management) reprints the first section of "The Ethics of Belief" as an appendix), as well as scientific discussions and claims.
In that point of view, claiming something is true, when one has not investigated the topic and does not know, is ethically wrong, even if the claim it turns out (by pure dumb luck) to be correct. It is not enough to claim something and hope it works out; it is instead necessary to know that the claim is correct before making the claim. The ethical requirement is to have performed an investigation sufficient to expect to know one way or another, and come to a rationally supportable conclusion. While not rising to the level of known fact, belief is something on which reputation rests. Being wrong thus has consequences to reputation, provided the error is in the essence and not mere correctable detail.
This idea of requiring substantial investigation to come to a belief may seem to conflict with the scientific method, in that a scientist seemingly makes a mere claim, which generally stands until shown false. But in reality we expect that claim to be something beyond "mere." We demand that a scientific investigator have put sufficient professional effort into a conclusion before using a scientific podium to spout off. The investigation is what provides an ethical basis for belief, which still may be wrong or (more likely) incomplete.
For example, scientific publication does not mean that all of science supports the described conclusions, which are still just claims made by particular scientists. Showing (not necessarily proving) a claim to be wrong is part of the process of science, not unwarranted intrusion. Showing someone wrong in this context naturally affects reputation, but rarely results in absolute ruin.
The process of experimentation involves making "claims," often to be disproven, but those are clearly labeled hypotheses for experiment, not conclusions for use by others.
In contrast, when we have conclusive evidence of truth
we have knowledge and fact instead of belief.
Facts do not require belief, nor do they respond to voting
or authority.
Clearly, science depends upon knowledge and fact, not personal
beliefs, and it is crucial to know the difference. Also see:
scientific method,
extraordinary claims and
rhetoric.
We can do FWT's in "the bottom panel" at the end of my: "Active Boolean Function Nonlinearity Measurement in JavaScript" page, locally, or @: http://www.ciphersbyritter.com/JAVASCRP/NONLMEAS.HTM.
Here is every bent sequence of length 4, first in {0,1} notation, then in {1,-1} notation, with their FWT results:
bent {0,1} FWT bent {1,-1} FWT
0 0 0 1 1 -1 -1 1 1 1 1 -1 2 2 2 -2
0 0 1 0 1 1 -1 -1 1 1 -1 1 2 -2 2 2
0 1 0 0 1 -1 1 -1 1 -1 1 1 2 2 -2 2
1 0 0 0 1 1 1 1 -1 1 1 1 2 -2 -2 -2
1 1 1 0 3 1 1 -1 -1 -1 -1 1 -2 -2 -2 2
1 1 0 1 3 -1 1 1 -1 -1 1 -1 -2 2 -2 2
1 0 1 1 3 1 -1 1 -1 1 -1 -1 -2 -2 2 -2
0 1 1 1 3 -1 -1 -1 1 -1 -1 -1 -2 2 2 2
These sequences, like all true bent sequences, are not
balanced.
Literature references on this point include:
"Let Qn = {1,-1}n. The defining property of a bent sequence x in Qn is that the Hadamard transform of x has constant magnitude."
"Let y be a bent sequence over {0,-1}n.. . . "The Hamming weight of y is 22k-1 (+ or -) 2k-1."
-- Adams, C. and S. Tavares. 1990. Generating and counting binary bent sequences. IEEE Transactions on Information Theory.IT-36(5):1170-1173.
"Bent functions, except for the fact that they are never balanced, exhibit ideal cryptographic properties."
-- Chee, S., S. Lee, K. Kim. 1994. Semi-bent functions. Advances in Cryptology -- ASIACRYPT '94107-118.
". . . it has often to be considered as a defect from a cryptographic point of view that bent functions are necessarily non-balanced."
-- Dobbertin, H. 1994. Construction of Bent Functions and Balanced Boolean Functions with High Nonlinearity. K.U. Leuven Workshop on Cryptographic Algorithms (Fast Software Encryption).61-74.
"Example 23 (Bent Functions Are Not Balanced). . . ."
-- Seberry, J. and X. Zhang. "Hadamard Matrices, Bent Functions and Cryptography." The University of Wollongong. November 23, 1995.
The zeroth element of the {0,1} FWT is the number of 1's in the sequence.
Here are some bent sequences of length 16:
bent {0,1} 0 1 0 0 0 1 0 0 1 1 0 1 0 0 1 0
FWT 6,-2,2,-2,2,-2,2,2,-2,-2,2,-2,-2,2,-2,-2
bent {1,-1} 1 -1 1 1 1 -1 1 1 -1 -1 1 -1 1 1 -1 1
FWT 4,4,-4,4,-4,4,-4,-4,4,4,-4,4,4,-4,4,4
bent {0,1} 0 0 1 0 0 1 0 0 1 0 0 0 1 1 1 0
FWT 6,2,2,-2,-2,2,-2,2,-2,-2,-2,-2,2,2,-2,-2
bent {1,-1} 1 1 -1 1 1 -1 1 1 -1 1 1 1 -1 -1 -1 1
FWT 4,-4,-4,4,4,-4,4,-4,4,4,4,4,-4,-4,4,4
Bent sequences are said to have the highest possible uniform nonlinearity. But, to put this in perspective, recall that we expect a random sequence of 16 bits to have 8 bits different from any particular sequence, linear or otherwise. That is also the maximum possible nonlinearity, and here we actually get a nonlinearity of 6.
There are various more or less complex constructions for these
sequences. In most cryptographic uses, bent sequences are modified
slightly to achieve balance.
Massey, J. 1969. Shift-Register Synthesis and BCH Decoding. IEEE Transactions on Information Theory.IT-15(1):122-127.
Bernoulli trials have a
Binomial distribution.
Transistor biasing is trickier than it might seem from knowing the simple purpose of keeping the device "partly on":
One common biasing approach is to place a particular DC voltage
on the base or gate, and a resistor in the emitter or source lead.
Transistor action then tends to increase current until the emitter
or source has a voltage related to the base or gate, a form of
negative
feedback.
This sets the output bias current, which with a particular pull-up
resistor sets a desired output voltage.
One difficulty with this approach is that it demands an input signal
with lower impedance than the biasing, so that the AC signal will
dominate.
Another issue is that the emitter or source resistor will use some
of the available voltage simply to establish bias, voltage which
then is not available across the device for AC signals. (Also see
transistor self-bias.)
ifWith a bijection an inverse always exists. (Contrast with: involution.)f(x) = y thenf-1(y) = x .
A bijection on bit-strings,
Making random data decompress into language text (necessarily also random in some way) would seem to be difficult. Different classes of plaintext, such as language, database files, program code, or whatever, probably require different compressors or at least different compression models. With respect to language text, such a compressor should decompress random strings into spaced correct words or "word salad." That should complicate attempts to automatically distinguish the original message or block from among other possibilities.
Should bijective compression actually be possible and practical,
the significance would be massive.
Computerized
attacks can succeed only if a correct
deciphering can be recognized automatically.
When incorrect decipherings have structure which is close to
plaintext, a computer may not be able to distinguish them from
success.
If humans skill is needed to read and judge the result of thousands
or millions of brute-force attempts, traversing a keyspace may take
tens of millions of times longer than simple computer scanning.
Making an attack millions of times harder than it was before could
be the difference between complete practical security and almost
no security at all.
Holding an attack loop down to human reading speeds could produce
a massive increase in practical
strength.
n k n-k
P(k,n,p) = ( ) p (1-p)
k
This ideal distribution is produced by evaluating the probability function for all possible k, from 0 to n.
If we have an experiment which we think should produce a binomial distribution, and then repeatedly and systematically find very improbable test values, we may choose to reject the null hypothesis that the experimental distribution is in fact binomial.
Also see the binomial section of my JavaScript page:
"Binomial and Poisson Statistics Functions in JavaScript,"
locally, or @:
http://www.ciphersbyritter.com/JAVASCRP/BINOMPOI.HTM#Binomial,
and my early message on randomness testing
(locally, or @:
http://www.ciphersbyritter.com/NEWS2/94080601.HTM).
The "paradox" is resolved by noting that we have a 1/365 chance
of success for each possible pairing of students, and there
are 253 possible pairs or
combinations of 23 things taken 2 at
a time. (To count the number of pairs, we can choose any of the 23
students as part of the pair, then any of the 22 remaining students
as the other part. But this counts each pair twice, so we have
This problem seems to beg confusion between probability and expected counts, since the correct expectation is often fractional. We can relate the probability of finding a "double" of some birthday (Pd) to the expected number of doubles (Ed) as approximately (equations (5.4) and (5.5) from my article):
Pd = 1 - e-Ed , so Ed = -Ln( 1 - Pd ) .For a success probability of 0.5, the expected doubles are
Ed = -Ln( 1 - 0.5 ) = 0.693147 .
One way to model the overall probability of success is from the
probability of failure
A different model addresses the probability of success for each sample, instead of each pair. For population (N) and samples (s) (equation (1.2) from my article):
Pd(N,s) = 1 - (1-1/N)(1-2/N)..(1-(s-1)/N) ,which gives a success probability for 23 samples of 0.5073.
Sometimes the problem is to find the number of samples (s) needed for a given probability of success in finding doubles (Pd) from a given population (N). Starting with equation (2.5) from my article and substituting (5.5), we get:
s(N,p) = (1 + SQRT(1 - 8N Ln( 1 - Pd ))) / 2 .For the birthday case the number of samples needed from a population of 365 for an even chance of success is:
s(365,0.5) = (1 + SQRT(1 - (8 * 365 * -0.693))) / 2
= (1 + SQRT( 2024.56 )) / 2
= 45.995 / 2
= 22.997 .
This result means that 23 samples should meet with success just a
little more often than the 1 time in 2 demanded by
Also see:
birthday attack,
population estimation,
augmented repetitions,
my Cryptologia "birthday" article: "Estimating Population
from Repetitions in Accumulated Random Samples,"
locally, or @:
http://www.ciphersbyritter.com/ARTS/BIRTHDAY.HTM,
and an example and comparison to various other methods in
the conversation "Birthday Attack Calculations,"
locally, or @:
http://www.ciphersbyritter.com/NEWS4/BIRTHDAY.HTM.
In
digital
electronics,
bits generally are represented by
voltage levels on connected
wires, at a given time.
When the bit-value on a wire changes, some time will elapse
until the wire reaches the new voltage level.
Until that happens, the wire voltage is not a valid digital
level and should not be interpreted as having a particular bit
value. Also see:
logic level.
There are various ways this might be achieved:
"Exact bit-balance can be achieved by accumulating data to a block byte-by-byte, only as long as the block can be balanced by adding appropriate bits at the end."
"We will always add at least one byte of 'balance data' at the end of the data, a byte which will contain both 1's and 0's. Subsequent balance bytes will be either all-1's or all-0's, except for trailing 'padding' bytes, of some balanced particular value. We can thus transparently remove the balance data by stepping from the end of the block, past any padding bytes, past any all-1's or all-0's bytes, and past the first byte containing both 1's and 0's. Padding is needed both to allow balance in special cases, and when the last of the data does not completely fill the last block."
"This method has a minimum expansion of one byte per block, given perfectly balanced binary data. ASCII text may expand by as much as 1/3, which could be greatly reduced with a pre-processing data compression step."
(My article "A Keyed Shuffling System for Block Cipher Cryptography," illustrates key hashing, a nonlinearized RNG, and byte shuffling. We would do a similar thing for bit-permutation, but with a larger and wider RNG and shuffling bits instead of bytes. See either locally, or @: http://www.ciphersbyritter.com/KEYSHUF.HTM).
Ciphering by bit-transposition has unusual resistance to known plaintext attack because many, many different bit-permutations of the plaintext data will each produce exactly the same ciphertext result. Consequently, even knowing both the plaintext and the associated ciphertext does not reveal the shuffling sequence. Bit-permutation thus joins double-shuffling in hiding the shuffling sequence, which is important when we cannot guarantee the strength of that sequence (as we generally cannot).
A transposition cipher is "dynamic" when it "never" permutes two blocks in the same way. Dynamic bit-permutation ciphers can be a very competitive practical alternative to both stream ciphers and conventional block ciphers. Although bit-shuffling may be slower, it has a clearer and more believable source of strength than the other alternatives.
Also see my article: "Dynamic Transposition Revisited Again" (40K)
(locally, or @:
http://www.ciphersbyritter.com/ARTS/DYNTRAGN.HTM),
the Dynamic Transposition Ciphering conversation (730K)
(locally, or @:
http://www.ciphersbyritter.com/NEWS5/REDYNTRN.HTM).
Digital logic IC's are wildly successful examples of hardware black box components. Externally, they perform useful digital functions, and in most cases, digital designers need not think about the internal construction. Internally, however, the "digital" devices use analog transistors to effect digital operation.
An example of black box software design is a subroutine or Structured Programming module, where all interaction with the caller is in the form of parameters. The module uses the given resources, does what it needs, completes, and returns to the caller. As long as the module does what we want, there is no need to know how the module works, so we can avoid dealing with internal complexity at the lower level. And when the module does not work, it can be debugged in a minimal environment which avoids most of the complexity of the larger system, thus making debugging far easier.
In a discussion of block cipher concepts, cryptography implicitly uses definition (2), because it is the accumulation of multiple characters (and the resulting larger ciphering alphabet) which is characteristic of conventional block ciphers. A one-element "block" simply cannot exhibit the various block issues (such as mixing, diffusion, padding and expansion) that we see in a real block cipher, and so fails to model both the innovation and the resulting problems. Similar effects occur when any scalable model is simplified beyond reason. (See: scientific method.) It is also possible to cipher blocks of dynamically selectable size, or even fine-grained variable size.
All real block ciphers are in fact streamed to handle more than one block of data. The actual ciphering might be seen as a stream meta-ciphering using a block cipher transformation. The point of this is not to provide a convenient academic way to contradict any possible response to a question of "stream or block," but instead to identify the origin of various ciphering properties and problems (see: a cipher taxonomy).
It is not possible to block-cipher just a single bit or byte of a block. (When that is possible, we may be dealing with a stream cipher.) If individual bytes really must be block-ciphered, it will be necessary to fill out each block with padding in some way that allows the padding to be distinguished from the actual plaintext data after deciphering.
Partitioning an arbitrary stream of into fixed-size blocks generally means the ciphertext handling must support data expansion, if only by one block. But handling even minimal data expansion may be difficult in some systems.
The distinction between "block" and "stream" corresponds to the common distinction between "block" and "character" device drivers in operating systems. This is the need to accumulate multiple elements and/or pad to a full block before a single operation, versus the ability to operate without delay but requiring multiple operations. This is a common, practical distinction in data processing and data communications.
A competing interpretation of block versus stream operation
seems to be based on transformation "re-use":
In that interpretation, block ciphering is about having a complex
transformation, which thus directly supports re-use (providing each
plaintext block "never" re-occurs).
In that same interpretation, stream ciphering is about supporting
transformation re-use by changing the transformation itself.
These effects do of course exist (although in my view they are not
the most fundamental issues for analysis or design).
But that interpretation also allows both qualities to exist
simultaneously at the same level of design, and so does not provide
the full
analytical benefits of a true
logical
dichotomy.
There is some background:
A conventional block cipher is a transformation between all possible plaintext block values and all possible ciphertext block values, and is thus an emulated simple substitution on huge block-wide values. Within a particular block size, both plaintext and ciphertext have the same set of possible values, and when the ciphertext values have the same ordering as the plaintext, ciphering is obviously ineffective. So effective ciphering depends upon re-arranging the ciphertext values from the plaintext ordering, and this is a permutation of the plaintext values. A conventional block cipher is keyed by constructing a particular permutation of ciphertext values for each key.
The mathematical model of a conventional block cipher is bijection, and the set of all possible block values is the alphabet. In cryptography, the bijection model corresponds to an invertible table having a storage element associated with each possible alphabet value. Since each different table represents a different permutation of the alphabet, the number of possible tables is the factorial of the alphabet size.
In particular, a conventional block cipher with a
In an ideal conventional block cipher, changing even a single bit of the input block will change all bits of the ciphertext result, each with independent probability 0.5. This means that about half of the bits in the output will change for any different input block, even for differences of just one bit. This is overall diffusion and is present in a block cipher, but usually not in a stream cipher. Data diffusion is a simple consequence of the keyed invertible simple substitution nature of the ideal block cipher.
Improper diffusion of data throughout a block cipher can have serious strength implications. One of the functions of data diffusion is to hide the different effects of different internal components. If these effects are not in fact hidden, it may be possible to attack each component separately, and break the whole cipher fairly easily.
A large message can be ciphered by partitioning the plaintext into blocks of a size which can be ciphered. This essentially creates a stream meta-cipher which repeatedly uses the same block cipher transformation. Of course, it is also possible to re-key the block cipher for each and every block ciphered, but this is usually expensive in terms of computation and normally unnecessary.
A message of arbitrary size can always be partitioned into some number of whole blocks, with possibly some space remaining in the final block. Since partial blocks cannot be ciphered, some random padding can be introduced to fill out the last block, and this naturally expands the ciphertext. In this case it may also be necessary to introduce some sort of structure which will indicate the number of valid bytes in the last block.
Proposals for using a block cipher supposedly without data expansion may involve creating a tiny stream cipher for the last block. One scheme is to re-encipher the ciphertext of the preceding block, and use the result as the confusion sequence. Of course, the cipher designer still needs to address the situation of files which are so short that they have no preceding block. Because the one-block version is in fact a stream cipher, we must be very careful to never re-use a confusion sequence. But when we only have one block, there is no prior block to change as a result of the data. In this case, ciphering several very short files could expose those files quickly. Furthermore, it is dangerous to encipher a CRC value in such a block, because exclusive-OR enciphering is transparent to the field of mod 2 polynomials in which the CRC operates. Doing this could allow an opponent to adjust the message CRC in a known way, thus avoiding authentication exposure.
Another proposal for eliminating data expansion consists of ciphering blocks until the last short block, then re-positioning the ciphering window to end at the last of the data, thus re-ciphering part of the prior block. This is a form of chaining and establishes a sequentiality requirement which requires that the last block be deciphered before the next-to-the-last block. Or we can make enciphering inconvenient and deciphering easy, but one way will be a problem. And this approach cannot handle very short messages: its minimum size is one block. Yet any general-purpose ciphering routine will encounter short messages. Even worse, if we have a short message, we still need to somehow indicate the correct length of the message, and this must expand the message, as we saw before. Thus, overall, this seems a somewhat dubious technique.
On the other hand, it does show a way to chain blocks for authentication in a large-block cipher: We start out by enciphering the data in the first block. Then we position the next ciphering to start inside the ciphertext of the previous block. Of course this would mean that we would have to decipher the message in reverse order, but it would also propagate any ciphertext changes through the end of the message. So if we add an authentication field at the end of the message (a keyed value known on both ends), and that value is recovered upon deciphering (this will be the first block deciphered) we can authenticate the whole message. But we still need to handle the last block padding problem and possibly also the short message problem.
Ciphering raw plaintext data can be dangerous when the cipher has a relatively small block size. Language plaintext has a strong, biased distribution of symbols and ciphering raw plaintext would effectively reduce the number of possible plaintext blocks. Worse, some plaintexts would be vastly more probable than others, and if some known plaintext were available, the most-frequent blocks might already be known. In this way, small blocks can be vulnerable to classic codebook attacks which build up the ciphertext equivalents for many of the plaintext phrases. This sort of attack confronts a particular block size, and for these attacks Triple-DES is no stronger than simple DES, because they both have the same block size.
The usual way of avoiding these problems is to randomize the plaintext block with an operating mode such as CBC. This can ensure that the plaintext data which is actually ciphered is evenly distributed across all possible block values. However, this also requires an IV which thus expands the ciphertext.
Worse, a block scrambling or randomization function like CBC is public, not private. It is easily reversed to check overall language statistics and thus distinguish the tiny fraction of brute force results which produce potentially valid plaintext blocks. This directly supports brute force attack, as well as any attack in which brute force is a final part. One alternative is to use a preliminary cipher to randomize the data instead of an exposed function. Pre-ciphering prevents easy plaintext discrimination; this is multiple ciphering, leading in the direction Shannon's Ideal Secrecy.
Another approach (to using the full block data space) is to apply data compression to the plaintext before enciphering. If this is to be used instead of plaintext randomization, the designer must be very careful that the data compression does not contain regular features which could be exploited by the opponents.
An alternate approach is to use blocks of sufficient size
for them to be expected to have a substantial amount of uniqueness or
entropy.
If we expect plaintext to have about one bit of
entropy per byte of text, we might want a block size of at
least 64 bytes before we stop worrying about an uneven
distribution of plaintext blocks. This is now a practical
block size.
It may be helpful to recall a range of published distinctions between "stream cipher" and "block cipher" (and if anyone has any earlier references, please send them along). Note that open discussion was notably muted during the Cold War, especially during the 50's, 60's and 70's. I see the earlier definitions as attempts at describing an existing codification of knowledge, which was at the time tightly held but nevertheless still well-developed.
The intent of classification is understanding and use. Accordingly, it is up to the analyst or student to "see" a cipher in the appropriate context, and it is often useful to consider a cipher to be a hierarchy of ciphering techniques. For example, it is extremely rare for a block cipher to encipher exactly one block. But when that same cipher is re-used again that seems a lot like repeated substitution, which is the basis for stream ciphering. (Of course repeatedly using the same small substitution would be ineffective, but if we attempt to classify ciphers by their effectiveness, we start out assuming what we are trying to understand or prove.) So an alternate way to "see" the re-use of a block cipher is as a higher-level stream "meta-cipher" which uses a block cipher component. But that is exactly what we call "block ciphering."
Some academics insist upon distinguishing stream versus block ciphering by saying that block ciphers have no retained state between blocks, while stream ciphers do. Simply saying that, however, does not make it true, and only one example is needed to expose the distinction as false and misleading. A good example for that is my own Dynamic Transposition cipher, which is a block cipher in that it requires a full block of data before processing can begin, yet also retains state between blocks. So if DT is not a block cipher, what is it? We would hope to define only two categories, not four or more. Note that Lempel (1979, above) explicitly says that transposition is a block cipher. Again, see: a cipher taxonomy to see one approach on how ciphers relate.
Another issue is that stream ciphers can be implemented in ways that accumulate a block of data before ciphering. Internally, such systems generally have a streaming system which traverse the block element-by-element, perhaps multiple times. It is important to see beyond an apparent block requirement stemming from data manipulation only, which thus contributes no strength, to the internal ciphers which (hopefully) do provide strength.
It is also possible to have multiple stream ciphers work on the
same "block," and then we do have a legitimate "block cipher"
(or perhaps a "block meta-cipher") formed by
multiple encryption of stream
ciphers.
(Although multiple ciphering with
additive stream ciphers is
usually unhelpful, most conventional block ciphers are in fact
multiple encryptions internally, so internal multiple ciphering is
hardly a crazy approach.)
But if we want to understand strength, we still need to consider the
fundamental ciphering operations which, here, are streams.
Simply making something work like a block cipher does not give it
the same
model as a conventional
block cipher, and so does not provide for analysis at that level.
In the end, we might see such a construction as a block meta-cipher
composed of internal stream ciphers.
Currently, there are three main block cipher models:
The common academic model of a block cipher is the mathematical bijection, which cryptography calls simple substitution. In practice, such a cipher requires a table far too large to instantiate, and so the actual cipher only emulates a huge, keyed table.
One advantage of the bijection model is that specific, measurable mathematical things can be said about a bijection. Of course exactly the same things also can be said about simple substitution, and the field of ciphering is cryptography, not mathematics.
One problem with the bijection model is that it does not attempt to establish a dichotomy. In the bijection model, "block cipher" just another label in a presumably endless sequence of such labels, each representing a distinct ciphering approach. Consequently, the bijection model makes a poor contribution toward an overall cipher taxonomy useful in the analysis of arbitrary cipher designs.
Another problem with the bijection model is that it establishes yet another term of art: The word block is well known, understood, and rarely disputed. The word cipher is also widely agreed upon. The phrase "block cipher" obviously includes nothing about bijections. So to define "block cipher" in terms of bijections is to take the phrase far beyond the simple meaning of the terms. We could scarcely describe this as anything other than misleading.
Yet another problem with the bijection model, is that, since it presumes to define "block cipher" as a particular type of cipher, what are we to do with ciphers which operate on blocks and yet do not function as bijections (e.g., transposition cipher)? No longer are ciphers related by their proper description. This is even more misleading.
Ultimately, the problem with the bijection model is not the model itself: The model is what it is because substitution is what it is. The problem is the insistence by some academics that this is the only valid model for a "block cipher." A much better choice for the bijection model is the phrase: "conventional block cipher."
The static state model puts forth the proposition that stream ciphers dynamically change their internal state, whereas block ciphers do not. Typically, there is also an understanding that the bijective block cipher model applies.
One problem with the static state definition is again in the name itself: The phrase "block cipher" does not include the word "state." To use the phrase "block cipher" for a property of state is to create yet another term of art, preempting the obvious meaning of the phrase "block cipher," and preventing related block-like ciphers from having similar descriptions, thus misleading both instructor and student.
Another problem with the static state model is that we can build stream-like ciphers which do not change their internal state (in fact, I claim we stream a substitution table when we repeatedly use it across a message, just like we stream DES). Similarly, we can build block-like ciphers which do change their internal state (I usually offer my Dynamic Transposition cipher as an example, but so is a block cipher built from multiple internal stream operations). So if we accept the static state model, what do we call those ciphers which function on blocks, and yet do change state? Why preempt the well-known terms "block" or "stream" for the fundamentally different properties of internal state?
Ultimately, what insight does state classification provide that warrants usurping the obvious descriptive phrases "block cipher" and "stream cipher" instead of thinking up something appropriate?
The original mechanized ciphers were stream ciphers, starting with the Vernam cipher of 1919. The term "block cipher" may have been introduced in the secret world of government security to draw a practical distinction between the well-known stream concept, and the newer designs that operated on a block. (That would have been in the 50's, 60's or even 70's; hopefully, someone will either confirm this or correct it.) In the multiple-element model, a block cipher requires the accumulation of more than one data element before ciphering can begin.
One advantage of the multiple-element definition is that it forms an easy dichotomy with the definition of a stream cipher as a cipher which does not require such accumulation. Also note that this is no mere semantic issue, but is instead just one representation of a broader concept of "one versus many" which rises repeatedly in computing practice, including:
The various consequences of the single-element versus multiple-element dichotomy are well known: When blocks are accumulated from individual elements, storage is required for that accumulation, and time is required as well, which can imply latency. In contrast, when elements need not be accumulated, there need be neither storage nor latency, but the total overhead may be greater. While latency probably is not much of an issue for email ciphering, latency can be significant for real-time streams like music or video, or interactive handshake protocols. Overhead is, of course, a significant issue in system design.
To see how the multiple-element block cipher definition works, consider the following:
Strangely, a degenerate block is exactly the same as a
degenerate sequence: just one element.
In neither case does that element teach about the larger object: a
one-element block does not have diffusion between elements, and a
one-element stream does not have correlation between elements.
(Similarly, is a single electronic wire with a fixed voltage
one-wire "parallel" or one-value "serial"?)
From this we conclude that the most important aspects of
cryptographic (and electronic) design and analysis simply
do not exist as a single element, so it is inappropriate
to either use or judge a model at that level.
A block size of n bits typically means that 2n different codewords or "block values" can occur. An (n,k) block code uses those 2n codewords to represent the equal or smaller count of 2k different messages. Thus, a 64-bit block cipher normally encodes 64 plaintext bits into 64 ciphertext bits as a simple (64,64) code. But if 16 input bits are reserved for other use, the coding expands 48 plaintext bits into 64 ciphertext bits, so we have a (64,48) code.
The normal use for extra codewords is to implement some form of error detection and/or error correction. This overhead is not normally called "inefficient coding," but is instead a simple cost of providing improved quality. In cryptography, the extra code words may be used to add security or improve performance by implementing:
Also see
8b10b and
huge block cipher advantages.
NOT as complementation, indicated by ' (single quote) 0' = 1 1' = 0 OR as addition, denoted "+" 0 + 0 = 0 0 + 1 = 1 1 + 0 = 1 1 + 1 = 1 AND as multiplication, denoted "*" as usual 0 * 0 = 0 0 * 1 = 0 1 * 0 = 0 1 * 1 = 1 XOR a useful but not essential operation 0 XOR 0 = 0 0 XOR 1 = 1 1 XOR 0 = 1 1 XOR 1 = 0
1. Addition is Commutative: x + y = y + x 2. Addition is Associative: x + (y + z) = (x + y) + z 3. The Additive Identity: 0 + x = x 4. The Additive Inverse: x + x' = 1 5. Multiplication is Associative: x(yz) = (xy)z 6. Multiplication is Distributive: x(y + z) = xy + xz 7. Multiplication is Commutative: xy = yx 8. The Multiplicative Identity: 1 * x = x Other: 1 + x = x x + x = x x * x = x (x')' = x DeMorgan's Laws: (x + y)' = x'y' (xy)' = x' + y' XOR: x XOR y = xy' + x'y (x XOR y)' = xy + x'y'
Typically computed as the fast Walsh-Hadamard transform (FWT) of the function being measured. For more details, see the topic unexpected distance and the "Active Boolean Function Nonlinearity Measurement in JavaScript" page (locally, or @: http://www.ciphersbyritter.com/JAVASCRP/NONLMEAS.HTM).
Note that the FWT computation is done for efficiency only. It is wholly practical to compute the nonlinearity of short sequences by hand. It is only necessary to manually compare each bit of the measured sequence to each bit of an affine Boolean function. That gives us the distance from that particular function, and we repeat that process for every possible affine Boolean function of the measurement length.
Especially useful in S-box analysis, where the nonlinearity for the table is often taken to be the minimum of the nonlinearity values computed for each output bit.
Also see my articles:
The original definition is for linear mapping theta,
"The minimum total Hamming weight [wh] of (a,theta(a)) is a measure for the minimum amount of diffusion that is realized by a linear mapping."Definition 6.10 The branch number B of a linear mapping theta is given by
B(theta) = min(wh(a),wh(theta(a))), for a<>0
-- Daemen, J. 1995. Cipher and Hash Function Design, Strategies Based on Linear and Differential Cryptanalysis. Thesis. Section 6.8.1.
Branch number specifically applies only to a linear mixing. Actually, even that is not quite right: the real problem is keying, not nonlinearity (although in practice, keying may imply nonlinearity). To the extent that we can experimentally traverse the input block, a branch number certainly can be developed for a nonlinear mixer.
But while any particular mixer can have a branch number, a keyed mixer will have a branch number for every possible key. Moreover, we would expect the minimum over all those nonlinear mixings to be very low, just like the minimum strength of any cipher over all possible keys (the opponent trying just one key) is also very low. Yet we do not attempt to characterize ciphers by their minimum strength over all possible keys.
No keyed structure can be properly characterized by the extrema over all keys. When we have random variables such as keying, we should be thinking of the distribution of values, and the probability of encountering extreme values. And that is not branch number.
More insight is available in the description of the SQUARE cipher:
"It is intuitively clear that both linear and differential trails would beneft from a multiplication polynomial that could limit the number of nonzero terms in input and output difference (and selection) polynomials. This is exactly what we want to avoid by choosing a polynomial with a high diffusion power, expressed by the so-called branch number.
Let wh(a) denote the Hamming weight of a vector, i.e., the number of nonzero components in that vector." [Normally, Hamming weight applies to bits, but here it is being used for bytes./tfr] "Applied to a state a, a difference pattern a' or a selection pattern u, this corresponds to the number of non-zero bytes. In [2] the branch number B of an invertible linear mapping was introduced as
B(theta) = for a<>0, min wh(a)+ wh(a)This implies that the sum of the Hamming weights of a pair of input and output difference patterns (or selection patterns) to theta is at least B. It can easily be shown that B is a lower bound for the number of active S-boxes in two consecutive rounds of a linear or differential trail.""In [15] it was shown how a linear mapping over GF(2m)n with optimal B
(B = n + 1) can be constructed from a maximum distance separable code."
-- Daemen, J., L. Knudsen and V. Rijmen." 1997. "The Block Cipher {SQUARE}." Fast Software Encryption, Lecture Notes in Computer ScienceVol. 1267:149-165. Section 4.
In the wide trail strategy, branch number applies to a particular unkeyed and linear diffusion mechanism. In the SQUARE design, branch number also applies to a particular unkeyed and linear polynomial multiplication. So branch number might also describe the simple linear form of Balanced Block Mixing used in Mixing Ciphers. But linear BBM's apparently do not have an optimal branch number (over all possible input data changes), although in most cases they do have a good branch, and are dynamically scalable to both tiny and huge blocks on a block-by-block basis.
Instead of linear diffusion, it should be "intuitively obvious" that nonlinear diffusion would be a better choice for a cipher, if such could be obtained with good quality at reasonable cost. Nonlinear Balanced Block Mixing occurs when the butterfly functions are keyed. Keying is easily accomplished by constructing appropriate orthogonal Latin squares using the fast checkerboard construction. But "branch number" does not apply to these keyed nonlinear constructions.
The "optimal" branch value for the MDS codes in the SQUARE
design is given as
From the Handbook of Applied Cryptography:
"1.23 Definition. An encryption scheme is said to be breakable if a third party, without prior knowledge of the key pair (e,d), can systematically recover plaintext from corresponding ciphertext in some appropriate time frame." [p.14]
"Breaking an information security service (which often involves more than simply encryption) implies defeating the objective of the intended service." [p.15]
The term "break" seems to be a term of art in academic cryptanalysis, where it apparently means a successful attack which takes less effort than brute force (or the cipher design strength, if that is less), even if the effort required is impractical, and even if the attack is easily prevented at the cipher system level. This meaning of the term "break" can be seriously misleading because, in English, "break" means "to render unusable" or "to destroy," and not just "to make a little more dubious."
The academic meaning of "break" is also controversial, as it can be used as a slander to demean both cipher and designer without a clear analysis of whether the attack really succeeds. And even if the attack does succeed, the question is whether it actually reveals data or key material, thus making the cipher dangerous for use in practice.
Everyone understands that a cipher is "broken" when the information in a message can be extracted without the key, or when the key itself can be recovered, with less effort than the design strength. And a break is particularly significant when the work involved need not be repeated on every message. But when the amount of work involved is impractical, the situation is best described as a theoretical or academic break. The concept of an "academic break" is especially an issue for ciphers with a very large keyspace, in which case it is perfectly possible for a cipher with an academic break to be more secure than ciphers with lesser goals which have no "break." It is also at least conceivable that an attack can be surprising and insightful and, thus, "successful" even if it takes more effort than the design strength, which would be no form of "break" at all.
In my view, a documented flaw in a cipher, such as some statistic
which
distinguishes a practical cipher from
some
model, but without an
attack which recovers data or key, at most
should be described as a "theoretical" or "certificational"
weakness.
Unfortunately, even a problem which has no impact on security
is often promoted (improperly, in my view) to the term
academic break or even "break"
itself.
Even when the key length of a cipher is sufficient to prevent brute force attack, that key will be far too small to produce every possible plaintext from a given ciphertext (see Shannon's Perfect Secrecy). Combined with the fact that language is redundant, this means that very few of the decipherings will be words in proper form. So most wrong keys could be identified immediately.
On the other hand, recognizing plaintext may not be easy.
If the plaintext itself
Brute force is the obvious way to attack a cipher, and the way
most ciphers can be attacked, so ciphers are designed to have a
large enough
keyspace to make this much too expensive
to succeed in practice.
Normally, the design
strength of a cipher is based on the
cost of a brute-force attack.
In most FFT diagrams, the input elements are shown in a vertical
column at the left, and the result elements in a vertical column on
the right.
Lines represent signal flow from left to right.
There are two computations, and each requires input from each of
the two selected elements.
In an "in place" FFT, the results conveniently go back into the
same positions as the input elements.
So we have two horizontal lines between the same elements, and
two diagonal lines going to each "other" element, which cross.
This is the "hourglass" shape or "butterfly wings" on edge.
A source of stable power is the most important requirement for any electronic device. In particular, digital logic functions can only be trusted to produce correct results if their power is kept within specified limits. It is up to the designer to provide sufficient correct power and guarantee that it remain within limits despite whatever else is going on.
Most digital logic families use "totem pole" outputs, which means they have a transistor from Vcc or Vdd (power) to the output pin, and another transistor from the output pin to Vss (ground). Normally, only one transistor is ON, but as the output signal passes from one state to another, transiently, both transistors can be ON, leading to short, high-current pulses on both the Vcc and ground rails. These current pulses are essentially RF energy, and can and do produce ringing on power lines and a general increase in system noise. The pulses are also strong enough to potentially change both the Vcc and ground voltage levels in the power distribution system near the device, which can affect nearby logic and operation. Typically this occurs at some random moment when the worst conditions coincide to cause a logic fault. To avoid that, we want to bypass the current pulse away from the power system in general, so other devices are not affected.
For many years, a typical rule of thumb was to use a 0.1uF ceramic disc for each supply at each bipolar chip, plus a 1uF tantalum for every 8 chips. That may still be a good formula for slower analog chips and older digital logic like LSTTL. But as chip speed has increased, bypassing has become more complex.
Ideally, a bypass capacitor will be connected from every supply pin to the ground pin right at each chip. Ideally, there will be no lead left on either end of the capacitor: not 1/4 inch, not 1/8 inch, which is one reason why surface-mount capacitors are desirable. Ideally, any necessary lead will be wide, flat copper. But the ideal system is a goal, not reality.
One of the effects of higher system speeds is that normal system operation now covers the resonant frequency of the bypass capacitors. Unfortunately, this resonance is not a fixed constant, even for a particular type of part. Bypass resonance is instead a circuit condition, involving the reactance of the closest bypass capacitor, plus the inductance in power connections, and reactance in other bypass capacitors. Although it is virtually impossible to remove inductance from PC-board traces, it is possible to use whole copper layers as "power planes" for power distribution.
Resonance means that an impulse causes "ringing," in which energy is propagated back and forth between inductance and capacitance until it finally dissipates in circuit resistance or is radiated away, but the resulting signal from many devices may appear as increased system noise.
Resonance would actually seem to be the ideal bypass situation, in that a resonant bypass presents the minimum impedance to ground. But it does that only at one frequency; lower and higher frequencies are less rejected. It seems quite impractical to tune for resonance with the "random" pulses occurring in complex logic. And, above resonance, inductance dominates and then higher-frequency noise and pulses are more able to affect the rest of the system.
Another approach has been to use various bypass capacitors, typically 0.01uF and 0.1uF in parallel, "sprinkled around" the PC layout. The idea was that self-resonance in any one bypass capacitor would be hidden by the other capacitor of different value and, thus, different resonant frequency. Alone, either a 0.01uF or a 0.1uF cap may do an effective job. However, recent modeling indicated, and experimentation has confirmed, that using both together can be substantially worse than using either value alone.
The inherent limitation in bypassing is that the normal bypass process is not "lossy" or dissipative. Pulse energy can be stored in the inductance of short leads or PC-board traces, and then "ring" in resonance with the usual ceramic bypass capacitors. Having many bypass caps often leads to complex RF filter-like structures which just pass the ringing energy around. An alternative is the wide use of tantalum bypass capacitors, since tantalum becomes increasingly lossy at higher frequencies and will dissipate pulse energy.
Several approaches seem reasonable:
If we know the capacitance C in Farads and the frequency f in Hertz, the capacitive reactance XC in Ohms is:
XC = 1 / (2 Pi f C) Pi = 3.14159...Capacitors in parallel are additive. Two capacitors in series have a total capacitance which is the product of the capacitances divided by their sum.
A capacitor is typically two conductive "plates" or metal foils separated by a thin insulator or dielectric, such as air, paper, or ceramic. An electron charge on one plate attracts the opposite charge on the other plate, thus "storing" charge. A capacitor can be used to collect a small current over long time, and then release a high current in a short pulse, as used in a camera strobe or "flash."
The simple physical model of a component which is a simple capacitance and nothing else works well at low frequencies and moderate impedances. But at RF frequencies and modern digital rates, there is no "pure" capacitance. Instead, each capacitance has a series inductance that often does affect the larger circuit. See bypass.
Also see
inductor and
resistor.
The earliest definition of "cascade cipher" I know (1983) does not mention key independence:
"A Cascade Cipher (CC) is defined as a concatenation of block cipher systems, thereafter referred to as its stages; the plaintext of the CC plays the role of the plaintext of the first stage, the ciphertext of the i-th stage is the plaintext of the (i+1)-st stage and the ciphertext of the last stage is the ciphertext of the CC.A modern academic definition is:"We assume that the plaintext and ciphertext of each stage consists of m bits, the key of each stage consists of k bits and there are t stages in the cascade."
[Note the lack of the term "independent."/tfr]
-- Even, S. and O. Goldreich. 1983. "On the power of cascade ciphers." Advances in Cryptology: Proceedings of Crypto '83.43-50.
"[The] Product of several ciphers is also a product cipher, such a design is sometimes called a cascade cipher."[Note the lack of anything like "with independent keys."/tfr]
-- Biryukov, Alex. (Faculty of Mathematics and Computer Science, The Weizmann Institute of Science.) 2000. Methods of Cryptanalysis. "Lecture 1. Introduction to Cryptanalysis."
Similarly, the term
"product encipherment" is defined in
Shannon 1949 (and is quoted here under
Algebra of Secrecy Systems)
as the use of one cipher, then another with independent keys.
Thus, the independent key terminology was defined in cryptography
over half a century ago, and probably 34 years before
"cascade ciphering" was defined for the same idea without the
key independence requirement.
Both terms are commonly and legitimately confused in use.
Anyone using the terms "cascade ciphering" or "product ciphering"
would be well advised to explicitly state what the term is supposed
to mean, or to not complain when someone takes it to mean something
else.
Particularly inappropriate as a description of
multiple encryption, because a
physical "chain" is is only as strong as the weakest link,
while a sequence of ciphers is as strong as the strongest link.
Chance
Chaos
In physics, the state of an analog physical system cannot be fully measured, which always leaves some remaining uncertainty to be magnified on subsequent steps. And, in many cases, a physical system may be slightly affected by thermal noise and thus continue to accumulate new information into its state.
In a
computer, the state of the
digital
system is explicit and
complete, and there is no uncertainty. No noise is accumulated.
All operations are completely
deterministic. This means that, in a
computer, even a "chaotic" computation is completely predictable
and repeatable.
One way to construct a larger square is to take some Latin square and replace each of the symbols or elements with a full Latin square. By giving the replacement squares different symbol sets, we can arrange for symbols to be unique in each row and column, and so produce a Latin square of larger size.
If we consider squares with numeric symbols, we can give each replacement square an offset value, which is itself determined by a Latin square. We can obtain offset values by multiplying the elements of a square by its order:
0 1 2 3 0 4 8 12 1 2 3 0 * 4 = 4 8 12 0 2 3 0 1 8 12 0 4 3 0 1 2 12 0 4 8To simplify the example, we can use the same original square for all of the replacement squares:
0+ 0 1 2 3 4+ 0 1 2 3 8+ 0 1 2 3 12+ 0 1 2 3
1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0
2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1
3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2
4+ 0 1 2 3 8+ 0 1 2 3 12+ 0 1 2 3 0+ 0 1 2 3
1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0
2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1
3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2
8+ 0 1 2 3 12+ 0 1 2 3 0+ 0 1 2 3 4+ 0 1 2 3
1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0
2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1
3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2
12+ 0 1 2 3 0+ 0 1 2 3 4+ 0 1 2 3 8+ 0 1 2 3
1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0
2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1
3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2
which produces the order-16 square:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 0 5 6 7 4 9 10 11 8 13 14 15 12 2 3 0 1 6 7 4 5 10 11 8 9 14 15 12 13 3 0 1 2 7 4 5 6 11 8 9 10 15 12 13 14 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 5 6 7 4 9 10 11 8 13 14 15 12 1 2 3 0 6 7 4 5 10 11 8 9 14 15 12 13 2 3 0 1 7 4 5 6 11 8 9 10 15 12 13 14 3 0 1 2 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 9 10 11 8 13 14 15 12 1 2 3 0 5 6 7 4 10 11 8 9 14 15 12 13 2 3 0 1 6 7 4 5 11 8 9 10 15 12 13 14 3 0 1 2 7 4 5 6 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 13 14 15 12 1 2 3 0 5 6 7 4 9 10 11 8 14 15 12 13 2 3 0 1 6 7 4 5 10 11 8 9 15 12 13 14 3 0 1 2 7 4 5 6 11 8 9 10
Clearly, this Latin square exhibits massive structure at all
levels, but this is just a simple example.
In practice we would create and use a different
There are 576 Latin squares of order 4, any one of which can be
used as any of the 16 replacement squares.
The offset square is another
The construction is also applicable to orthogonal Latin squares. See my articles:
Improved checksums (e.g., Fletcher's checksums) include both data values and data positions and may perform within a factor of 2 of CRC. One advantage of a true summation checksum is a minimal computation overhead in software (in hardware, a CRC is almost always smaller and faster). Another advantage is that when header values are changed in transit, a summation checksum is easily updated, whereas a CRC update is more complex and many implementations will simply re-scan the full data to get the new CRC.
The term "checksum" is sometimes applied to any form of error
detection, including more sophisticated codes like CRC.
In the usual case, "many" random samples are counted by category or separated into value-range "bins." The reference distribution gives us the the number of values to expect in each bin. Then we compute a X2 test statistic related to the difference between the distributions:
X2 = SUM( SQR(Observed[i] - Expected[i]) / Expected[i] )
("SQR" is the squaring function, and we require that each expectation not be zero.) Then we use a tabulation of chi-square statistic values to look up the probability that a particular X2 value or lower (in the c.d.f.) would occur by random sampling if both distributions were the same. The statistic also depends upon the "degrees of freedom," which is almost always one less than the final number of bins. See the chi-square section of the "Normal, Chi-Square and Kolmogorov-Smirnov Statistics Functions in JavaScript" page (locally, or @: http://www.ciphersbyritter.com/JAVASCRP/NORMCHIK.HTM#ChiSquare).
The c.d.f. percentage for a particular chi-square value is the area of the statistic distribution to the left of the statistic value; this is the probability of obtaining that statistic value or less by random selection when testing two distributions which are exactly the same. Repeated trials which randomly sample two identical distributions should produce about the same number of X2 values in each quarter of the distribution (0% to 25%, 25% to 50%, 50% to 75%, and 75% to 100%). So if we repeatedly find only very high percentage values, we can assume that we are probing different distributions. And even a single very high percentage value would be a matter of some interest.
Any statistic probability can be expressed either as the proportion of the area to the left of the statistic value (this is the "cumulative distribution function" or c.d.f.), or as the area to the right of the value (this is the "upper tail"). Using the upper tail representation for the X2 distribution can make sense because the usual chi-squared test is a "one tail" test where the decision is always made on the upper tail. But the "upper tail" has an opposite "sense" to the c.d.f., where higher statistic values always produce higher percentage values. Personally, I find it helpful to describe all statistics by their c.d.f., thus avoiding the use of a wrong "polarity" when interpreting any particular statistic. While it is easy enough to convert from the c.d.f. to the complement or vise versa (just subtract from 1.0), we can base our arguments on either form, since the statistical implications are the same.
It is often unnecessary to use a statistical test if we just want to know whether a function is producing something like the expected distribution: We can look at the binned values and generally get a good idea about whether the distributions change in similar ways at similar places. A good rule-of-thumb is to expect chi-square totals similar to the number of bins, but distinctly different distributions often produce huge totals far beyond the values in any table, and computing an exact probability for such cases is simply irrelevant. On the other hand, it can be very useful to perform 20 to 40 independent experiments to look for a reasonable statistic distribution, rather than simply making a "yes / no" decision on the basis of what might turn out to be a rather unusual result.
Since we are accumulating discrete bin-counts, any fractional expectation will always differ from any actual count. For example, suppose we expect an even distribution, but have many bins and so only accumulate enough samples to observe about 1 count for every 2 bins. In this situation, the absolute best sample we could hope to see would be something like (0,1,0,1,0,1,...), which would represent an even, balanced distribution over the range. But even in this best possible case we would still be off by half a count in each and every bin, so the chi-square result would not properly characterize this best possible sequence. Accordingly, we need to accumulate enough samples so that the quantization which occurs in binning does not appreciably affect the accuracy of the result. Normally I try to expect at least 10 counts in each bin.
But when we have a reference distribution that trails off toward
zero, inevitably there will be some bins with few counts.
Taking more samples will just expand the range of bins, some of which
will be lightly filled in any case. We can avoid quantization error
by summing both the observations and expectations from multiple bins,
until we get a reasonable expectation value (again, I like to see 10
counts or more).
This allows the "tails" of the distribution to be more properly
(and legitimately) characterized.
(The technique of merging adjacent bins is sometimes called
"collapsing.")
Also see:
cryptography,
block cipher,
stream cipher,
substitution,
permutation,
A good cipher can transform secret information into a multitude of different intermediate forms, each of which represents the original information. Any of these intermediate forms or ciphertexts can be produced by ciphering the information under some key value. The intent is that the original information only be exposed by one of the many possible keyed interpretations of that ciphertext. Yet the correct interpretation is available merely by deciphering under the appropriate key.
A cipher appears to reduce the protection of secret information to enciphering under some key, and then keeping that key secret. This is a great reduction of effort and potential exposure, and is much like keeping your valuables in your house, and then locking the door when you leave. But there are also similar limitations and potential problems.
With a good cipher, the resulting ciphertext can be stored or transmitted otherwise exposed without also exposing the secret information hidden inside. This means that ciphertext can be stored in, or transmitted through, systems which have no secrecy protection. For transmitted information, this also means that the cipher itself must be distributed in multiple places, so in general the cipher cannot be assumed to be secret. With a good cipher, only the deciphering key need be kept secret. (See: Kerckhoffs' requirements, but also security through obscurity.)
Note that a cipher does not, in general, hide the length of a plaintext message, nor the fact that the message exists, nor when it was sent, nor, usually, the addressing to whom and from whom. Thus, even the theoretical one time pad (often said to be "proven unbreakable") does expose some information about the plaintext message. If message length is a significant risk, random amounts of padding can be added to confuse that, although padding can of course only increase message size, and is an overhead to the desired communications or storage. This typically would be handled at a level outside the cipher design proper, see cipher system.
It is important to understand that ciphers are unlike any other modern product design, in that we cannot know when a cipher "works." For example:
In CBC mode the ciphertext value of the preceding block is exclusive-OR combined with the plaintext value for the current block. This randomization has the effect of distributing the resulting block values evenly among all possible block values, and so tends to prevent codebook attacks. But ciphering the first block generally requires an IV or initial value to start the process. And the IV necessarily expands the ciphertext by the size of the IV.
[There are various possibilities other than CBC for avoiding plaintext block statistics in ciphers. One alternative is to pre-cipher, presumably with a different cipher and key, thus producing randomized plaintext blocks (see multiple encryption). Another alternative is to use a block at least 64 bytes wide, which, if it contains language text, can be expected to contain sufficient unknowable randomness to avoid codebook attacks (see huge block cipher advantages).]
Note that the exposed nature of the CBC randomizer (the previous block ciphertext) does not hide plaintext or plaintext statistics. When simple deciphering exposes plaintext, the vast majority of possible plaintexts can be rejected automatically, based on their lack of bit-level and character and word structure. Normal CBC does not improve this situation much at all.
In CBC mode, each randomizing value is the ciphertext from each previous block. Clearly, all the ciphertext is exposed to the opponent, so there would seem to be little benefit associated with hiding the IV, which, after all, is just the first of these randomizing values. Clearly, in the usual case, if the opponent makes changes to a ciphertext block in transit, that will hopelessly garble two blocks (or perhaps just one) of the recovered plaintext. As a result, it is very unlikely that an opponent could make systematic changes in the plaintext simply by changing the ciphertext.
But the IV is a special case: if the IV is not enciphered,
and if the opponents can intercept and change the IV in transit,
they can change the first-block plaintext
Despite howls of protest to the contrary, it is easy to see that the CBC first-block problem is a confidentiality problem, not an authentication problem. To see this, we simply note that all that is necessary to avoid the problem is to keep the IV secret. When the IV is protected, the opponent cannot know which changes to make to reach a desired plaintext. And, since the problem can be fixed without any authentication at all, it is clear that the problem was not a lack of authentication in the first place. Instead, the problem was caused by exposing the IV, and solving that is the appropriate province of the CBC and block level, instead of a MAC at the cipher system and message level.
To fix the CBC first-block problem it is not necessary to check the plaintext for changes by using a MAC. Nor is a MAC necessarily the only way to authenticate a message. But if we are going to use a MAC anyway, that is one way to solve the problem. That works because a MAC can detect the systematic changes which a lack of confidentiality may have allowed to occur. But if a MAC is not otherwise desired, introducing a MAC to solve the CBC first-block problem is probably overkill, because only the block-wide IV needs to be protected, and not the entire message.
The reason we might not want to use a MAC is that a MAC carries some inherent negative consequences. One of those is a processing latency, in that we cannot validate the recovered plaintext until we get to the end and check the digest. Latency can be a serious problem with streaming data like audio and video, and with interactive protocols. But even with an email message we have to buffer the whole message as decrypted and wait for the incoming data to finish before we can do anything with it (or we can make encryption hard and decryption easy, but one side will be a problem). Or we can set up some sort of packet structure with localized integrity checks and ciphertext expansion in each packet. But that seems like a lot of trouble when an alternative is just to encipher the IV.
Even when a MAC is used at a higher level anyway, it may be important for Software Engineering and modular code construction to handle at the CBC level as many of the problems which CBC creates as possible. This avoids forcing the problem on, and depending upon a correct response from, some unknown programmer at the higher level, who may have other things on the mind. Handling security problems where they occur and not passing them on to a higher layer is an appropriate strategy for security programming.
As the problems compound themselves, it seems legitimate to point
out that the CBC first-block problem is a CBC-level security issue
caused by CBC and by transporting the IV in the open.
The CBC first-block problem is easily prevented simply by
transporting the IV securely, by encrypting the IV before including
it with the ciphertext.
Also see "The IV in Block Cipher CBC Mode" conversation
(locally, or @:
http://www.ciphersbyritter.com/NEWS6/CBCIV.HTM).
Also see
traffic analysis,
Software Engineering,
Structured Programming, and
comments in the "Cipher Review Service" document
(locally, or @:
http://www.ciphersbyritter.com/CIPHREVU.HTM).
For the analysis of cipher operation it is useful to collect ciphers into groups based on their functioning (or intended functioning). The goal is to group ciphers which are essentially similar, so that as we gain an understanding of one cipher, we can apply that understanding to others in the same group. We thus classify not by the components which make up the cipher, but instead on the "black-box" operation of the cipher itself.
We seek to hide distinctions of size, because operation is independent of size, and because size effects are usually straightforward. We thus classify conventional block ciphers as keyed simple substitution, just like newspaper amusement ciphers, despite their obvious differences in strength and construction. This allows us to compare the results from an ideal tiny cipher to those from a large cipher construction; the grouping thus can provide benchmark characteristics for measuring large cipher constructions.
We could of course treat each cipher as an entity unto itself, or relate ciphers by their dates of discovery, the tree of developments which produced them, or by known strength. But each of these criteria is more or less limited to telling us "this cipher is what it is." We already know that. What we want to know is what other ciphers function in a similar way, and then whatever is known about those ciphers. In this way, every cipher need not be an island unto itself, but instead can be judged and compared in a related community of similar techniques.
Our primary distinction is between ciphers which handle all the data at once (block ciphers), and those which handle some, then some more, then some more (stream ciphers). We thus see the usual repeated use of a block cipher as a stream meta-cipher which has the block cipher as a component. It is also possible for a stream cipher to be re-keyed or re-originate frequently, and so appear to operate on "blocks." Such a cipher, however, would not have the overall diffusion we normally associate with a block cipher, and so might usefully be regarded as a stream meta-cipher with a stream cipher component.
The goal is not to give each cipher a label, but instead to seek insight. Each cipher in a particular general class carries with it the consequences of that class. And because these groupings ignore size, we are free to generalize from the small to the large and so predict effects which may be unnoticed in full-size ciphers.
In general, absent special coding for transmission (such as converting full binary into base-64 for email) ciphertext should be "random-like." Accordingly, we can run all sorts of tests to try to find any sort of structure or correlation in the ciphertext, or between plaintext, key, and ciphertext. The many available statistical randomness tests should provide ample opportunity for virtually unlimited testing.
The usual or conventional block cipher is intended to emulate a huge, keyed, substitution table. Mathematically, such a function is a bijection, and the symbols in the table are a permutation. These structures might be measured, at least in theory. But very few conventional block ciphers are scalable to tiny size, and the vast size of a real block cipher allows only statistical sampling.
One obvious issue in block cipher construction is diffusion. If the resulting emulated table really is a permutation, if we change the input value in any way, we expect the number of bits which change in the output to occur in a binomial distribution. In addition, we expect each output bit to have a 50 percent probability of changing. We can measure these things.
Typically, we pick some random input value and cipher to get the result; then we change some bit of the input and get the new result and note which and how many bits changed. One advantage of the binomial distribution is that, as block size increases, the distribution becomes increaingly narrow (for any reasonable probability). Thus, we can hope to peer into tremendously small probabilities, which may be about as much error as we can expect to find.
We also can develop a mean value for each output bit, or analyze a particular bit more closely, looking for correlations between input and output, or between key and output, or between the key and some aspect of the transformation between input and output. We might look at correlations between each key bit and each output bit, or between any combination of key bits versus any combination of output bits and so on. With increasingly large experiments, we can perform increasingly fine statistical analyses.
An issue of at least potential concern is that conventional block cipher designs do not implement a completely keyed transformation , but instead implement only a tiny, tiny fraction of all possible tables of the block size. This opens the possibility of weakness in some form of correlation resulting from a tiny subset of implemented permutations. The issue then becomes one of trying to measure possible structural correlations between the set of implemented permutations and the key, including individual bits, or even arbitrary functions of arbitrary multiple bits. At real cipher size, such measurements will be difficult. Or perhaps knowledge of some subset of the transformation could lead to filling out the rest of the transformation; at real cipher size, this may be very difficult to see.
Cipher designs which are scalable can be tested at real size when that is useful, or as tiny "toy" versions, when that is useful. Naturally, the tiny versions are not intended to be as strong as the real-size versions, nor even to be a useful cipher at that size. One purpose is to support exhaustive correlation testing to reveal structural problems which should be easier to discern in the smaller construction. The goal would be to find fault at the tiny size, and then use that to develop insight leading to a scalable attack. That same insight also should help improve the cipher design.
One advantage of scalability is to support attacks on the same cipher at different sizes. Once we find an attack on a toy-size version, we can measure how hard that approach really is by actually doing it. Then we can scale up the cipher slightly and measure how much the difficulty has increased. That can provide true evidence which can be used to extrapolate the strength of the real-size cipher, under the given attack. I see this as vastly more believable information than we have for current ciphers.
Another thing we might do is to measure Boolean function nonlinearity values. This measure at least has the advantage of directly addressing one form of strength: the linear predictability of each key-selected permutation.
Yet another thing we might investigate is the number of keys that are actually different. That is, do any keys produce the same emulated table, and if not, how close are those tables? Can we find any two keys that produce the same ciphertext from the same plaintext? (See population estimation and multiple encryption.)
The conventional stream cipher consists of a keyed RNG or confusion generator and some sort of data and confusion combiner, usually exclusive-OR. Since exclusive-OR has absolutely no strength of its own, the strength of the classic stream cipher depends solely on the RNG. Such testing is a common activity in cryptography, using various available statistical randomness tests. (But recall that many strengthless statistical RNG's do well on such tests.) I particularly recommend runs up/down, because we can develop a useful non-flat distribution of results and then compare that to the theoretical expectation. We can do similar things with birthday tests, which are also useful in confirming the coding efficiency or entropy of really random generators.
Modern stream ciphers with
nonlinear combiners (see, for example:
Dynamic Substitution)
seem harder to test.
Presumably we can test the ciphertext for
randomness, as usual, yet that would not
distinguish between the combiner and the RNG.
Possibly we could test the combiner with RNG, and then the RNG
separately, and compare distributions.
However, it is not clear what sort of tests would provide useful
insight to this construction.
Alternate suggestions are welcomed.
Ciphertext contains the same information as the original plaintext, hopefully in a form which cannot be easily understood. Cryptography hides information by transforming a plaintext message into any one of a vast multitude of different ciphertexts, as selected by a key. Ciphertext thus can be seen as a code, in which the exact same ciphertext has a vast number of different plaintext interpretations. As a goal, it should be impractical to know which interpretation represents the original plaintext without knowing the key.
Normally, ciphertext will appear
random; the values in the ciphertext
should occur in a generally
balanced way.
Normally, we do not expect ciphertext to
compress to a smaller size; that
implies efficient
coding (also see
entropy), but only for the
It also may happen that the ciphertext can be
encoded inefficiently (perhaps as
Ciphertext expansion is the general situation: Stream ciphers need a message key, and block ciphers with a small block need some form of plaintext randomization, which generally needs an IV to protect the first block. Only block ciphers with a large size block generally can avoid ciphertext expansion, and then only if each block can be expected to hold sufficient uniqueness or entropy to prevent a codebook attack.
It is certainly true that in most situations of new construction
a few extra bytes are not going to be a problem. However, in some
situations, and especially when a cipher is to be installed into
an existing system, the ability to encipher data without
requiring additional storage can be a big advantage. Ciphering
data without expansion supports the ciphering of data structures
which have been defined and fixed by the rest of the system,
provided only that one can place the cipher at the interface
"between" two parts of the system. This is also especially
efficient, as it avoids the process of acquiring a different,
larger, amount of store for each ciphering. Such an installation
also can apply to the entire system, and not require the
re-engineering of all applications to support cryptography in
each one.
CFB is closely related to OFB, and is intended to provide some of the characteristics of a stream cipher from a block cipher. CFB generally forms an autokey stream cipher. CFB is a way of using a block cipher to form a random number generator. The resulting pseudorandom confusion sequence can be combined with data as in the usual stream cipher.
CFB assumes a shift register of the block cipher block size. An IV or initial value first fills the register, and then is ciphered. Part of the result, often just a single byte, is used to cipher data, and the resulting ciphertext is also shifted into the register. The new register value is ciphered, producing another confusion value for use in stream ciphering.
One disadvantage of this, of course, is the need for a full
block-wide ciphering operation, typically for each data byte
ciphered. The advantage is the ability to cipher individual
characters, instead of requiring accumulation into a block
before processing.
In a sense, the idea of a ciphertext-only attack is inherently incomplete. By themselves, symbols and code values have no meaning. So we can have all the ciphertext we want, but unless we can find some sort of structure or relationship to plaintext, we have nothing at all. The extra information necessary to identify a break could be the bit structure in the ASCII code, the character structure of language, or any other known relation. But the ciphertext is never enough if we know absolutely nothing about the plaintext. It is our knowledge or insight about the plaintext, the statistical structure, or even just the known use of one plaintext concept, that allows us to know when deciphering is correct.
In practice, ciphertext-only attacks typically depend on some
error or weakness in the
encryption design which somehow relates
some aspect of
plaintext in the ciphertext. For example,
codes that always encrypt the same words in
the same way naturally leak information about how often those words
are used, which should be enough to identify the plaintext.
And the more words identified, the easier it is to fill in the gaps
in sentences, and, thus, identify still more words. Modern
ciphers are less likely to fall into that
particular trap, making ciphertext-only attacks generally more
academic than realistic (also see
break).
See the documentation:
In a digital system we create a delay or measure time by simply
counting pulses from a stable
oscillator.
Since counting operations are digital, noise effects are virtually
eliminated, and we can easily create accurate delays which are as
long as the count in any counter we can build.
Code values can easily represent not only symbols or characters, but also words, names, phrases, and entire sentences (also see nomenclator). In contrast, a cipher operates only on individual characters or bits. Classically, the meaning of each code value was collected in a codebook. Codes may be open (public) or secret.
Coding is a very basic part of modern computation and generally implies no secrecy or information hiding. In modern usage, a code is often simply a correspondence between information (such as character symbols) and values (such as the ASCII code or Base-64). Because a code can represent entire phrases with a single number, one early application for a public code was to decrease the cost of telegraph messages.
In general, secret codes are weaker than ciphers, because a typical code will simply substitute or transform each different word or letter into a corresponding value. Thus, the most-used plaintext words or letters also become the most-used code or ciphertext values and the statistical structure of the plaintext remains exposed. Then the opponent easily can find the most-used ciphertext values and realize that they represent the most-used plaintext words. Accordingly, it is common to superencipher a coded message in an attempt to hide the codebook values.
A meaningful code is more than just data, being also the interpretation of that data. The main concept of modern cryptography is the use of a key to select one interpretation from among vast numbers of different interpretations, so that meaning is hidden from those who do not have both the appropriate decryption program and key. Each particular ciphertext is interpreted by the decryption system to produce the desired plaintext. The pairing of value plus interpretation to produce or do something occurs in various places:
In real life, many useful things do require a particular thing
to use them.
For example, gasoline provides energy for cars, but only because
cars have the appropriate engine to perform the desired conversion.
Similarly, bullets require guns, radio broadcasting stations
require radios and so on.
But that probably reaches beyond the idea of a code, which is
basically limited to information- or symbol-oriented transformations.
The usual ciphertext only approach depends upon the plaintext having strong statistical biases which make some values far more probable than others, and also more probable in the context of particular preceding known values. While this is not known plaintext, it is a form of known structure in the plaintext. Such attacks can be defeated if the plaintext data are randomized and thus evenly and independently distributed among the possible values (see balance).
When a codebook attack is possible on a block cipher, the complexity of the attack is controlled by the size of the block (that is, the number of elements in the codebook) and not the strength of the cipher. This means that a codebook attack would be equally effective against either DES or Triple-DES.
One way a block cipher can avoid a codebook attack is by having
a large
block size which will contain an unsearchable
amount of plaintext "uniqueness" or
entropy. Another approach is to randomize the
plaintext block, by using an
operating mode such as
CBC, or
multiple encryption.
Yet another approach is to change the
key frequently, which is one role of the
message key introduced at the
cipher system level.
Codebreaking is what we normally think of when hearing the WWII crypto stories, especially the Battle of Midway, because many secrecy systems of the time were codes. According to the story, the Japanese are preparing an attack on Midway island, and have given Midway the coded designation "AF." American cryptanalysts have exposed the designator "AF," but not what it represents. Assuming the "AF" to be Midway, American codebreakers have Midway falsely report the failure of their fresh-water plant in open traffic. Then, two days later, intercepted Japanese traffic states that "AF" is short of fresh water. Thus, "AF" is confirmed as Midway.
Note that there had to be a way to identify the actual target
(plaintext) with the code value
(ciphertext) before the meaning was
exposed.
Simply having the ciphertext itself, without finding structure in
the ciphertext or some relationship to plaintext, is almost never
enough, see
ciphertext-only attack.
The classic example is of a cult who believed the Earth was going to end at a particular time. Supposedly, many members gave up their houses and jobs and so on, but the Earth did not end. As a consequence, less-involved members generally accepted that their belief was false. But more-involved members instead insisted that the actions of the cult showed their faith, which was then rewarded by the Earth not ending.
Obviously it is difficult to use
logic to address issues of faith, but
science is not a faith and does not require
belief.
Therefore, when we find that current scientific positions are wrong,
they can be changed with only minor discomfort and anguish.
Supposedly. (Also see
mere semantics and
old wives' tale.)
n
( ) = C(n,k) = n! / (k! (n-k)!)
k
Also,
n n n
( ) = ( ) = 1 ( ) = n
0 n 1
See the combinations section of the "Base Conversion, Logs,
Powers, Factorials, Permutations and Combinations in JavaScript" page
(locally, or @:
http://www.ciphersbyritter.com/JAVASCRP/PERMCOMB.HTM#Combinations).
Also see
permutation.
Consider a conventional block cipher: For any given size block, there is some fixed number of possible messages. Since every enciphering must be reversible (deciphering must work), we have a 1:1 mapping between plaintext and ciphertext blocks. The set of all plaintext values and the set of all ciphertext values is the same set; particular values just have different meanings in each set.
Keying gives us no more ciphertext values, it only re-uses the values which are available. Thus, keying a block cipher consists of selecting a particular arrangement or permutation of the possible block values. Permutations are a combinatoric topic. Using combinatorics we can talk about the number of possible permutations or keys in a block cipher, or in cipher components like substitution tables.
Permutations can be thought of as the number of unique
arrangements of a given length on a particular set. Other
combinatoric concepts include
binomials
and
combinations
(the number of unique given-length subsets of a given set).
Reversible combiners are pretty much required to encipher plaintext into ciphertext in a stream cipher. The ciphertext is then deciphered into plaintext using a related inverse or extractor mechanism. The classic examples are the stateless and strengthless linear additive combiners, such as addition, exclusive-OR, etc.
Reversible and nonlinear keyable combiners with state are a result of the apparently revolutionary idea that not all stream cipher security need reside in the keying sequence. Examples include:
Irreversible or non-invertible combiners are often proposed to mix multiple RNG's into a single confusion sequence, also for use in stream cipher designs. But that is harder than it looks. For example, see:
Also see
balanced combiner,
complete, and also
"The Story of Combiner Correlation: A Literature Survey,"
locally or @:
http://www.ciphersbyritter.com/RES/COMBCORR.HTM.
Also see:
associative and
distributive.
Completeness does not require that an input bit change an output bit for every input value (which would not make sense anyway, since every output bit must be changed at some point, and if they all had to change at every point, we would have all the output bits changing, instead of the desired half). The inverse of a complete function is not necessarily also complete.
As originally defined in Kam and Davida:
"For every possible key value, every output bit ci of the SP network depends upon all input bits p1,...,pn and not just a proper subset of the input bits." [p.748]-- Kam, J. and G. Davida. 1979. Structured Design of Substitution-Permutation Encryption Networks. IEEE Transactions on Computers.C-28(10):747-753.
To build an appropriate algebra and make complex numbers a
field,
the rectangular representation is written as (x+iy) [or (x+jy)],
where i [or j] has the value SQRT(-1).
The symbol i is called "imaginary," but we might just consider it a
way for the algebra to relate the values in the ordered pair.
Clearly,
With appropriate rules like:
addition: (a+bi) + (c+di) = (a+c) + (b+d)i
multiplication: (a+bi) * (c+di) = (ac-bd) + (bc+ad)i
c+di ac+bd ad-bc
division: ---- = ----- + (-----)i
a+bi aa+bb aa+bb
we get complex algebra, and can perform most operations and
even evaluate trignometric and other complex functions like we do
with reals.
In cryptography, perhaps the most common use of complex numbers occurs in the FFT, which typically transforms values in rectangular form. Sometimes we want to know the magnitude or length of the implied vector, which we can get by converting the rectangular (x,y) representation into the (mag,ang) representation:
magnitude: mag(z) = SQRT( x*x + y*y ) angle: ang(z) = arctan( y / x) Note: Computer arctan(x) functions are generally unable to place the angle in the proper quadrant, but arctan2(x,y) routines -- with two input parameters -- may be available to do so.
The most successful components are extremely general and can be used in many different ways. Even as a brick is independent of the infinite variety of brick buildings, a flip-flop is independent of the infinite variety of logic machines which use flip-flops.
The source of the ability to design and build a wide variety of different electronic logic machines is the ability to interconnect and use a few very basic but very general parts.
Electronic components include
The use of individual components to produce a working complex system in production requires: first, a comprehensive specification for each part; and next, full testing to guarantee that each part actually meets the specification (see: quality management).
Digital logic is normally specified to operate correctly over a range of supply voltage, temperature, loading, clock rates, and other appropriate parameters. Specified limits (minimum's or maximum's) guarantee that a working part will operate correctly even with the worst case of all parameters simultaneously. This process allows large, complex systems to operate properly in practice, provided the designer makes sure that none of the parameters can exceed their correct range.
Cryptographic system components include:
A logic machine with:
The general model of mechanical computation is the finite state machine, which is absolutely deterministic and, thus, predictable.
Also see:
source code,
object code,
software,
system design,
Software Engineering and
Structured Programming.
As a rule of thumb, a cubic centimeter (cc) of a solid has about 1024 or 1E24 atoms. In a metal, usually each atom contributes one or two electrons, so a metal has about 1024 (1E24) free electrons per cc. This massive number of free electrons has a tiny resistance to current flow of something like 10-6 ohms across a cubic centimeter of copper, or about one microhm per cm3. Apparently the International Annealed Copper Standard (IACS) says that annealed copper with a cross sectional area of a square centimeter should have a resistance of about 1.7241 microhms/cm (at 20 degrees Celsius), which is satisfactorily close.
A cube with one millimeter sides has 1/100 the cross sectional area of a centimeter cube (and is about like AWG 17 wire), and so would have 100x the resistance per cm., but also is only 1/10 the length, for about 17 microhms per millimeter copper cube. A meter of AWG 17 wire would have 1000 millimeter-size cubes at 17 microhms each, so we would expect it to have about 17 milliohms total resistance. As a check, separate wire tables give the resistance of AWG 17 at 5.064 ohms per 1000ft (304.8m), which is 0.017 ohms (17 milliohms) per meter.
RESISTANCE RELATIVE TO COPPER (cm cube = 1.7241 microhms)
Resist Temp Coef Thermal Cond Melts (deg C)
Silver (Ag) 0.95 0.0038 4.19 960.5
Copper (Cu) 1.00 0.00393 3.88 1083
Gold (Au) 1.416 0.0034 2.96 1063
Aluminum (Al) 1.64 0.0039 2.03 660
Bronze (Cu+Sn) 2.1 --- --- 1280
Tungsten (W) 3.25 0.0045 1.6 3370
Zinc (Zn) 3.4 0.0037 1.12 419
Brass (Cu+Zn) 3.9 0.002 1.2 920
Nickel (Ni) 5.05 0.0047 0.6 1455
Iron (Fe) 5.6 ~0.005 0.67 1535
Tin (Sn) 6.7 0.0042 0.64 231.9
Chromium (Cr) 7.6 --- --- 2170
Steel (Fe+C) ~10 --- 0.59 1480
Lead (Pb) 12.78 0.0039 0.344 327
Titanium (Ti) 47.8 --- 0.41 1800
Stainless (-->) 52.8 --- 0.163 1410 (Fe+Cr+Ni+C)
Mercury (Hg) 55.6 0.00089 0.063 -38.87
Nichrome (Ni+Cr) 65 0.00017 0.132 1350
Graphite (C) 590 --- --- 3800
Carbon (C) 2900 -0.0005 --- 3500
In a conspiracy, multiple individuals can each contribute a
minor action to accumulate a large effect.
One obvious approach is to use gossip to give the impression that
all right-thinking people are against some one or some thing.
A conspiracy can be difficult to oppose, because a major effect
can be achieved with minor actions that individually do not call
for a major response.
In number theory we say than integer a (exactly) divides
integer b (denoted
In number theory we say that integer a is congruent to
integer b
modulo m, denoted
Used in the analysis of signal processing to develop the response
of a processing system to a complicated real-valued input signal.
The input signal is first separated into some number of discrete
impulses. Then the system response to an impulse
It is apparently possible to compute the convolution of two
sequences by taking the
FFT of each, multiplying these results
term-by-term, then taking the inverse FFT. While there is an
analogous relationship in the
FWT, in this case the "delays" between the
sequences represent
mod 2 distance differences, which may or may
not be useful.
Copyright protects a particular expression, but not the underlying idea, process or function it may perform, which is the province of patent protection. Copyright protects form, not content: Copyright can protect particular text and diagrams, but not the described concept. In general, copyright comes into existence simply by creating a picture or manuscript or making a selection; theoretically, no notice or registration is required. (See the Library of Congress circular "Copyright Basics": http://www.loc.gov/copyright/circs/circ1.html#cr). However, formal registration is required before a lawsuit can be filed, and registration within 3 months of publication supports recovery of statutory damages and attorney fees; otherwise, apparently only actual damages can be recovered. Similarly, no copyright notice is required, but having one like this:
Copyright 1991 Terry Ritter. All Rights Reserved.may avoid an "innocent infringement" defense. Protection currently lasts 70 years beyond the death of the author, or 95 years from date of publication for works for hire. Copyright is not handled by the PTO but instead by the United States Copyright Office (http://lcweb.loc.gov/copyright/) in the Library of Congress.
One way to evaluate a common correlation of two real-valued
sequences is to multiply them together term-by-term and sum all
results.
If we do this for all possible "delays" between the two sequences,
we get a "vector" or
"The correlation coefficient associated with a pair of Boolean functions f(a) and g(a) is denoted byC(f,g) and is given byC(f,g) = 2 * prob(f(a) = g(a)) - 1 ." -- Daemen, J., R. Govaerts and J. Vanderwalle. 1994. Correlation Matrices. Fast Software Encryption. 276. Springer-Verlag.
There are two classes: A local counterexample refutes a
lemma, but not necessarily the
main conjecture.
A global counterexample refutes the main conjecture.
Note that integer counting produces perhaps the best possible
signal for investigating block cipher deficiencies in the rightmost
bits. Accordingly, incrementing by some large random constant, or
using some sort of
LFSR or other polynomial counter which changes
about half its bits on each step may be more appropriate.
CRC error-checking is widely used in practice to check the data recovered from magnetic storage. When data are written to disk, a CRC of the original data is computed and stored along with the data itself. When data are recovered from disk, a new CRC is computed from the recovered data and that result compared to the recovered CRC. If the CRC's do not match, we have a "CRC error."
Computer disk-read operations always have some chance of a "soft error" which does not re-occur when the same sector is re-read, so the usual hardware response is to try again, some number of times. If that does not solve the problem, the error may be reported to the user and could indicate the start of serious disk problems.
A CRC operation is essentially a remainder over the huge numeric
value which is the data; the mod 2 polynomials make this "division"
both faster and simpler than one might expect.
Related techniques like integer or floating point division can have
similar power, but are unlikely to be as simple.
In general, "division" techniques only miss errors which are some
product of the divisor, and so
The CRC result is an excellent (but linear) hash value corresponding to the data. Compared with other hash alternatives, CRC's are simple and straightforward. They are well-understood. They have a strong and complete basis in mathematics, so there can be no surprises. CRC error-detection is mathematically tractable and provable without recourse to unproven assumptions. And CRC hashes do not need padding. None of this is true for most cryptographic hash constructions.
For error-detection, the CRC register is first initialized to some fixed value known at both ends, nowadays typically "all 1's." Then each data element is processed, each of which changes the CRC value. When all of the data have been processed, the CRC result is sent or stored at the end of the data. Frequently the CRC result first will be complemented, so that a CRC of the data and the complemented result will produce a fixed "magic number." This allows efficient hardware error-checking, even when the hardware does not know how large the data block will be in advance. (Typically, the end of transmission, after the CRC, is indicated by a hardware "done" signal.)
Nowadays, CRC's are often computed in software which is generally more efficient with larger data quantities. Thus we see 8-bit, 16-bit or 32-bit data elements being processed. However, CRC's can be computed on individual data bits, and on records of arbitrary bit length, including zero bits, one bit, or any uneven or dynamic number of bits. As a consequence, no padding is ever needed for CRC hashing.
Here is a code snippet for a single-bit left-shift CRC operation:
if (msb(crc) == databit)
crc = crc << 1;
else
crc = (crc << 1) ^ poly;
This fragment needs to execute 8 times to compute the CRC for a
full data byte.
However, a better way to process a byte in software is to
pre-compute a 256-element table representing every possible CRC
change corresponding to a single byte.
The table value is selected by a data byte XORed with the current
top byte of the CRC register (in a left-shift implementation).
In the late 60's and early 70's, the first CRC's were initialized as "all-0's." Then it was noticed that extra or missing 0-bits at the start of the data would not be detected, so it became virtually universal to init the CRC as "all-1's." In this case, extra or missing zeros at the start are detected, and extra or missing ones at the start are detected as well.
It is possible for multiple errors to occur and the CRC result to end up the same as if there were no error. But unless the errors are introduced intentionally, this is very unlikely. Various common errors are detected absolutely, such as:
If we have enough information, it is relatively easy to compute error patterns which will take a CRC value to any desired CRC value. Because of this, data can be changed in ways which will produce the original CRC result. Consequently, no CRC has any appreciable cryptographic strength, but some applications in cryptography need no strength:
On the other hand, a CRC, like most computer hashing operations, is normally used so that we do not have "enough information." When substantially more information is hashed than the CRC can represent, any particular CRC result will be produced by a vast number of different input strings. In this way, even a linear CRC can be considered an irreversible "one way" or "information reducing" transformation. Of course, when a string shorter than the CRC polynomial is hashed, it should not be too difficult to find the one string that could produce any particular CRC result.
The CRC polynomial need not be particularly special. Unlike the generator polynomials used in LFSR's, a CRC poly need not be primitive nor even irreducible. Indeed, the early 16-bit CRC polys were composite with a factor of "11" which is equivalent to the information produced by a parity bit. (Since parity was the main method of error-detection at the time, the "11" factor supported the argument that CRC was better.) However, modern CRC polys generally are primitive, which allows the error detection guarantees to apply over larger amounts of data. It also allows the CRC operation to function as an RNG. But the option exists to use secret random polynomials to detect errors without being as predictable as a standard CRC. Polynomial division does not require mathematical structure (such as an irreducible or primitive), beyond the basic mod 2 operations.
Different CRC implementations can shift left or right, take data lsb or msb first, and be initialized as zeros or ones, each option naturally producing different results. Various CRC standards specify different options. Obviously, both ends must do things the same way, but it is not necessary to conform to a standard to have quality error-detection for a private or new design. Variations in internal handling can make a CRC with one set of options produce the same result as a CRC with other options.
When the logical complement of a CRC result is appended to the
data and processed msb first, the CRC across that data and the result
produces a "magic" value which is a constant for a particular poly and
set of options.
In general, the sequence reverse of a good poly is also a good poly, and
there is some advantage to having CRC polys which are about half 1's.
In some notations we omit the msb which is always 1 (as is the lsb).
For notational convenience, we can write
Name Hex Set Bits
CRC16 8005 16,15,2,0
CCITT 1021 16,12,5,0
CRC24a 800063
CRC24b 800055
CRC24c 861863
CRC32 04c11db7 32,26,23,22,16,
12,11,10,8,7,5,4,2,1,0
SWISS-PROT 64,4,3,1,0
SWISS-PROT Impr 64,63,61,59,58,56,55,52,49,48,
(D. Jones) 47,46,44,41,37,36,34,32,
31,28,26,23,22,19,16,
13,12,10,9,6,4,3,0
Also see: reverse CRC, and
In normal cryptanalysis we start out knowing plaintext, ciphertext, and cipher construction. The only thing left unknown is the key. A practical attack must recover the key. (Or perhaps we just know the ciphertext and the cipher, in which case a real attack would recover plaintext.) Simply finding a distinguisher (showing that the cipher differs from the chosen model) is not, in itself, an attack or break.
Because no theory guarantees strength for any conventional cipher (see, for example, the one time pad and proof), ciphers traditionally have been considered "strong" when they have been used for a long time with "nobody" knowing how to break them easily.
Expecting cipher strength because a cipher is not known to have been broken is the logic fallacy of ad ignorantium: a belief which is claimed to be true because it has not been proven false.
Cryptanalysis seeks to extend this admittedly-flawed process by applying known attack strategies to new ciphers (see heuristic), and by actively seeking new attacks. Unfortunately, real attacks are directed at particular ciphers, and there is no end to different ciphers. Even a successful break is just one more trick from a virtually infinite collection of unknown knowledge.
In cryptanalysis it is normal to assume that at least known-plaintext is available; often, defined-plaintext is assumed. The result is typically some value for the amount of work which will achieve a break (even if that value is impractical); this is the strength of the cipher under a given attack. Different attacks on the same cipher may thus imply different amounts of strength. While cryptanalysis can demonstrate "weakness" for a given level of effort, cryptanalysis cannot prove that there is no simpler attack (see, for example, attack tree and threat model):
Indeed, when ciphers are used for real, the opponents can hardly be expected to advertise a successful break, but will instead work hard to reassure users that their ciphers are still secure. The fact that apparently "nobody" knows how to break a cipher is somewhat less reassuring from this viewpoint. (Also see the discussion: "The Value of Cryptanalysis," locally, or @: http://www.ciphersbyritter.com/NEWS3/MEMO.HTM). For this reason, using a wide variety of different ciphers can make good sense: That reduces the value of the information protected by any particular cipher, which thus reduces the rewards from even a successful attack. Having numerous ciphers also requires the opponents to field far greater resources to identify, analyze, and automate breaking (when possible) of each different cipher. Also see: Shannon's Algebra of Secrecy Systems.
In general, a cipher can be seen as a key-selected set of transformations from plaintext to ciphertext (and vise versa for deciphering). A conventional block cipher takes a block of data or a block value and transforms that into some probably different value. The result is to emulate a huge, keyed substitution table, so, for enciphering, we can write this very simple equation:
E[K][PT] = CT, or E[K,PT] = CT
where PT = plaintext block value, K = Key, and
CT = ciphertext block value.
The brackets "[ ]" mean the operation of indexing: the selection of
a particular position in an array and returning that element value.
Here,
D[K][CT] = PT, or D[K,PT] = CT
where D[K] represents an
inverse or decryption table.
But an attacker does not know and thus must somehow develop the
decryption table.
We assume that an opponent has collected quite a lot of information, including lots of plaintext and the associated ciphertext (a condition we call known plaintext). The opponent also has a copy of the cipher and can easily compute every enciphering or deciphering transformation. What the opponent does not have, and what he is presumably looking for, is the key. The key would expose the myriad of other ciphertext block values for which the opponent has no associated plaintext.
We might imagine the opponent attacking a cipher with a deciphering machine having a huge "channel-selector" dial to select a key value. As one turns the key-selector, each different key produces a different deciphering result on the display. So all the opponent really has to do is to turn the key dial until the plaintext message appears. Given this extraordinarily simple attack (known as brute force), how can any cipher be considered secure?
In a real cipher, we make the key dial very, very, very big! The keyspace of a real cipher is much too big, in fact, for anyone to try each key in a reasonable amount of time, even with massively-parallel custom hardware. That leaves the opponent with a problem: brute force does not work.
Nevertheless, the cipher equation seems exceedingly simple. There is one particular huge emulated table as selected by the key, and the opponent has a sizable set of positions and values from that table. Moreover, all the known and unknown entries are created by exactly the same mechanism and key. So, if the opponent can in some way relate the known entries to the rest of the table, thus predicting unknown entries, the cipher may be broken. Or if the opponent can somehow relate known plaintexts to the key value, thus predicting the key, the key may be exposed. And with the key, ciphertext for which there is no corresponding plaintext can be exposed, thus breaking the cipher. Finding these relationships is where the cleverness of the individual comes in. In a real sense, a cipher is a puzzle, and we currently cannot guarantee that there is no particular "easy" way for a smart team to solve it.
One peculiarity of conventional block ciphers is that they cannot emulate all possible tables, but instead only a tiny, tiny fraction thereof (see block cipher). Even what we consider a huge key simply cannot select from among all possible tables because there are far too many. Now, the "tiny fraction" of tables actually emulated is still too many to traverse (this is a "large enough" keyspace), but, clearly, some special selection is happening which might be exploited. Having even one particular value at one known table position is sufficiently special that we expect that only one key would produce that particular relationship in a conventional cipher. So, in practice, just one known-plaintext pair generally should be sufficient to identify the correct key, if only we could find some way to do it.
In academic cryptanalysis we normally assume that we do not know the key, but do know the cipher and everything about it. We also assume essentially unlimited amounts of known plaintext to use in an attack to find the key. In practice things are considerably different.
In practical cryptanalysis we may not know which cipher has been used. The cipher may not ever have been published, or may have been modified from the base version in various ways. Even a cipher we basically know may have been used in a way which will disguise it from us, for example:
Selecting among different ciphers is part of Shannon's 1949 Algebra of Secrecy Systems. In a modern computer implementation, we could select ciphers dynamically. The number of selectable transformations increases exponentially when several ciphers are used in sequence (multiple encryption). Considering Shannon's academic work in this area, the use of well-known standardized designs is, ironically and rather sadly, current cryptographic orthodoxy (see risk analysis).
The general mathematical model of ciphering is that of a keyed transformation (a mapping or function). Numerically, we can make the general model work for a system of multiple ciphers by allowing some "key" bits to select the cipher, with the rest of the key bits going to key that cipher. But in the adapted model, different parts of the key will have vastly different difficulties for the opponent. Finding the correct key within a cipher may be hard, yet could be much, much easier than finding the exact cipher actually being used. Differences in the difficulty of finding different key bits are simply glossed over in the adapted general model.
Somehow obtaining and breaking every cipher which possibly could have been used is a vastly larger problem than the relatively small increase in keyspace indicated by the number of possible ciphers. For example, if we think we have found the key and want to check it, on a known cipher that has essentially no cost and may take a microsecond. But if we want to check the key on an unknown cipher, we first have to obtain that cipher. That may require the massive ongoing cost of maintaining an intelligence field service to obtain copies of secret ciphers. Once the needed cipher is obtained, finding a practical break may take experts weeks or months, if a break is even found. Taken together, this is a vast increase in difficulty for the opponent per cipher choice compared to the difficulty per key choice within a single cipher.
Just as it may be impossible to try every 128-bit key at even a nanosecond apiece, it also may be impossible to keep up with a far smaller but continuing flow of new secret ciphers which take hundreds of billions of times longer to handle. This advantage seems to be exploited by NSA in keeping cipher designs secret (also see security through obscurity). Given the stark contrast of yet another real example which contradicts the current cryptographic wisdom, crypto academics continue to insist that standardizing and exposing the cipher design makes sense. Surely, exposing a cipher does support gratuitous analysis and help to expose some cipher weakness, but does not, in the end, give us a proven strong cipher. In the end, exposing the cipher may turn out to benefit opponents far more than users.
In practice, an individual attacker mainly must hope that the cipher to be broken is flawed. An attacker can collect ciphertext statistics and hope for some irregularity, some imbalance or statistical bias that will identify the cipher class, or maybe even a well-known design. An attacker can make plaintext assumptions and see if some key will produce those words. But enciphering guessed plaintext seems an unlikely path to success when every possible cipher, and every possible modification of that cipher, is the potential encryption source. All this is a very difficult problem, and far different than the normal academic analysis.
Many academic attacks are essentially theoretical, involving huge amounts of data and computation. But even when a direct technical attack is practical, that may be the most difficult, expensive and time-consuming way to obtain the desired information. Other methods include making a paper copy, stealing a copy, bribery, coercion, and electromagnetic monitoring. No cipher can keep secret something which has been otherwise revealed. Information security thus involves far more than just cryptography, and a cryptographic system is more than just a cipher (see: cipher system). Even finding that information has been revealed does not mean that a cipher has been broken, although good security virtually requires that assumption. (Of course, when we can use only one cipher, we cannot change ciphers anyway.)
Unfortunately, we have no way to know how strong a cipher appears to our opponents. Even though the entire reason for using cryptography is a belief that our cipher has sufficient strength, science provides no basis for such belief. At most, cryptanalysis can give us only an upper limit to the strength of a cipher, which is not particularly helpful, and can only do that when a cipher actually can be broken. But when a cipher is not broken, cryptanalysis has told us nothing about the strength of a cipher, and unbroken ciphers are the only ones we use.
The ultimate goal of cryptanalysis is not to break every possible cipher (that would be the end of an industry and also the end of new PhD's in the field). Instead, the obvious goal is understanding why some ciphers are weak, and why other ciphers seem strong. It is not much of a leap from that to expect cryptanalysts to work with, or at least interact with, cipher designers, with a common goal of producing better ciphers.
Unfortunately, cryptanalysis is ultimately limited by what can
be done: there are no ciphering techniques which guarantee strength,
and there is no test which tells us how weak an arbitrary cipher
really is.
Accordingly, exposing a particular weakness in a particular cipher
may be about as much as cryptanalysis can offer, even if that means
a deafening silence about similar designs, ciphers which
have been repaired, or significant cipher designs which remain
both unbroken and undiscussed.
Apparent agreement among academics does not imply a lack of academic controversy, since many will side with the conventional wisdom, while others step back to consider the arguments. Since reality is not subject to majority rule, even universal academic agreement would not constitute a scientific argument, which instead requires facts and exposed logical reasoning. Controversy may even imply that academic cryptographers are unaware of the issue, or have not really considered it in a deep way. For if clear, understandable and believable explanations already existed, there would be little room for debate. Controversy arises when the given explanations are false, or obscure, or unsatisfactory.
Scientific controversy is less about conflict than exposing Truth. That happens by doing research, then taking a stand and supporting it with facts and scientific argument. Many of these issues should have indisputable answers or expose previously ignored consequences. Wishy-washy statements like "some people think this, some think that," not only fail to inform, but also fail to frame a discussion to expose the real answer.
One aspect of science is the creation of quantitative models which describe or predict reality. Since poor models lead to errors in reasoning, a science reacts to poor predictions by improving the models. In contrast, cryptography reacts by making excuses about why the model really is right after all, or does not apply, or does not matter. Examples include:
In my view, cryptography has presided over a fundamental breakdown in logic, perhaps created by awe of supposedly superior mathematical theory. Upon detailed examination, however, theoretical math often turns out to be inapplicable to the case at hand. Practical results which conflict with theory are ignored or dismissed, even though confronting reality is how science improves models. Demanding belief in conventional cryptographic wisdom requires people to think in ways which accept logical falsehood as truth, and then they apply that lesson. Reasoning errors have become widespread, accepted, and prototypes for future thought. Disputes in cryptography are commonly argued with logic fallacies, and may be "won" with arguments that have no force at all. Since experimental results are rare in cryptography, we cannot afford to lose reason, because that is almost all we have.
A major logical flaw in conventional cryptography is the belief that one good cipher is enough.
But since cipher strength occurs only in the context of our opponents, how could we ever know that we have a "good" cipher, or how "good" it is?
In particular:
So if we cannot measure "good," and cannot prove "good," then exactly how do we know our ciphers are "good?" The answer, of course, is that we do not and can not know any such thing. In fact, nobody on our side can know that our ciphers are "good," no matter how well educated, experienced or smart they may be, because that is determined by our opponents in secret. Anyone who feels otherwise should try to put together what they see as a scientific argument to prove their point.
When conventional cryptography accepts a U.S. Government standard cipher as "good" enough, there is no real need:
Conventional cryptography encourages a belief in known cipher strength, thus ignoring both logic and the lessons of the past. That places us all at risk of cipher failure, which probably would give no indication to the user and so could be happening right now. That we have no indication of any such thing is not particularly comforting, since that is exactly what our opponents would want to portray, even as they expose our information.
When an ordinary person makes a claim, they can be honestly wrong. But when a trained expert in the field makes a claim that we know cannot be supported, and continues to make such claims, we are pretty well forced into seeing that as either professional incompetence or deliberate deceit. Encouraging people to use only one cipher by claiming they need nothing else is exactly what one would expect from an opponent who knows how to break the cipher. Maybe that is just coincidence.
Cryptographic controversies include:
In my view, cryptography often does not understand or attempt to address controversial issues in a scientific way. In areas where cryptography cannot distinguish between truth and falsehood, it cannot advance.
If anyone has any other suggestions for this list, please let
me know.
Uses might include:
Requirements for such an RNG will vary depending upon use, but might include:
We normally assume that the opponents have a substantial amount of known plaintext to use in their work (see cryptanalysis). So the situation for the opponents involves taking what is known and trying to extrapolate or predict what is not known. That is similar to building a scientific model intended to predict larger reality on the basis of many fewer experiments. Since the whole idea is to make prediction difficult for the opponent, unpredictability can be called the essence of cryptography.
Cryptography is a part of cryptology, and is further divided into secret codes versus ciphers. As opposed to steganography, which seeks to hide the existence of a message, cryptography seeks to render a message unintelligible even when the message is completely exposed.
Cryptography includes at least:
In practice, cryptography should be seen as a system or game which includes both users and opponents: True scientific measures of strength do not exist when a cipher has not been broken, so users can only hope for their cipher systems to protect their messages. But opponents may benefit greatly if users can be convinced to adopt ciphers which opponents can break. Opponents are thus strongly motivated to get users to believe in the strength of a few weak ciphers. Because of this, deception, misdirection, propaganda and conspiracy are inherent in the practice of cryptography. (Also see trust and risk analysis.)
And then, of course, we have the natural response to these negative possibilities, including individual paranoia and cynicism. We see the consequences of not being able to test cipher security in the arrogance and aggression of some newbies. Even healthy users can become frustrated and fatalistic when they understand cryptographic reality. Cryptography contains a full sea of unhealthy psychological states.
For some reason (such as the lack of direct academic statements on the issue), some networking people who use and depend upon cryptography every day seem to have a slightly skewed idea about what cryptography can do. While they seem willing to believe that ciphers might be broken, they assume such a thing could only happen at some great effort. Apparently they believe the situation has been somehow assured by academic testing. But that belief is false.
Ciphers are like puzzles, and while some ways to solve the puzzle may be hard, other ways may be easy. Moreover, once an easy way is found, that can be put into a program and copied to every "script kiddie" around. The hope that every attacker would have to invest major effort to find their own fast break is just wishful thinking. And even as their messages are being exposed, the users probably will think everything is fine, just like we think right now. Cipher failure could be happening to us right now, because there will be no indication when failure occurs.
What are the chances of cipher failure? We cannot know! Ciphers are in that way different from nearly every other constructed object. Normally, when we design and build something, we measure it to see that it works, and how well. But with ciphers, we cannot measure how well our ciphers resist the efforts of our opponents. Since we have no way to judge effectiveness, we also cannot judge risk. Thus, we simply have no way to compare whether the cipher design is more likely to be weak than the user, or the environment, or something else. As sad as this situation may seem, it is what we have.
When compared to the alternative of blissful ignorance, it should be a great advantage to know that ciphers cannot be depended upon. First, design steps could be taken to improve things (although that would seem to require a widespread new understanding of the situation that has always existed). Next, we note that ciphers can at most reveal only what they try to protect: When protected information is not disturbing, or dangerous, or complete, or perhaps not even true, exposure becomes much less of an issue.
Modern cryptography generally depends upon translating a message into one of an astronomical number of different intermediate representations, or ciphertexts, as selected by a key. If all possible intermediate representations have similar appearance, it may be necessary to try all possible keys (a brute force attack) to find the key which deciphers the message. By creating mechanisms with an astronomical number of keys, we can make this approach impractical.
Keying is the essence of modern cryptography. It is not possible to have a strong cipher without keys, because it is the uncertainty about the key which creates the "needle in a haystack" situation which is conventional strength. (A different approach to strength is to make every message equally possible, see: Ideal Secrecy.)
Nor is it possible to choose a key and then reasonably expect to use that same key forever. In cryptanalysis, it is normal to talk about hundreds of years of computation and vast effort spent attacking a cipher, but similar effort may be applied to obtaining the key. Even one forgetful moment is sufficient to expose a key to such effort. And when there is only one key, exposing that key also exposes all the messages that key has protected in the past, and all messages it will protect in the future. Only the selection and use of a new key terminates insecurity due to key exposure. Only the frequent use of new keys makes it possible to expose a key and not also lose all the information ever protected.
Cryptography is not an engineering science: It is not possible to know when cryptography is "working," nor how close to not-working it may be:
Cryptography may also be seen as a zero-sum game, where a
cryptographer competes against a
cryptanalyst. We might call
this the cryptography war.
Note that the successful cryptanalyst must keep good attacks secret, or the opposing cryptographer will just produce a stronger cipher. This means that the cryptographer is in the odd position of never knowing whether his or her best cipher designs are successful, or which side is winning.
Cryptographers are often scientists who are trained to ignore unsubstantiated claims. But the field of cryptography often turns the scientific method on its head, because almost never is there a complete proof of cryptographic strength in practice. In cryptography, scientists accept the failure to break a cipher as an indication of strength (that is the ad ignorantium fallacy), and then demand substantiation for claims of weakness. But there will be no substantiation when a cipher system is attacked and broken for real, while continued use will endanger all messages so "protected." Evidently, the conventional scientific approach of requiring substantiation for claims is not particularly helpful for users of cryptography.
Since the scientific approach does not provide the assurance of cryptographic strength that users want and need, alternative measures become appropriate:
Sometimes also said to include:
It is especially important to consider the effect the underlying
equipment has on the design.
Even apparently innocuous operating system functions, such
as the multitasking "swap file," can capture supposedly secure
information, and make that available for the asking.
Since ordinary disk operations generally do not even attempt to
overwrite data on disk, but instead simply make that storage
free for use, supposedly deleted data is, again, free for the
asking.
A modern cryptosystem will at least try to address such issues.
Quartz is a piezoelectric material, so a voltage across the terminals forces the quartz wafer to bend slightly, thus storing mechanical energy in physical tension and compression of the solid quartz. The physical mass and elasticity of quartz cause the wafer to mechanically resonate at a natural frequency depending on the size and shape of the quartz blank. The crystal will thus "ring" when the electrical force is released. The ringing will create a small sine wave voltage across electrical contacts touching the crystal, a voltage which can be amplified and fed back into the crystal, to keep the ringing going as oscillation.
Crystals are typically used to make exceptionally stable electronic oscillators (such as the clock oscillators widely used in digital electronics) and the relatively narrow frequency filters often used in radio.
It is normally necessary to physically grind a crystal blank to the desired frequency. While this can be automated, the accuracy of the resulting frequency depends upon the effort spent in exact grinding, so "more accurate" is generally "more expensive."
Frequency stability over temperature depends upon slicing the
original crystal at precisely the right angle.
Temperature-compensated crystal oscillators (TCXO's) improve
temperature stability by using other components which vary with
temperature to correct for crystal changes.
More stability is available in oven-controlled crystal oscillators
(OCXO's), which heat the crystal and so keep it at a precise
temperature despite ambient temperature changes.
Sometimes suggested in cryptography as the basis for a TRNG, typically based on phase noise or frequency variations. But a crystal oscillator is deliberately designed for high frequency stability; it is thus the worst possible type of oscillator from which to obtain and exploit frequency variations. And crystal oscillator phase noise (which we see as edge jitter) is typically tiny and must be detected on a cycle-by-cycle basis, because it does not accumulate. Detecting a variation of, say, a few picoseconds in each 100nSec period of a typical 10 MHz oscillator is not something we do on an ordinary computer.
Another common approach to a crystal oscillator TRNG is to
XOR
many such oscillators, thus getting a complex high-speed waveform.
(The resulting
digital
signal rate increases as the sum of all the oscillators.)
Unfortunately, the high-speed and asynchronous nature of the wave
means that
setup and
hold
times cannot be guaranteed to latch that data for subsequent use.
(Latching is inherent in, say, reading a value from a computer
input port.)
That leads to statistical bias and possible
metastable
operation.
Futher, the construction is essentially
linear
and may power up similarly each time it is turned on.
Current is analogous to the amount of water flow, as opposed
to pressure or
voltage.
A flowing electrical current will create a
magnetic field around the
conductor.
A changing electrical current may create an
electromagnetic field.
In some
RNG constructions, (e.g.,
BB&S and the
Additive RNG)
the system consists of multiple independent cycles, possibly of
differing lengths.
Since having a cycle of a guaranteed length is one of the main
requirements for an RNG, the possibility that a short cycle may
exist and be selected for use can be disturbing.
Sometimes people claim that they have a method to compress a file, and that they can compress it again and again, until it is only a byte long. Unfortunately, it is impossible to compress all possible files down to a single byte each, because a byte can only select 256 different results. And while each byte value might represent a whole file of data, only 256 such files could be selected or indicated.
Normally, compression is measured as the percentage size reduction; 60 percent is a good compression for ordinary text.
In general, compression occurs by representing the most-common data values or sequences as short code values, leaving longer code values for less-common sequences. Understanding which values or sequences are more common is the "model" of the source data. When the model is wrong, and supposedly less-common values actually occur more often, that same compression may actually expand the data.
Data compression is either "lossy," in which some of the information is lost, or "lossless" in which all of the original information can be completely recovered. Lossy data compression can achieve far greater compression, and is often satisfactory for audio or video information (which are both large and may not need exact reproduction). Lossless data compression must be used for binary data such as computer programs, and probably is required for most cryptographic uses.
Compressing plaintext data has the advantage of reducing the size of the plaintext, and, thus, the ciphertext as well. Further, data compression tends to remove known characteristics from the plaintext, leaving a compressed result which is more random. Data compression can simultaneously expand the unicity distance and reduce the amount of ciphertext available which must exceed that distance to support attack. Unfortunately, that advantage may be most useful with fairly short messages. Also see: Ideal Secrecy.
One goal of cryptographic data compression would seem to be minimize the statistical structure of the plaintext. Since such structure is a major part of cryptanalysis, that would seem to be a major advantage. However, we also assume that our opponents are familiar with our cryptosystem, and they can use the same decompression we use. So the opponents get to see the structure of the original plaintext simply by decompressing any trial decryption they have. And if the decompressor cannot handle every possible input value, it could actually assist the opponent by identifying wrong decryptions.
When using data compression with encryption, one pitfall is that many compression schemes add recognizable data to the compressed result. Then, when that compressed result is encrypted, the "recognizable data" represents known plaintext, even when only the ciphertext is available. Having some guaranteed known plaintext for every message could be a very significant advantage for opponents, and unwise cryptosystem design.
It is normally impossible to compress random-like ciphertext. However, some cipher designs do produce ciphertext with a restricted alphabet which can of course be compressed. Also see entropy.
Another possibility is to have a data decompressor that can take any random value to some sort of grammatical source text. That may be what is sometimes referred to as bijective compression. Typically, a random value would decompress into a sort of nonsensical "word salad" source text. However, the statistics of the resulting "word salad" could be very similar to the statistics of a correct message. That could make it difficult to computationally distinguish between the "word salad" and the correct message. If "bijective compression" imposes an attack requirement for human intervention to select the correct choice, that might complicate attacks by many orders of magnitude. The problem, of course, is the need to devise a compression scheme that decompresses random values into something grammatically similar to the expected plaintext. That typically requires a very extensive statistical model, and of course at best only applies to a particular class of plaintext message.
An extension of the "bijective" approach would be to add random
data to compressed text.
Obviously, there would have to be some way to delimit or otherwise
distinguish the plaintext from the added data, but that may be
part of the compression scheme anyway.
More importantly, the random data probably would have to be added
between the compressed text in some sort of keyed way, so that it
could not easily be identified and extracted.
The keying requirement would make this a form of encryption.
The result would be a
homophonic encryption, in that the
original plaintext would have many different compressed
representations, as selected by the added random data.
Having many different but equivalent representations allows the same
message to be sent multiple times, each time producing a different
encrypted result.
But it is also potentially dangerous, in that the compressed message
expands by the amount of the random data, which then may represent
a hidden channel.
Since, for encryption purposes, any random data value is as good
as another, that data could convey information about the key and
the user would never know.
Of course, the same risk occurs in
message keys or, indeed, almost any
nonce.
Most
electronic devices require DC
Contrary to naive expectations, a complex system almost never performs as desired when first realized. Both hardware and software system design environments generally deal with systems which are not working. When a system really works, the design and development process is generally over.
Debugging involves identifying problems, analyzing the source of those problems, then changing the construction to fix the problem. (Hopefully, the fix will not itself create new problems.) This form of interactive analysis can be especially difficult because the realized design may not actually be what is described in the schematics, flow-charts, or other working documents: To some extent the real system is unknown.
The most important part of debugging is to understand in great detail exactly what the system is supposed to do. In hardware debugging, it is common to repeatedly reset the system, and start a known sequence of events which causes a failure. Then, if one really does know the system, one can probe at various points and times and eventually track down the earliest point where the implemented system diverges from the design intent. The thing that causes the divergence is the bug. Actually doing this generally is harder than it sounds.
Software debugging is greatly aided by a design and implementation process that decomposes complex tasks into small, testable procedures or modules, and then actually testing those procedures. Of course, sometimes the larger system fails anyway, in which case the procedure tests were insufficient, but they can be changed and the fixed procedure re-tested. Sometimes the hardest part of the debugging is to find some set of conditions that cause the problem. Once we have that, we can repeatedly run through the code until we find the place where the expected things do not occur.
Debugging real-time software is compounded by the difficulty of knowing the actual sequence of operations. This frequently differs from our ideal model of multiple completely independent software processes.
Two of the better books on the debug process include:
Agans, D. 2002. Debugging. The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems. AMACOM.
Telles, M. and Y. Hsieh. 2001. The Science of Debugging. Coriolis.
Poster graphics with the 9 rules can be found at: http://www.debuggingrules.com/, but the shorthand form may seem somewhat arcane unless one already knows the intent. The best thing is to read the book. Failing that, however, I offer my interpretation, plus a few insights:
Each stage in a typical radio circuit might use 100 ohm resistors from power, and then a bypass capacitor of perhaps 0.01uF to circuit ground. The RF signals in one stage are attenuated at the common power bus by the frequency-sensitive voltage divider or filter consisting of the resistor and the power supply capacitors. Any RF signals on the power supply bus are attenuated again by the resistor to the next stage and that bypass capacitor.
Since a stage current of 20mA could cause a 2V drop across the
100 ohm resistor, smaller resistors, or even chokes using ferrite
beads, might be used instead.
When voltages or currents are measured, power changes as the square of these values, so a decibel is twenty times the base-10 logarithm of the ratio of two voltages or currents.
For example, suppose we have an active filter with a voltage gain of 1 at 425Hz and a gain of 6 at 2550Hz:
amplitude change in dB = 20 log10 (e2/e1) = 20 log10 6 = 15.56 dB
Supposedly the name of the ancient Roman practice of lining up
soldiers who had run from battle, and selecting every tenth one
to be killed by those not selected.
A defined plaintext attack typically needs to send a particular plaintext value to the internal cipher (thus "knowing" that value), and get the resulting ciphertext. Typically, a large amount of plaintext is needed under a single key. A cipher system which prevents any one of the necessary conditions also stops the corresponding attacks.
Many defined plaintext attacks are interactive, and so require the ability to choose subsequent plaintext based on previous results, all under one key. It is relatively easy to prevent interactive attacks by having a message key facility, changing message keys on each message, and by handling only complete messages instead of a continuing flow or stream that an opponent can modify interactively.
Known plaintext (and, thus, defined plaintext) attacks can be opposed by:
Most modern cipher systems will use a message key (or some
similar facility), making defined-plaintext attacks generally more
academic than realistic (also see
break).
If we choose two values completely independently, we have a DF of 2. But if we must choose two values such that the second is twice the first, we can choose only the first value independently. Imposing a relationship on one of the sampled value means that we will have a DF of one less than the number of samples, even though we may end up with apparently similar sample values.
In a typical
goodness of fit test such as
chi-square, the reference
distribution (the expected counts) is
normalized to give the same number of counts as the experiment. This
is a constraint, so if we have N bins, we will have a DF of
NOT( A OR B) = NOT(A) AND NOT(B) NOT( A AND B ) = NOT(A) OR NOT(B)so
A OR B = NOT( NOT(A) AND NOT(B) ) A AND B = NOT( NOT(A) OR NOT(B) )Also see: Logic Function.
Normally, an N-type semiconductor has no net electrical charge, but does have an excess of electrons which are not in a stable bond. Similarly, on its own, P-type semiconductor also has no net charge, but has a surplus of holes or bond-positions which have no electrons. When the two materials occur in the same crystal lattice, there are opposing forces:
For diode conduction to occur, sufficient potential must be
applied so that the depletion field effect is overwhelmed.
The depletion field (or "junction voltage" or "barrier voltage")
is typically about 0.6V in silicon.
DES is a
The mechanics of DES are widely available elsewhere. Here I note how one particular issue common to modern block ciphers is reflected in DES. A common academic model for conventional block ciphers is a "family of permutations." The issue is the size of the implemented keyspace compared to the size of the potential keyspace for blocks of a given size.
For
The obvious conclusion is that almost none of the keyspace
implicit in the model is actually implemented in DES, and that is
consistent with other modern block cipher designs.
While that does not make modern ciphers weak, it is a little disturbing.
See more detailed comments under
AES.
Conventional block ciphers are completely deterministic, unless they include a random homophonic feature. Computational random number generators also are deterministic (again, see: finite state machine). Some specially-designed physical devices produce nondeterministic sequences (see: really random).
The theoretical concepts of deterministic finite automata (DFA) are usually discussed as an introduction to lexical analysis and language grammars, as used in compiler construction. In that model, computation occurs in a finite state machine, and the computation sequence is modeled as a network, where each node represents a particular state value and there is exactly one arc from each node to one other node. Normal program execution steps from the initial node to another node, then another, step by step, until the terminal node is reached. In general, the model corresponds well both to software sequential instruction execution, and to synchronous digital hardware operations controlled by hardware clock cycles.
Within this theory, however, at least two different definitions for "nondeterministic" are used:
In cryptography, we are usually interested in "nondeterministic" behavior to the extent that it is unpredictable. One example of cryptographic nondeterministic behavior would be a really random sequence generator. Programs, on the other hand, including most random number generators, are almost always deterministic. Even programs which use random values to select execution paths generally get those values from deterministic statistical RNG's, making the overall computation cryptographically predictable.
One issue of cryptographic determinism is the question of whether user input or mechanical latencies can make a program "nondeterministic." To the extent that user input occurs at a completely arbitrary time, that should represent some amount of uncertainty or entropy. In reality, though, it may be that certain delays are more likely than others, thus making the uncertainty less than it might seem. Subsequent program steps based on the user input cannot increase the uncertainty, being simply the expected result of a particular input.
If hardware device values or timing occur in a completely unpredictable manner, that should produce some amount of uncertainty. But computing hardware generally is either completely exposed or strongly predictable. For example, disk drives can appear to have some access-time uncertainty based on the prior position of the read arm and rotational angle of the disk itself. But if the arm and disk position information is known, there is relatively little uncertainty about when the desired track will be reached and the desired sector read.
If disk position state was actually unknowable, we could have a source of cryptographic uncertainty. But disk position is not unknowable, and indeed is fairly well defined after just a single read request. Subsequent operations might be largely predictable. Normally we do not consider disk position state, or other computer hardware state, to be sufficiently protected to be the source of our security. In the best possible case, disk position state might be hidden to most opponents, but that hope is probably not enough for us to assign cryptographic uncertainty to those values.
Similar arguments pertain to most automatic sources of supposed
computer uncertainty. Also see
really random.
More generally, the idea that results unimaginable to any
normal person or impossible for any normal design are actually
from a god inside and not the machine itself.
A true dichotomy adds tremendous power to
analysis by identifying particular
effects with particular categories.
Finding one such effect thus identifies a category, which then
predicts the rest of the effects of that category and the lack of
the effects of the opposing category.
And when only two categories exist, even seeing the lack
of the effects from one category necessarily implies the other.
The basic idea of Differential Cryptanalysis is to first cipher some plaintext, then make particular changes in that plaintext and cipher it again. Particular ciphertext differences occur more frequently with some key values than others, so when those differences occur, particular keys are (weakly) indicated. With huge numbers of tests, false indications will be distributed randomly, but true indications always point at the same key values and so will eventually rise above the noise to indicate some part of the key.
The technique typically depends on exploiting various assumptions:
Typically, differential weakness can be eliminated at cipher
design time by the use of
keyed
Typically, differential attacks can be prevented at the cipher system level, by:
Also see
Normally we speak of data diffusion, in which changing a tiny part of the plaintext data may affect the whole ciphertext. But we can also speak of key diffusion, in which changing even a tiny part of the key should change each bit in the ciphertext with probability 0.5.
Perhaps the best diffusing component is substitution, but this diffuses only within a single substituted value. Substitution-permutation ciphers extend diffusion beyond a single value by moving the bits of each substituted element to other elements, substituting again, and repeating. But this only provides guaranteed diffusion if particular substitution tables are constructed.
Another alternative for extending diffusion to other elements
is to use some sort of
Balanced Block Mixing.
BBM constructions inherently have guaranteed diffusion which I call
ideal mixing.
Still another alternative is a
Variable Size Block Cipher
construction. Also see
overall diffusion.
In practice, real diodes do allow some "leakage" current in the reverse direction, and leakage approximately doubles for every 10degC increase in temperature.
Semiconductor junctions also have a "forward" voltage (typically 0.6V in silicon, or 0.3V in germanium and for Schottky devices in silicon) which must be exceeded for conduction to occur. This bias voltage is basically due to the semiconductor depletion region. The forward voltage has a negative temperature coefficient of about -2.5mV/degC, in either silicon or germanium.
Semiconductor junctions also have a dynamic resistance (for small signals) that varies inversely with current:
r = 25.86 / I where: r = dynamic resistance in ohms I = junction current (mA)
As a result of the forward voltage and internal resistance, power is dissipated as heat when current flows. Diodes are made in a range from tiny and fast signal devices through large and slow high power devices packaged to remove internal heat. There is a wide range of constructions for particular properties, including photosensitive and light-emitting diodes (LED's).
All real diodes have a reverse "breakdown" voltage, above which massive reverse conduction can occur. (See avalanche multiplication and Zener breakdown.) Real devices also have current, power, and temperature limitations which can easily be exceeded, a common result being a wisp of smoke and a short-circuit connection where we once had a diode. However, if the current is otherwise limited, diode breakdown can be exploited as a way to "regulate" voltage (although IC regulator designs generally use internal "bandgap" references for better performance).
All diodes break down, but those specifically designed to do so
at particular low voltages are called Zener diodes (even if
they mainly use avalanche multiplication).
A semiconductor junction in reverse voltage breakdown typically
generates good-quality
noise which can be
amplified and exploited for use
(or which must be filtered out).
A bipolar
transistor base-emitter junction can be
used instead of a Zener or other diode.
Another noise generation alternative is to use an IC which has a
documented and useful noise spectrum, such as the "IC Zener" LM336
(see some noise circuits
locally, or @:
http://www.ciphersbyritter.com/NOISE/NOISRC.HTM).
Also see
avalanche multiplication and
Zener breakdown.
A distinguisher makes a contribution to cryptanalysis by showing that a model does not work for the particular cipher. The problem comes in properly understanding what it means to not model the reality under test. In many cases, a successful distinguisher is presented as a successful attack, with the stated or implied result being that the cipher is broken. But that goes beyond what is known. When a scientific model is shown to not apply to reality, that does not make reality "wrong." It just means that the tested model is not useful. Maybe the best implication is that a new model is needed.
Distinguishers generally come under the heading of "computational indistinguishability," and much of this activity occurs in the area of conventional block ciphers. One of the problems of that area is that many cryptographers interpret "block cipher" as an emulated, huge simple substitution (that is, a key-selected pseudorandom permutation). But it is entirely possible for ciphers to work on blocks yet not fit that model. Clearly, if a cipher does not really function in that way, it may be possible to find a distinguisher to prove it.
The real issue in cipher design is strength. The problem is that we have no general measure to give us the strength of an abstract cipher. But a distinguisher provides testimony about conformance to model, not strength. A distinguisher simply does not testify about weakness.
In normal
cryptanalysis we start out knowing
plaintext,
ciphertext, and
cipher construction.
The only thing left unknown is the
key. A practical
attack must recover the key.
If it does not, it is not a real attack after all.
If we have a discrete distribution, with a finite number of possible result values, we can speak of "frequency" and "probability" distributions: The "frequency distribution" is the expected number of occurrences for each possible value, in a particular sample size. The "probability distribution" is the probability of getting each value, normalized to a probability of 1.0 over the sum of all possible values.
Here is a graph of a typical "discrete probability distribution" or "discrete probability density function," which displays the probability of getting a particular statistic value for the case "nothing unusual found":
0.1| ***
| * * Y = Probability of X
Y | ** ** y = P(x)
| **** ****
0.0 -------------------
X
Unfortunately, it is not really possible to think in the same way about continuous distributions: Since continuous distributions have an infinite number of possible values, the probability of getting any particular value is zero. For continuous distributions, we instead talk about the probability of getting a value in some subrange of the overall distribution. We are often concerned with the probability of getting a particular value or below, or the probability of a particular value or above.
Here is a graph of the related "cumulative probability distribution" or "cumulative distribution function" (c.d.f.) for the case "nothing unusual found":
1.0| ******
| ** Y = Probability (0.0 to 1.0) of finding
Y | * a value which is x or less
| **
0.0 -******------------
X
The c.d.f. is just the sum of all probabilities for a given value
or less. This is the usual sort of function used to interpret a
statistic: Given some result, we can
look up the probability of a lesser value (normally called
p) or a greater value (called
Usually, a test statistic is designed so that extreme values are not likely to occur by chance in the case "nothing unusual found" which is the null hypothesis. So if we do find extreme values, we have a strong argument that the results were not due simply to random sampling or other random effects, and may choose to reject the null hypothesis and thus accept the alternative hypothesis.
Usually the ideal distribution in cryptography is "flat" or uniform. Common discrete distributions include:
A common continuous distribution is: Also see the Ciphers By Ritter / JavaScript computation pages (locally, or @: http://www.ciphersbyritter.com/index.html#JavaScript).Also see:
associative and
commutative.
This is a particular danger in cryptosystems, since most ciphers
are built from less-complex parts. Indeed, a major role of
cryptographic design is to combine small
component parts into a larger complex
system which cannot be split apart.
Typically, shuffling permutes the contents of a substitution table or block as a result of a sequence from a keyed random number generator (RNG). Essentially, shuffling allows us to efficiently key-select an arbitrary permutation from among all possible permutations, which is the ideal sort of balanced selection.
If shuffling is implemented so the shuffling sequence is used as efficiently as possible, simply knowing the resulting permutation should suffice to reconstruct the shuffling sequence, which is the first step toward attacking the RNG. While common shuffle implementations do discard some of the sequence, we can guarantee to use at least twice as much information as the table or block can represent simply by shuffling twice. Double-shuffling will not produce any more permutations, but it should prevent the mere contents of a permuted table or block from being sufficient to reconstruct the original shuffling sequence.
In a sense, double-shuffling is a sort of one-way information valve which produces a key-selected permutation, and also hides the shuffling sequence which made the selection.
Also see
Dynamic Transposition.
One way to have a dynamic key in a block cipher is to include the key value along with the plaintext data (also see block code). But this is normally practical only with blocks of huge size, or variable size blocks. (See huge block cipher advantages.)
Another way to have a dynamic key in a block cipher is to add a
confusion
layer which mixes the key value with the
block. For example, exclusive-OR could be used to mix a 64-bit
key with a 64-bit data block.
The main goal of Dynamic Substitution is to provide a stream cipher combiner with reduced vulnerability to known-plaintext attack. In contrast, a stream cipher using a conventional additive combiner will immediately and completely expose the confusion sequence under known-plaintext attack. That gives the opponents the chance to attack the stream cipher internal RNG, which is the common stream cipher attack. But a Dynamic Substitution combiner hides the confusion sequence, thus complicating the usual attack on the stream cipher RNG.
Dynamic Substitution is a substitution table in which the arrangement of the entries changes during operation. This is particularly useful as a strong replacement for the strengthless exclusive-OR combiner in stream ciphers.
In the usual case, an invertible substitution table is keyed by shuffling under the control of a random number generator. One combiner input is used to select a value from within the table to be the result or output; that is normal substitution. But the other combiner input is used to select an entry at random, then the values of the two selected entries are exchanged. So as soon as a plaintext mapping is used for output, it is immediately reset to any possibility, and the more often any plaintext value occurs, the more often that particular transformation changes.
The arrangement of a keyed substitution table starts out unknown to an opponent. From the opponent's point of view, each table entry could be any possible value with uniform probability. But after the first value is mapped through that table, the just-used table entry or transformation is at least potentially exposed, and no longer can be considered unknown. Dynamic Substitution acts to make the used transformation again completely unknown and unbiased, by allowing it to again take on any possible value. Thus, the amount of information leaked about table contents is replaced by information used to re-define each just-used entry.
Also see
Cloak2,
Penknife,
Dynamic Transposition,
Balanced Block Mixing,
Mixing Cipher,
Mixing Cipher design strategy and
Variable Size Block Cipher.
One goal of Dynamic Transposition is to leverage the concept of Shannon Perfect Secrecy on a block-by-block basis. In contrast to the usual ad hoc strength claims, Perfect Secrecy has a fundamental basis for understanding and believing cipher strength. That basis occurs when the ciphering operation can produce any possible transformation between plaintext and ciphertext. As a result, even brute force no longer works, because running through all possible keys just produces all possible block values. And, in contrast to conventional block ciphers, which actually implement only an infinitesimal part of their theoretical model, each and every Dynamic Transposition permutation can be made practically available.
One interesting aspect of Dynamic Transposition is a fundamental hiding of each particular ciphering operation. Clearly, each block is ciphered by a particular permutation. If the opponent knew which permutation occurred, that would be useful information. But the opponent only has the plaintext and ciphertext of each block to expose the ciphering permutation, and a vast plethora of different permutations each take the exact same plaintext to the exact same ciphertext. (This is because wherever a '1' occurs in the ciphertext, any plaintext '1' would fit.) As a consequence, even known-plaintext attack does not expose the ciphering permutation, which is information an opponent would apparently need to know. The result is an unusual block cipher with an unusual fundamental basis in strength.
Also see
Also see Dynamic Substitution Combiner, Balanced Block Mixing, Mixing Cipher, Mixing Cipher design strategy and Variable Size Block Cipher.
In the simple model of bipolar transistor operation, base-emitter current is multiplied by transistor hFE or beta (B), thus producing amplified collector-emitter current:
Ic = hFE * Ib Ic = collector current hFE = the Forward common-Emitter h-parameter, or beta Ib = base currentAnd while that is a reasonable rule of thumb, the simple model is not very accurate.
A far more accurate computational model is Ebers-Moll:
Ic = Is[ e**(Vbe / Vt) - 1 ] Vt = kT / q Ic = collector current Is = saturation current (reverse leakage) e = the base of natural logs Vbe = voltage between base and emitter k = Boltzmann's constant (1.380662 * 10**-23 joule/deg K) T = temperature (deg. Kelvin) q = electron charge (1.602189 * 10**-19 coulombs)In Ebers-Moll the collector current is a function of base-emitter voltage, not current. Unfortunately, base-emitter voltage Vbe itself varies both as a function of collector current (delta Vbe = 60mV per power of ten collector current), and temperature (delta Vbe = -2.1mV per deg C). Vbe also varies slightly with collector voltage Vce (delta Vbe ~ -0.0001 * delta Vce), which is known as the Early effect.
In cryptography two aspects of efficiency concern keys. Keys are often generated from unknowable bits obtained from really-random generators. When unknowable bits are a limited resource there is motive both to decrease the amount used in each key, and to increase the amount being generated. However, both of these approaches have the potential to weaken the system. In particular, insisting on high efficiency in really-random post-processing can lead to reversible processing which is the exact opposite of the goal of unknowable bits.
Extra unknowable bits may cover unknown, undetected problems both
in key use (in the cipher), and in key generation (in the randomness
generator).
Because there is so much we do not know in cryptography, it is
difficult to judge how close we are to the edge of insecurity.
We need to question the worth of efficiency if it ends up helping
opponents to break a cipher.
A better approach might be to generate high quality unknowability,
and use as much of it as we can.
It is important to distinguish between a long-distance propagating electromagnetic field and simpler and more range-limited independent electric and magnetic fields.
It is unnecessary to consider how a field has been generated. Exactly the same sort of magnetic field is produced either by solid magnets or by passing DC current through a coil of wire making an electromagnet. A field from an electromagnet is not necessarily an electromagnetic field in the sense of a propagating wave; it is just another magnetic field.
Changing magnetic fields can be produced by forcing magnets to rotate (as in an alternator) or changing the current through an electromagnet. Typical dynamic magnetic field sources might include AC motor clocks, mixing motors, fans, or even AC power lines. It would be extremely difficult for low-frequency changes or physical movement to generate a propagating electromagnetic field.
Radio frequency voltage is the basis of most radio transmission. Radio antenna designs convert RF power into synchronized electric and magnetic fields producing a true electromagnetic field which can be radiated into space.
It is important to distinguish between the expanding or "radiating" property of an electromagnetic field, as opposed to the damaging ionizing radiation produced by a radioactive source.
As far as we know
Reducing emitted EMI and dealing with encountered EMI
is an issue in most modern electronic design. Also see
TEMPEST and
shielding.
To prevent ESD, we "simply" prevent the accumulation of static electricity, or prevent discharge through a sensitive device. Some approaches include the use of high-resistance ESD surfaces to keep equipment at a known potential (typically "ground"), conductive straps to connect people to the equipment (or "ground") before they touch it, and ample humidity to improve static discharge through air. Other measures include the use of "ESD shoes" to ground individuals automatically, the use of metalized insulated bags, and improved ESD protection in the devices themselves.
Unless grounded, two people are rarely at the same electrical
potential or voltage, so handing a sensitive board or device from
one person to another can complete a
circuit
for the discharge of static potential.
That could be prevented by shaking hands before giving a board to
another person, or by placing the board on an ESD surface to be
picked up.
ECB is the naive method of applying a block cipher, in that the plaintext is simply partitioned into appropriate size blocks, and each block is enciphered separately and independently. When we have a small block size, ECB is generally unwise, because language text has biased statistics which will result in some block values being re-used frequently, and this repetition will show up in the raw ciphertext. This is the basis for a successful codebook attack.
On the other hand, if we have a large block (at least, say, 64 bytes), we may expect it to contain enough unknowable uniqueness or "entropy" (at least, say, 64 bits) to prevent a codebook attack. In that case, ECB mode has the advantage of supporting independent ciphering of each block. That, in turn, supports various things, like ciphering blocks in arbitrary order, or the use of multiple ciphering hardware operating in parallel for higher speeds.
Modern packet-switching network technologies often deliver raw packets out of order. The packets will be re-ordered eventually, but having out-of-sequence packets can be a problem for low-level ciphering if the blocks are not ciphered independently.
Also see
Balanced Block Mixing,
All or Nothing Transform
and the "Random Access to Encrypted Data" conversation
(locally, or @:
http://www.ciphersbyritter.com/NEWS4/ECBMODE.HTM).
H(X) = -SUM( pi log2 pi )H is in bits per symbol when the log is taken to base 2. Also called "communications entropy."
Note that all meanings can be casually described as our uncertainty as to the value of a random variable. But in cryptography, mere statistical uncertainty is not the same as the unpredictability required for cryptographic strength. (Also see Old Wives' Tale.)
In the original literature, and even thereafter, we do not find what I would accept as a precise word-definition of information-theoretic entropy. Instead, we find the development of a specific numerical computation which Shannon names "entropy."
Apparently the term "entropy" was taken from the physics because the form of the computation in information theory was seen to be similar to the form of "entropy" computations used in physics. The "entropy" part of this is thus the formal similarity of the computation, instead of a common underlying idea, as is often supposed.
The meaning of Shannon entropy is both implicit in and limited by the specific computation. Fortunately for us, the computation is relatively simple (as these things go), and it does not take a lot of secondary, "expert" interpretation to describe what it does or means. We can take a few simple, extreme distributions and easily calculate entropy values to give us a feel for how the measure works.
Basically, Shannon entropy is a measure of coding efficiency in terms of information bits per communicated bit. It gives us a measure of optimal coding, and the advantage is that we can quantify how much we would gain or lose with a different coding. But no part of the computation addresses the context required for "uncertainty" about what we could or could not predict. Exactly the same values occur, giving the same entropy result, whether we can predict a sequence or not.
"Suppose we have a set of possible events whose probabilities of occurrence are p1, p2, ..., pn. These probabilities are known but that is all we know concerning which event will occur. Can we find a measure of how much 'choice' is involved in the selection of the event or of how uncertain we are of the outcome?"-- Shannon, C. E. 1948. A Mathematical Theory of Communication. Bell System Technical Journal.27:379-423.
"In a previous paper the entropy and redundancy of a language have been defined. The entropy is a statistical parameter which measures, in a certain sense, how much information is produced on the average for each letter of a text in the language. If the language is translated into binary digits (0 or 1) in the most efficient way, the entropy H is the average number of binary digits required per letter of the original language."-- Shannon, C. E. 1951. Prediction and Entropy of Printed English. Bell System Technical Journal.30:50-64.
"[Even if we tried] all forms of encoding we could think of, we would still not be sure we had found the best form of encoding, for the best form might be one which had not occurred to us." "Is there not, in principle at least, some statistical measurement we can make on the messages produced by the source, a measure which will tell us the minimum average number of binary digits per symbol which will serve to encode the messages produced by the source?"-- Pierce, J. R. 1961. Symbols, Signals and Noise. Harper & Row.
"If we want to understand this entropy of communication theory, it is best first to clear our minds of any ideas associated with the entropy of physics." ". . . the literature indicates that some workers have never recovered from the confusion engendered by an early admixture of ideas concerning the entropies of physics and communication theory."-- Pierce, J. R. 1961. Symbols, Signals and Noise. Harper & Row.
When we have a sequence of values from a random variable, that sequence may or may not be predictable. Unfortunately, there are virtually endless ways to predict a sequence from past values, and since we cannot test them all, we generally cannot know if a sequence is predictable or not (see randomness testing). So until we can predict a sequence, we are "uncertain" about each new value, and from that point of view, we might think the "uncertainty" of the sequence is high. That is how all RNG sequences look at first. It is not until we actually can predict those values that we think the "uncertainty" is low, again from our individual point of view. That is what happens when the inner state of a statistical RNG is revealed. So if we expect to interpret entropy by what we can predict, the result necessarily must be both contextual and dynamic. Can we seriously expect a simple, fixed computation to automatically reflect our own changing knowledge?
In practice, the entropy computations use actual sequence values, and will produce the same result whether we can predict those values or not. The entropy computation uses simple frequency-counts reflected as probabilities, and that is all. No part of the computation is left over for values that have anything to do with prediction or human uncertainty. Entropy simply does not discriminate between information on the basis of whether we can predict it or not. Entropy does not measure how unpredictable the information is. In reality, entropy is a mere statistical measure of information rate or coding efficiency. Like other statistical measures, entropy simply ignores the puzzles of the war between cryptographers and their opponents the cryptanalysts.
The entropy(1) computation can be seen as one of many possible measures of randomness, with the ideal being a flat or uniform distribution of values. (Obviously, anything other than a flat distribution will be, to some extent, predictable.) Also see: linear complexity and Kolmogorov-Chaitin complexity.
Entropy(1) is useful in
coding theory and
data compression, but requires a
knowledge of the probabilities of each value which we usually know by
sampling.
Consequently, in practice, we may not really know the "true"
probabilities, and the probabilities may change through time.
Furthermore, calculated
The limit to the unpredictability in any
deterministic
random number generator
is just the number of bits in the
state of that generator, plus knowledge
of the particular generator design.
Nevertheless, most RNG's will have a very high calculated
On the other hand, it is very possible to consider pairs of symbols, or triples, etc. The main problem is practicality, because then we need to collect exponentially more data, which can be effectively impossible. Nevertheless, if we have a trivial toy RNG such as a LFSR with a long single cycle, and which outputs its entire state on each step, it may be possible to detect problems using entropy(1). Although a random generator should produce any possible next value from any state, we will find that each state leads into the next without choice or variation. But we do not need entropy(1) to tell us this, because we know it from the design. On the other hand, cryptographic RNG's with substantial internal state which output only a small subset of their state are far too large to measure with entropy(1). And why would we even want to, when we already know the design, and thus the amount of internal state, and thus the maximum entropy(3) they can have?
A value similar to entropy(1) can be calculated by population estimation methods (see augmented repetitions). Also see the experimental correspondence in entropy values from noise generator characterization tests: "Experimental Characterization of Recorded Noise" (locally, or @: http://www.ciphersbyritter.com/NOISE/NOISCHAR.HTM).
Some fairly new and probably useful formulations have been given
which include the term "entropy," and so at first seem to be other
kinds of entropy.
However, the name "entropy" came from a formal similarity to
the computation in physics.
To the extent that new computations are less similar to the
physics, they confuse by including the term "entropy."
"[The] set of a posteriori probabilities describes how the cryptanalyst's knowledge of the message and key gradually becomes more precise as enciphered material is obtained. This description, however, is much too involved and difficult for our purposes. What is desired is a simplified description of that approach to uniqueness of the possible solutions."". . . a natural mathematical measure of this uncertainty is the conditional entropy of the transmitted signal when the received signal is known. This conditional entropy was called, for convenience, the equivocation."
". . . it is natural to use the equivocation as a theoretical security index. It may be noted that there are two significant equivocations, that of the key and that of the message. These will be denoted by
HE(K) andHE(M) respectively. They are given by:HE(K) = SumE,K[ P(E,K) log PE(K) ] HE(M) = SumE,M[ P(E,M) log PE(K) ]in which E, M and K are the cryptogram, message and key and
- P(E,K) is the probability of key K and cryptogram E
- PE(K) is the a posteriori probability of key K if cryptogram E is intercepted
- P(E,M) and PE(M) are the similar probabilities for message instead of key.
"The summation in
HE(K) is over all possible cryptograms of a certain length (say N letters) and over all keys. ForHE(M) the summation is over all messages and cryptograms of length N. ThusHE(K) andHE(M) are both functions of N, the number of intercepted letters."
-- Shannon, C. E. 1949. Communication Theory of Secrecy Systems. Bell System Technical Journal.28: 656-715.
Also see:
unicity distance,
Perfect Secrecy,
Ideal Secrecy and
pure cipher.
Here we have all three possible sequences from a non-ergodic process: across we have the average of symbols through time (the "temporal average"), and down we have the average of symbols in a particular position over all possible sequences (the "ensemble average"):
A B A B A B ... p(A) = 0.5, p(B) = 0.5, p(E) = 0.0 B A B A B A ... p(A) = 0.5, p(B) = 0.5, p(E) = 0.0 E E E E E E ... p(A) = 0.0, p(B) = 0.0, p(E) = 1.0 ^ ^ | ... | | +---- p(A) = 0.3, p(B) = 0.3, p(E) = 0.3 | ... +-------------- p(A) = 0.3, p(B) = 0.3, p(E) = 0.3 (Pierce, J. 1961. Symbols, Signals and Noise.)When a process is ergodic, every possible ensemble average is equal to the time average. As increasingly long sequences are examined, we get increasingly accurate probability estimates. But when a process is non-ergodic, the measurements we take over time from one or a few sequences may not represent all possible sequences. And measuring longer sequences may not help. Also see entropy.
Although most hard sciences can depend upon experimental
measurement to answer basic questions,
cryptography is different.
Often, issues must be
argued on lesser evidence, but science
rarely addresses kinds of evidence, or how conclusions might be
drawn.
It is easy to sympathize with the quote, but it is also easily misused: For example, what, exactly, is being "claimed"? Is that stated, or do we have argument by innuendo? And just who determines what is "extraordinary"? And what sort of evidence could possibly be sufficiently "extraordinary" to convince someone who has a contrary bias?
In a
scientific discussion or
argument, a
distinction must be made between what is actually known
as fact and what has been merely assumed and accepted
for lo, these many years.
Often, what is needed is not so much
evidence, as the
reasoning to expose
conclusions which have drawn far beyond
the evidence we have.
Even if well-known "experts" have chosen to
believe overdrawn conclusions, that
does not make those conclusions correct, and also does not require
new evidence, let alone anything "extraordinary."
See the factorials section of the "Base Conversion, Logs, Powers,
Factorials, Permutations and Combinations in JavaScript" page
(locally, or @:
http://www.ciphersbyritter.com/JAVASCRP/PERMCOMB.HTM#Factorials).
The system under analysis is considered to be a set of components or black box elements, each of which may fail. By considering each component in turn, the consequences of a failure in each particular component can be extrapolated, and the resulting costs or dangers listed. For each failure it is generally possible to consider alternatives to minimize either the probability or the effect of such failure. Things that might be done include:
Also see
risk,
risk management and
fault tree analysis.
If two Boolean functions are not correlated, we expect them to agree half the time, which we might call the "expected distance." When two Boolean functions are correlated, they will have a distance greater or less than the expected distance, and we might call this difference the unexpected distance or UD. The UD can be positive or negative, representing distance to a particular affine function or its complement.
It is easy to do a fast Walsh transform by hand. (Well, I say
"easy," then always struggle when I actually do it.)
Let's do the FWT of function
(a',b') = (a+b, a-b)
So for the values (1,0), we get (1+0, 1-0) which is just (1,1). We start out pairing adjacent elements, then every other element, then every 4th element, and so on until the correct pairing is impossible, as shown:
original 1 0 0 1 1 1 0 0
^---^ ^---^ ^---^ ^---^
first 1 1 1 -1 2 0 0 0
^-------^ ^-------^
^-------^ ^-------^
second 2 0 0 2 2 0 2 0
^---------------^
^---------------^
^---------------^
^---------------^
final 4 0 2 2 0 0 -2 2
The result is the unexpected distance to each affine Boolean function. The higher the absolute value, the greater the "linearity"; if we want the nonlinearity, we must subtract the absolute value of each unexpected distance from the expected value, which is half the number of bits in the function. Note that the range of possible values increases by a factor of 2 (in both positive and negative directions) in each sublayer mixing; this is information expansion, which we often try to avoid in cryptography.
Also see: "Walsh-Hadamard Transforms: A Literature Survey," locally, or @: http://www.ciphersbyritter.com/RES/WALHAD.HTM. and the "Active Boolean Function Nonlinearity Measurement in JavaScript" page, locally, or @: http://www.ciphersbyritter.com/JAVASCRP/NONLMEAS.HTM.
The FWT provides a strong mathematical basis for
block cipher mixing such that all input
values will have an equal chance to affect all output values.
Cryptographic mixing then occurs in butterfly operations based on
balanced block mixing structures
which replace the simple add / subtract butterfly in the FWT and
confine the value ranges so information expansion does not occur.
A related concept is the well-known
FFT, which can use exactly the same mixing
patterns as the FWT.
Faults can be
Fault tolerance is achieved by eliminating each and every
single point of failure in the system. Improving the
reliability can benefit the
overall system, but is not the same as fault tolerance.
First, the various undesired outcomes are identified. Then each sequence of events which could cause such an outcome is also identified.
If it is possible to associate an independent probability with each event, it may be possible to compute an overall probability of occurrence of the undesirable outcome. Then one can identify the events which make the most significant contributions to the overall result. By minimizing the probability of those events or reducing their effect, the overall probability of the negative outcome may be reduced.
Various system aspects can be investigated, such as unreliability (including the effect of added redundancy), system failure (e.g., the causes of a plane crash), and customer dissatisfaction. A potential advantage is efficiency when multiple faults seem to converge in a particular node, since it may be possible to modify that one node to eliminate the effect of many faults at once. However, that also would be contrary to a "defense in depth" policy of protecting all levels, wherever a fault might occur or propagate.
Also see
risk,
risk management,
failure modes and effects analysis and
attack tree.
Normally, in a Feistel construction, the input block is split into two parts, one of which drives a transformation whose result is exclusive-OR combined into the other block. Then the "other block" value feeds the same transformation, whose result is exclusive-OR combined into the first block. This constitutes 2 of perhaps 16 "rounds."
L R
| |
|--> F --> + round 1
| |
+ <-- F <--| round 2
| |
v v
L' R'
One advantage of the Feistel construction is that the transformation does not need to be invertible. To reverse any particular layer, it is only necessary to apply the same transformation again, which will undo the changes of the original exclusive-OR.
A disadvantage of the Feistel construction is that
diffusion
depends upon the internal transformation. There is no guarantee of
overall diffusion, and the number
of rounds required is often found by experiment.
Also see the Fenced DES section of the main page
locally, or @:
http://www.ciphersbyritter.com/index.html#FencedTech.
Also see: "A Keyed Shuffling System for Block Cipher Cryptography,"
locally, or @:
http://www.ciphersbyritter.com/KEYSHUF.HTM.
Fencing layers are also used in
other types of cipher.
While exceedingly valuable, the FFT tends to run into practical problems in use which can require a deep understanding of the process:
The FFT provides a strong mathematical basis for
block cipher mixing in that all input
values will have an equal chance to affect all output values.
But an ordinary FFT expands the range of each sample by a factor of
two for each mixing sub-layer, which does not produce a conventional
block cipher.
A good alternative is
Balanced Block Mixing, which has
the same general structure as an FFT, but uses
balanced
butterfly operations based on
orthogonal Latin squares.
These replace the simple add / subtract butterfly in the ordinary
FFT, yet confine the value ranges so information expansion does not
occur. Another concept related to the FFT is the
fast Walsh-Hadamard transform
(FWT), which can use exactly the same mixing patterns as the FFT.
In general, a field supports the four basic operations (addition, subtraction, multiplication and division), and satisfies the normal rules of arithmetic. An operation on any two elements in a field is a result which is also an element in the field. The real numbers and complex numbers are examples of infinite fields.
Fields of finite order include rings of integers modulo some prime. Here are multiplication tables under mod 2, mod 3 and mod 4:
0 1 0 1 2 0 1 2 3
0 0 0 0 0 0 0 0 0 0 0 0
1 0 1 1 0 1 2 1 0 1 2 3
2 0 2 1 2 0 2 0 2
3 0 3 2 1
In a field, each element must have an inverse, and the product of
an element and its inverse is 1. This means that every non-zero
row and column of the multiplication table for a field must
contain a 1. Since row 2 of the mod 4 table does not contain a 1,
the set of integers mod 4 is not a field. This is because 4
is not a prime.
The
order of a field is the number of elements
in that field. The integers mod prime p
(Z/p) form a
finite field of order p.
Similarly,
mod 2 polynomials will form a field
with respect to an
irreducible
polynomial,
and will have order 2n, which is a very useful size.
For example, suppose we have an active filter with a voltage gain of 1 at 425Hz and a gain of 6 at 2550Hz:
amplitude change in dB = 20 log10 (e2/e1) = 20 log10 6 = 15.56 dB octaves = log2 (f2/f1) = log2 (2550/425) = 2.58 octaves decades = log10 (f2/f1) = log10 (2550/425) = 0.778 decades dB/octave = 15.56 / 2.58 = 6dB/octave dB/decade = 15.56 / 0.778 = 20dB/decadeThe value
Also denoted Fq for a finite field of
order q. Also see:
characteristic.
In particular, a finite collection of states S, an input sequence with alphabet A, an output sequence with alphabet B, an output function u(s,a), and a next state function d(s,a).
Other than
really random generators for
nonce or
message key values, all the computations of
cryptography are finite state machines
and so are completely
deterministic.
Much of cryptography thus rests on the widespread but unproven
belief that the internal state of a
cryptographic machine (itself a FSM) cannot be deduced from a
substantial amount of known output, even when the machine design
is completely defined.
Ultimately, flow control is one of the most important aspects of a
communications network.
Inside the network, however, protocols may simply send and re-send
data until those particular data are acknowledged.
f(x) = A0 + SUM (An cos nx + Bn sin nx)Alternately, over the interval [a, a+2c]:
f(x) = a0 + SUM ( an cos(n PI x/c) + bn sin(n PI x/c) ) an = 1/c INTEGRAL[a,a+2c]( f(x) cos(n PI x/c) dx ) bn = 1/c INTEGRAL[a,a+2c]( f(x) sin(n PI x/c) dx )
The use of sine and cosine functions is particularly interesting,
since each
term (or pair of terms) represents a single
frequency oscillation with a particular
amplitude and
phase.
So to the extent that we can represent an amplitude waveform as a
series of sine and cosine functions, we thus describe the frequency
spectrum associated with that waveform.
This frequency spectrum describes the frequencies which must be
handled by a
circuit to reproduce the original waveform.
This illuminating computation is called a
Fourier transform.
In a cryptographic context, one of the interesting ideas of the Fourier transform is that it represents a thorough mixing of each input value to every output value in an efficient way. On the other hand, using the actual FFT itself is probably impractical for several reasons:
The basic idea of efficiently combining each value with every
other value is generalized in cryptography as
Balanced Block Mixing.
BBM structures can be applied in FFT-like patterns, and can
support a wide range of
keyed,
non-expanding,
nonlinear, and yet reversible
transformations.
Typically, audio frequencies range from 20Hz to 20kHz, although many designs try to be flat out to 100kHz. Video baseband frequencies range from DC up to something like 3MHz or 5 MHz. Common names for radio frequency (RF) ranges with generally similar properties are:
ELF 3Hz..300Hz naval strategic commands VF 300Hz..3kHz telephone voice VLF 3kHz..30kHz radionavigation, military LF 30kHz..300kHz time/freq std (WWVB 60kHz) MF 300kHz..3MHz AM broadcast band HF 3MHz..30MHz shortwave, ham bands, CB VHF 30MHz..300MHz FM broadcast band, TV UHF 300MHz..3GHz TV, cell phones, satellite SHF 3GHz..30GHz satellite EHF 30GHz..300GHz
Geffe, P. 1973. How to protect data with ciphers that are really hard to break. Electronics. January 4.99-101.
For conventional stream ciphers, which are basically just an RNG and exclusive-OR, the name of the game is to find a strong RNG. Most RNG designs are essentially linear and easily broken with just a small amount of the produced sequence. The Geffe combiner was an attempt to combine two RNG's and produce a stronger sequence than each. But that was not to be.
Also see: "The Story of Combiner Correlation: A Literature Survey"
(locally, or @:
http://www.ciphersbyritter.com/RES/COMBCORR.HTM#Geffe73)
and "MacLaren-Marsaglia Again"
(locally, or @:
http://www.ciphersbyritter.com/NEWS5/MACLAR.HTM).
Typically a cylindrical tube with an outer conductive shell (the cathode), a wire (the anode) in the center, and filled with a gas like argon at low pressure. Depending on the tube involved, a positive bias of perhaps 500 or 600 volts above the cathode is applied to the anode through a resistance of perhaps 1 to 10 Megohms. We would expect both tube temperature and applied voltage to affect detection sensitivity to some extent.
When an ionizing event like a gamma ray interacts with an atom of the internal gas, a fast electron may be ejected from the shell of the atom. As the ejected electron encounters other atoms, it may cause other electrons to be ejected, producing a cascade or "avalanche" of gas ions along their paths. If the full distance between cathode and anode becomes ionized, a strong current pulse or arc will occur. Presumably, many weaker or wrongly positioned events occur which do not form an arc and are not sensed. However, even unsensed events may have short-term and localized effects on sensitivity that may be hard to quantify.
After the initial pulse (which discharges the interelectrode capacitance), current for the arc flows through the anode resistance, which should cause the applied voltage to drop below the level which would sustain an arc. After the arc ends, an internal trace gas, such as an alcohol, may help to "quench" the ionization which could cause the same arc to reoccur. During the avalanche and quench period, the tube cannot detect new events. Meanwhile, the anode voltage climbs back toward the operational level (charging the interelectrode capacitance) until another sufficiently-strong and properly-placed ionizing event occurs.
Often cited in
cryptography as the ultimate
really random generator.
Typically we have mod 2 polynomials with results reduced "modulo" an irreducible or generator polynomial of degree n. This is analogous to creating a field from the integers modulo some prime p. Unfortunately, a block size of n bits would imply an order of 2n, which is not prime. But we can get the block size we want using mod 2 polynomials.
For example, consider GF(24) with the generator polynomial x4 + x + 1, or 10011, which is a degree-4 irreducible. First we multiply two elements as usual:
1 0 1 1
* 1 1 0 0
----------
0
0
1 0 1 1
1 0 1 1
---------------
1 1 1 0 1 0 0
Then we "reduce" the result modulo the generator polynomial:
1 1 0
----------------
1 0 0 1 1 ) 1 1 1 0 1 0 0
1 0 0 1 1
---------
1 1 1 0 0
1 0 0 1 1
---------
1 1 1 1 0
1 0 0 1 1
---------
1 1 0 1
=========
So, if I did the arithmetic right, the result is the remainder, 1101. I refer to this as arithmetic "mod 2, mod p".
An irreducible is sufficient
to form a finite field. However, some special irreducibles are
also primitive, and these
create "maximal length" sequences in LFSR's.
Goodness-of-fit tests can at best tell us whether one
distribution is or is not the same as the other,
and they say even that only with some probability. It is
important to be very careful about experiment design, so that,
almost always, "nothing unusual found" is the goal we seek. When
we can match distributions, we are obviously able to state exactly
what the experimental distribution should be and is. But there
are many ways in which distributions can differ, and simply
finding a difference is not evidence of a specific effect.
(See null hypothesis.)
Dec Binary Gray
0 000 000
1 001 001
2 010 011
3 011 010
4 100 110
5 101 111
6 110 101
7 111 100
<FONT FACE = "Symbol">A</FONT> displays A and<FONT FACE = "Symbol">a</FONT> displays a ;<FONT FACE = "Symbol">G</FONT> displays G and<FONT FACE = "Symbol">g</FONT> displays g .
Alpha A A a a Beta B B b b Gamma G G g g Delta D D d d Epsilon E E e e Zeta Z Z z z Eta H H h h Theta Q Q q q Iota I I i i Kappa K K k k Lamda L L l l Mu M M m m Nu N N n n Xi X X x x Omicron O O o o Pi P P p p Rho R R r r Sigma S S s s Tau T T t t Upsilon U U u u Phi J J j j Chi C C c c Psi Y Y y y Omega W W w w
Perhaps the earliest common use of a ground was in the electric telegraph, which came into use around the time of the U.S. Civil War. A battery voltage was switched onto a common wire using a telegraph key, and electromagnets up and down the wire responded by making a click. When the key was released, the electromagnets would release and make a clack, the time between click and clack being the dot or dash of Morse code. But for current to flow in the circuit, there had to be a return path. One way to do that was to string two wires for each circuit. However, it was found that a metal surface in the earth, such as a rod driven into the ground, can contact, within a few ohms of resistance, the same reference as used by everybody else. So, especially for small signals, the return path can be through the actual dirt itself, thus saving a lot of copper and making a system economically more viable.
The original concept of radio was to launch and collect signals from the air, as referenced to the common ground. What actually happens is the propagation of an electromagnetic wave, which can be detected without a common reference. But a ground can play an important role in an antenna system, especially at lower RF frequencies.
In the past, the usual ground reference was the copper cold water pipe which extended in the earth from the home to the city water main. In many homes, this reference was carried throughout the home on substantial copper pipe with soldered connections. Unfortunately, the introduction of nonconductive plastic water pipe, while convenient and cheap, also has eliminated an easy ground reference.
Power distribution, with a massive appetite for copper, is a natural application for one-wire connections, but in this case there are surprising and dangerous complexities. Nowadays, at the AC socket, we have both a protective ground wire which connects directly to some ground, and also a return power path, which is connected to ground at some point.
Ideally, the metal chassis or case of anything connected to the power lines should connect to the protective ground. Ideally, if protected equipment shorts out and connects live power to the case, that will blow the equipment fuse or even a power box circuit breaker, instead of electrocuting the operator. Even more ideally, a ground fault interrupter (GFI) can detect even a small amount of protective current flow and open an internal breaker. However, the protective ground system itself is generally tested at most once (upon installation), and if it goes bad under load, we will not know until bad things happen. While GFI's do have a "test" button, most ordinary equipment does not.
As different amounts of AC current flow about the home or
building, wire resistance causes the voltage between the two AC
socket grounds to vary, which is the origin of a
ground loop.
But ground loops are not limited to power circuits, and can
present serious problems in instrumentation and audio systems.
The simple ground model would have us believe that there is no resistance in the ground, which is of course false. Even sending a small signal from one amplifier to another on an unbalanced line implies that some current will flow on that line, and that same current also flows through the ground connections (often, a "shield" conductor). Thus, the voltage across different parts of ground will vary dynamically depending on the ground resistance, which can cause a cross-coupling between independent unbalanced channels. Even if that coupling is tiny, when working with tiny signals, it may matter anyway, especially in a TEMPEST context, or when working with signals in a receiver.
The ground loop problem is inherent in unbalanced signal lines. The same effects occur inside circuitry, but then the problem is under the control of a single designer or manufacturer; most problems occur when interconnecting different units. In general, ground loop cross-coupling effects are minimized by reducing ground resistance and increasing input or load resistance. Alternately, broadcast audio systems use balanced line interconnections that do not need ground as a part of the signal path. Balanced lines also tend to "cancel out" common-mode noise picked up by cables on long runs.
Consumer equipment generally uses unbalanced lines where the signal is referenced to some ground, typically on "RCA connectors." But because of ground loops, different equipment can have different references, thus introducing power line hum into the signal path.
Various responses are possible, but the one which is not possible is to open or disconnect the safety ground. The safety ground is there to protect life and should never be subverted. 3-prong to 2-prong AC adapters should never be used when equipment has 3-wire plugs. Because sound systems are interconnected, a system isolated from safety ground allows a failure on even one remote piece of equipment to electrify the entire system, and that is breathtakingly dangerous. Better alternatives always exist.
The pervasive nature of ground loops is a good reason to use
isolated balanced lines.
It is also a reason to use optical digital interconnections, which
inherently isolate the ground references in different pieces of
equipment.
In a group consisting of set G and closed operation * :
The integers under addition (Z,+) form a group, as do the reals (R,+). A set with a closed operation which is just associative is a semigroup. A set with a closed operation which is both associative and has an identity is a monoid. A ring has a second dyadic operation which is distributive over the first operation. A field is a ring where the second operation forms an abelian group.
Used throughout cryptography, but particularly related to Pure Cipher.
Fundamental distinctions exist between hardware and software:
For error detection, a hash of message data will produce a particular hash value, which then can be included in the message before it is sent (or stored or enciphered). When the data are received (or read or deciphered), the message is hashed again, and the result should match the included value. If the hash is different, something has changed, and the usual solution is to request the data be sent again. But the hash value is typically much smaller than the data, so there must be "many" different data sets which will produce that same value, which is called hash "collision." Because of this, "error detection" inherently cannot detect all possible errors, and this is independent of any linearity in the hash computation.
An excellent example of a hash function is a CRC operation. CRC is a linear function without cryptographic strength, but does have a strong mathematical basis which is lacking in ad hoc methods. Strength is not needed when keys are processed into the state or seed used in a random number generator, because if either the key or the state becomes known, the keyed cipher has been broken already. Strength is also not needed when a hash is used to accumulate uncertainty in data from a really random generator, since the hash construction cannot expose unknowable randomness anyway.
In contrast, a cryptographic hash function such as that used for authentication must be "strong." That is, it must be "computationally infeasible" to find two input values which produce the same hash result. Otherwise, an opponent could produce a different message which hashes to the correct authentication value. In general, this means that a cryptographic hash function should be nonlinear overall and the hash state or result should be 256 bits or more in size (to prevent birthday attacks).
Sometimes a cryptographic hash function is described in the literature as being "collision free," which is a misnomer. A collision occurs when two different texts produce exactly the same hash result. Given enough texts, collisions will of course occur, precisely because any fixed-size result has only so many possible code values. The intent for a cryptographic hash is that collisions be hard to find (which implies a large internal state), and that particular hash values be impossible to create at will (which implies some sort of nonlinear construction).
A special cryptographic hash is not needed to assure that hash results do not expose the original data: When the amount of information hashed is substantially larger than the internal state or the amount of state ultimately exposed, many different data sequences will all produce the exact same hash result (again, "collision"). The inability to distinguish between the data sequences and so select "the" original is what makes a hash one way. This applies to all "reasonable" hash constructions independent of whether they are "cryptographic" or not. In fact, we can better guarantee the collision distributions when we have a relatively simple linear hash than if we must somehow analyze a complex ad hoc cryptographic hash.
On the other hand, when less information is hashed than the amount of revealed state, the hashing may be reversible, even if the hash is "cryptographic." And, again, that is independent of the strength of the hash transformation.
Currently, almost all of cryptography is based on complex but deterministic (and, thus, at least potentially solvable) operations like ciphers and hashes. Because of the occasionally disastrous effectiveness of cryptanalysis, every cipher system has need of at least a few absolutely unpredictable values, which can be described as really random. Really random values have various uses, including message keys and protocol nonces. Generally, such values are obtained by attempting to detect or sample some molecular or atomic process, such as electrical noise.
For most cryptographic use, values should occur in a uniform distribution, so that no value will be predictable (by an opponent) any more than any other value. Unfortunately, few measurable molecular or atomic processes have a uniform distribution. As a consequence, some deterministic processing must be applied to somehow "flatten" the non-uniform distribution.
In statistics, and with real number values, it is common to simply compute an inverse and multiply. Unfortunately, that depends upon knowing the original distribution very well, but in practice the sampled distributions from quantum levels are not ideal and do vary.
No simple, fixed, integer value transformation can compensate for a distribution bias where some values appear more often than they should. Bias is a property of a set of values, not individual items, so treating individual values similarly seems unlikely to correct the problem. On the other hand, a transformation of multiple values, like block ciphering, can go a long way. Because a cipher block generally holds 64 bits or more worth of sample values, we might never see two identical plaintext blocks, and thus never produce a bias in the ciphertext. With a block cipher, any bias in the sample values tends to be hidden by the multiple values in a block, although at substantial expense.
Perhaps the most common way to flatten a distribution is to hash multiple sample values into a result for use. Using a CRC hash as an example, we can model a CRC operation as something like a large, fast modulo. Now, when the CRC is initialized to a fixed value, a particular input sequence always produces the same result, just like any other deterministic operation, including a cryptographic hash. So when inputs repeat, results repeat, and that carries the bias from the input to the output, even if only a subset of result bits are used. The worst possible situation would be to "hash" each sample value independently into a smaller result, since then the most frequent sample values would transfer the bias directly into the results. Normally, though, if we hash enough sample values at the same time, we expect the input sequence to "never" repeat, so the results should be almost completely corrected. Thus, the issue is not just having more input than output, but also having enough input so that any particular input string will "never" recur.
An improvement is to initialize the CRC to a random starting state before each hash operation. Because of the random initialization, any remaining bias (as in particularly frequent or infrequent values) will be distributed among all possible output values. When using a CRC for hashing, a separate random value is not required, since a random value is already in the CRC state as a result of the previous hash. Thus, what is required is simply to not initialize the CRC to a fixed value before each hash operation. For other hashes, the previous result could be hashed before new sample values.
To assure that the hash is not reversible, the hash operation
must be overloaded; that is, at least twice as much information
must be hashed as the size of the hash result or the amount exposed.
And when bias must be corrected, a factor of 2.5 or more may be
a better minimum.
Reasonable choices might include a 16-bit CRC with 40 bits of
normally-distributed input data, or a 32-bit CRC with 80 bits
of input.
In most sciences, the main point of a mathematical model is to predict reality, and experimentation is how we know reality. Experimentation thus sets the values that the mathematical model must reproduce. When there is a real difference between experiment and model (both being competently evaluated), the model is wrong.
In general, experimentation cannot know every possible parameter, or try every possible value, and so cannot assure us that something never happens, or that every possibility has been checked. That sort of thing typically requires a proof, but such proof is always based on the assumption that the mathematical model is sufficient and correct. Because experimentation often collects all measurable data, it is generally better than proof at finding unexpected happenings or relationships.
In cryptography, ciphers are basically approved for use by experiments which find that various attacks do not succeed. Absent various assumptions (such as: no other attacks are possible, and every approach has been fully investigated) that does not even begin to approach what we would consider actual proof of strength. Nevertheless, those results apparently are sufficient for the field of cryptography to place real users and real data at risk.
Since experimentation is the basis for all real use of
cryptography, it does seem odd that experimentation is often scorned in
mathematical cryptography.
Each hex value represents exactly four
bits, which can be particularly convenient.
Also see:
binary,
octal, and
decimal.
For group G with operation #, and group H with operation %; for mapping @ from group G to group H; given a, b in G: The result of the group G operation on a and b, when mapped into group H, must be the same as first mapping a and b into H, and then performing the group H operation:
@(a # b) = @a % @bA homomorphic mapping (the map @ from group G into group H) need not be one-to-one.
Given a homomorphism from group G into group H and mapping @ from G to H:
Also see
automorphism and
isomorphism.
A form of
homophonic substitution is available
in a large
block cipher, where a homophonic
selection field is enciphered along with the plaintext (also see
block coding,
balanced block mixing and
huge block cipher
advantages).
Any of the possible values for that field naturally will produce
a unique ciphertext. After deciphering any of those
ciphertexts, the homophonic selection field could be deleted,
and the exact same plaintext recovered.
When the homophonic selection is
really random, it adds a
non-deterministic aspect to the cipher.
Note that the ability to produce a multitude of different
encipherings for exactly the same data is related to the
concept of a
key, especially
dynamic keying, and the use of a
salt in
hashing. Also see
the "The Homophonic Block Cipher Construction" conversation
(locally, or @:
http://www.ciphersbyritter.com/NEWS3/HOMOPHON.HTM).
1xx Information 4xx Browser request error
100 Continue 400 Bad request
101 Ack protocol change 401 Unauthorized
2xx Success 402 Payment required
200 OK 403 Forbidden
201 Resource created 404 Not found
202 Request accepted 405 Method not allowed
203 OK, non-authoritative 406 Not acceptable
204 Empty 407 Proxy authentication required
205 Content reset 408 Request timed out
206 Partial content 409 Conflict with state of resource
3xx Redirection 410 Resource no longer available
300 Multiple locations 411 Length value required
301 Moved permanently 412 Precondition failed
302 Found in <location> 413 Requested file too large
303 Use <location> 414 Requested address too long
304 Not modified 415 Requested data on unsupported media
305 Must use <proxy> 416 Requested range not satisfiable
306 Unused 417 Expectation failed
307 Temp <location> 5xx Server error
500 Internal error
501 Not implemented
502 Bad gateway
503 Service unavailable
504 Gateway timeout
505 HTTP version not supported
Various advantages can accrue from huge blocks (although not all simultaneously):
When we have a huge block of plaintext, we may expect it to contain enough (at least, say, 64 bits) uniqueness or entropy to prevent a codebook attack, which is the ECB weakness. In that case, ECB mode has the advantage of supporting the independent ciphering of each block. This, in turn, supports various things, such as the use of multiple ciphering hardware operating in parallel for higher speeds.
As another example, modern packet-switching network technologies often deliver raw packets out of order. The packets will be re-ordered eventually, but we cannot start deciphering until we have a full block in the correct order. But we might avoid delay if blocks are ciphered independently.
A similar issue can make per-block authentication very useful. Typically, authentication requires a scan of plaintext, and then some structure to transport the authentication value with the ciphertext, much like a common error detecting code. The problem is that all the data which are to be authenticated in one shot, which often means the whole deciphered file, must be buffered until an authentication result is reached. We certainly cannot use the data until it is authenticated. So we have to buffer all that data as we get it, but cannot use any of it until we get it all and find that it checks out. We can avoid that overhead and latency by using per-block authentication.
To implement per-block authentication, we use a keyed cryptographic RNG which produces a keyed sequence of values. Both ends produce the same keyed sequence by using the same key. We place a different random value in each block sent, and then compare that to the result as each block is received. This is very much like a per-block version of a Message Key.
Normally, in software, the more computation there is, the longer it takes, but that is not necessarily true in hardware, if we are willing to build or buy more hardware. In particular, we can pipeline hardware computations so that, once we fill the pipeline (a modest latency), we get a full block result on every computation period (e.g., on every clock pulse). Thus, huge blocks can be much faster in hardware than software: the larger the block, the larger the data rate, for any given hardware implementation technology.
Also see All or Nothing Transform and Balanced Block Mixing.
Also see:
In both logic and science, some hypotheses are better than others. In logic, hypotheses are the same if they have the same formal structure. But in science, hypotheses with the exact same structure may or may not be appropriate in particular contexts. A scientific hypothesis must be:
Hypotheses structured so that we can only develop evidence for the question by investigating an essentially unlimited number of possibilities are untestable. Hypotheses in which no experiment of any kind can disprove the question are unfalsifiable, and are best seen as mere beliefs. Hypotheses which apply, say, to all matter, are unprovable absent testing on all matter, but any one (repeatable) experiment could disprove the question. Most scientific hypotheses are structured so that they can be proven false or confirmed by experiment, but cannot ever be proven true.
Many scientific questions are formally unproven in the sense that they address the operation of all matter across time, only a tiny subset of which can be sampled experimentally. But most scientific issues also admit quantifiable experimentation which makes it possible to bound the interpretation of reality. Quantifiable experimentation makes it possible to compare different trials of the same experiment and see if things work about the same each time, with each material, and in each place. Of course, getting precisely similar values in each case may just be a coincidence, which is why it is not proof. But each trial does add to a growing mass of evidence for an overall similarity which, while not proof, does provide both statistical and factual support.
The usual logic of scientific experimentation is not available in cryptography, in that cryptography has no general, quantifiable test of strength. We only know the strength of a cipher when an actual attack is found (and then only know the strength under that particular attack); until then we know nothing at all about strength. Until an attack is found, there is no experimental strength value, so cryptanalytic experiments cannot develop bounds on strength. Nor are factual and comparable values developed. As a result, cryptographic proof appears to require what Science knows cannot be done: a testing of every possibility simply to show that none of them work.
For example, in cryptography, we may wish to assert that a certain cipher is strong, or unbreakable by any means. (We can insert "practical" with little effect.) A cipher is unbreakable when no possible attack can break it. So to prove that, we apparently first must identify every possible attack, and then check each to see if any succeed on the cipher under test. But not only do we not know every possible attack, it seems unlikely that we could know every possible attack, or even how many there are. Thus, the assertion of strength seems unprovable.
Even if we could know every possible attack, we still have problems: Since attacks are classified as approaches (rather than algorithms), it seems necessary to phrase each in ways guaranteed to cover every possible use. Yet it seems unlikely that we could be guaranteed to know every possible ramification of even one approach. Without comprehensive algorithms, it is hard to see how we could provably know that any particular approach could not work. So again the hypothesis of cipher strength seems unprovable and unhelpful.
On the other hand, for each well-defined attack algorithm, we probably can decide whether that would break any particular cipher. So it is at least conceivable that we can have proof of strength against particular explicit attack algorithms. Unfortunately, in most cases, attack approaches must be modified for each individual cipher before an appropriate algorithm is available, and failure might just indicate a poorly-adapted approach. So algorithmic tests are not particularly helpful.
Similar issues occur with respect to random bit sequences: A sequence is random if no possible technique can extrapolate future bits from all past ones, succeeding other than half the time. Again, we do not and probably can not know every possible extrapolation technique, and as the set of "past" bits grows, so do the number of possible techniques. Not knowing each possible technique, we certainly cannot check each one, making the hypothesis of sequence randomness seemingly unprovable.
Fortunately, we do have some defined algorithms for statistical randomness tests. Thus, what we can say is that a particular test has found no pattern, and that test can be repeated by various workers on various parts of the sequence for confirmation. What that does not do is build evidence for results from other tests, and the number of such tests is probably unbounded. So, again, the hypothesis of randomness seems unprovable.
Even if we could develop cryptographic proof to the same level as natural law, that probably would not be useful. People already believe ciphers are strong, even without supporting evidence. Belief with supporting evidence would, of course, be far preferable. But what is really needed is not more belief, but actual affirmative proof of strength. And that is beyond what science can provide even for ordinary natural law.
Why is lack of absolute proof acceptable in science and not in cryptography? In science, the issue is not normally whether an effect exists, but instead the exact nature of that effect. When an apple falls, we see gravity in action, we know it exists, and then the argument is how it works in detail. Even when science pursues an unknown effect, it does so numerically in the context of experiments which measure the property in question.
When a cipher operates, all we see is the operation of a computing mechanism; we cannot see secrecy or security to know if they even exist, let alone know how much we have. Security cannot be measured because the appropriate context is our opponents, and they are not talking. While we may know that we could not penetrate security, that is completely irrelevant unless we know that the same applies to our opponents.
Does all this mean cryptography is hopeless? Well, absolute proof of absolute security seems unlikely. But we can seek to manage the risk of failure, particularly because we cannot know how large that risk actually is. For example, airplanes are designed with layers of redundancy specifically to avoid catastrophic results from single subsystem failures (see single point of failure). In cryptography, we could seek to not allow the failure of any single cipher to breach security. It would seem that we could approach that by multiple encryption and by dynamically selecting ciphers, which are parts of the Shannon Algebra of Secrecy Systems.
The disturbing aspect of the IDEA design is the extensive use
of almost
linear operations, and no nonlinear tables
at all. While technically nonlinear, the internal operations
seem like they might well be linear enough to be attacked.
"With a finite key size, the equivocation of key and message generally approaches zero, but not necessarily so. In fact, it is possible for . . .HE(K) andHE(M) to not approach zero as N approaches infinity.""An example is a simple substitution on an artificial language in which all letters are equiprobable and successive letters independently chosen."
"To approximate the ideal equivocation, one may first operate on the message with a transducer which removes all redundancies. After this, almost any simple ciphering system
-- substitution, transposition, Vigenere, etc., is satisfactory."
-- Shannon, C. E. 1949. Communication Theory of Secrecy Systems. Bell System Technical Journal.28:656-715.
There are various examples:
Also see:
pure cipher,
Perfect Secrecy,
unicity distance and
balance.
Consider what it would mean to make the load match the impedance of a generator: As we decrease the load impedance toward the generator impedance, more current will flow into the load and more power will be transferred. But as we deliver more power, more power is also dissipated in the generator, and thus simply lost to heat. We deliver the maximum possible power to a load when the load has the same impedance as the generator, but then we lose as much power in the generator as we manage to transfer. An efficiency of 50 percent is generally a bad idea for any signal, low-level, high-level or power.
As the load impedance is decreased below the generator impedance, now less power is delivered to the load, and even more power is dissipated as heat in the generator. The limit is what happens when a power amplifier output is shorted. In practice, adding speakers in parallel to an amplifier output can so reduce the load impedance to cause the amplifier to overheat or self-protect or fail.
In most audio work, there is little desire to "match" impedances. Normally, signals are produced by low-impedance sources (e.g., preamplifier outputs) for connection to high-impedance inputs (e.g., amplifier inputs). When transformers are used, they often create balanced lines, receive balanced signals, and provide ground loop isolation.
Impedance matching tends to be of the most concern for electro-mechanical devices. The fidelity of sensitive mechanical sensors like phonograph cartridges and professional microphones can be affected by their loads. For best performance, it is important to present the correct load impedance for each device, which is a serious impedance matching requirement. However, nowadays this is often accomplished trivially with an appropriate load resistor across a high-impedance input to an amplifier or preamp. In contrast, loudspeakers, which are also electro-mechanical, are almost universally "voltage driven" devices. Speakers are specifically designed to be driven from an source having an impedance much lower than their own.
One old application somewhat like matching is the classic "input transformer": These devices take a low-level low-impedance signal to a larger signal (typically ten times the original, or more), but then necessarily at a much, much higher impedance. When used with an amplifier that has a high input impedance anyway, an input transformer can deliver a greater signal, without the noise of a low-level amplification stage. However, bipolar transistor input stages generally want to see a low impedance source for best noise performance, so the advantage seems limited to somewhat noisier FET and tube input circuits.
Good input transformers, with a wide and flat frequency response, are very expensive and can be surprisingly sensitive to nearby AC magnetic fields. Although a thin metal shield is sufficient to protect against RF fields, sheet metal steel will not diminish low-frequency magnetic fields much at all. Mu metal shields may help, although distance is the usual solution.
In power transformers, losing as much power to heat as we deliver would be absolutely ridiculous. We may transform AC power to get the voltage we need, but we do not "match" the equipment load to the impedance of the AC line.
In radio frequency (RF) work, impedance matching is needed
to properly use coaxial cables.
Using source and load impedances appropriate for the coax minimizes
"standing waves" on that coax.
Standing waves increase current at voltage minima and thus increase
signal loss, even for low-level signals.
Standing waves also can cause voltage breakdown at voltage maxima,
which may include transmitter tuning capacitors or output transistors.
And almost any passive filter will require a known load impedance.
If we know the inductance L in Henrys and the frequency f in Hertz, the inductive reactance XL in Ohms is:
XL = 2 Pi f L Pi = 3.14159...Separate inductors in series are additive. However, turns on the same core increase inductance as the square of the total turns. Two separate inductors in parallel have a total inductance which is the product of the inductances divided by their sum.
An inductor is typically a coil or multiple turns of conductor wound on a magnetic or ferrous core, or even just a few turns of wire in air. Even a short, straight wire has inductance. Current in the conductor creates a magnetic field, thus "storing" charge. When power is removed, the magnetic field collapses to maintain the current flow; this can produce high voltages, as in automobile spark coils.
An inductor is one "winding" of a transformer.
The simple physical model of a component which is a simple inductance and nothing else works well at low frequencies and moderate impedances. But at RF frequencies and modern digital rates, there is no "pure" inductance. Instead, each inductor has a series resistance and parallel capacitance that may well affect the larger circuit.
Also see
capacitor and
resistor.
Many proofs are informal. Because they cannot be mechanically
verified, these proofs may have unseen problems, and often do
develop and change over time. See:
Method of Proof and Refutations.
As a
rule of thumb, a cubic centimeter (cc)
of a solid has about 1024 or 1E24 atoms.
A good insulator like
quartz has only about
10 free electrons per cc., which implies that only
about one atom in 1023 (1E23) has a broken bond
(at room temperature and modest
voltage).
This gives a massive
resistance to
current flow of about
1018 (1E18) ohms across a centimeter cube.
The integers are
closed under addition and
multiplication, but not under division.
(Z,+,*) is thus an infinite
ring, but not a
field.
However, a
finite field, denoted
Z/p, can be created by doing operations
modulo a
prime.
In the United States, the basis of intellectual property law is the Constitution, in Article 1, Section 8 (Powers of Congress):
"Congress shall have power . . . To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries."
Intellectual property generally includes:
In the U.S., there are three types of patent:
In some realizations, an intermediate
block might be wired connections between layer
hardware. In the context of a general
purpose
computer, an intermediate block might
represent the movement of data between operations, or perhaps
transient storage in the original block.
+----------+ +----------+ | | INTO | Y | | X | | +----+ | | | f | |f(X)| | | | ---> | +----+ | +----------+ +----------+
g( f(x) ) = x = f-1( f(x) ).Only functions which are one-to-one can have an inverse.
(Contrast with: bijection.)f( f(x) ) = x.
A cipher which takes
plaintext to
ciphertext, and
ciphertext back to plaintext, using the exact same operation.
A polynomial form of the ever-popular "Sieve of Eratosthenes" can be used to build table of irreducibles through degree 16. That table can then be used to check any potential irreducible through degree 32. While slow, this can be a simple, clear validation of other techniques.
Also see
primitive polynomial.
Usually the main purpose of an IV is to randomize or whiten the plaintext data. As a result, plaintext (and, thus, ciphertext) repetition is made less likely, thus greatly reducing exposure to codebook attack.
Generally, an IV must be accompany the ciphertext, and so always expands the ciphertext by the size of the IV.
While it is often said that IV values need only be random-like or unpredictable, and need not be confidential, in the case of CBC mode, that advice can lead to man-in-the-middle attacks on the first plaintext block. If a MITM opponent knows the usual content of the first block, they can change the IV to manipulate that block (and only that block) to deliver a different address, or different dollar amounts, or different commands, or whatever. And while the conventional advice is to use a MAC at a higher level to detect changed plaintext, that is not always desirable or properly executed. But the CBC first-block problem is easily solved at the CBC level simply by enciphering the IV and otherwise keeping it confidential, and that can be reasonable even when a MAC will be used later.
Sometimes, iterative or repeated ciphering under different IV values can provide sufficient added keying to perform the message key function (e.g., the "iterative stream cipher" in a cipher taxonomy).
Often discussed with respect to oscillator signals. Oscillator jitter is commonly due to the small amounts of analog noise inherent in the physics of electronic circuitry, which thus affects the analog-to-digital conversion which indicates the start of each new period. This is unpredictable variation, but generally very tiny, bipolar around some mean frequency, and varies on a cycle-by-cycle basis. It cannot be accumulated over many cycles for easier sensing.
A different form of jitter occurs when a digital system uses
two or more independent oscillators or
clocks
which are not
synchronized.
In this case, one signal may slide
early or late with respect to the other, until an entire
cycle is lost or skipped.
But all this will be largely
deterministic,
based on the frequencies and phases of the different clocks.
To a large extent, this is something like two brass gears
rolling together, with a particular tooth on the smaller gear
appearing a predictable number of teeth later on the larger
gear.
The name "jitterizer" was established in section 5.5 of my 1991
Cryptologia article: "The Efficient Generation of
Cryptographic Confusion Sequences"
(locally, or @:
http://www.ciphersbyritter.com/ARTS/CRNG2ART.HTM#Sect5.5)
and is taken from the use of
an oscilloscope on digital circuits, where a signal which is not
"in sync" is said to
jitter.
Mechanisms designed to restore
synchronization are called
"synchronizers," so mechanisms designed to cause jitter
can legitimately be called "jitterizers."
Kerckhoffs, Auguste. 1883. La cryptographie militaire. Journal des sciences militaires.Various texts and papers are rather casual about what Kerchkhoffs supposedly wrote. Fortunately, the original article (in the original French) is available on-line for comparison:IX(1):5–38, IX(2):161–191.
Systems which select among an ever-increasing number of ciphers can even make it difficult for an opponent to know the full set of possible ciphers. For the opponents, being forced to find, obtain and analyze a continuing flow of new secret ciphers is vastly more expensive than simply trying another key value in a known cipher. Forcing the opponent to pay (in effort) to acquire each of many cipher designs is not a bad idea. While having many possible ciphers does not guarantee strength, it should increase the cost of attacks and thus potentially change the balance of power between user and attacker.
Kerckhoffs second requirement is also understood to discount secret ciphers, as in security through obscurity. We of course want to use only ciphers we can continue to use securely even when the cipher has been fully exposed. But we certainly can use ciphers that start out secret, even if we understand that eventually they will become exposed.
Note that the issue of secret ciphers is not stated directly by Kerckhoffs, but is instead extrapolated from what he wrote. What Kerckhoffs really says is that cipher exposure should not cause "inconvenience." But to the extent that "inconvenience" is an issue, various other ramifications appear that the crypto texts studiously ignore:
In cryptography we have various kinds of keys, including a User Key (the key which a user actually remembers), which may be the same as an Alias Key (the key for an alias file which relates correspondent names with their individual keys). We may also have an Individual Key (the key actually used for a particular correspondent); a Message Key (normally a random value which differs for each and every message); a Running Key (the confusion sequence in a stream cipher, normally produced by a random number generator); and perhaps other forms of key as well (also see key management).
In general, the value of a cryptographic key is used to
initialize the
state of a
cryptographic mechanism
such as some form of
RNG.
The RNG then may be used to create a sequence which eventually
becomes a
running key for a
stream cipher, or perhaps for a
Dynamic Transposition
block cipher.
Alternately, the RNG may be used to
shuffle and, thus, key,
Ideally, a key will be an arbitrary equiprobable selection among a huge number of possibilities (also see balance). This is the fundamental strength of cryptography, the "needle in a haystack" of false possibilities. But if a key is in some way not a random selection from a uniform distribution, but is instead biased, the most-likely keys can be examined first, thus reducing the complexity of the search and the effective keyspace.
In most cases, a key will exhibit diffusion across the message; that is, changing even one bit of a key should change every bit in the message with probability 0.5. A key with lesser diffusion may succumb to some sort of divide and conquer attack.
For practical security, it is not sufficient to simply have a large keyspace, it is also necessary to use that keyspace. Because changing keys can be difficult, there is often great temptation to assign a single key and then use that key forever. But if that key is exposed, not only are the current messages revealed, but also all other messages both past and future, and this is true for both public-key and secret-key ciphers. Using only one key makes that key as valuable as all the information it protects, and it is probably impossible to secure any key that well, especially if it is frequently used. Humans make mistakes, people change jobs and loyalties, and employees can be intimidated, tempted or blackmailed.
It is important to change keys periodically, thus decisively ending any previous exposure. Secret-key systems can make this fairly invisible by keeping an encrypted alias file and automatically translating a name or channel identifier into the current key. This supports the easy use of many secret keys, and the invisible update of those keys. Then only the keyphrase for the alias file itself need be rememebered, and that keyphrase could be changed at will. Old keys could be removed from the alias file periodically to reduce the amount of information exposed by that file, and so minimize the consequences of exposure. See: key reuse.
Even with support from the cipher system, key file maintenance is always serious and potentially complex. The deciphered alias file itself should never appear either on-screen or as a printout; it should be edited automatically or indirectly. Thus, some security officer probably needs to be in charge of updating and archiving the alias files.
For some unknown reason, some authors claim that secret key ciphers are essentially impractical. That of course flies in the face of many decades of extensive actual use of secret-key ciphers in the military. The claim seems to be that secret-key ciphers require vastly more keys to be set up and managed than public key ciphers. There is an argument there, as we shall see, but it is not a good one.
Public-key ciphering generally requires an entity beyond the ciphering parties simply to function. This is a public key infrastructure (PKI), of which the main element is a certification authority (CA). The CA distributes authenticated keys for use, and so must be set up and supported and protected as long as new keys or even mere key authentication is needed. But even with a CA, public-key misuse can lead to undetectable man-in-the-middle attacks (MITM), where the opponent reads the messages without having to break any cipher at all. With a secret key cipher at least the opponent has to actually break a cipher, which is thought to be hard.
Users can always give secret information to others. It is not the cryptography which allows exposure, since either end can give a copy to someone else no matter how the original was sent. The role of cryptography is limited to providing protected communication (or storage), and cannot prevent exposure by either user. Sharing secret information with someone else inherently implies a certain degree of trust.
In a secret-key cipher, a user at each end has exactly the same key. If only two users have a key, and one user receives a message which that key deciphers, the message can only have been sent by the other user who has that key. But there are various issues:
The remaining issue seems to be that, if everybody has to talk privately and independently of everyone else, then everybody needs a different key for everybody else. Public-key systems seem to make that easier by allowing the senders to share the public key to a single user. Ideally, fewer keys need be created. But actually getting public keys is only "easy" after a CA is established, funded, operating, and even then only if we can live with trusting a CA. Similar structures could be built to easily distribute secret keys, and may be particularly appropriate in a distributed, hierarchical business situation.
In practice, we do not need to talk to everybody else, just a small subset. And many interactions are with representatives from a common group, each of whom has access to the same secret information anyway. The group might need a different key for each "client" user, but everyone in the group could use the key for that client. Business groups might have to handle millions of keys, but public-key technology does not solve the problem, because somebody has to authenticate all those keys. If we hide that function in the CA, then we have to fund and trust the CA.
Suppose we need to communicate, privately and independently, with n people:
One other possibility is a "web of trust." In this structure, people attest to trusting someone who has a key for someone else. But even if we assume that could ever work to cryptographic levels of certainty, validating a key is only half the issue. The other part is whether we can reasonably hope to trust our secret information to someone we do not know. Public-key ciphers do not solve that problem.
An odd characteristic of public-key cryptography is that, normally, if we encipher a message to a particular user, we cannot then decipher that same message. In a business context that may be an auditing problem, since the business can only read and archive incoming messages. Absent a special design, public-key cryptography may make it impossible to document what offer was actually sent.
For those who would never consider using the short keys needed by secret-key ciphers, note that public keys must be much longer than private keys of the same strength, because a valid public key must have a very restricted form that most key values will not have. In practice, a public-key cipher almost certainly will just set up the keys for an internal secret-key cipher which actually protects the data, so the final key size will be small anyway.
Public-key technology is a tool that offers certain advantages, but those are not nearly as one-sided as people used to believe. Secret key cipher systems were functioning well in practice long before the invention of public-key technology.
Also see:
Keys should be changed periodically. But, in a corporate setting it is likely that the corporation will want to be able to review old messages which were encrypted under old keys. That either requires archiving a plaintext version of each message, or archiving the encrypted version, plus the key to decrypt it (see key storage).
Obviously, key archives could be a sensitive, high-value target,
so that keeping keys and messages on different machines may be a
reasonable precaution.
In secret key ciphers, key authentication is inherent in secure key distribution.
In public key ciphers, public keys can be exposed and delivered in unencrypted form. But someone who uses the wrong key may unknowingly have "secure" communications with an opponent, as in a man-in-the-middle attack. It is thus absolutely crucial that public keys be authenticated or certified as a separate process. Normally this implies the need for a Certification Authority (CA).
Also see:
message authentication and
user authentication.
A corporation may seek to limit the ability for users to create
their own new keys, so that corporate authorities can monitor all
business communications.
That of course implies that the corporation takes on the role of
creating and
distributing new keys, and
probably also maintains a
key archive as well as a message archive (see
key storage).
Although this problem is supposedly "solved" by the advent of the public key cipher, in fact, the necessary public key validation is almost as difficult as the original problem. Although public keys can be exposed, they must represent who they claim to represent, or a "spoofer" or man-in-the-middle can operate undetected.
Nor does it make sense to give each individual a separate
secret key, when a related group of people would have access
to the same files anyway. Typically, a particular group has the
same secret key, which will of course be changed when any member
leaves. Typically, each individual would have a secret key for
each group with whom he or she associates.
For public key ciphers, the key to be loaded will be in plaintext form and need not be deciphered. Similarly, a public key database may be unencrypted, since all the public keys are exposed anyway. So adding a public key can be just as simple as adding any other data.
For secret key ciphers, the key to be loaded will have been transported encrypted under some other key. And the user key database will be encrypted under the keyphrase for that particular user. Accordingly, a new key must first be decrypted and then encrypted under the user keyphrase. One problem here is that we want to minimize the amount of time any secret key exists in plaintext form. Of course keys will be in plaintext form during use, in which case we decipher the key only in program memory, and then zero that storage as soon as possible.
It seems desirable to avoid deciphering the entire key database
simply to insert a single new key.
One workable possibility is to add simple structure to the cipher
itself so that cipherings can be concatenated as ciphertext
(as long as they are ciphered under the same key).
Then the new key can be enciphered on its own, and simply
concatenated onto the key storage file.
The more valuable the messages, the more serious the risk from loss of the associated keys. Such loss might occur by equipment failure, accident, or even deliberate user action.
In the business case, key loss can be mitigated by maintaining
corporate
key archives, and distributing key
files to users.
Without such archives (or some alternate way to recover), a single
user equipment failure could result in the loss of critical keys
and business documents, which would virtually guarantee the end of
encryption in that environment.
Key management facilities for users typically include:
Additional key management facilities may be made available only to corporate security officers:
Even absolutely perfect keys cannot solve all problems, nor can they guarantee privacy. Indeed, when cryptography is used for communications, generally at least two people know what is being communicated. So either party could reveal a secret:
When it is substantially less costly to acquire the secret by
means other then a technical
attack on the
cipher, cryptography has
pretty much succeeded in doing what it can do.
Unfortunately, once an attack has been found and implemented as
a computer program, the incremental cost of applying that attack
may be quite small.
Obviously, we may send many messages to a particular recipient. For security reasons we do not want to use the same key for any two messages, and we also do not want to manually establish that many keys. One approach is the two-stage message key, where a random value is used to encrypt the message, and only that random value is encrypted by the main key. In hybrid ciphers, a public key component transports the random message key. Thus, the stored keys are used only to encrypt a relatively small random value or nonce, and so are very well hidden.
One way to create the random nonce would be to
bit-permute a vector of
half-1's and half-0's, by
shuffling twice. That way, even
if the cipher failed and the nonce was exposed, the keyed generator
creating the message keys would be protected (see
Dynamic Transposition) so
that the opponent will have to repeat the previous break, which
may demand both time and luck.
Under my alias file implementation, selecting the right key for use is done by entering the alias nickname for the desired person, contract, project or group. That could be the email address for the appropriate channel; those with multiple email addresses could have the same key listed under multiple aliases. When the email address is used as the alias, the desired email address can be automatically found in the message header, and the correct key automatically used.
Similarly, stored keys should have a start date, and multiple keys for the same channel will be distinguished by that date. By checking the date of the message to be decrypted, the key which was correct as of that date could be automatically selected and used, again making most key selection automatic, even for archived messages.
In practice, it is common to select the wrong key, and then the
message cannot be read. But if the message is just being accumulated
somewhere for later use, we may mistakenly discard the original
ciphertext before making sure we have the deciphered plaintext.
Accordingly, a required feature of a
cipher system is that using the wrong
key be detected and announced.
Presumably this will be done with some sort of
error detecting code such as a
CRC of the plaintext, although some cases may
demand a keyed
MAC.
For public keys, the key database can be unencrypted, and possibly even part of a larger database system.
For secret keys, the database must be encrypted, and should be as simple as possible for security reasons. One possible form is what I call an alias file. Each key is given a textual alias, which then becomes an efficient way to identify and use a particular long random key. Typically, an alias would be a nickname for a person, project, or work group, or an email address. By allowing the user to specify a short name instead of a key, long and random keys can be selected and used efficiently.
The "database" part of this could be as simple as an encrypted list of entries, with each entry having a few simple textual fields such as alias id, key value, and start date. By ordering the entries by start date, the system could search from the front of the list for the first entry matching the alias field and having a start date before the current date.
Although often honored in the breach, periodic key changes are a security requirement. We do of course assume that the cipher system will encrypt the data for each message with a random key in any case. But if everything always starts with the same key, then anyone getting that key will have access to everything, which makes that key an increasingly valuable target. To compartmentalize, and limit that risk, we must change keys periodically, even public and private keys.
A start date field supports periodic key change in a way largely invisible to the user. A routine corporate key file update would add new keys at the start of the file, with future start dates, thus not affecting the use of the current keys at all. Then, when the new date arrives, the new keys would be selected and used automatically, making the key update process largely invisible to the user and far more practical than usually thought possible.
In contrast, an end date seems much less useful, not the least because it involves a prediction of the future as to when the key may become a problem. Moreover, presumably the intended response to such a date is to stop key use, but if a new key has not been distributed, that also stops business operation, which is just not smart. So a start date is needed, but an end date is not.
Corporate key policies would produce new key files for users from time to time, with future keys added and unused keys stripped out. That also would be an appropriate time for the user to implement a new passphrase.
Also see
message archives.
Public key transport may at first seem fairly easy. Public keys do not need to be encrypted for transport, and anyone may see them. However, they absolutely must be certified to be the exact same key the sender sent. For if someone replaces the sent key with another, subsequent messages can be exposed without breaking any cipher (see man in the middle attack). The usual solution suggested is a large, complex, and expensive certification infrastructure that is often ignored. (See PKI.)
Secret key transport first involves encrypting the secret key
under a one-time keyphrase or random
nonce.
The resulting message is then hand-carried (on a floppy or CDR) or
otherwise sent (perhaps by overnight mail or package courier) to
the other end.
Then the keyphrase is sent by a different channel (perhaps by phone
or fax) to decrypt the key.
Of course, if the encrypted message is intercepted and copied, and
then the second channel intercepted as well, the secret key would
be exposed, which is why hand-delivery is best.
Fortunately, most people who are working together do meet occasionally
and then keys can be exchanged.
As soon as the transported secret key is decrypted it should
immediately be re-encrypted for secure storage.
That can and should be done without ever exposing the key itself.
A substitution table is keyed by creating a particular
ordering from each different key. This can be accomplished by
shuffling the table under the control of a
random number generator
which is initialized from the key.
Cryptography is based on the idea that if we have a huge number of keys, and select one at random, the opponents generally must search about half of the possible keys to find the correct one; this is a brute force attack.
Although brute force is not the only possible attack, it is the one attack which will always exist (except for ciphers with Perfect Secrecy). Therefore, the ability to resist a brute force attack is normally the design strength of a cipher. All other attacks should be made even more expensive. To make a brute force attack expensive, a cipher simply needs a keyspace large enough to resist such an attack. Of course, a brute force attack may use new computational technologies such as DNA or "molecular computation." Currently, 120 bits is large enough to prevent even unimaginably large uses of such new technology.
It is probably just as easy to build efficient ciphers which use
huge keys as it is to build ciphers which use small keys, and the
cost of storing huge keys is probably trivial. Thus, large keys
may be useful when this leads to a better cipher design, perhaps
with less key processing. Such keys, however, cannot be considered
better at resisting a brute force attack than a 120-bit key, since
120 bits is already sufficient.
On a PC-style computer, a processor internal to the keyboard maintains the state of all keys (up or down). The keyboard processor also continuously scans through each possible key (perhaps every 2 msec) and reports to the PC any key which has just gone up or down. Keys are reported by position or "scan code;" the keyboard processor does not use ASCII.
Measuring keystroke timings is a common way of collecting
supposedly unknowable information for a
really random generator.
However, even though a PC computer can measure events very closely,
the keyboard scan process inherently quantizes keystrokes at a
far more coarse resolution.
And, when measuring software-detected events, various PC system
things like hardware interrupts and OS task changes can provide
substantial variable latency which is nevertheless
deterministic.
One model of a cipher is a key-selected mathematical function or transformation between plaintext and ciphertext. To an opponent this function is unknown, and one of the best ways to address an unknown function is to look at both the input and output. More than that, even though an opponent has ciphertext, something must be known about the plaintext or an opponent has no way to measure attack success.
Public key ciphers allow opponents to create known-plaintext. Thus, public key ciphers force us to assume they will resist known-plaintext attacks, even though that may or may not be correct. However, most so-called "public key" ciphers do not protect actual data with a public key system, but are in fact hybrid ciphers, where the public key system is used only to transfer a key for a conventional secret key cipher.
The cryptanalytic literature on secret-key ciphers is rife with attacks which depend upon known-plaintext, and secret-key ciphers are still used for almost all data ciphering. Virtually all secret-key ciphers are best attacked with known-plaintext to the point that describing cipher weakness almost universally means some number of cipherings and some amount of known-plaintext. For example, Linear Cryptanalysis normally requires known-plaintext, while Differential Cryptanalysis generally requires the even more restrictive defined plaintext condition.
If modern cipher designers do not talk much about known-plaintext, that may be because designers think that:
On the other hand, some aspect of the plaintext must be known, or it will be impossible to know when success has been achieved. Consequently, it is hard to imagine a situation in which actual known-plaintext would not benefit cryptanalysis. Since huge amounts of known-plaintext are needed for current attacks, that much exposure may be preventable at the cipher system level. And, since attacks only get better over time it would seem only prudent to hide as much known-plaintext as possible.
It is surprisingly reasonable that an opponent might have a modest amount of known plaintext and the related ciphertext: That might be the return address on a letter, a known report or newspaper account, or even just some suspected words. Sometimes a cryptosystem will carry unauthorized messages such as birthday greetings which are then discarded in ordinary trash, due to their apparently innocuous content, thus potentially providing a small known-plaintext example. (It is harder to see how really huge amounts of known-plaintext might escape, but one possibility is described in security through obscurity.)
Unless the opponents know something about the plaintext, they will be unable to distinguish the correct deciphering even when it occurs. Hiding all structure in the plaintext thus has the potential to protect messages against even brute force attack; this is essentially a form of Shannon-style Ideal Secrecy.
One approach to making plaintext "unknown" would be to
pre-cipher the plaintext, thus hopefully producing an unstructured
ciphertext which would prevent success when attacking the second
cipher.
In fact, each cipher would protect the other.
Successful attacks would then have to step through both
ciphering keys instead of just one, which should be exponentially
more difficult.
This is one reason for using
multiple encryption.
Also see the known plaintext discussion
"Known-Plaintext and Compression"
(locally, or @:
http://www.ciphersbyritter.com/NEWS6/KNOWNPLN.HTM).
A known plaintext attack typically needs both the plaintext value sent to the internal cipher and the resulting ciphertext. Typically, a large amount of plaintext is needed under a single key. A cipher system which prevents any one of the necessary conditions also stops the corresponding attacks.
Known plaintext attacks can be opposed by:
Although known plaintext per se is not always needed to attack a cipher, some aspect of knowledge about the plaintext is absolutely required to know when any attack has succeeded (see ciphertext only attack). Also see defined plaintext attack.
A known plaintext attack is especially dangerous to the
conventional
stream cipher with an
additive combiner, because the
known plaintext can be "subtracted" from the ciphertext, thus
completely exposing the
confusion sequence.
This is the sequence produced by the cryptographic
random number generator,
and can be used to attack that generator.
Such attacks can be complicated or defeated by ciphers using a
nonlinear
combiner, like a
keyed
Latin square, or a
Dynamic Substitution
Combiner instead of the usual additive combiner.
Unfortunately, different abilities to sense and use deep structure or correlations in a sequence can make major differences in the complexity value. In general, we cannot know if we have found the smallest program.
One of many possible measures of sequence complexity; also see:
linear complexity and
entropy.
There are actually at least three different K-S statistics, and two different distributions:
The one-sided statistics are:
Dn+ = MAX( S(x[j]) - F(x[j]) )
= MAX( ((j+1)/n) - F(x[j]) )
Dn- = MAX( F(x[j]) - S(x[j]) )
= MAX( F(x[j]) - (j/n) )
where "MAX" is computed over all j from 0 to n-1.
Both of these statistics have the same distribution.
The two-sided K-S statistic is:
Dn* = MAX( ABS( S(x[j]) - F(x[j]) ) )
= MAX( Dn+, Dn- )
and this has a somewhat different distribution.
This "side" terminology is standard but unfortunate, because every distribution has two sides which we call "tails." And every distribution can be interpreted in one-tailed or two-tailed ways.
Here, the "side" terminology refers to different statistic computations: the "highest" difference, the "lowest" difference, and the "greatest" difference between two distributions. Thus, "side" refers not to the tails of a distribution, but to values being either above or below the reference. If we knew that only one direction was news, we could use the appropriate "one-sided" statistic, and still interpret the results in a "two-tailed" way.
For example, if we used the "highest" test, we could detect large positive differences from the reference distribution (if the p-value was near 1.0) and also detect a distribution which was unusually close to the reference (if the p-value was near 0.0). Obviously, the "highest" test would hide negative differences.
On the other hand, if we compute both the "highest" and "lowest" statistics, we cover all the information (and slightly more) that we could get from the "two-sided" or "greatest" statistic.
We can base a hypothesis test of any statistic results on critical values on either or both tails of the null distribution, depending on our concerns. One problem with a "two-tailed" interpretation is that we accumulate critical region on both ends of the distribution, even though the ends are not equally important.
Normally, the p-value we get from a statistic comparing distributions is the probability of that value occurring when both distributions are the same. Finding p-values near zero or one is odd. Repeatedly finding p-values too close to zero shows that the distributions are unreasonably similar. Repeatedly finding p-values too close to one shows that the distributions are different. And we do not need critical-value trip-points to highlight this.
Knuth II multiplies Dn+, Dn- and Dn* by SQRT(n) and calls them Kn+, Kn- and Kn*, so we might say that there are at least six different K-S statistics.
The one-sided K-S distribution is easier to compute precisely, especially for small n and across a wide range (such as "quartile" 25, 50, and 75 percent values), and may be preferred on that basis. There is a modern evaluation for the two-sided K-S distribution which should be better than the old versions, but usually we do not need to accept its limitations. Often the experimenter can choose to use tests which are more easily evaluated, and for K-S, that would be the "one-sided" tests.
See the Kolmogorov-Smirnov section of "Normal, Chi-Square and Kolmogorov-Smirnov Statistics Functions in JavaScript" page (locally, or @: http://www.ciphersbyritter.com/JAVASCRP/NORMCHIK.HTM#KolSmir).
Consider three cold slices of pizza: We might put all three on
a plate and heat them in a microwave in 3 minutes. That would be
3 minutes from thought to mouth, a
Or consider a mixing cipher, which typically needs log n mixing sub-layers to mix n elements (i.e., n log n operations). The latency is the delay from the time we start computing until we get the result. So if we double the number of elements, we also double the necessary computation, plus another sub-layer. Thus, in software, when we double the block size, the latency increases somewhat, and the data rate decreases somewhat (but see huge block cipher advantages).
In hardware, things are much different: Even though the larger computation is still needed, that can be provided in separate on-chip hardware for each sub-layer. Typically, each sub-layer may take a single clock cycle to perform the computation. So if we double the block size, we need another sub-layer, and do gain one more clock cycle of latency before a particular block pops out. But the data rate is still a full block per cycle, and stays that, no matter how wide the block may be. In hardware, when we double the block size, we double the data rate, giving large blocks a serious advantage.
In the past, hardware operation delay has largely been dominated by the time taken for gate switching transistors to turn on and off. Currently, operation delay is more often dominated by the time it takes to transport the electrical signals to and from gates on long, thin conductors.
The effect of latency on throughput can often be reduced by
pipelining or partitioning the main
operation into many small sub-operations, and running each of
those in parallel, or at the same time.
As each operation finishes, that result is
latched and saved temporarily, pending the availability of the
next sub-operation hardware.
Thus, throughput is limited only by the longest sub-operation
instead of the overall computation.
2 0 1 3 1 3 0 2 0 2 3 1 3 1 2 0
Since each row contains the same symbols, every possible row
can be created by re-arranging or
permuting the n symbols into
n! possible rows.
At order 4 there are 24 possible rows.
A naive way to build a Latin square would be to choose each row of
the square from among the possible rows, and that way we can build
The following square is cyclic, in the sense that each row below the top is a rotated version of the row above it:
0 1 2 3 1 2 3 0 2 3 0 1 3 0 1 2This is a common way to produce Latin squares, but is generally undesirable for cryptography, since the resulting squares are few and predictable.
It is at least as easy to make a more general Latin square as it is to construct an algebraic group: The operation table of any finite group is a Latin square, as is the addition table of a finite field. Conversely, while some Latin squares do represent associative operations and can form a group, most Latin squares do not. At order 4 there are 576 Latin squares, but only 16 are associative (about 2.8 percent). So non-associative (and thus non-group) squares dominate heavily, and may be somewhat more desirable for cryptography anyway.
A Latin square is said to be reduced or normalized or in standard form when the symbols are in lexicographic order across the top row and down the leftmost column. Any Latin square of any order reduces through a single rows and columns re-arrangement into exactly one standard square.
A Latin square is reduced to standard form in two steps: First, the columns are re-arranged so the top row is in order. Since that places the first element of the leftmost column in standard position, only the rows below the top row need be re-arranged to put the leftmost column in order. (Alternately, the rows can be re-arranged first and then the columns to the right of the leftmost column; the result is the same standard square.)
The 576 unique Latin squares of order 4 include exactly 4 squares in standard form:
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 1 2 3 0 1 3 0 2 1 0 3 2 1 0 3 2 2 3 0 1 2 0 3 1 2 3 1 0 2 3 0 1 3 0 1 2 3 2 1 0 3 2 0 1 3 2 1 0
A Latin square of order n can be
shuffled or expanded into, and so
can represent,
At order 4, by permuting 4 rows and 3 columns, each standard
square expands to
Also see Latin square combiner and orthogonal Latin squares.
And see:
A Latin square combiner produces a balanced and nonlinear, yet reversible, combining of two values. The advantage is the ability to have a huge number of different, yet essentially equivalent, combining functions, thus preventing opponents from knowing which combining function is in use. To exploit this advantage, we must both create and use keyed Ls's. One efficient way to do that is what I call the checkerboard construction, which I also describe in my article: "Practical Latin Square Combiners" (locally or @: http://www.ciphersbyritter.com/ARTS/PRACTLAT.HTM). Also see Latin square and orthogonal Latin squares for other articles.
A Latin square combiner can be seen as the generalization of the exclusive-OR mixing concept from exactly two values (a bit of either 0 or 1) to any number of different values (e.g., bytes). A Latin square combiner is inherently balanced, because for any particular value of one input, the other input can produce any possible output value. A Latin square can be treated as an array of substitution tables, each of which is invertible, and so can be reversed for use in a suitable extractor. As usual with cryptographic combiners (including XOR), if we know the output and a specific one of the inputs, we can extract the value of the other input.
For example, a tiny Latin square combiner might combine two 2-bit values each having the range zero to three (0..3). That Latin square would contain four different symbols (here 0, 1, 2, and 3), and thus be a square of order 4:
2 0 1 3 1 3 0 2 0 2 3 1 3 1 2 0
With this square we can combine the values 0 and 2 by selecting the top row (row 0) and the third column (column 2) and returning the value 1.
When extracting, we will know a specific one (but only one) of the two input values, and the result value. Suppose we know that row 0 was selected during combining, and that the output was 1: We can check for the value 1 in each column at row 0 and find column 2, but this involves searching through all columns. We can avoid this overhead by creating the row-inverse of the original Latin square (the inverse of each row), in the well-known way we would create the inverse of any invertible substitution. For example, in row 0 of the original square, selection 0 is the value 2, so, in the row-inverse square, selection 2 should be the value 0, and so on:
1 2 0 3 2 0 3 1 0 3 1 2 3 1 2 0
Then, knowing we are in row 0, the value 1 is used to select the second column, returning the unknown original value of 2.
A practical Latin square combiner might combine two bytes,
and thus be a square of order 256, with 65,536 byte entries. In
such a square, each 256-element column and each 256-element row
would contain each of the values from 0 through 255 exactly
once.
Layers can be
confusion layers (which simply change
the block value),
diffusion layers (which propagate
changes across the block in at least one direction) or both.
In some cases it is useful to do multiple operations as a
single layer to avoid the need for internal temporary storage
blocks.
Kullback,S. 1938. Statistical Methods in Cryptanalysis. (Reprinted by Aegean Park Press.)but the following table is original for the Crypto Glossary.
At the top of the table are letters in a general rank ordering, most-common at the left and least-common at the right. Some of the variation possible in different sets of messages or text is shown by the different ranks a given letter may have. From this we might conclude that N, R, O, A, I should be treated as a group, while S may (or may not) be unique enough to identify specifically from the usage rank in a message.
E T N R O A I S D L H C P F U M Y G W V B X Q K J Z
1 1 | | | | | | | | | | | | | | | | | | | | | | | | |
2 .. 2 | | | | | | | | | | | | | | | | | | | | | | | |
3 .... 3 3 | | | | | | | | | | | | | | | | | | | | | |
4 .... 4 4 4 4 | | | | | | | | | | | | | | | | | | | |
5 .....5 5 5 5 5 | | | | | | | | | | | | | | | | | | |
6 ...... 6 6 6 6 | | | | | | | | | | | | | | | | | | |
7 .......7 7 7 7 | | | | | | | | | | | | | | | | | | |
8 .............. 8 | | | | | | | | | | | | | | | | | |
9 ................ 9 | 9 | | | | | | | | | | | | | | |
10 ................. 10 0 | | | | | | | | | | | | | | |
11 ................. 11 1 1 | | | | | | | | | | | | | |
12 ................... 12 2 2 | | | | | | | | | | | | |
13 ..................... 13 3 3 3 | | | | | | | | | | |
14 ..................... 14 4 4 4 4 | | | | | | | | | |
15 ..................... 15 5 5 5 5 | | | | | | | | | |
16 ..........................16. 16 | | | | | | | | | |
17 ............................... 17 7 | | | | | | | |
18 ............................... 18 8 8 8 | | | | | |
19 ............................... 19 9 9 9 | | | | | |
20 ............................... 20 0 0 0 0 | | | | |
21 ..................................... 21 1 | | | | |
22 ......................................... 22 2 2 | |
23 ......................................... 23 3 3 3 |
24 ......................................... 24 4 4 4 |
25 ........................................... 25 5 5 5
26 ........................................... 26 ... 6
In scientific argumentation honesty is demanded. Lies cannot further the cause of scientific insight and conclusion. However, lies can waste the time of everyone involved, not just for the time of the discussion, but potentially years of effort by many people, based on false assumptions.
Personally, I take an accusation of lying very seriously. Absent a public apology, my response is to end my interaction with that person. That is what I do, and that is what I think everyone should do.
Some people think that failing to defend against even the most heinous assertion is a sign of weakness or even an admission of guilt. But when a person is accused of lying, the accused immediately knows whether the accusation is correct or not. And if not, the accuser has just shown themselves as a liar or a fool. There is no need to separate these possibilities. In either case there is no reason to further dignify whatever points they wish to present.
A participant in a discussion has no responsibility to respond,
no matter what
claim an opponent may make.
Instead, is the responsibility of the claimant to present logical
arguments or
proof, as opposed to a mere possibility or
belief.
Simply making a claim then demanding that it be accepted if it cannot
be proven false is the
ad ignorantium fallacy.
Suppose we have a cipher to analyze which has some unknown internal function: If that function is random, with no known pattern between input and output or between values, we may have to somehow traverse every possible input value before we can understand the full function.
But if there is a simple pattern in the function values, then we may only need a few values to predict the entire function. And a linear function is just about the simplest possible pattern, which makes it almost the weakest possible function.
There are at least two different exploitable characteristics of cryptographic linearity:
One math definition of linearity is: function f : G -> G is linear in field G if:
f(a*x + b*y) = a*f(x) + b*f(y)for any a, b, x, y in G. To be linear, function f is thus usually limited to the restrictive form:
f(x) = axthat is, "multiplication" only, with no "additive" term. Functions which do an additive term; that is, in the form:
f(x) = ax + bare thus technically distinguished and called affine. Affine functions are virtually as weak as linear functions, and it is very common to casually call them "linear."
Another definition of linearity is:
1) f(0) = 0 2) f(ax) = a * f(x) 3) f(a + b) = f(a) + f(b)where (1) apparently distinguishes linear from affine.
It is also possible to talk about linearity with respect to an "additive group." A function f: G -> H is linear in groups G and H (which have addition as the group operation) if:
f(x + y) = f(x) + f(y)for any x,y in G.
There are multiple ways a relationship can be linear: One way is to consider a, x, and b as integers. But the exact same bit-for-bit identical values also can be considered polynomial elements of GF(2n). Integer addition and multiplication is linear in the integers, but when seen as mod 2 operations, the exact same computation producing the exact same results is not linear. In this sense, linearity is contextual.
Moreover, in cryptography, the issue may not be as much one of strict mathematical linearity as it is the "distance" between a function and some linear approximation (see rule of thumb and Boolean function nonlinearity). Even if a function is not technically linear, it may well be "close enough" to linear to be very weak in practice. So even a mathematical proof that a function could not be considered linear under any possible field would not really address the problem of linear weakness. A function can be very weak even if technically nonlinear.
True linear functions are used in ciphers because they are easy
and fast to compute, but they are also exceedingly weak.
Of course
XOR is linear and trivial, yet is used all the
time in arguably
strong ciphers;
linearity only implies weakness when an
attack can exploit that linearity.
Clearly, a conventional
block cipher design using linear
components must have nonlinear
components to provide strength, but linearity, when part of a
larger system, does not necessarily imply weakness.
In particular, see
Dynamic Transposition,
which ciphers by
permutation.
In general, there is a linear algebra of permutations, but that
seems to be not particularly useful when a different permutation
is used on every block, and when the particular permutation used
cannot be identified externally.
One of many possible complexity measures; also see Kolmogorov-Chaitin complexity and entropy.
Also see my article: "Linear Complexity: A Literature Survey,"
locally, or @:
http://www.ciphersbyritter.com/RES/LINCOMPL.HTM.
When the LC approximation equations include both plaintext and
ciphertext bits, they obviously require at least
known plaintext for evaluation,
with sufficient data to exploit the usually tiny bias.
LC typically also requires knowing the contents of the internal
Accordingly, most LC attacks are prevented by the simple use of
keyed
LC attacks also can be addressed at the cipher system level, by:
Also see: "Linear Cryptanalysis: A Literature Survey,"
locally, or @:
http://www.ciphersbyritter.com/RES/LINANA.HTM.
y = a1x1 + a2x2 + . . . + anxn
A first
degree
equation, such as: In an n-element shift register (SR), if the last element is connected to the first element, a set of n values can circulate around the SR in n steps. But if the values in two of the elements are combined by exclusive-OR and that result connected to the first element, it is possible to get an almost-perfect maximal length sequence of 2n-1 steps. (The all-zeros state will produce another all-zeros state, and so the system will "lock up" in a degenerate cycle.) Because there are only 2n different states of n binary values, every state value but one must occur exactly once, which is a statistically-satisfying result. Moreover, the values so produced are a perfect permutation of the counting numbers (1..2n-1).
A Linear Feedback Shift Register
+----+ +----+ +----+ +----+ +----+ "a0"
+-<-| a5 |<---| a4 |<-*-| a3 |<---| a2 |<---| a1 |<--+
| +----+ +----+ | +----+ +----+ +----+ |
| v |
+------------------> (+) ----------------------------+
1 0 1 0 0 1
In the figure we have a LFSR of
degree 5, consisting of 5 storage
elements
a[1][t+1] = a[5][t] + a[3][t]; a[2][t+1] = a[1][t]; a[3][t+1] = a[2][t]; a[4][t+1] = a[3][t]; a[5][t+1] = a[4][t];
Normally the time distinction is ignored, and we can write more generally, for some feedback polynomial C and state polynomial A of degree n:
n
a[0] = SUM c[i]*a[i]
i=1
The feedback polynomial shown here is 101001, a degree-5 poly
running from
LFSR's are often used to generate the
confusion sequence for
stream ciphers, but this is very
dangerous: LFSR's are inherently
linear and thus weak. Knowledge of the
feedback polynomial and only
n element values (from
known plaintext) is sufficient
to run the sequence backward or forward.
And knowledge of only 2n elements is sufficient to develop an
unknown feedback polynomial (see:
Berlekamp-Massey).
This means that LFSR's should not be used as stream ciphers without
in some way isolating the sequence from analysis. Also see
jitterizer and
additive RNG.
log2(x) = log10(x) / log10(2) log2(x) = ln(x) / ln(2)
Since math is based on logic, it is not supposed to be possible for math to support fuzzy or incorrect reasoning. However, math does exactly that in practical cryptography. (See some examples at proof and old wives' tale.) The problem seems to be a false assumption that math is cryptography, so whatever math proves must apply in practice. But math only is cryptography in theoretical systems for theoretical data; in real systems, the necessary assumptions can almost never be guaranteed, which means the conclusions are no longer proven. It is logically invalid (and extremely dangerous) to imagine that unproven conclusions can provide confidence in a real system.
The science of logic is intended to force reasoning into patterns
which always produce a correct conclusion from initial assumptions.
Of course, even an invalid
argument can sometimes produce a correct
conclusion, which can deceive us into thinking the argument is valid.
A valid argument must always produce a correct
conclusion.
See:
subjective,
objective,
contextual,
absolute,
inductive reasoning,
deductive reasoning,
and especially
fallacy.
Also see:
argumentation,
premise and
conclusion;
scientific model,
hypothesis and
null hypothesis.
INPUT NOT
0 1
1 0
INPUT AND OR XOR
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 1 0
These Boolean values can be stored as a
bit,
and can be associated with 0 or 1, FALSE or TRUE, NO or YES,
etc. Also see:
gate and
DeMorgan's Laws.
In TTL-compatible devices, a logic zero input must be 0.8 volts or lower, and a logic one input must be 2.0 volts or higher. That leaves the range between 0.8 and 2.0 volts as invalid. When a logic device has an invalid voltage on some input, it is not guaranteed to perform the expected digital function. In particular, attempts to latch invalid voltage levels can lead to metastability problems.
Note that different logic families have different valid signal
ranges:
74 L H S LS AS ALS F HC HCT AC ACT
VOH 2.4 2.4 2.4 2.7 2.7 2.5 2.5 2.5 3.8 4.3 4.4 4.4
VIH 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.2 2.0 3.2 2.0
VIL 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.9 0.8 1.4 0.8
VOL 0.4 0.3 0.4 0.5 0.4 0.5 0.4 0.5 0.3 0.3 0.1 0.1
Clk 35 3 50 100 50 125 35 150 24 25 155 160
Output High, Input High, Input Low, Output Low; Clk = max. FF toggle MHz
NOTE: different texts and vendors list different values
These are
component
specifications
that characterize the situation of a logic output pin connected
to logic input pins.
Assuming that the supply voltage, ambient temperature, loading
and time delays are all within specified limits, the output
voltage is guaranteed to be either
higher than VOH (for a one),
or lower than VOL (for a zero).
These output values are beyond the levels needed for inputs
to sense a particular logic level (either VIH or
VIL).
As a result, a system can have 300 or 400 mV of noise (from the
power supply, ground loops, inductive and capacitive pickup) yet
still sense the correct logic value.
In this way, large digital systems can be built to perform
reliably. (Also see:
system design.)
By far the most serious man-in-the-middle problems are public key issues. In this sort of attack, the opponent arranges for the user to encipher with the key to the opponent, instead of the key to the far end. All long, random keys look remarkably alike, but using a key from the opponent allows the opponent to decipher the message, which completely exposes the plaintext. The opponent then re-enciphers that plaintext under the key to the far end and sends along the resulting ciphertext so neither end will know anything is wrong.
The public-key MITM attack targets the idea that many people will send their public keys on the network. The bad part of this is a lack of public-key certification. Unless public keys are properly authenticated by the user, the MITM can send a key just as easily, and pretend to be the other end. Then, if one uses that key, one has secure communication with the opponent, instead of the far end. So a message to the desired party goes through the opponent, where the message is deciphered, read, and re-enciphered with the correct key for the far end. In this way, the opponent quickly reads the exact conversation with minimal effort and without breaking the cipher per se, or cryptanalysis of any sort, yet neither end sees anything suspicious.
All of this depends on the opponent being able to intercept and change the message in transit. The original cryptographic model related to radio transmission and assumed that an opponent could listen to the ciphertext traffic, and perhaps even interfere with it, but not that messages could be intercepted and completely hidden. Unfortunately, message interception and substitution is a far more realistic possibility in a store-and-forward computer network than the radio-wave model would imply. Routing is not secure on the Internet, and it is at least conceivable that messages between two people are being routed through connections on the other side of the world. This property might well be exploited to make such messages flow through a particular computer for special processing. Again, neither end would see anything suspicious.
Perhaps the worst part of this is that a successful MITM attack does not involve any attack on the actual ciphering. Even a mathematical proof of the security of a particular cipher would be irrelevant in a system which allows MITM attacks.
A related (but very limited) version of an MITM attack can occur in the first block of CBC block cipher operating mode. In CBC, the IV is exclusive-ORed with the plaintext of the first block during deciphering. So if the opponents can somehow change the IV in transit, they can also change the resulting first block plaintext. And if the opponents know what plaintext was sent (perhaps a logo, or a date, or a name, or a command, or even a fixed dollar figure), they can change it to anything they want (in the first block).
Since the problem exploited in public-key MITM attacks is a lack of authentication, one might jump to the conclusion that all MITM attacks are authentication problems. But the authentication needed for public-key MITM attacks is not the authentication of an IV, nor even the authentication of plaintext, but instead the authentication of the public key itself. Key authentication is a fundamentally different issue than the CBC IV and first block problem.
Unlike public-key MITM attacks, the problem with the CBC first block is not a lack of authentication, but rather a lack of confidentiality for the IV. It is the lack of confidentiality which allows the IV value to be usefully manipulated (see CBC). That makes CBC first-block MITM a cipher-level problem, something appropriately solved below the cipher system level. It is often said that an IV can simply be transmitted in the open, but it is exactly that exposure which enables the first-block CBC MITM problem.
The way to avoid CBC first-block MITM problems is to encipher the IV instead of exposing it. Alternately, a higher-level MAC could be used to detect any changes in the plaintext.
The way to avoid public-key MITM attacks is to certify the keys, but this is inconvenient and time-consuming. So, unless the cipher system actually requires keys to be certified, this is rarely done. The worst part is that a successful MITM attack consumes few resources, need not "break" the cipher proper, and may provide just the kind of white-collar desktop intelligence a bureaucracy would love.
It is interesting to note that, regardless of how inconvenient
it may be to share keys for a
secret-key cipher, this is an
inherent authentication which prevents the horribly complete exposure
of public-key MITM attacks.
f: X -> Y ,
the mapping or
function or transformation or
rule
f takes any value in the
domain X into some value in the
range, which is contained in Y.
For each element x in X, a mapping associates a single
element y in Y.
Element f(x) in Y is the image of element
x in X.
If no two values of x in X produce the same result f(x), f is one-to-one or injective.
f-1(f(x)) = x.
A mathematical model consisting of some number of states, with
transitions between states occurring at random but with some
associated probability, and a symbol associated with each state
which is output when that state is entered. One specific type
is the
ergodic process.
One example is a
random walk.
Also see the non-random
finite state machine,
stationary process and
hidden Markov model.
Mathematics is based on the correct logic of argumentation which is studied in philosophy. As such, it should be impossible (or at least exceedingly embarrassing) for math to support invalid conclusions. But I claim that happens all the time in cryptography. The problem seems to be a failure to recognize the distinction between theory and practice. In particular, many common proof assumptions are simply impossible to guarantee in practice, which means the proof results cannot be relied upon.
Despite claims to the contrary, cryptography is not mathematics! Instead, math is a general modeling tool. Cryptography is an applications area which applied math can model. But, in any field, the utility of results always depends upon the extent to which some model corresponds to reality. There is nothing new about this: The distinction between theory and practice is pervasive in science. Learning the meaning of modeling, how to apply models, that models have limits, and what to do outside a model, are fundamental parts of a scientific education. Clearly, cryptographic models must first correctly model reality before they can be said to apply to reality. The need for model validation is often unmentioned or forgotten or just considered irrelevant in the rush to glorify a new crypto math proof.
Cryptography is simply different from most fields that use
math models.
In practice, the worth of a real cryptosystem often depends upon
point of view:
The various ciphering
In mathematics it is common to say things like: "Let us assume that we have situation x; now let's prove what that would mean for other things." In this way, absolute logical requirements are easily handwaved into mathematical existence. But actually achieving those requirements in practice is another story. Normally, math requirements are not prescriptive; that is, they do not describe how a property is to be provably obtained, only that it somehow be obtained. Math normally provides no assurance that a handwaved property even can be achieved in practice. But if the needed property cannot be achieved, then, obviously, all attempts to achieve it are, and have been, a useless waste of time. That can be somewhat irritating to those who seek the practical use of math results in real systems.
If, for example, math assumes the existence of an unpredictable random number generator, cryptographic Perfect Secrecy can be proven. But in practice, we can guarantee no such thing. Oh, we can build generators that seem pretty unpredictable (see really random), but finding an absolute guarantee that the actual machines are unpredictable at the time they are used seems beyond our reach.
Without an absolute guarantee that each and every assumption has been identified and is simultaneously achieved in practice, a supposed "proof" is not even a complete logical argument. Such a "proof" is formally incomplete in practice, and so technically concludes nothing at all. In cryptography, theoretical proofs thus tend to create unfounded belief, something both science and mathematics should be working hard to avoid and debunk.
Almost no theoretical math security proofs apply in practice, yet most math-oriented crypto texts seem to say they do (see, for example, the one time pad proofs). In my view, that means mathematical cryptography has not yet been forced to address the distinction between theory and practice. If mathematical cryptography is to apply in reality, it must be as an applied discipline, especially those parts which are now largely mathematical illusion.
Alas, mathematical cryptography seems not very concerned about the situation, since things could be done differently but are not. One approach might be to use only properties which provably can be achieved in practice. That would, of course, greatly restrict mathematical "progress," but only if we take "progress" to include useless results. Until the math guys really do get concerned about reality, they necessarily leave it up to the individual practitioner to identify those theoretical results that do not apply in practice. In this way, ordinary workers in the field are being required to reason better than the math guys themselves. Yet sloppy reasoning is how some of the most commonly-held views in cryptography can be simply false (see old wives' tales).
Difficulties also exist in taking mathematical experience and applying that to cryptography:
On the other hand, mathematics is irreplaceable in providing the tools to pick out and describe structure in apparently strong cipher designs. (See, for example, Boolean function nonlinearity and my comments on experimental S-box nonlinearity measurement.) Mathematics can identify specific strength problems, and evaluate potential fixes. But there appears to be no real hope of evaluating strength with respect to every possible attack, even using mathematics.
Although mathematical cryptography has held out the promise
of providing provable
security, in over 50 years of work,
no practical cipher has been generally accepted as having
proven
strength. See, for example:
one time pad and
proof.
Originally a
linear feedback shift register
(LFSR) sequence of 2n-1 steps
produced by an n-bit-wide
shift register.
This means that every
binary value the register can hold, except
zero, will occur on some step, and then not occur again until all
other values have been produced. A maximal-length LFSR can be
considered a binary counter in which the count values have been
shuffled or
enciphered.
The sequence from a normal binary counter is perfectly
balanced and
the sequence from a maximal-length LFSR is almost perfectly
balanced. Also see
M-sequence.
MDS codes are supposed to be useful in the
wide trail strategy for
conventional
block cipher design.
Apparently MDS codes can be used to make some minimum number of
However, MDS codes are not applicable to all designs, nor are
such codes needed for optimal ciphering.
The obvious counterexample is my
mixing cipher designs, which use small
Balanced Block Mixing
operations (actually,
orthogonal Latin squares).
The
oLs's are arranged into
scalable FFT-like structures to
mix every input byte into every output byte, and to diffuse even a
small input change across each and every output.
Similar computations include the
geometric mean
(the nth root of the product of n values, used for average rate
of return) and the
harmonic mean
A mechanism can be seen as a process or an implementation for
performing that process (such as
electronic
hardware,
computer
software, hybrids, or the like).
Although perhaps looked down upon by those of the mathematical
cryptography persuasion, mechanistic cryptography certainly does
use mathematics to design and predict performance. But rather
than being restricted to arithmetic operations, mechanistic
cryptography tends to use a wide variety of mechanically-simple
components which may not have concise
mathematical descriptions. Rather than simply implementing a
system of math expressions, complexity is
constructed from the various efficient components available to
digital computation.
However, words are how we discuss facts, and semantics is the meaning of those words, making semantics somewhat more important than "mere." Finding that a discussion is not using the expected meaning of a term, or that multiple different meanings are being used simultaneously, shows that the discussion itself is in trouble. (See argumentation, logic and fallacy.)
For example:
Mersenne Primes:
2 107 9689 216091
3 127 9941 756839
5 521 11213 859433
7 607 19937 1257787
13 1279 21701 1398269
17 2203 23209 2976221
19 2281 44497 3021377
31 3217 86243 6972593
61 4253 110503 13466917
89 4423 132049
A message can be seen as sequence of values. Consequently, it may seem that the only cryptographic manipulations possible would be either to change values (substitution) or change position (transposition). However, it is also possible to include meaningless values in or between messages (as in nulls). A variation is to intermix several different messages as in a braid or grille. Another possibility is to collect groups of symbols and change those values as a unit, which is technically a code.
Note that message length is rarely hidden by
ciphers.
One way of hiding length is by continuous transmission with
nulls between messages.
Naturally, it is then necessary to identify the start and end
of each message, which may involve various
synchronization techniques.
Another way of hiding length is to use nulls to expand the message
by a random amount.
Fortunately, if we have all the keys ever used for some channel, a start date for each is sufficient to define the period of activity for a particular key. So if we know the message date, we can select the key which was active on that date. With email, we might parse the header for the message date, and so automatically select the correct old key.
One possibility for local user archives is to decrypt all messages when received, and accumulate the plaintext files. A modified version of that is to re-encrypt each message under the user's own keyphrase, although that could be a problem when it comes time to change that keyphrase, or if the user leaves. Yet another alternative might be to simply archive each message as received, even encrypted, and then the select old keys needed to access old encrypted messages.
All these user-centric approaches have in common the problem that the user's archives are assumed to be intact, that the computer has not crashed or been deliberately erased. That is an assumption a corporation may not be prepared to accept.
In contrast, corporate security policies may want to archive all messages, from and to all users. When the corporation is the source of all keys (see key creation), it would have all the alias files and all the old keys for each user. To select the correct key, we need to identify the user, the email address or name (the "alias") for the far end, and the message date. Again, email message headers can be parsed for this information, especially if users follow reasonable alias protocols. (Note that alias protocols may the least part of dealing with sensitive information and encryption.) If each message is kept in a different file, that file can be given the appropriate date for that message which then provides another source of the date for key selection.
Since any decent
cipher system should report the use
of the wrong key, all new encrypted messages could be automatically
checked for correct decryption.
Action could then be taken quickly if the appropriate keys were
not being used.
To a large extent, message authentication depends upon the use of particular keys to which opponents should not have access. Simply receiving an intelligible message thus indicates authentication. But automatic authentication may require added message redundancy in the form of a hash value that can be checked upon receipt, similar to message integrity.
Another approach to message authentication is to use an authenticating block cipher; this is often a block cipher which has a large block, with some "extra data" inserted in an "authentication field" as part of the plaintext before enciphering each block. The "extra data" can be some transformation of the key, the plaintext, and/or a sequence number. This essentially creates a homophonic block cipher: If we know the key, many different ciphertexts will produce the same plaintext field, but only one of those will have the correct authentication field.
The usual approach to authentication in a
public key cipher is to encipher
with the private key. The resulting ciphertext can then be
deciphered by the public key, which anyone can know. Since even
the wrong key will produce a "deciphered" result, it is also
necessary to identify the resulting plaintext as a valid message;
in general this will also require redundancy in the form of a hash
value in the plaintext. The process provides no
secrecy, but only a person with access to
the private key could have enciphered the message. Also see:
key authentication and
user authentication.
Although widely touted and used, a MAC is hardly the only possible
form of authentication.
A MAC normally functions across a whole message, and thus requires
that the entire message exist before authentication can operate.
One alternate form of authentication is a per-block authentication
field (see
homophonic substitution and
block code).
This allows each block to be authenticated, and possibly could
even replace standard Internet Protocol error-detection, thus
reducing system overhead.
Presumably, other forms of authentication are also possible.
Note that a CRC is a fast, linear hash. Messages with particular CRC result values can be constructed rather easily. However, if the CRC is hidden behind strong ciphering, an opponent is unlikely to be able to change the CRC value systematically or effectively. In particular, this means that the CRC value will need more protection than a simple exclusive-OR in an additive stream cipher or the exclusive-OR approach to handling short last blocks in a block cipher.
A similar approach to message integrity uses a nonlinear cryptographic hash function or MAC. These also add a computed redundancy to the message, but generally require more computation than a CRC. While cryptographic hashes generally purport to have significant security properties, those are rarely if ever proven to the same extent as the lesser properties of a simple CRC. It is thought to be exceedingly difficult to construct messages with a particular cryptographic hash result, so the hash result perhaps need not be hidden by encryption. Of course doing that is just tempting fate.
Another approach to message integrity is to use an
authenticating block cipher;
this is often a
block cipher which has a large
block, with some "extra data" inserted in
an "authentication field" as part of the plaintext before
enciphering each block.
The "extra data" can be some transformation of the key, the
plaintext, and/or a sequence number. This essentially creates a
homophonic block cipher: If we know
the key, many different ciphertexts will produce the same plaintext
field, but only one of those will have the correct authentication
field.
Normally, the message key is a large really random value or nonce, which becomes the key for ciphering the data in a single message (see cipher system). Normally, the message key itself is enciphered under the User Key or other key for that link (see alias file and key management). The receiving end first deciphers the message key, then uses that value as the key for deciphering the message data. Alternately, the random value itself may be sent unenciphered, but is then enciphered or hashed (under a keyed cryptographic hash) to produce a value used as the data ciphering key.
Message keys have very substantial advantages:
It is important that message key construction be made clear and straightforward in design and implementation. Like most nonces, a message key is "extra" data, the value of which is not important. That value thus could be subverted to become a hidden side-channel for disclosing secure information.
In a sense, a message key is the higher-level concept of an
IV, which is necessarily distinct for each
particular design.
Some form of message key is the usual way to implement a
hybrid or
public key cipher.
Metastability is typically caused by violation of the set-up and hold time requirements of a flip-flop, which can cause an intermediate voltage level to be latched. Note that intermediate voltage levels always occur when a digital signal changes state; in a well-designed system they are ignored by clock signals which provide time for the transient condition to pass. Metastability occurs when the "amplified" level in a flip-flop happens to be exactly the same as the input level at the time the clock connects these points. The condition can endure indefinitely, until internal noise causes a collapse one way or the other.
Metastability cannot be prevented in designs which use a digital flip-flop or latch to capture raw analog data or unsynchronized digital data such as digitized noise. Metastability can be reduced by minimizing the time the signal spends in the invalid region, for example by using faster logic and/or Schmitt trigger devices. Metastability can be greatly reduced by using more than one stage of clocked latch. Metastability is eliminated in logic design by assuring valid logic levels and timing, such that setup and hold times are never violated.
In the simple case, where a single input line is sampled,
metastability may only cause an occasional unexpected delay and
an uncontrolled
non-random
bias.
But if multiple lines are sampled and metastability occurs on even
one of those, a completely different value can be produced.
In some systems, latching a wrong value could lead to entering
unexpected or prohibited states with undefined results.
". . . a valid
[formal] proof is one in which
no matter how one interprets the descriptive terms,
one never produces a counterexample
"For any [informal] proposition there is always some sufficiently narrow interpretation of its terms, such that it turns out true, and some sufficiently wide interpretation, that it turns out false." [p.99]
". . . informal, quasi-empirical, mathematics does not grow through a monotonous increase of the number of indubitably established theorems, but through the incessant improvement of guesses by speculation and criticism, by the logic of proofs and refutations." [p.5]
"Refutations, inconsistencies, criticism in general are very important, but only if they lead to improvement. A mere refutation is no victory. If mere criticism, even though correct, had authority, Berkeley would have stopped the development of mathematics and Dirac could not have found an editor for his papers." [p.112]
-- Lakatos, I. 1976. Proofs and Refutations: The Logic of Mathematical Discovery. Cambridge University Press.
Any such claim is flawed in multiple ways:
Below, we have a toy 32-bit-block Mixing Cipher. Plaintext at the top is transformed into ciphertext at the bottom. Each "S" is an 8-bit substitution table, and each table (and now each mixing operation also) is individually keyed.
Horizontal lines connect elements which are to be mixed
together: Each
A 32-Bit Mixing Cipher
| | | | <- Input Block (Plaintext)
S S S S <- Fencing
| | | |
*---* *---* <- 2 BBM Mixings
| | | |
*-------* | <- 1 BBM Mixing
| *-------* <- 1 BBM Mixing
| | | |
S S S S <- Fencing
| | | |
*-------* |
| *-------*
| | | |
*---* *---*
| | | |
S S S S <- Fencing
| | | | <- Output Block (Ciphertext)
By mixing each element with another, and then each pair with
another pair and so on, every element is eventually mixed with every
other element. Each BBM mixing is
dyadic, so each "sub-level" is a mixing of
twice as many elements as the sublevel before it. A block of n
elements is thus fully mixed in
The pattern of these mixings is exactly like some implementations
of the FFT, and thus the term "FFT-style." See
Balanced Block Mixing and
Mixing Cipher Design
Strategy. Also see the info and articles in the
"Mixing Ciphers" section of the main page,
locally, or @:
http://www.ciphersbyritter.com/index.html#MixTech.
The usual goals for a conventional block cipher design are strength and speed. But to those goals, I add scalability, huge blocks, massive keyed internal state and clarity:
Perhaps the main issue in the design of block ciphers with huge blocks has always been the ability to efficiently mix information across the whole block. First recall substitution-permutation ciphering, where we build a conventional block cipher using only fencing layers (substitution tables) and wiring. By wiring substitution table outputs to two different following tables, changing the input to the first table may change both of the secondary tables, and that is diffusion. The problem is that both tables may not change. If some input change causes no change on the wires to some table, that table and possibly its subsequent tables will not be involved, which will give the opponent a simpler ciphering transformation to attack. We want to eliminate that possibility. Accordingly, what I call ideal mixing will cause any input change to be conducted to every following table, thus forcing all tables to actively participate in the result.
We now know that ideal mixing can be accomplished with FFT-like networks using relatively small BBM operations. However, in the U.S., ciphers which do not use my Balanced Block Mixing technology must use mixing operations which are inherently unbalanced. Since I view balance as the single most important concept in cryptography, avoiding that when we can get it would seem to be a very serious decision.
Building a scalable balanced block mixing process is fairly easy using ideal mixing BBM technology. Basically, each mixing operation must have a pair of orthogonal Latin squares, and those can be linear, nonlinear, or key-created, or combinations thereof. My bias is to use key-created oLs's in tables. (It is easy to construct keyed nonlinear orthogonal pairs of Latin squares of arbitrary 4n order as I describe in my articles:
I prefer to use "many" keyed substitution tables, because these
hold a large amount of unknown internal state which an
opponent must somehow reconstruct.
These
One of the more subtle problems with scalable Mixing Ciphers is that some limited quantity (maybe 16, maybe 64, maybe more) of keyed tables (and keyed oLs mixing operations as well) may have to support a block of essentially unlimited size. Thus, tables must be re-used; tables can be selected from an array of such tables based on some keyed function of position. It is important that this be keyed and sufficiently complex so that knowing something about a table in one position does not immediately allow assigning that knowlege to other table positions.
One way to handle table selection by cipher position is to
have a maximum block size, and then have a table of that size
for each layer, indicating which
We do something right, then move on. We do not have to do
things over and over until they finally work.
In the past I have used two linear mixing layers, which implies
three fencing layers, and I still might use that structure with
keyed nonlinear mixing.
However, having become somewhat more conservative, I might now
use three linear mixing layers (instead of two) along with four
fencing layers.
With a layered design, where it is easy to add or remove layers,
there is always a desire to reduce computation and increase
speed, and it is easy to go too far.
0 + 0 = 0
0 + 1 = 1
1 + 0 = 1
1 + 1 = 0
1 + 1 + 1 = 1
0 * 0 = 0
0 * 1 = 0
1 * 0 = 0
1 * 1 = 1
Subtraction mod 2 is the same as addition mod 2.
The operations + and * can also be considered the
logic functions
XOR and
AND respectively.
Addition and Subtraction:
1 0 1 1
+ 0 1 0 1
+ 1 1 0 0
---------
0 0 1 0
Multiplication:
1 0 1 1
* 1 1 0 0
----------
0
0
1 0 1 1
1 0 1 1
---------------
1 1 1 0 1 0 0
Polynomial multiplication is not the same as repeated
polynomial addition. But there is a fast approach to squaring
mod 2 polynomials:
a b c d
a b c d
------------
ad bd cd dd
ac bc cc dc
ab bb cb db
aa ba ca da
----------------------
a 0 b 0 c 0 d
To square a mod 2 polynomial, all we have to do is "insert" a zero
between every column. Note that aa = a for a = 0 or a = 1,
and ab = ba, so either 0 + 0 = 0 or 1 + 1 = 0.
Division:
1 0 1 1
----------------
1 1 0 0 ) 1 1 1 0 1 0 0
1 1 0 0
---------
1 0 1 0
1 1 0 0
---------
1 1 0 0
1 1 0 0
---------
0
The decision about whether the divisor "goes into" the dividend is based exclusively on the most-significant (leftmost) digit. This makes polynomial division far easier than integer division.
Mod 2 polynomials behave much like integers in that one polynomial may or may not divide another without remainder. This means that we can expect to find analogies to integer "primes," which we call irreducible polynomials.
Since division is not
closed, mod 2 polynomials do not constitute a
field. However, a
finite field of polynomials can be
created by choosing an
irreducible modulus polynomial, thus
producing a Galois field
GF(2n).
In a monoid consisting of set M and closed operation * :
A set with a closed operation which is just associative is a
semigroup.
A set with a closed operation which is associative, with an
identity element and inverses is a
group.
Schnorr, C. and S. Vaudenay. 1994. Parallel FFT-Hashing. Fast Software Encryption.149-156.
This definition would seem to include
orthogonal Latin squares,
a more-desirable
balanced form well-known in mathematics
for hundreds of years.
Their paper, on non-reversible hashing, may have been presented at
the "Cambridge Security Workshop," Cambridge, December 9-11, 1993.
My work on
Balanced Block Mixing was
published to the net on March 12, 1994, which was before the earlier
paper was available in published proceedings.
Even accepting the earlier paper, however, my work was the first
to demonstrate FFT-like reversible ciphering using
butterfly
functions that turn out to be
orthogonal Latin squares.
The attack depends upon the idea that messages of the same
length will be permuted in the same way, and probably will not
apply to modern transposition ciphers.
The point of multiple encryption is to reduce the damage if our main cipher is being broken without our knowledge. We thus compare the single-cipher case to the multiple-cipher case. But some people just do not like the idea of multiple encryption. Complaints against multiple encryption include:
When we have one cipher, could adding a second cipher weaken the result? Well, it is possible, but it also seems extremely unlikely. For example, weakening could happen if the second cipher was the same as the first, and in decipher mode (or an involution), and using the same key. But, except for the keying, exactly that situation is deliberately constructed in "EDE (encipher-decipher-encipher) Triple DES," about which there are no anxieties at all.
Remember that it is always possible for any cipher to be weak in practice, no matter how strong it is in general, or whether it has another cipher after it or not: All the opponent has to do is pick the right key. So, when we think about potential problems in any form of encryption, we also need to think about the likelihood of those causes actually happening in practice. Constructing a case of weakness is not particularly helpful if that does not apply generally. In any practical analysis, it is not very useful to find a counterexample which does not represent the whole; that is the classic logic fallacy known as accident.
Despite the "EDE Triple DES" example, the most obvious possibility of weakness would be for the same cipher to appear twice, in adjacent slots. Obviously, we can prevent that! Could a different cipher expose what a first cipher has just enciphered? Perhaps, but if so, they are not really different ciphers after all, and that would be something we could check experimentally. Can a single cipher have several fundamentally different constructions? That would seem to be difficult: Normally, even a small change in the ciphering process has far-reaching effects. But, again, we could check for that.
The idea that a completely unrelated cipher could decipher (and thus expose) what the first cipher had protected may seem reasonable to those with little or no experience in the practice of ciphering or who have never actually tried to break ciphertext. But if that approach was reasonable, it would be a major issue in the analysis of any new cipher, and we do not see that.
Ciphers transform plaintext into key-selected ciphertext, and we can describe the number of possible cipherings mathematically. Even restricting ourselves to small but serious block ciphers like DES or AES, the number of possible transformations is BIG, BIG, BIG (see AES)! Out of the plethora of possible ciphers, and every key for each of those ciphers, we expect only one cipher and one key to expose our information.
If many different ciphers and keys could expose an enciphered message, finding the limits of such vulnerability would be a major part of every cipher analysis, and that is not done because it is just not an issue. If just adding a second cipher would, in general, make the first cipher weaker, that would be a useful academic attack, and we see no serious proposals for such attacks on serious ciphers.
Much of the confusion about the potential risks of multiple encryption seems due to one confusing article with a particularly unfortunate title:
Maurer, U. and J. Massey. 1993. Cascade Ciphers: The Importance of Being First. Journal of Cryptology.Apparently the main goal of the article is to present a contradiction for:6(1):55-61.
"Folk Theorem. A cascade of ciphers is at least as difficult to break as any of its component ciphers." [p.3]In the end, the main result from the article seems to be:
"It is proved, for very general notions of breaking a cipher and of problem difficulty, that a cascade is at least as difficult to break as the first component cipher." [Abstract]But while that may sound insightful, it just means that if the first cipher is weak, the cascade may be strong anyway, which is no different than the "Folk Theorem." And if the first cipher is strong, the result tells us nothing at all. (What really would be significant would be: "at least as weak as," but that is not what the article gives us.)
The implication seems to be that we should simply discard a useful rule of thumb that is almost always right, for a result claimed to be right which gives us nothing at all. But we would not do that even in theoretical mathematics! (See Method of Proof and Refutations.) Instead, we would seek the special-case conditions that make our statement false, and then integrate those into assumptions that support valid proof.
The example "ciphers" used in the "proof" are constructed so that each of two possible keys do produce different ciphertexts for two plaintexts, but not for two others. Then we assume that only two original plaintexts actually occur. In this case, for one cipher being "first," that key has no effect, and the resulting ciphertext is also not affected by the key in the second cipher. Yet if the ciphers are used in reverse order, both keys are effective.
However, knowing which cipher is "first" is only "important" when the first cipher results support attacks on the second cipher and not vise versa. But if we are allowed to create arbitrary weakness in examples, we probably can construct some that are mutually weak, in which case the question of which is "first" is clearly a non-issue, despite both the "result" and article title.
Both of the example ciphers are seriously flawed in that, for fully half of their plaintexts, changing the key does not change the ciphertext. Thus, for half their plaintexts, the example ciphers are essentially unkeyed. Since the example ciphers start out weak, I do not accept that either ordering has reduced the strength of the other cipher, and that is the main fear in using multiple encryption. Moreover, all we need to get a strong cascade is one more cipher, if it is strong. And that, of course, is the point of the "Folk Theorem."
The article has been used by some to sow "FUD" (fear, uncertainty and doubt) about multiple encryption. But multiple encryption in the form of product ciphering (in rounds and with related keys!) is a central part of most current block cipher designs. So before believing rumors of multiple encryption weakness, we might first ask why the block ciphers which use this technology seem to be trusted so well.
The first advantage of multiple encryption is to address the risk of the single point failure created by the use of a single cipher. Unfortunately, "risk of overall failure" seems to be a significantly different issue than the "keyspace size" or "needed known-plaintext" measures used in most related analysis. Unfortunately, it is in the nature of cryptography that the risk of cipher failure cannot be known by the cryptanalyst, designer, or user. The inability to know the risk is also the inability to quantify that risk, which leaves the analyst without an appropriate measure.
What we most seek from multiple encryption is redundancy to reduce risk of overall failure, a concept almost completely missing from academic analysis. The main issue is not whether multiple ciphers produce a stronger result when they all work. Instead the issue is overall security when one cipher is weak. For the single-cipher case a broken cipher means complete failure and loss of secrecy even if we are not informed. For the multi-cipher case to be a better choice, all the remaining ciphers need do is have any strength at all. One would expect, of course, that the remaining ciphers would be as strong as they ever were. And even if we assume that all the ciphers are weak, it is possible that their composition could still be stronger than the abject failure of a completely exposed single cipher.
Some examples of the literature:
"Here, in addition to formalizing the problem of chosenciphertext security for multiple encryption, we give simple, efficient, and generic constructions of multiple encryption schemes secure against chosen ciphertext attacks (based on any component schemes secure against such attacks) in the standard model."
--Dodis, Y. and J. Katz. 2005. Chosen-Ciphertext Security of Multiple Encryption. Second Theory of Cryptography Conference Proceedings. 188-209."We prove cascade of encryption schemes provide tolerance for indistinguishability under chosen ciphertext attacks, including a 'weak adaptive' variant."
"Most cryptographic functions do not have an unconditional proof of security. The classical method to establish security is by cryptanalysis i.e. accumulated evidence of failure of experts to find weaknesses in the function. However, cryptanalysis is an expensive, time-consuming and fallible process. In particular, since a seemingly-minor change in a cryptographic function may allow an attack which was previously impossible, cryptanalysis allows only validation of specific functions and development of engineering principles and attack methodologies and tools, but does not provide a solid theory for designing cryptographic functions. Indeed, it is impossible to predict the rate or impact of future cryptanalysis efforts; a mechanism which was attacked unsuccessfully for years may abruptly be broken by a new attack. Hence, it is desirable to design systems to be tolerant of cryptanalysis and vulnerabilities (including known trapdoors)."
"Maurer and Massey claimed that the proof in [EG85] 'holds only under the uninterestingly restrictive assumption that the enemy cannot exploit information about the plaintext statistics', but we disagree. We extend the proof of [EG85] and show that, as expected intuitively and in [EG85], keyed cascading provides tolerance to many confidentiality specifications, not only of block ciphers but also of other schemes such as public key and shared key cryptosystems. Our proof uses a strong notion of security under indistinguishability test--under plaintext only and non-adaptive chosen ciphertext attack (CCA1), as well as weak version of adaptive chosen ciphertext attack (wCCA2). On the other hand, we note that cascading does not provide tolerance for adaptive chosen ciphertext attack (CCA2), or if the length of the output is not a fixed function of the length of the input."
--Herzberg, A. 2004. On Tolerant Cryptographic Constructions. Presented in Cryptographer's Track, RSA Conference 2005."In a practical system, a message is often encrypted more than once by different encryptions, here called multiple encryption, to enhance its security." "Intuitively, a multiple encryption should remain 'secure', whenever there is one component cipher unbreakable in it. In NESSIE’s latest Portfolio of recommended cryptographic primitives (Feb. 2003), it is suggested to use multiple encryption with component ciphers based on different assumptions to acquire long term security. However, in this paper we show this needs careful discussion." "We give the first formal model regarding public key multiple encryption."
--Zhang, R., G. Honaoka, J. Shikata and H. Imai. 2004. On the Security of Multiple Encryption or CCA-security+CCA-security=CCA-security? 2004 International Workshop on Practice and Theory in Public Key Cryptography."We obtain the first proof that composition actually increases the security in some meaningful way."
--Aiello, W., M. Bellare, G. Di Crescenzo, R. Venkatesan. 1998. Security Amplification by Composition: The case of Doubly-Iterated, Ideal Ciphers. Advances in Cryptology--Crypto 98. 390-407. Springer-Verlag."We conjecture that operation modes should be designed around an underlying cryptosystem without any attempt to use intermediate data as feedback, or to mix the feedback into an interemediate round."
--Biham, E. 1994. Cryptanalysis of Multiple Modes of Operation. Journal of Cryptology. 11(1):45-58."Double encryption has been suggested to strengthen the Federal Data Encryption Standard (DES). A recent proposal suggests that using two 56-bit keys but enciphering 3 times (encrypt with a first key, decrypt with a second key, then encrypt with the first key again) increases security over simple double encryption. This paper shows that although either technique significantly improves security over single encryption, the new technique does not significantly increase security over simple double encryption.
--Merkle, R. and M. Hellman. 1981. On the Security of Multiple Encryption. Communications of the ACM. 24(7):465-467.
Multiple encryption can increase keyspace (as seen in Triple DES). But modern ciphers generally have enough keyspace, so adding more is not usually the looked-for advantage in using multiple encryption.
Multiple encryption reduces the consequences in the case that our favorite cipher is already broken and is continuously exposing our data without our knowledge. (See the comments on the John Walker spy ring in: security through obscurity.) When a cipher is broken (something we will not know), the use of other ciphers may represent the only security in the system. Since we cannot scientifically prove that any particular cipher is strong, the question is not whether subsequent ciphers are strong, but instead, what would make us believe that any particular cipher is so strong as to need no added protection.
Multiple encryption also protects each of the component ciphers from known plaintext attack. Since known plaintext completely exposes the ciphering transformation, it enables a wide range of attacks, and is likely to make almost any attack easier. Preventing known plaintext attacks has at least the potential to even make weak ciphers strong in practice.
With multiple encryption, the later ciphers work on the randomized "plaintext" produced as ciphertext by the earlier cipher. It can be extremely difficult to attack a cipher which only protects apparently random "plaintext," because it is necessary to at least find some structure in the plaintext to know that one has solved the cipher. See: Ideal Secrecy and unicity distance.
Most of the protocols used in modern communications are standardized. As a consequence, most people do not question the need for standardization in ciphers. But in this way ciphers are once again very different than the things we know so well: The inherent purpose of ciphers is to prevent interconnection to almost everyone (unless they have the right key).
Obviously, it is necessary to describe a cipher clearly and completely if it is to be properly implemented by different people. But standardizing on a single cipher seems more likely to help the opponent than the user (see NSA). With a single cipher, an opponent can concentrate resources on one target, and that target also has the most value since it protects most data. The result is vastly increased user risk.
The alternative to having a standard cipher is to have a standard cipher interface, and then select a desired cipher by textual name from a continually-increasing list of ciphers. In whatever way we now transfer keys, we could also transfer the name of the desired cipher, or even the actual cipher itself.
Multiple encryption can be dangerous if a single cipher is used with the same key each time. Some ciphers are involutions which both encipher and decipher with the same process; these ciphers will decipher a message if it is enciphered a second time under the same key. This is typical of classic additive synchronous stream ciphers, as it avoids the need to have separate encipher and decipher operations. But it also can occur with block ciphers operated in stream-cipher-like modes such as OFB, for exactly the same reason.
It is true that multiple encryption cannot be proven to improve security over having just a single cipher. That seems hardly surprising, however, since no single cipher can be proven to improve security over having no cipher at all. Indeed, using a broken cipher is far worse than no cipher, because then users will be mislead into not taking even ordinary precautions. And in real cryptography, users will not know when their cipher has been broken.
For more on this topic, see: superencryption; and the large sci.crypt discussions:
The many exhibits include:
Located about halfway between Washington, D.C. and Baltimore, MD, just off Rt. 295 (the Baltimore-Washington Parkway), the Museum is about a half-hour out of Washington: For example, take I-95 N., to Rt. 32 E. Then, just past (by under 1/10th of a mile) the cloverleaf intersection with Rt. 295, take the next exit (Canine Rd.), and follow the signs. Or take the Baltimore-Washington Parkway N., and exit just past the I-95 cloverleaf. The Museum is clearly marked on the Maryland page of the Mapsco or Mapquest Road Atlas 2005.
CAUTION: Due to new construction, the wooded terrain, and the
near proximity of Rt. 295, the sign indicating the Museum exit on
Rt. 32 is much too close to the turn-off.
So, after the Rt. 295 cloverleaf, end up in the right lane prepared
for a turn-off coming under a tenth of a mile later.
What does occur in reality is negative incremental or differential or dynamic resistance, where an increase in the voltage across an active device produces a decrease in current the device allows. That is the reverse of the normal resistance effect, and so is a "negative-like" region in the nevertheless overall positive effective resistance of the device. Some things which may have some amount of negative dynamic resistance include:
The analog electrical noise produced in semiconductor devices is typically classified as having three sources:
In cryptography, noise is often deliberately produced as a source of really random values. Such noise is normally the result of a collected multitude of independent tiny pulses, which is a white noise. We say that white noise contains all frequencies equally, which actually stretches both the meaning of frequency as a correlation over time, and the noise signal as a stationary source. In practice, the presence of low frequency components implies a time correlation which we would prefer to avoid, but which may be inherent in noise.
I have attacked noise correlation in two ways:
See my articles:
In general, a nonce is at least potentially dangerous, in that
it may represent a hidden channel.
In most nonce use, any random data value is as good as another,
and, indeed, that is usually the point.
However, by selecting particular values, nonce data could be
subverted and used to convey information about the key or plaintext.
Since any value should be as good as any other, the user and
equipment would never know about the subversion.
Of course, the same risk occurs in
message keys, and that does not
mean we do not use message keys or other nonces.
"The" normal distribution is in fact a family of distributions,
as parameterized by
mean and
standard deviation values. By computing
the sample mean and standard deviation, we can "normalize" the
whole family into a single curve. A value from any normal-like
distribution can be normalized by subtracting the mean then
dividing by the standard deviation; the result can be used to
look up probabilities in standard normal tables. All of which
of course assumes that the underlying distribution is in fact
normal, which may or may not be the case.
The NSA is the frequent topic of cryptographic speculation in the sense that they represent the opponent, bureaucratized. They have huge resources, massive experience, internal research and motivated teams of attackers. But since NSA is a secret organization, most of us cannot know what NSA can do, and there is little fact beyond mere speculation. But it is curious that various convenient conditions do exist, seemingly by coincidence, which would aid real cryptanalysis.
One situation convenient for NSA is that some particular cipher designs have been standardized. (This has occurred through NIST, supposedly with the help of NSA.) Although cipher standardization can be a legal requirement only for government use, in practice the standards are adopted by society at large. Cipher standardization is interesting because an organization which attacks ciphers presumably is aided by having few ciphers to attack, since that allows attack efforts to be concentrated on few targets.
When information is at risk, there is nothing odd about having an approved cipher. Normally, managers look at the options and make a decision. But NSA has secret ciphers for use by government departments and the military. They also change those ciphers far more frequently than the standardized designs.
Would a government agency risk tarnishing its reputation by knowingly approving a flawed cipher? Well, it is NIST, not NSA, that approves standard public ciphers. And if NSA neither designed nor approved those ciphers, exactly how could a flaw be considered a risk to them? Indeed, finding a flaw in a public design could expose the backwardness of academic development compared to the abilities of an organization which normally cannot discuss what it can do. That is not only not a risk, it could be the desired outcome.
Another situation which is convenient for NSA is that users are frequently encouraged to believe that their cipher has been proven strong by government acceptance. That is a reason to do nothing more, since what has been done is already good enough. Can we seriously imagine that NSA has a duty to tell us if they know that our standard cipher is weak? (That would expose their capabilities.)
Clearly, when only one cipher is used, and that cipher fails, all secrecy is lost. Thus, any single cipher is at risk of being a single point of failure. But, since risk analysis is a well known tool in other fields, it does seem odd that cryptography users are continually using a single cipher with no redundancy at all. The multiple encryption alternative simply is not used. The current situation is incredibly risky for users, yet oddly convenient for NSA.
The OTP or one time pad is commonly held up as the one example of an unbreakable cipher. Yet NSA has clearly described breaking the VENONA cipher, which used an OTP, during the Cold War. It is argued that VENONA was "poorly used," but if a user has no way to guarantee a cipher being "well used," there is no reason for a user to consider an OTP strong at all.
It does seem convenient for NSA that a potentially breakable
cipher continues to be described by crypto authorities as
absolutely "unbreakable."
Sometimes null characters are used to assure serial-line synchronization between data blocks or packets (see the ASCII character "NUL"). Sometimes null characters are used to provide a synchronized real-time delay when a transmitter has no data to send; this is sometimes called an "idle sequence." Similarly, block padding characters are sometimes considered "nulls."
A more aggressive use of nulls in
ciphering is to interleave nulls with
plaintext or
ciphertext data, in some way that the
nulls later can be removed.
When nulls are distinguished by position, they can have
random or even cleverly-selected values,
and thus improve plaintext or ciphertext
statistics, when desired.
And if the nulls can be removed only if one has a correct
key, nulls can constitute another
layer of ciphering beyond an existing
cipher.
Of course, adding nulls does
expand ciphertext.
(Also see
multiple encryption,
transposition and
braid.)
The
p-value computation for a particular
statistic typically tells us the probability of getting any
particular statistic value or less (or more) in the null
distribution.
Then, if we repeatedly find very unusual statistic values,
we can conclude either that our sampling has been very lucky, or
that the statistic is not reproducing the null distribution.
That would mean that we were not sampling innocuous data, and
so could reject the
null hypothesis.
This is the "something unusual found" situation.
Normally, the null hypothesis is just the statistical conclusion drawn when the pattern being tested for is not found. Normally, the null hypothesis cannot be proven or established by experiment, but can only be disproven, and statistics can only do that with some probability of error which is called the significance.
A statistical experiment typically uses random sampling or random values to probe a universe under test. Those samples are then processed by a test statistic and accumulated into a distribution. Good statistical tests are intended to produce extreme statistic values upon finding the tested-for patterns.
Sometimes it is thought that extreme statistic values are an indication that the tested-for pattern is present. Alas, reality is not that simple. Random sampling generally can produce any possible statistic value even when no pattern is present. There are no statistic values which only occur when the tested-for pattern is detected. However, some statistic values are extreme and only occur rarely when there is no underlying pattern. To distinguish patterns from non-patterns, it is necessary to know how often a particular statistic result value would occur with unpatterned data.
The collection of statistic values we find or expect from data
having no pattern is the
null distribution.
Typically this distribution will have the shape of a hill or bell,
showing that intermediate statistic values are frequent while
extreme statistic values are rare.
To know just how rare the extreme values are, other statistic
computations "flatten" the distribution by converting statistic
values into probabilities or
The probability that an extreme statistic value will occur when no pattern is present is also the probability of a Type I error which is usually a "false positive." Type I errors are a consequence of the randomness required by sampling, or a consequence of random values, and cannot be eliminated. Normally, random values are expected to produce sequences without pattern. Again, reality is not like that. Instead, over huge numbers of sequences, random values must produce every possible sequence, including every possible "pattern." Usually the statistical test is looking for a particular class of pattern which may not correspond to what we expect.
[When every sequence is possible, the probability of finding a "pattern" depends strongly on what we interpret as a pattern. It might be possible to get some quantification of pattern-ness with a measure like Kolmogorov-Chaitin complexity (the length of the shortest program to produce the sequence). But K-C complexity testing may have its own bias, and in any case there is no algorithm to find the shortest program.]
Any particular statistic value can be used to separate "probably found" from "probably not found." Typically, scientists will use a "significance" of 95 percent or 99 percent, but that is not the probability that the hoped-for "something unusual" signal has been found. Instead, the complement of the significance (usually 5 percent or 1 percent) generally is the probability that a statistic value so high or greater occurring from null data having no patterns. With a 95 percent significance, null data will produce results that falsely reject the null hypothesis in 5 trials out of 100. By increasing the significance, the probably that null data will produce an extreme result value is decreased, but never zero. When multiple trials show that the statistic measurements do not follow the null distribution, "something unusual" has been found, and the null hypothesis is rejected.
Randomness testing is a special case because there we hope to find the null distribution. Success in randomness testing generally means being forced to accept the null hypothesis, which is opposite to most statistical experiment discussions. Statistical test programs in the classic mold seem to say that some statistic extremes mean "a pattern is probably found," which would be "bad" for a random generator. Again, that is not how reality works. In most cases, a random number generator (RNG) is expected to produce the null distribution. Extreme statistic values are not only expected, they are absolutely required. An RNG which produces only "good" statistical results is bad.
If we check random data which has no detectable pattern, we
might expect the null hypothesis to be always rejected, but that
is not what happens.
In tests at a
Normally, the alternative hypothesis or research hypothesis H1 includes the particular signal we test for in the randomly-sampled data, but also includes any result other than that specified by the null hypothesis. It is, therefore, more like "something unusual found" than evidence of the particular result we seek. When the tested-for pattern seems to have been found, the null hypothesis is rejected, although the result could be due to a flawed experiment, or even mere chance (see significance). This range of things that may cause rejection is a motive for also running control trials which do not have the looked-for signal. Also see: randomness testing and scientific method.
A common approach is to formulate the null hypothesis to expect no effect, as in: "this drug has no effect." Then, finding something unexpected causes the null hypothesis to be rejected, with the intended meaning being that the drug "has some effect." However, many statistical tests (such as goodness-of-fit tests) can only indicate whether a distribution matches what we expect, or not. When the expectation is the known null distribution, then what we expect is nothing, which makes the "unusual" stand out. But in that case, even a poorly-conducted or fundamentally flawed experiment could produce a "something unusual found" result. Simply finding something unusual in a statistical distribution does not imply the presence of a particular quality. Instead of being able to confirm a model in quantitative detail, this formulation may react to testing error as a detectable signal.
Even in the best possible situation, random sampling will produce a range or distribution of test statistic values. Often, even the worst possible statistic value can be produced by an unlucky sampling of the best possible data. It is thus important to compare the distribution of the statistic values, instead of relying on a particular result. It is also important to know the null distribution so we can make the comparison. If we find a different distribution of statistic values, that will be evidence supporting the alternative or research hypothesis H1.
When testing data which has no underlying pattern, if we collect enough statistic values, we should see them occur in the null distribution for that particular statistic. So if we call the upper 5 percent of the distribution "failure" (this is a common scientific significance level) we not only expect but in fact require such "failure" to occur about 1 time in 20. If it does not, we will in fact have detected something unusual in a larger sense, something which might even indicate problems in the experimental design.
If we have only a small number of samples, and do not run repeated trials, a relatively few chance events can produce an improbable statistic value even in the absence of a real pattern That might cause us to reject a valid null hypothesis, and so commit a Type I error.
When we see "success" in a very common distribution, we can expect that success will be very common. A system does not have to be all that complex to produce results which just seem to have no pattern, and when no pattern is detected, we seem to have the null distribution. Finding the null distribution is not evidence of a lack of pattern, but merely the failure to find a pattern. And since that pattern may exist only in part, even the best of tests may give only weak indications which may be masked by sampling, thus leading to a Type II error. To avoid that we can run many trials, of which only a few should mask any particular indication. Of course, a weak indication may be difficult to distinguish from sampling variations anyway, unless larger trials are used. But there would seem to be no limit to the size of trials one might use.
Bellare and Rogaway, 1994. Optimal Asymmetric Encryption. Advances in CryptologyAn encoding for RSA.-- Eurocrypt '94. 92-111.
Note that the simplest model is not necessarily right: Many simple models are eventually replaced by more complex models. Nor does Science expect practitioners to defer to a particular model just because it has been published: The issue is the quality of the argument in the publication, and not the simple fact of publication itself.
A recommendation for scientists might be:
"When you have multiple theories which predict exactly the same
known facts, assume the simpler theory until it clearly does not
apply."
That tells us to test the simple model first, and to choose a
more complex model if the simple one is insufficient. Also see:
scientific method.
Somewhat easier to learn than
hexadecimal, since no new numeric
symbols are needed, but octal can only represent three
bits at a
time. This generally means that the leading digit will not take all
values, and that means that the representation of the top part of
two concatenated values will differ from its representation alone,
which can be confusing. Also see:
binary and
decimal.
E = IR, orI = E/R, orR = E/I .
Stories are almost universal in human society. Most stories are about someone doing something, and how that is going, or how it turned out. To a large extent, stories and gossip are how we learn to interact with the world around us. Because story-listening is so common, humans may be genetically oriented toward stories. Perhaps, before the invention of writing, valuable past experiences lived on in stories, and those who listened tended to live longer and/or better, and reproduce more.
But even if evolution has put gossip in our genes, it does not seem to have worried very much about the distinction between fantasy and reality. I suppose that would be a lot to ask of mere evolution. But, even in modern technology, many plausible-sounding stories are just not right, yet people accept them anyway.
Normally, Science handles this by testing the gossip-model against reality. But cryptography has precious little reality to test against. Accordingly, we see the accumulation of "old wives' tales" which are at best rules of thumb and at worst flat-out wrong. Yet these stories apparently are so ingrained in the myth of the field that mere rationality is insufficient to stop their progression (see cognitive dissonance).
Examples include:
Ideally, "one-sided tests" are statistic computations sensitive to variations on only one side of the reference. The two "sides" are not the two ends of a statistic distribution, but instead are the two directions that sampled values may differ from the reference (i.e., above and below).
When comparing distributions, low p-values almost always mean that the sampled distribution is unusually close to the reference.
On the other hand, the meaning of high p-values depends
on the test.
Some "one-sided" tests may concentrate on sampled values
above the reference distribution, whereas different
"one sided" tests may be concerned with sampled values
below the reference.
If we want to expose deviations both above and below
the reference, we can use two appropriate "one-sided"
tests, or a
two-sided test intended to expose
both differences.
Fundamentally a way to interpret a statistical result. Any statistic can be evaluated at both tails of its distribution, because no distribution has just one tail. The question is not whether a test distribution has two "tails," but instead what the two tails mean.
When comparing distributions, finding repeated p-values near 0.0 generally mean that the distributions seem too similar, which could indicate some sort of problem with the experiment.
On the other hand, the meaning of repeated p-values near 1.0 depends on the test. Some one-sided tests may concentrate on sampled values above the reference distribution, whereas different "one sided" tests may be concerned with sampled values below the reference. If we want to expose deviations both above and below the reference, we can use two appropriate "one-sided" tests, or a two-sided test intended to expose differences in both directions.
Some texts argue that one-tailed tests are almost always inappropriate, because they start out assuming something that statistics can check, namely that the statistic exposes the only important quality. If that assumption is wrong, the results cannot be trusted.
There is also an issue that the significance level is confusingly different (about twice the size) in one-tailed tests than it is in two-tailed tests, since two-tailed tests accumulate rejection from both ends of the null distribution.
However, sometimes one-tailed tests seem clearly more appropriate than the altermative, for example:
Also see my comments from various OTP discussions locally, or @: http://www.ciphersbyritter.com/NEWS2/OTPCMTS.HTM
Despite the "one-time" name, the most important OTP requirement is not that the keying sequence be used only once, but that the keying sequence be unpredictable. Clearly, if the keying sequence can be predicted, the OTP is broken, independent of whether the sequence was re-used or not. Sequence re-use is thus just one of the many forms of predictability. Indeed, we would imagine that the extent of the inability to predict the keying sequence is the amount of strength in the OTP. And the OTP name is just another misleading cryptographic term of art.
The one time pad sometimes seems to have yet another level of strength above the usual stream cipher, the ever-increasing amount of unpredictability or entropy in the confusion sequence, leading to an unbounded unicity distance and perhaps, ultimately, Shannon Perfect Secrecy. Clearly, if the confusion sequence is in fact an arbitrary selection among all possible and equally-probable strings of that length, the system would be Perfectly Secret to the extent of hiding which message of the given length was intended (though not the length itself). But that assumes a quality of sequence generation which we cannot prove but can only assert. So that is a just another scientific model which does not sufficiently correspond to reality to predict the real outcome.
In a realized one time pad, the confusion sequence itself must be random for, if not, it will be somewhat predictable. And, although we have a great many statistical randomness tests, there is no test which can certify a sequence as either random or unpredictable. Indeed, a random selection among all possible strings of a given length must include even the worst possible patterns that we could hope to find (e.g., "all zeros"). So a sequence which passes our tests and which we thus assume to be random may not in fact be the unpredictable sequence we need, and we can never know for sure. (That could be considered an argument for using a combiner with strength, such as a Latin square, Dynamic Substitution or Dynamic Transposition.) In practice, the much touted "mathematically proven unbreakability" of the one time pad depends upon an assumption of randomness and unpredictability which we can neither test nor prove.
In a realized one time pad, the confusion sequence must be transported to the far end and held at both locations in absolute secrecy like any other secret key. But where a normal secret key might range perhaps from 16 bytes to 160 bytes, there must be as much OTP sequence as there will be data (which might well be megabytes or even gigabytes). And whereas a normal secret key could itself be sent under a key (as in a message key or under a public key), an OTP sequence cannot be sent under a key, since that would make the OTP as weak as the key, in which case we might as well use a normal cipher. All this implies very significant inconveniences, costs, and risks, well beyond what one would at first expect, so even the realized one time pad is generally considered impractical, except in very special situations.
There are some cases in which an OTP can make sense, at least when compared to using nothing at all. One advantage of any cipher is the ability to distribute key material instead of plaintext. Whereas plaintext lost in transport could mean exposure, key material lost in transport would not affect security. That allows key material to be securely transported at an advantageous time and accumulated for later use. Of course it also requires that key material transport be successfully completed before use. And the existence of a key material repository allows the repository to be targeted for attack immediately, before secure message transport is even needed.
A realized one time pad requires a confusion sequence which is as long as the data. However, since this amount of keying material can be awkward to transfer and keep, we often see "pseudo" one-time pad designs which attempt to correct this deficiency. Normally, the intent is to achieve the theoretical advantages of a one-time pad without the costs, but unfortunately, the OTP theory of strength no longer applies. Actual random number generators typically produce their sequence from values held in a fixed amount of internal state. But when the generated sequence exceeds that internal state, only a subset of all possible sequences can be produced. RNG sequences are thus not random in the sense of being an arbitrary selection among all possible and equally-probable strings, no matter how statistically random the individual values may appear. Of course it is also possible for unsuspected and exploitable correlations to occur in the sequence from a really random generator whose values also seem statistically quite random. Accordingly, generator ciphers are best seen as classic stream cipher designs.
Nor does even a theoretical one time pad imply unconditional security: Consider A sending the same message to B and C, using, of course, two different pads. Now, suppose the opponents can acquire plaintext from B and intercept the ciphertext to C. If the system is using the usual additive combiner, the opponents can reconstruct the pad between A and C. Now they can send C any message they want, and encipher it under the correct pad. And C will never question such a message, since everyone knows that a one time pad provides "absolute" security as long as the pad is kept secure. Note that both A and C have done this, and they are the only ones who had that pad.
Even the theoretical one time pad fails to hide message length, and so does leak some information about the message.
In real life, theory and practice often differ. The main problem in applying theoretical proof to practice is the requirement to guarantee that each and every assumption in the proof absolutely does exist in the target reality. The main requirement of the OTP is that the pad sequence be unpredictable. Unfortunately, unpredictability is not a measurable quantity. Nobody can know that an OTP sequence is unpredictable. Users cannot test a claim of unpredictability on the sequences they have. The OTP thus requires the user to trust the pad manufacturers to deliver unpredictability when even manufacturers cannot measure or guarantee that. Any mathematical proof which requires things that cannot be guaranteed in practice is not going to be very helpful to a real user. (Also see the longer discussion at proof.)
The inability to guarantee unpredictability in practice should be a lesson in the practical worth of mathematical cryptography. Theoretical math feels free to assume a property for use in proof, even if that property clearly cannot be guaranteed in practice. In this respect, theoretical math proofs often deceive more than they inform, and that is not a proud role for math.
At least two professional, fielded systems which include OTP ciphering have been broken in practice by the NSA. The most famous is VENONA, which has its own pages at http://www.nsa.gov/docs/venona/. VENONA traffic occurred between the Russian KGB or GRU and their agents in the United States from 1939 to 1946. A different OTP system break apparently was described in: "The American Solution of a German One-Time-Pad Cryptographic System," Cryptologia XXIV(4): 324-332. These were real, life-and-death OTP systems, and one consequence of the security failure caused by VENONA was the death by execution of Julius and Ethel Rosenberg. Stronger testimony can scarcely exist about the potential weakness of OTP systems. And these two systems are just the ones NSA has told us about.
Apparently VENONA was exposed by predictable patterns in the key and by key re-use. At this point, OTP defenders typically respond by saying: "Then it wasn't an OTP!" But that is the logic fallacy of circular reasoning and tells us nothing new: What we want is to know whether or not a cipher is secure before we find out that it was broken by our opponents (especially since we may never find out)! Simply assuming security is what cryptography always does, and then we may be surprised when we find there was no security after all, but we expect much more from a security proof! We expect a proof to provide a guarantee which has no possibility of a different outcome; we demand that there be zero possibility of surprise weakness from a system which is mathematically proven secure in practice. Surely the VENONA OTP looked like an OTP to the agents involved, and what can "proven secure" possibly mean if the user can reasonably wonder whether or not the "proven" system really is secure?
Various companies offer one time pad programs, and sometimes also the keying or "pad" material. But random values sent on the Internet (as plaintext or even as ciphertext) are of course unsuitable for OTP use, since we would hope it would be easier for an opponent to expose those values than to attack the OTP.
Typically, the "pad" is one of a matched pair of small booklets of thin paper sheets holding random decimal digits, where each digit is to be used for encryption at most once. When done, that sheet is destroyed. The intent is that only the copy in the one remaining booklet (presumably in a safe place) could possibly decrypt the message.
In hand usage, a codebook is used to convert message plaintext to decimal numbers. Then each code digit is added without carry to the next random digit from the booklet and the result is numerical ciphertext. In the past, a public code would have been used to convert the resulting values into letters for cheaper telegraph transmission.
Based on
theoretical results, the
practical one time pad is widely thought
to be "unbreakable," a claim which is false, or at least only
conditionally true (see the NSA VENONA practical
successes above).
For other examples of failure in the current cryptographic wisdom,
see
AES,
BB&S,
DES,
proof and, of course,
old wives' tale.
Many academic sources say that a "one way" hash must make it difficult or impossible to create a particular result value. That would be an important property for authentication, since, when an opponent can easily create a particular hash value, an invalid message can be made to masquerade as real. But it is not clear that we can guarantee that property any more than we can guarantee cipher strength. (Also see cryptographic hash and MAC.)
In contrast, many other uses of hash functions in cryptography do not need the academic "one way" property, including:
+----------+ +----------+ | | ONTO | | | X | | Y = f(X) | | | f | | | | ---> | | +----------+ +----------+
It can be argued that block cipher operating modes are
stream "meta-ciphers" in which the
basic transformation is a full block, instead of
the usual
bit or
byte.
The schematic symbol for an op amp is a triangle pointing right, with the two inputs at the left and the output on the right. Power connections come out the top and bottom, but are often simply assumed. Power is always required, but is not particularly innovative, and can obscure the crucial feedback path from OUT to -IN.
+PWR
| \ |
| \
-IN --| - \
| >-- OUT
+IN --| + /
| /
| / |
-PWR
Op amps were originally used to compute mathematical functions in analog computers, where each amplifier was an "operation."
In the usual voltage-mode idealization, each input is imagined to have an infinite impedance and the output has zero impedance (here the ideal output is a voltage source, unaffected by loading). In the the rarer current-mode form, each input is imagined to have zero impedance and the output has infinite impedance (that is, the output is a current source). Some current-feedback op amps for RF use have a low-impedance voltage output and low-impedance current inputs. Of course, no real device has anything like infinite gain, although op amp gain can be extremely high at DC.
One important feature of an op amp is stability (as in lack of spurious oscillation; see discussion in amplifier). Op amp transistors have substantial gain at RF frequencies, and unexpected coupling between input and output can produce RF oscillations. Unfortunately, these may be beyond the frequency range that a modest oscilloscope can detect, with the main indication being that the device gets unreasonably hot.
In the early days of IC op amps, the designer was expected to
produce a feedback network for each circuit that included
stability compensation to prevent oscillation.
Nowadays, the IC manufacturer generally buys stability by rolling
off the frequency response at the usual
RC rate of
The usual cures for instability include first isolating the power supply, since that will go everywhere:
One of the main advantages of op amps is an ability to precisely set gain with resistors and negative feedback. In an environment where the available devices have wildly different gain values, the ability to set gain precisely over all production devices is a luxury. If the feedback is purely resistive, and thus relatively insensitive to different frequencies, an op amp can be given a wide, flat frequency response even though the open-loop response typically droops by 6dB/octave (20dB/decade). By using reactive components (typically capacitors) in the feedback loop, the frequency response can be tailored as desired. Moreover, in general, whatever gain is available beyond that specifically programmed acts to minimize distortion. For example, if we want a gain of 20 decibels (20dB or 10x) at 20kHz, we probably want an op amp to have 40dB (100x) or more gain at 20kHz, so that 20dB remains to reduce amplifier distortion.
Operational amplifiers typically roll off high frequency gain at around 20dB/decade for stability. With that roll-off slope, the numerical product of gain and frequency is approximately constant in the roll-off region. A good way to describe this might have been "gain-frequency product," but the phrase actually used is "gain-bandwidth product" or GBW. The GBW is the frequency at which gain = 1, which is way beyond the useful region, since op amps are supposed to have "infinite" gain. (In practice, GBW varies with supply voltage, load, and measurement frequency, not to mention faster than expected rolloff in different designs, so the relation is approximate). To get 40dB (100x) of open-loop gain at 20kHz, we will need a minimum GBW of about 100x 20kHz or 2MHz. Exactly the same computation is used in bipolar transistors, where GBW is known as the "transition frequency," or fT.
In most cases, the positive op amp input is not part of the feedback system, and so has the normal high impedance expected of an op amp input. However, the negative op amp input almost always is part of the feedback system, which changes the apparent input impedance. High amounts of negative feedback act to keep the negative input at almost the same voltage as the positive input. If the positive input is essentially ground, the negative input is forced by feedback also to be essentially ground, often described as a virtual ground.
In most cases, external signals will see the negative input as a low-impedance ground, and this happens because of feedback, not op amp input impedance. Circuits which require inversion and so use the (-IN) input may:
Most op amp circuits show bipolar (that is, both positive and negative) power supplies referenced to a center ground. But few if any op amps have a ground pin, so they see only a single power circuit across the device whether bipolar supplies are used or not. The problem is that op amps have to be biased just like transistors: their output needs to rest between supply and ground or it will not be possible to represent both positive and negative signals. Even op amps with rail-to-rail input and output ranges cannot reproduce a negative voltage when operating on a single positive supply. Conventional op amps with a limited input voltage range may demand that the bias level be near half the supply. Often we need a low-noise, low-hum and sometimes even high-power voltage reference, typically at about half the supply voltage.
The usual way to get an intermediate voltage in a single-supply system is to use two similar resistors in series from power to ground. Unfortunately, this means that noise on the power lines will just be divided by two and then used on the input side of which could be a high-gain circuit. And the resistors will add thermal noise, and possibly resistor excess noise. We can reduce supply hum and noise by splitting the upper resistance into two and adding a serious capacitor to ground there. Another capacitor from the lower resistor to ground will act to filter out high frequency signals from Johnson white noise. Since most noise power is in high frequencies, removing those frequencies can reduce the effective noise level.
In contrast to Johnson noise,
resistor excess noise is a
1/f noise and is highest at low
frequencies, and power filtering may have little effect.
Non-homogenous resistors will generate
Another bias alternative for single-supply operation is to generate some fixed moderate voltage level above ground instead dividing Vcc. In contrast to the usual voltage reference, a bias typically does not have to be an exact voltage, and can vary somewhat with time and temperature. For example, if powe