Monday, June 3, 2019

Statistical techniques for cryptanalysis

statistical techniques for crypt analysisIntroduction Cryptography is the art of writing essences in code or goose egg, to disguise, and thereby secure the con xt of a peculiar(a) stream of text. When engraveed, a plain text message undersurface be revealed only through the in bow out of the primaeval utilise to encode the think. Cryptography does non mask the existence of the message, but does disguise its content 1. In contrary, cryptanalysis is the art of recovering the plaintext of a message with verboten access to the break. Successful cryptanalysis whitethorn recover the plaintext or the paint for a specific ciphertext 2. There ar five general types of cryptanalytic antiaircraft guns- 1. Ciphertext-only attack In this type of attack, the cryptologist has a series of cipher texts entered apply the same write in codeion algorithm. Then, the cryptanalyst withholds the plain text of each of the cipher texts or identifies the recognise used to inscribe the cip her text2. Known-plaintext attack In this type of attack, the cryptanalyst has a series of ciphertext and their corresponding plaintext values encrypted using a specific key. The cryptanalyst then tries to deduce the key by forming a relationship between the ciphertext and plaintext entries. 3. Chosen-plaintext attack In this type of attack, the cryptanalyst not only has access to the ciphertext and associated plaintext for several messages, but he also chooses the plaintext that gets encrypted. His job is to deduce the key used to encrypt the messages or an algorithm to trace all new messages encrypted with the same key.4. Frequency analysis It is the study of the oftenness of every(prenominal)owtersor groups of earns in aciphertext. The manner is used as an aid to breakingclassical ciphers. Frequency analysis is found on the event that, in any given exsert of written language, certain garner and combinations of letters occur with varying frequencies.5. Rubber-hose crypt analysis The cryptanalyst threatens, tortures or blackmails the person who has the key until they give it up.Among the many cryptanalytic techniques, frequency analysis or frequency counting is the or so basic technique applied to break substitution cipher based algorithms, among the varied list of attack techniques. The basic use of frequency analysis is to first count the frequency of ciphertext letters and then associate pass judgmented plaintext letters with them. More complex use of statistics domiciliate be conceived, such as considering counts of pairs of letters digrams, trigrams, and so on. This is done to provide more than information to the cryptanalyst.It exploits the flunk in the substitution cipher algorithm to encrypt similar plaintext letters to similar ciphertext letters. Frequency analysis based cryptanalysis techniques were used to break ciphers based on the traditional cryptologic algorithms, but they do not work well with the modern block cipher based cryp tographic algorithms. Statistical properties of EnglishFrequency analysis based cryptanalysis uses the item that natural language is not random in nature and case-by-case alphabetical based substitution does not hide the statistical properties of the natural language. In the case of encryption using monoalphabetic substitution, to start deciphering the encryption it is useful to get a frequency count of all the letters. The just about frequent letter may represent the most common letter in English, E followed by T, A, O and I whereas the least frequent are Q, Z and X 7. Statistical patterns in a language can be detected by tracing the redundancy of the text in the language. It has been realized that various universal regularities remember text from dissimilar domains and languages. The best-known is Zipfs law on the distribution of word frequencies 5, according to which the frequency of terms in a collection decreases inversely to the wander of the terms. Zipfs law has been f ound to apply to collections of written documents in virtually all languages 5. English language characters have a actually high redundancy set up when used for cryptographic substitutions. If we have a message encrypted using the substitution cipher that charters to be break danceed, we can use frequency analysis. In otherwise words, if the sender has used an encryption scheme, that replaces one letter in the English to be another letter in English, we can still recognize the skipper plain text as, the frequency characteristics of the original plain text allow be passed on the new cipher text characters 4. To apply frequency analysis, we forget need to know the frequency of every letter in the English alphabet, or the frequency characteristics of the language used by the sender to encrypt the text. beneath is a list of average frequencies for letters in the English language. So, for example, the letter E accounts for 12.7% of all letters in English, whereas Z accounts for 0.1 %. All the frequencies are tabulated and plot below-For example, let us consider the following sentence We study Cryptography as part of our course. Using a simple substitution cipher, let us consider the following a-c , b- d, c-e..w-y, x-z, y-a, z-bSo, the cipher text becomes yg uvwfa etarvqitcrja cu rctv qh qwt eqwtug. A simple frequency analysis of the cipher text can be carried out and the results are as given belowThe above data can be used by a cryptanalyst to identify the key or the plaintext by using simple substitution to the cipher text till a suitable plaintext value is not identified.Apart from the use of mono alphabetic frequency analysis, cryptanalysts also identify frequency of paired letters better known as digram frequency and that of three letter words, called as Trigram frequencies. These help the cryptanalyst to exploit the redundant features of English language to break the cipher.The most common Digrams (in order) th, he, in, en, nt, re, er, an, ti, es, on , at, se, nd, or, ar, al, te, co, de, to, ra, et, ed, it, sa, em, ro.The most common Trigrams (in order) the, and, tha, ent, ing, ion, tio, for, nde, has, nce, edt, tis, oft, sth, men Table 1 Digram and Trigram Frequencies 6These help in identifying the most commonly used terms in English to break a cipher. The digram frequencies are used to break dickens letter words such as an, to, of etc and the trigram frequencies are used to break three letter words such as the, are, for etc. After breaking a significant twain letter and three letter words, it is practically east to identify the key from the cracked values of plaintext by matching the corresponding values in the ciphertext. This huge weakness in English language is used to break cipher texts encrypted using simple algorithms that make use of English alphabets. In practice the use of frequency analysis consists of first counting the frequency of ciphertext letters and then assigning guessed plaintext letters to them. Many lett ers will occur with roughly the same frequency, so a cipher with Xs may indeed social occasion X onto R, but could also map X onto G or M. But some letters in every language using letters will occur more frequently if there are more Xs in the ciphertext than anything else, its a upright guess for English plaintext that X is a substitution for E. But T and A are also very common in English text, so X might be either of them also 4. Thus the cryptanalyst may need to try several combinations of mappings between ciphertext and plaintext letters. Once the common iodin letter frequencies have been resolved, then paired patterns and other patterns are solved. Finally, when sufficient characters have been cracked, then the rest of the text can be cracked using simple substitution. Frequency analysis is highly effective against the simpler substitution ciphers and will break astonishingly short cipher texts with ease.Attacks on Traditional algorithms Encrypting using traditional algorithm s have been defenseless against cryptanalytic attacks as they use bit by bit encryption, which can be easily broken using frequency analysis based attacks. 1. Caesar CipherConsidering the case of one of the oldest ciphers, the Caesar Cipher, this cipher replaces one letter of the plaintext with another to produce the ciphertext, and any particular letter in the plaintext will ever, acidulate into the same letter in the cipher for all instance of the plaintext character. For instance, all Bs will turn into Fs. Frequency analysis is based on the fact that certain letters, and combinations of letters, appear with characteristic frequency in essentially all texts in a particular language 9. For instance, in the English language, E is very common, while X is not. Likewise, ST, NG, TH, and QU are common combinations, while XT, NZ, and QJ are very uncommon, or even im manageable to occur in English. This clearly shows how the Caesar cipher can be broken with ease by just identifying the frequency of each letter in the cipher text. A message encrypted using Caesar cipher is extremely insecure as an exhaustive cryptanalysis on the keys easily breaks the code.2. Substitution Ciphers The Caesar cipher forms a subset of the entire set of substitution ciphers. Here, the key of the encryption procedure is the permutation of all the twenty six characters of the English alphabets. Rather than choosing a particular key for all encryption process, we use a different key for successive encryption processes. This technique increases the number of possible key to 26, which is about 4 X 1026, which eliminates the exhaustive cryptanalysis attack on the keyspace 7. To decrypt the cipher the, statistical frequency distribution of single letter occurrence in English language is analyzed. Then, the digram and trigram frequencies of standard English words are compared with the frequencies of the trigrams in the cipher to finally reconstruct the key and in turn decipher the text. This is an efficient method to break the substitution cipher as, each plaintext letter is stand for by the same ciphertext letter in the message. So, all properties of plaintext are carried on to the cipher text.3. Vigenere Cipher In a Vigenere cipher, there is greater security as, a given plaintext letter is not always represented by the same ciphertext letter. This is achieved by using a sequence of n different substitution ciphers to encrypt a message. This technique increases the possible number of keys from 26 to (26)n. Although this was considered to be unbreakable, the Kasiskis method of attacking a Vigenere cipher yielded successful results of decrypting the message. According to this method, the first step is to discern the key continuance (n). Find identical segments of plain text that get encrypted to the same ciphertext, when they are b positions apart, where b=0 mod n. According to Kasiski, the near step is to find all the identical segments of length greater than 3, and record the distance between them 7. This can then be used to predict the length of the key (n). Once this is found the key is found by an exhaustive search of the keyspace for all possible combinations to identify the key. This is done by substituting all possible values for n to generate substrings. Once the substring is formed, the plaintext message can be automatically identified by using the back substitution of the key into the cipher 7. This can be done for all possible values for n until finally arriving at the actual key, which reveals the plaintext that was encrypted. This method can purport a long time to break the key to identify the plaintext incase the key length is very long, as the keyspace value would be large for larger keys.Defeating frequency based attacks Frequency based attacks have been used for a long time to break traditional encryption algorithms. It uses the fact that, traditional encryption algorithms do not eliminate the statistical properties of the l anguage upon encryption. The first way to defeat frequency based attacks is to encrypt blocks of characters at a time rather than single letters 7. This would ensure that, the same text in the plaintext is not encrypted to the same text in the ciphertext upon encryption. For e.g., if we use the Caesar cipher encryption scheme, the word ADDITIONAL will be encrypted to CFFKVKQPCN, we can see that the alphabets A, D and I are repeated more than once and at each instance, the encryption scheme used always encrypts A to C, D to F and I to K. This can clearly be used during frequency analysis to analyze the redundancy of the characters and in turn map them back to get the original plaintext character. Using a block encryption scheme, one can be satisfied that, this phenomenon does not occur as, in a block encryption scheme, the whole plaintext is broken into chunks or blocks of data, that is fed in as input to the encryption algorithm. The algorithm then, reads the input block along with the key and encrypts the complete block of plaintext, rather than individual characters, so there is a smaller chance that two blocks will produce the same chunk of ciphertext.The routine way of defeating frequency analysis is to make use of synonyms of words 7, rather than repeating the same word over and over again in a sentence. There are a lot of words in English, which have more than one synonym, thus providing with a set of words to be used as convenient in the particular context. To help in the selection of a synonym, grammar checking would have to be used to ensure that, the meaning expressed in the sentence is not altered by changing the words. Attacks against this technique could include creating a list of the best synonyms, but this would not help the attacker as different word could be used at each instance the same meaning needs to be expressed, defeating the benefit of this technique. This technique of using change words to represent common words to defeat cryptanal ysis attacks is called Homophones 7 in cryptography.A third technique that can effectively defeat cryptanalysis is Polyalphabetic substitution, that is, the use of several alphabets to encrypt the message 3, rather than using the same substitution technique again and again. The Vigenere Cipher is a form of Polyalphabetic cipher. This ensures that, no two characters are encrypted to the same ciphertext alphabet in the same message. This ensures that, direct frequency analysis of the cipher is not possible to successfully retrieve the original message. However, other techniques need to be used to identify the key length, if this is possible, then frequency analysis attack could be used to identify the original plaintext message successfully.Finally, a possible technique that could be used to defeat frequency analysis is to encrypt a single character of plaintext with two ciphertext characters 3. Upon encountering the same character twice, then different characters should be used to en crypt the message. This can be achieved by using a key size double that of the plaintext message and then encrypting the same plaintext with two values in the key and save them together for the same plaintext character. This would ensure that no two plaintext characters will have the same ciphertext character, defeating the frequency analysis method of breaking the cipher. Modern encryption algorithms and cryptanalysis Modern cryptographic algorithms take a better approach in defeating frequency analysis based attacks. The cryptographic algorithms nowadays use block encryption, rather than encrypting characters bit by bit, thus eliminating the redundancy of ciphertext alphabets for similar plaintext alphabets. Block ciphers are the central tool in the heading of protocols for shared-key cryptography. A block cipher is a function E 0, 1k - 0, 1n 0, 1n. This notation means that E takes two inputs, one being a k-bit string and the other an n-bit string, and returns an n-bit string 2. The first input is the key, which is used to encrypt the secret message. The second string is called the plaintext, and the output is called a ciphertext. The key-length k and the block-length n are parameters associated to a specific block cipher. They vary from block cipher to block cipher, and depend on the design of the algorithm itself. both(prenominal) of the most trusted symmetric ciphers include AES, Triple-DES, Blowfish, CAST and IDEA. In public-key cryptography, the most commonly used cryptosystems are RSA and the Diffie-Hellman systems, which have not been found to have any vulnerabilities till date. Preferably, the block cipher E is a public specified algorithm. In typical usage, a random key K is chosen and kept secret between a pair of users. The function EK is used by the sender to encrypt the message, for a given key, before sending it to the intended receiver, who decrypts the message using the same key 2. Security relies on the secrecy of the key. So, at first, one might think of the cryptanalysts goal as recovering the key K given some ciphertext, intercepted during transmission. The block cipher should be designed to make this task computationally difficult. In order to achieve this, the algorithms that are used to encrypt the message must be designed with a high degree of mathematical complexity, which cannot be reversed to obtain the plaintext from a known ciphertext. The length of the key used during encryption of a message plays an important role in deciding the effectiveness of an algorithm. Key length is conventionally measured in bits, and most of the well known strong ciphers have key lengths between 128 and 256 bits. A cipher is considered strong if, after years of attempts to find a weakness in the algorithm, there is no known effective cryptanalytic attack against it. This indicates that, the most efficient way of breaking an encrypted message without knowing the key used to encrypt it is to brute force it, i.e. trying all po ssible keys. The effort required to break an encrypted message is determined by the number of possible keys, known as thekeyspace. Knowing the speed of the computer to break the key, it is easy to calculate how long it would take to search the keyspace to break a particular cipher 2.For example, considering a cipher that uses 128-bit keys, each bit can either be 0 or 1, so, there are 2128 or 3-1038 keys approximately. Suppose we imagine that about ten billion computers are assigned the task of breaking the code, each capable of testing ten billion keys per second, then, the task of running through the entire keyspace would take around 3-1018seconds, which is about 100 billion years. But, in fact, it would be necessary to run through only half the keyspace to hit upon the correct key, which would take around 50 billion years. This is longer than the estimated age of the universe according to modern cosmology, which is about 15 billion years 2. This shows that, it is practically infea sible to crack modern cryptographic algorithms using Brute Force attacks. So, one can imagine the effectiveness of the modern cryptographic algorithms and their resistance towards cryptanalytic attacks.Conclusions Cryptography has progressed in recent years and modern cryptographic algorithms have proved to be successful in defending against most forms of cryptanalytic attacks. Frequency analysis based attacks have proved to exploit the weaknesses in traditional encryption algorithms into revealing the plaintext message that was encrypted using them. The natural language used to encrypt messages is not considered to be random in nature, which is exploited by frequency counting based attacks. Based upon the frequency of letters that occur in the ciphertext, one can guess the plaintext characters due to their redundancy rate and the specific combination of letters in a word. This weakness can be repelled by using stream ciphers, which do not carry the redundancy in the plaintext to th e ciphertext. Modern block cipher, encrypt a chunk of plaintext into ciphertext and vice versa, eliminating the redundancy of language used in encryption. Although the algorithm plays an important part, it is the key length used in block ciphers that helps in repelling cryptanalysis. Modern ciphers use a key length starting from 128 bits, eliminating the speculation of a brute force attack to decrypt the message. The higher the key length, the more time it takes to break these ciphers. These advantages have made modern cryptographic algorithms more popular among the security community. No known weaknesses have been found in these algorithms yet, that may allow one to identify the plaintext message. Bibliography1 Stallings, W., Cryptography and Network Security, Chapter 1, Third Edition, apprentice Hall, 20032 Schneier, B., Applied Cryptography, Chapter 1, Second Edition, John Wiley Sons, newfangled York City, New York, USA, 19963 Hart, G.W., To Decode Short Cryptograms, Communic ations of the ACM 37(9), 1994, pp. 102-1084 Lee, K.W., Teh, C.E., Tan, Y.L., Decrypting English Text Using Enhanced Frequency Analysis, National Seminar on Science, Technology and Social Sciences (STSS 2006), Kuantan, Pahang, Malaysia5 Zipf, GK., Human Behaviour and the Principle of Least Effort, 1949, Cambridge Addison Wesley Publications.6 Lewand, R.E., Cryptological Mathematics, The Mathematical Association of America, 2000, Pages 345-3467 Stamp, M and Low, R.M., Applied Cryptanalysis, 2007, Chapter 1 and 2, John Wiley Sons, New York City, New York, USA 8 http//www.simonsingh.net, Online internet frequency analysis tools9 http//www.textalyser.net, online text analysis and frequency analysis information

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.