Privacy Tech-Know Blog: Secret Agent 101— Basics of Cryptography

Individuals and organizations have long had a need to protect secrets from prying eyes.  One way in which we protect those secrets is through the use of cryptography, from the Greek kryptós, meaning "hidden” or “secret" and graphein, meaning "writing".  Early forms of cryptography were used by the ancient Egyptians, Greeks and Romans.

While cryptography was long the exclusive preserve of governments, it has become much more ubiquitous and many facets of the modern digital era are underpinned by, and protected by, the use of cryptography.  It’s used to protect e-mail and other forms of communication (e.g., text messaging); financial transactions (e.g., online banking or shopping, “tap and go” payments); web browsing; and much more.  It can be used to protect data "at rest", such as information stored on computers and storage devices (e.g., laptops or backup drives), as well as to protect data “in transit” (e.g., data being transferred via the Internet, smartphones, and so on).

Traditionally, cryptography has been used to keep information hidden or secret, but cryptography provides other capabilities that are arguably just as important, namely:

  • message or data integrity (ensuring that data cannot be modified in an unauthorized or undetected manner);
  • sender/receiver identity authentication (ensuring that you know who you are dealing with); and
  • non-repudiation (ensuring that parties to a transaction cannot later deny being part of it).

Hail Caesar!

The Caesar Cipher, used during the time of Julius Caesar (as its name suggests), is a relatively simple scheme that can be used to illustrate many of the basic concepts in cryptography.

This cipher is based on shifting alphabet letters by a predefined number of spaces and using the resulting shifted letters to transform a message.  For example, using a shift of 3 spaces, A becomes D, B becomes E, and so on.  The message “MEET ME AT NOON” becomes “PHHW PH DW QRRQ”.  If the shift is large enough to go beyond the letter Z, it starts again from the letter A (i.e., think of the alphabet as a ring or loop, with the letters A and Z attached).  This type of cipher is known as a substitution cipher.

The original message (“MEET ME AT NOON”) is also known as the plaintext, while the transformed message (“PHHW PH DW QRRQ”) is called the ciphertext or the encrypted message.  The process of creating the ciphertext from the plaintext is encryption, while deriving the plaintext from the ciphertext is decryption

The shifting of letters by a predefined amount is the encryption algorithm (the opposite shift, to recover the plaintext, is the decryption algorithm).  In this case, the encryption and decryption algorithms are the same.  The size of the shift, in this case 3, is the key for this algorithm, and given that the key is the same for both encryption and decryption it is called a symmetric key system.  There are also asymmetric key systems (ones where the keys are different) that are in use today, but we’ll leave that discussion for a separate post.

The idea is that a message or other information will be transformed in such a way that only authorized parties can access it.  An authorized recipient would be provided with the appropriate key and would use it to decrypt the message.  The assumption is that without the key, unauthorized users would not be able to decrypt the message.

Caesar’s Downfall

However, there are only 26 possible keys in the Caesar Cipher: the shift distances of 0, 1, 2, etc. up to 25.  A shift of 0 leaves the message unchanged, so a key equal to 0 is not going to keep many secrets. Shift distances, or keys, greater than 25 merely repeat the cycle (e.g., a key of 26 is equivalent to a shift of 0, and a key of 30 is equivalent to a shift of 4).

If an unauthorized individual is able to intercept an encrypted message, and if that individual suspects the nature of the algorithm used, it’s easy to try each of the 25 keys (leaving out 0) to see if any meaningful message results.  This method of code-breaking (more formally, cryptanalysis) is known as exhaustive search or brute force. This method of analysis may have been less feasible at the time of the Romans, but today it’s very easy using modern computing—in this case, the search will be very short, and the cipher provides a very limited amount of protection.

In a simple substitution cipher, each letter of the plaintext is replaced with another, and any particular letter in the plaintext will always be transformed into the same letter in the ciphertext.  In the English language, the letters E, T, A and O are the most common, while Z, Q, and X are rare.  Similarly, certain combinations of letters, such as TH, ER, ON, and AN, are also quite common. An alternative mode of code-breaking, known as frequency analysis, starts by counting the frequency with which ciphertext letters and combinations occur and then associating those with guessed plaintext letters and combinations.  For example, a high frequency of the letter X in the ciphertext could mean that X corresponds to the letter E in the plaintext.  By making several such associations, the individual may start to reveal words or parts of words, which in turn may help them decrypt more of the message.  Furthermore, the same approach can be applied to the length of words, assuming spaces are not changed from plaintext to cyphertext. For example, a high frequency of the word WKH in the ciphertext could mean that it corresponds to the word THE in the plaintext.

Variations on a Theme

There are ways to make ciphers more resistant to brute-force searches or frequency analysis, making them more secure.  For instance, instead of using a fixed shift (of 3) for the key as described above, the shift can be based upon a previously agreed text (e.g., the soliloquy from Shakespeare’s Hamlet).  In this case, the pre-agreed text (“to be or not to be”) becomes the key, and each letter of the plaintext message is shifted by an amount determined by the corresponding letter in the key.  Using the same example as before, the plaintext message (“MEET ME AT NOON”), with the key (“TOBE OR NO TTOB”), becomes the ciphertext (“FSFX AV NH GHCO”).  While this adds a certain amount of randomness to the ciphertext, making cryptanalysis somewhat more difficult, if the pre-agreed text becomes compromised in some way (i.e., an adversary guesses it based on how well known it might be), then the plaintext messages can be retrieved fairly easily.

There is yet another variation on the Caesar Cipher, known as the One-Time Pad or OTP, which is considered to be unbreakable.  OTPs require a random sequence of letters that is the same size as, or longer than, the message that’s being sent to act as the pre-shared key.  Similar to the previous example, we convert the letters in the plaintext and the OTP (the key) to their numerical position in the alphabet.  We then apply the Caesar cypher (i.e., add the number values together ) to each plaintext / key letter pair to create the ciphertext. For example, the plaintext message (“MEET ME AT NOON”), with OTP (“VAII QE LC NCXC”), becomes the ciphertext (“HEMR CI LV AQLP”). The approach can also be extended to include spaces (to avoid frequency attacks on the length of words) as well as numbers and symbols (to avoid giving away measures such as distance or quantities).

However, for OTPs to be unbreakable in practice, several conditions must all be met:

  • Only two copies of the OTP should exist (one for the sender, one for the recipient).  If you need to communicate with a large number of people, this quickly becomes a cumbersome system to manage;
  • The OTP must be kept secret, which means having a secure way to share and keep it secure (cue secret agents exchanging messages in dark alleys);
  • The OTP (i.e., the key) should consist of truly random characters (which is not as easy to do as it might seem);
  • The OTP should be as long as, or longer, than the plaintext.  This may be difficult to achieve if you are trying to encrypt a large amount of free-form text;
  • The OTP should be used once and only once (re-use of the OTP could give an adversary sufficient information to break the code); and
  • Both copies of the OTP must be destroyed immediately after use (to prevent an adversary decrypting the message(s)).

Although OTPs might be great for spies, they’re not very practical for modern day-to-day communications.  Instead, the security of modern communications rests largely on the use of asymmetric cryptographic systems such as Secure Sockets Layer (SSL) or Transport Layer Security (TLS).  But that’s a story for a different blog post.

Suggested Reading

Date modified: