How RSA Encryption works
If you hate math, you're gonna have a bad time: I'll let you know that right now. You might not have noticed but you've used RSA encryption before. It gets it's name from the MIT scientists Rivest, Shamir, and Adelman who invented the algorithm. Let's take a look at how RSA encryption works.
Prior to RSA, cryptography used symmetric algorithms. This essentially meant that if 2 people wanted to share encrypted data, they'd need to share identical keys first. If Hank wanted to send an encrypted message to Peggy, Hank would encrypt his message with his key, send the message and Peggy would decrypt the message using her copy of an identical key. For a physical analogy, imagine Hank wrote a note to Peggy, put it in a locked box and sent it to Peggy. Upon receipt, Peggy can use her copy of the key to unlock the locked box. But what if Hank wanted to send Dale an encrypted message? He'd need to either hand-deliver a secret key that only Hank and Dale or exchange keys using a Diffle-Hymie key exchange (something for a future post). Now Hank has two keys, one for communicating with Peggy and one for communicating with Dale.
In the late 60s, British mathematician James Ellis was working on a non-secret method of encryption. He didn't have all of the specifics ironed out but he realized that the actions of lock and unlock were inverse operations. Essentially, encrypting something and decrypting it were inverse operations to each other.
RSA is an asymmetric algorithm and uses a public/private key pair to encrypt/decrypt data. You can use a private key to encrypt data and a public key to decrypt it, but you can also use the public key to encrypt data and the private key to decrypt it. The key pair perform inverse operations each other. The general idea was that instead of having 2 identical secret keys (such as with symmetric algorithms), one key could be public and given out freely while one person had a secret private key. This would mean that Hank could have a private key and send the corresponding public key to Peggy, Dale, Bobby, Luanne, Cotton, Boomhauer, et al. Now, Hank only has a single key to manage but can still exchange encrypted messages. This idea is based on splitting a key into two parts: An encryption key and a decryption key.
Looking at it with a different analogy, what if Hank wanted to send Peggy a specific color without anybody in between them knowing what the color was? In order to explain this, you'll first need to know what a complimentary color is. A complimentary color is a color that when added to another color makes white, effectively cancelling out the first color. Mixing a color and it's complimentary color isn't difficult. However, figuring out what color and corresponding complimentary color was used from white is very difficult. This is an example of a one-way function.
Continuing with the color analogy, Hank generated his private key, which is the color red. His computer figures out that red's complimentary color is cyan. Hank sends the color cyan to Peggy. Peggy wants to send a specific yellow to Hank so she takes yellow to the cyan Hank sent her and the result is green. Peggy sends the resulting green to Hank, which when applied to Hank's private red color, results in the yellow Peggy wanted to send. Meanwhile, if someone was intercepting those communications, all they saw were cyan and green, which don't equal the yellow Peggy sent to Hank.
In reality, computers don't speak in color so there needed to be a mathematical way of accomplishing Ellis' idea. The solution was found in 1973 by Clifford Cocks, also a British mathematician and cryptographer. He needed to create what's referred to as a trap-door one-way function. As we talked about above a one way function is something that's relatively simple to do but very difficult to do in reverse. What Cocks was trying to do was create a mathematical function that was relatively easy to perform but hard to reverse unless you had special information, called the trap-door.
This can be done using modular exponentiation. Without getting too deep into number theory, modular exponentiation is a number, raised to a power and then divided by a modulus (another number) and output the remainder. Let's look at an example.
I'm taking 5, multiplying by itself 4 times and then dividing by 9. The solution to this is the remainder, which is 4.
This idea can be used to encrypt a message. Imagine Hank has a message, contained in the secret variable M. M is multiplied by itself e times (where e is a public exponent), which is then divided by a number N and the remainder is output to the variable c. This equation would look like
The reason this is important is because since M is secret, but e, N, and c are public, calculating c with the above equation is relatively easy. However, using e, N, and c to figure out M is difficult. This is our one-way function which we can apply to M. This formula represents our public key. Variables e and N are what create the public key.
The private key is the trap-door described above. You'll recall that a trap-door one-way function is easy to do but hard to reverse without the special information. The private key is our trap-door that makes reversing this simple. In order to reverse the equation, we'll need to raise c to the power of d and divide by N to get our original M.
How we decide what d is though? In order to figure out d, we'll need to use a second one-way function, which brings us to a mathematical discovery known as the Euclidean algorithm. Euclid was a Greek mathematician around 300 B.C. He discovered that any number has exactly 1 prime factorization. Prime factorization involves breaking a number down into it's lowest prime factors. For example, let's do some simple multiplication below:
If we were to find the prime factorization of 96, we'd find that 49 is a square of 7 so the prime factorization of 96 is 7*7*2. We can't break that down any further since 7 and 2 only share factors of 1.
For a computer, multiplying 2 numbers is very easy. The larger the numbers the longer it takes but we're talking about times of under a second for relatively huge numbers. Conversely, trying to undo that multiplication or factoring a number is harder to do. The larger the number, the longer the factorization will take. If the number is long enough, it could take a massive computer hundreds or even thousands of years to factor.
The variable N is found by generating 2 prime numbers and multiplying them together. These are often very long numbers, but for the sake of examples, I'm going to be demonstrating with small numbers. The prime factorization of N is something we're going to hide for now. Cocks needed a function that depends on knowing the prime factorization of N.
In order to further obfuscate the variable N, Cocks turned to Euler's Phi function. Also known as Euler's totient function, which counts how many integers are less than or equal to a number that does not share any common factor (other than 1).with that same number. For example, if we wanted to find the Phi of 8, we'd look at the numbers from 1 up to 8. We'd find that 2, 4, 6 and 8 are common factors of 8 while 1, 3, 5 and 7 are prime. Therefore Phi of 8, equals 4. If you look at a prime number, calculating the phi function, Phi is always the number -1. Let's look at an example below:
Since 11 is prime, it's only factor is 1. Therefore, the number of integers leading up to 11 are 10. Also worth noting is that the Phi function is multiplicative. For example, if you had the Phi of (7*11), than that would be equal to Phi(7)*Phi(11), which would be (7-1)(11-1), which breaks down to (6)(10), which equals 60.
If we know that the variable N is the product of 2 prime numbers, then the Phi of our variable N is just the value of Phi for each of our primes multiplied together. This is a trap-door for solving Phi of N. Since we hid the 2 primes that we used to create the variable N, finding the phi of N is easy.
If you're still following along, you might be wondering how this connects back to the modular exponentiation we discussed earlier. This is where another work of Euler comes in. It's Euler's Theorem, which states that a variable m raised to the power of phi(n) is equal to 1 mod n. To demonstrate this, I'll pick 2 numbers, m=5 and n=8.
Since the Phi of 8 is 4, then 5 raised to the 4th power would simplify to
We just need to modify this equation using 2 basic principles. The first is that 1 raised to the power of any number (which we'll call k), still equals 1 (1*1=1, 1*1*1=1, etc.). 1^k=1. The second is that 1 multiplied by m, it still equals m.
Remember that d variable we needed to undo the private key variables above? To be the trap-door of our one-way function? We find that d is calculated as
The private key forumula is as shown below.
Let's look at an example of this. Let's say Hank wants to exchange encrypted messages with Peggy. Hank will start by finding 2 similarly sized prime numbers. He finds 53 and 59. We'll call these variable p and q, respectively. Variables p and q multiplied together equal 3127, represented by variable n. Phi(n) is (53-1)(59-1) which equals 3016. He'll choose a small exponent like 3 which will be represented by variable e. Let's see what we've got so far.
p=53
q=59
n=3127
Phi(n)=3016
e=3
Hank then finds d, which is found by the formula (2*(3016)+1)/3. This equals 2011
d=2011
Hank then hides everything except for n and e, which is the public key. Hank can send this to Peggy to lock her message with. Peggy calculates her message as the number 89. Peggy will calculate
If you'll recall, our the formula for our public key was
For Peggy's message, variable c equals 1394
When 1394 is sent back to Hank, Hank then takes 1394 and raises it to the power of d, or 2011. When divided by n, the remainder is m. So,
which was Peggy's original message.
If Bobby, Dale or anybody else was listening to this, the only variables they would have are n, c and e, which is not enough to calculate m, unless they had d. They could only find d if they could calculate Phi(n), which means they'd need to know the prime factorization of N.
Now, this example was performed with small, numbers. In reality, most of the internet is using 2048 bit encryption, which means that the numbers they're dealing with are 617 digits. Cocks' work was considered classified information but was independently rediscovered by MIT students Rivest, Shamir, and Adelman in 1978, hence the name.
The Problem
Prior to RSA, cryptography used symmetric algorithms. This essentially meant that if 2 people wanted to share encrypted data, they'd need to share identical keys first. If Hank wanted to send an encrypted message to Peggy, Hank would encrypt his message with his key, send the message and Peggy would decrypt the message using her copy of an identical key. For a physical analogy, imagine Hank wrote a note to Peggy, put it in a locked box and sent it to Peggy. Upon receipt, Peggy can use her copy of the key to unlock the locked box. But what if Hank wanted to send Dale an encrypted message? He'd need to either hand-deliver a secret key that only Hank and Dale or exchange keys using a Diffle-Hymie key exchange (something for a future post). Now Hank has two keys, one for communicating with Peggy and one for communicating with Dale.
The History
In the late 60s, British mathematician James Ellis was working on a non-secret method of encryption. He didn't have all of the specifics ironed out but he realized that the actions of lock and unlock were inverse operations. Essentially, encrypting something and decrypting it were inverse operations to each other.
RSA is an asymmetric algorithm and uses a public/private key pair to encrypt/decrypt data. You can use a private key to encrypt data and a public key to decrypt it, but you can also use the public key to encrypt data and the private key to decrypt it. The key pair perform inverse operations each other. The general idea was that instead of having 2 identical secret keys (such as with symmetric algorithms), one key could be public and given out freely while one person had a secret private key. This would mean that Hank could have a private key and send the corresponding public key to Peggy, Dale, Bobby, Luanne, Cotton, Boomhauer, et al. Now, Hank only has a single key to manage but can still exchange encrypted messages. This idea is based on splitting a key into two parts: An encryption key and a decryption key.
A Non-Mathematical Example
Looking at it with a different analogy, what if Hank wanted to send Peggy a specific color without anybody in between them knowing what the color was? In order to explain this, you'll first need to know what a complimentary color is. A complimentary color is a color that when added to another color makes white, effectively cancelling out the first color. Mixing a color and it's complimentary color isn't difficult. However, figuring out what color and corresponding complimentary color was used from white is very difficult. This is an example of a one-way function.
Continuing with the color analogy, Hank generated his private key, which is the color red. His computer figures out that red's complimentary color is cyan. Hank sends the color cyan to Peggy. Peggy wants to send a specific yellow to Hank so she takes yellow to the cyan Hank sent her and the result is green. Peggy sends the resulting green to Hank, which when applied to Hank's private red color, results in the yellow Peggy wanted to send. Meanwhile, if someone was intercepting those communications, all they saw were cyan and green, which don't equal the yellow Peggy sent to Hank.
The Math Behind The Public Key
In reality, computers don't speak in color so there needed to be a mathematical way of accomplishing Ellis' idea. The solution was found in 1973 by Clifford Cocks, also a British mathematician and cryptographer. He needed to create what's referred to as a trap-door one-way function. As we talked about above a one way function is something that's relatively simple to do but very difficult to do in reverse. What Cocks was trying to do was create a mathematical function that was relatively easy to perform but hard to reverse unless you had special information, called the trap-door.
This can be done using modular exponentiation. Without getting too deep into number theory, modular exponentiation is a number, raised to a power and then divided by a modulus (another number) and output the remainder. Let's look at an example.
I'm taking 5, multiplying by itself 4 times and then dividing by 9. The solution to this is the remainder, which is 4.
This idea can be used to encrypt a message. Imagine Hank has a message, contained in the secret variable M. M is multiplied by itself e times (where e is a public exponent), which is then divided by a number N and the remainder is output to the variable c. This equation would look like
The reason this is important is because since M is secret, but e, N, and c are public, calculating c with the above equation is relatively easy. However, using e, N, and c to figure out M is difficult. This is our one-way function which we can apply to M. This formula represents our public key. Variables e and N are what create the public key.
The Math Behind The Private Key
The private key is the trap-door described above. You'll recall that a trap-door one-way function is easy to do but hard to reverse without the special information. The private key is our trap-door that makes reversing this simple. In order to reverse the equation, we'll need to raise c to the power of d and divide by N to get our original M.
How we decide what d is though? In order to figure out d, we'll need to use a second one-way function, which brings us to a mathematical discovery known as the Euclidean algorithm. Euclid was a Greek mathematician around 300 B.C. He discovered that any number has exactly 1 prime factorization. Prime factorization involves breaking a number down into it's lowest prime factors. For example, let's do some simple multiplication below:
If we were to find the prime factorization of 96, we'd find that 49 is a square of 7 so the prime factorization of 96 is 7*7*2. We can't break that down any further since 7 and 2 only share factors of 1.
For a computer, multiplying 2 numbers is very easy. The larger the numbers the longer it takes but we're talking about times of under a second for relatively huge numbers. Conversely, trying to undo that multiplication or factoring a number is harder to do. The larger the number, the longer the factorization will take. If the number is long enough, it could take a massive computer hundreds or even thousands of years to factor.
The variable N is found by generating 2 prime numbers and multiplying them together. These are often very long numbers, but for the sake of examples, I'm going to be demonstrating with small numbers. The prime factorization of N is something we're going to hide for now. Cocks needed a function that depends on knowing the prime factorization of N.
In order to further obfuscate the variable N, Cocks turned to Euler's Phi function. Also known as Euler's totient function, which counts how many integers are less than or equal to a number that does not share any common factor (other than 1).with that same number. For example, if we wanted to find the Phi of 8, we'd look at the numbers from 1 up to 8. We'd find that 2, 4, 6 and 8 are common factors of 8 while 1, 3, 5 and 7 are prime. Therefore Phi of 8, equals 4. If you look at a prime number, calculating the phi function, Phi is always the number -1. Let's look at an example below:
Since 11 is prime, it's only factor is 1. Therefore, the number of integers leading up to 11 are 10. Also worth noting is that the Phi function is multiplicative. For example, if you had the Phi of (7*11), than that would be equal to Phi(7)*Phi(11), which would be (7-1)(11-1), which breaks down to (6)(10), which equals 60.
If we know that the variable N is the product of 2 prime numbers, then the Phi of our variable N is just the value of Phi for each of our primes multiplied together. This is a trap-door for solving Phi of N. Since we hid the 2 primes that we used to create the variable N, finding the phi of N is easy.
If you're still following along, you might be wondering how this connects back to the modular exponentiation we discussed earlier. This is where another work of Euler comes in. It's Euler's Theorem, which states that a variable m raised to the power of phi(n) is equal to 1 mod n. To demonstrate this, I'll pick 2 numbers, m=5 and n=8.
Since the Phi of 8 is 4, then 5 raised to the 4th power would simplify to
Remember that d variable we needed to undo the private key variables above? To be the trap-door of our one-way function? We find that d is calculated as
The private key forumula is as shown below.
An Example
Let's look at an example of this. Let's say Hank wants to exchange encrypted messages with Peggy. Hank will start by finding 2 similarly sized prime numbers. He finds 53 and 59. We'll call these variable p and q, respectively. Variables p and q multiplied together equal 3127, represented by variable n. Phi(n) is (53-1)(59-1) which equals 3016. He'll choose a small exponent like 3 which will be represented by variable e. Let's see what we've got so far.
p=53
q=59
n=3127
Phi(n)=3016
e=3
Hank then finds d, which is found by the formula (2*(3016)+1)/3. This equals 2011
d=2011
Hank then hides everything except for n and e, which is the public key. Hank can send this to Peggy to lock her message with. Peggy calculates her message as the number 89. Peggy will calculate
If you'll recall, our the formula for our public key was
For Peggy's message, variable c equals 1394
When 1394 is sent back to Hank, Hank then takes 1394 and raises it to the power of d, or 2011. When divided by n, the remainder is m. So,
which was Peggy's original message.
If Bobby, Dale or anybody else was listening to this, the only variables they would have are n, c and e, which is not enough to calculate m, unless they had d. They could only find d if they could calculate Phi(n), which means they'd need to know the prime factorization of N.
Now, this example was performed with small, numbers. In reality, most of the internet is using 2048 bit encryption, which means that the numbers they're dealing with are 617 digits. Cocks' work was considered classified information but was independently rediscovered by MIT students Rivest, Shamir, and Adelman in 1978, hence the name.
Comments
Post a Comment