Introduction

If you want to communicate safely with a website, you are probably using the RSA protocol. In this Notebook, we will implement this algorithm and try it on a simple example.

Principle

Let's assume you are Amazon. You have several customer and you need to ask them secrets informations that only you should be able to decrypt. To do so, the RSA works with 2 keys :

  • 1 Private key allowing you to decrypt the code
  • 1 Public key to share will all users to encode the information

When you ask the consumer for a sensitive information, they encrypt this information using the Public key and send this to the network. If someone else, receive it, they won't we able to decrypt it as they have only the public key. However, with your private key, you (and only you) can decrypt this information. That's why the private key must remain secret.

If you are a hacker and you want to break the private key to decrypt the information you have stolen in the network, you will need to do a Prime Factorization which is a very complex problem.

Let see how those keys are created

In [5]:
import random
from math import gcd

def coprimes(number):
    while True:
        e = int(random.random() * (number - 2)) + 2 
        if gcd(e, number) == 1: 
            break
    return e

def egcd(number, modulo):
    if number == 0:
        return modulo, 0, 1
    else:
        g, y, x = egcd(modulo % number, number)
        return g, x - (modulo // number) * y, y


def modinv(number, modulo):
    g, x, y = egcd(number, modulo)
    if g != 1:
        raise Exception('modular inverse does not exist')
    else:
        return x % modulo

Algorithm

The keys are created in 5 steps:

Step 1 : Find 2 primes number (p and q)

They must be as big as possible because this is giving the safety of the algorithm (it's often a ~200 digit primes number). However, here, we will use smaller one just for the example.

In [74]:
p = 3
q = 11

Step 2 : Compute 2 numbers (N and Phi(N))

This step is quite obvious, there is only 2 formulas to remember $$ N = p \times q $$ $$ \phi(N) = (p-1) \times (q-1) $$

In [75]:
n = p * q
phi = (p-1)*(q-1)

Step 3 : Take a random number coprime (e) with $\phi(N)$

At this step we need to find randomly a number being coprime with $\phi(N)$. This means that e and $\phi(N)$ should have only 1 a common divisor. We can use the GCD algorithm to get it

In [76]:
e = coprimes(phi)

Step 4 : Find d where $ d \times e \equiv 1 \pmod{ \phi(N) } $

This last import number is more tricky to get. It requires the computation of the Modular multiplicative inverse which is also based on the Extended Greatest Common Divisor. I used an implementation found on the internet.

In [77]:
d = modinv(e, phi)

(Bonus) Offset d and e

This step is not really required but it allows to hide a bit the $ \phi(N) $ when you use it on samll value like I'm doing now. In term of safety, I don't think it helps a lot...

In [78]:
e += int(random.random() * n) * phi
In [79]:
d += int(random.random() * n) * phi

Return the 2 keys

The 2 keys are now easy to create :

- One key is the pair (e, n)
- One key is the pair (d, n)

You select the one you want as private but after you keep it secret. The second one can be shared to everybody. Let's now summary the example above

In [81]:
print("p: {} , q: {}".format(p, q))
print("N: {}".format(n))
print("phi: {}".format(phi))
print("e: {} , d: {}".format(e, d))
print("{}*{} % {} = {}".format(d, e, phi, (d*e) % phi))
print("public key : {}".format((d, n)))
print("private key : {}".format((e, n)))
p: 3 , q: 11
N: 33
phi: 20
e: 173 , d: 57
57*173 % 20 = 1
public key : (57, 33)
private key : (173, 33)

Security

Just as a background, as we know how to create the key, we can understand why it's dangerous to use small p and q. The key contains N = p * q. If the hacker is able to find p and q from N. The hacker will have $ \phi(N) $. With that he has e (or d based on the key creation) and $ \phi(N) $. It's after very easy to find e (or d) and having the private key.

Let's now have a look at how to encode / decode

Encoding

This step is quite simple. The message you want to send must be converted to numbers. You can do the one you want but in this example, we will use the ascii code of every letter of the message.

To encode the message you have to use the public key (let say the pair (d, n)). For every number (k) you have to pass it through the formula below $$ code = k^d\mod{ n } $$

Decoding

The decoding step is exactly the invert with you private key (e, n): $$ k = code^e\mod{ n } $$

After you can passe from ascii to letter easily.

Let's do a real example

NB : For this computation with large number, do not use the normal power function and modulo. You have to use instead the modpow algorithm. This is because a big number to a big power won't fit in memory. This algorithm is a lot faster. It's implemented by default in python build-in pow function
In [87]:
def create_keys(p, q):
    
    def coprimes(number):
        while True:
            e = int(random.random() * (number - 2)) + 2 
            if gcd(e, number) == 1: 
                break
        return e

    def egcd(number, modulo):
        if number == 0:
            return modulo, 0, 1
        else:
            g, y, x = egcd(modulo % number, number)
            return g, x - (modulo // number) * y, y


    def modinv(number, modulo):
        g, x, y = egcd(number, modulo)
        if g != 1:
            raise Exception('modular inverse does not exist')
        else:
            return x % modulo
    
    n = p * q
    phi = (p-1)*(q-1)
    e = coprimes(phi)
    d = modinv(e, phi)
    e += int(random.random() * n) * phi
    d += int(random.random() * n) * phi
    return (e, n), (d, n)

def encode(word, public_key):
    return [pow(ord(letter), public_key[0], public_key[1]) for letter in word]
        
def decode(code, private_key):
    decoded = [pow(value, private_key[0], private_key[1]) for value in code]
    return "".join([chr(ascii) for ascii in decoded])

Creation of keys

In [83]:
public_key, private_key = create_keys(7741, 7753)
In [84]:
print("public key : {}".format(public_key))
print("private key : {}".format(private_key))
public key : (2050183293675059, 60015973)
private key : (2325870019225499, 60015973)

Encoding (customer side)

By using the public key, we can encode the message

In [88]:
code = encode("Hello World", public_key)
In [90]:
print("The code is : {}".format(code))
The code is : [5956990, 47214715, 21647324, 21647324, 43082101, 40738165, 15059835, 43082101, 54412267, 21647324, 47783031]

Decoding (your side)

This code is sent to the network and now you have to decrypt it using your private key

In [92]:
decoded = decode(code, private_key)
In [93]:
print("The message is : {}".format(decoded))
The message is : Hello World

Here it is, it worked ! However, keep in mind that this example uses very small primes and is highly unsecure. In addition, if the value $ p \times q $ is smaller than the value you want to encode, you won't be able to decrypt it. Here we uses the value of every letter in ascii so there is no risk but in some cases, we could use bigger chunk (for example a phone number complete)

Going further

As we saw it here, we can see limitation of this algorithm because everybody have the public key and only the one having the private key can decrypt your message. This means that there is no way for you (amazon) to send encoded message to the user (customer) as everybody have the public key and will be able to decrypt it.

However, there is alternative to have a complete crypted communication in few steps:

  • The customer generates also in his side a pair (public/private) key.
  • With the Public key he has from you, he encode his own public key and send it to you
  • With your private key, you can decyphter the public key he send you (and you keep your private key)
  • When you want to contact the customer, you use his public key to crypt the message
  • When you want to read the answer from the customer, you use your private key
  • On the customer side, it's reversed.

That's all. You only used a public key to send in a secure way a second pair of key for the other direction

Conclusion

In this Notebook, we saw how the most common algorithm is working to secure information on the internet. There is multiple other ones existing and I'll start to learn them.