Maybe you've already felt this coming... The subject of this blog post is a little different compared to the previous ones.
This time, I'm here to write about entropy.
To start with, I have revisited the basic defitinion in termodynamics. I started with the second law, which states that, entropy of the entire universe, as an isolated system, will always increase over time.
Nevertheless I wanted to keep up with the IT world.
So I have steered my interest towards to the information theory. After all, the term entropy is also used in the information theory. While doing my research on this area, I turned the steering wheel a little more and ended up with the Linux Kernel Entropy, which we generally come across while we are dealing with the CSPRNGs (Cryptographically Secure Pseudorandom Number Generators).
We think about CSPRNGs while getting random numbers from /dev/randon and /dev/urandom on Linux Operating Systems. So, when we take a little deep dive, we see a thing called entropy pool there.. Anyways, I think you get the idea.. These all are connected..
Actually these are very deep and huges subjects, but I will keep it as compact as possible.
I don't find long blog posts very practical :)
In this post, we will take a look at the information theroy, make some introduction to the entropy and lastly, check the linux side of this subject.
The information theory started with Claude Shannon, a mathematician, electrical engineer and a cyrptographer. The first article in this area was Mathematical Theory of Communication. It was written by Shannon in year 1948.
The goal of this discipline was to efficiently and reliably transmit a message from a sender to a recipient. As we all know that, these messages are composed of bits. Either 1 or 0. This topic focuses on the relationship between the bits used in the messages and the bits representing the information itself. Some kind of an optimization actually..
That is, when we communicate, we want useful information to get through.. More useful information more better.
So according to information theory, when we send a message with an useful information, we actually reduce the receipient's uncertainity about the related subject.
According the Shannon, when we send one bit of useful information, we reduce the receipient's uncertainity by a factor of 2. (provided that we work with logarithmic base 2)
The term entropy is used as a measure of how certain the things are.
Well, let's do a thought experiment to understand it better.
Suppose, we decided to flip a fair coin.
At t0, we don't know the outcome. We don't have any information about the outcome.. head or tails?
(We may only know the previous throw's outcome. ) So, %50 chance for head and %50 chance for tails. Just suppose head is 1, tails is 0.. So we are using one bit for each.. So our message length is actually 1 bit here..
(We may only know the previous throw's outcome. ) So, %50 chance for head and %50 chance for tails. Just suppose head is 1, tails is 0.. So we are using one bit for each.. So our message length is actually 1 bit here..
At t1, we throw the coin and once we saw the output, we know it is a tail or a head. 1 or 0. So now we know..
At t0, there were 2 equally like options and now at t1, there is 1. So our uncertainity is reduced by 2. The information entropy here is actually equal to 1, as it is based on base-2 logarithm of the possible outcomes. The base 2 logarithm of 2 is 1. So, the result of the logarithm (1) tells us how uncertain this event is.
At t0, there were 2 equally like options and now at t1, there is 1. So our uncertainity is reduced by 2. The information entropy here is actually equal to 1, as it is based on base-2 logarithm of the possible outcomes. The base 2 logarithm of 2 is 1. So, the result of the logarithm (1) tells us how uncertain this event is.
Or if we stick to the formula;
S = - (minus) base 2 logarithm of 0.50.. (base 2 logarithm of 0.50 is -1)..
So again, the Entropy(S) is 1.
Ofcouse, I have shortened the calculation but here is what happens in details ( using the formula)
One more thing about t1 -> at t1, we received a single bit of usual information. We actually received 1 bit in this example (as we said 1 or 0), but we could receive 5 bytes as well..
I mean, suppose we received the outcome as "Tails", as a 5 bytes (40 bit) string rather than a single bit (1 or 0 for representing it).. Even in that case , we would still receive a 1 bit of useful information.
I mean, suppose we received the outcome as "Tails", as a 5 bytes (40 bit) string rather than a single bit (1 or 0 for representing it).. Even in that case , we would still receive a 1 bit of useful information.
Of course, in case we throw a fair dice, the entropy is different than the one in our flipping coin example.
In this case we have 6 equally likely outcomes/possibilities.. The uncertainity reduction factor is 6 here. So, the base 2 logarithm of 6 gives us the entropy.. It is 2.5849625007.
Extra note: Notice the dice in the above picture. The corners are steep, not round.. This is good for having fairness.. Our dice should have those type of corners as we want it to move in a caotic manner , when we throw it..
The things get interesting when the probabilities of the outcomes are not equally likely.
Actually the same formula applies even in this case. The one that I mentioned earlier.
Remember the sum(sigma) in that formula. The symbol Σ (sigma)..
The things get interesting when the probabilities of the outcomes are not equally likely.
Actually the same formula applies even in this case. The one that I mentioned earlier.
Remember the sum(sigma) in that formula. The symbol Σ (sigma)..
The Cross-entropy is also an interesting topic and it is widely used in machine learning as a cost function.. However; I will not go into that details in this blog post..
These topics may be the motivations of another blog post in the future.
These topics may be the motivations of another blog post in the future.
Well, lets continue with the CSPRNG and Linux kernel perspectives.
In computer security and in Linux world, random numbers are important.. The generators that generates those random numbers are also very important.. No one should be able to predict the next number , which will be generated by the generator.
In Linux and in all other Operating systems and security implementations, we want them to be unpredictable..
As you may guess, the predictability in this case is also quantified in a measure called entropy.
Remember our flipping coin example? It had 1 bit of entropy. There was no predictability in the coin's output. If the coin was unfair (lets suppose it always outputs tails), the outcome of flipping that coin would have less entropy (zero entropy actually).. So it could be guessed with a certainity.
In short, in Linux we want that, the uncertainity in those random numbers, in those the random bytes..
This is because, a lot of the security protocols and systems depends on those random bytes.
For instance the encryption keys.. They are based on those random bytes and they should be really hard to predict.
SSL/TLS and SSH keys are another examples that use this randomness.
We have a challenge here.. Well, we are using the computers to generate those random bytes right?
A computer is a deterministic machine.. So what will a computer use for supplying this randomness? How can an OS (Linux in this case) generate unpredictable random bytes?
The answer is short. Our physical world... In our world, the things/events are not always happening in the same way.. So by feeding itself with these random things/events, an OS like Linux can have a source for creating this randomness.
Linux does this by using the events, the interrupts. A mouse move, a keyboard click , a driver interaction or an I/O operation have attributes like mouse position or I/O time.. They differ right?
They are random.. Consider I/O attributes.. An I/O that takes 2 seconds to finish in a system, may take 5 seconds to finish in another system.. Kernel have rights to get those info from those interrupts.. So Linux use these attributes to build a pool called the entropy pool and uses it as a source while generating the unpredictable random numbers.
Ofcourse having the info about those interrupts is not enough for generating the good random bytes that we need. For instance, elapsed time of an I/O may differ. It may differ according to the disk speed, the position of the requested block and so on. But, how different can it be?
So having those interrupts provides a source for generating some unpredictability.. It has some entropy, but actually we need more. They can't directly satisfy our random byte needs and they are not uniform either.
We need to increase the unpredictability and at this point, we have the CSPRNGs as the solution.
CSPRNG stands for Cryptopgraphically Secure Pseudo Random Number Generator.
This tool takes the input and generates random and uniform bytes.. Generated bytes depends on the input only.
The values of those random events that I just mentioned, are the input of CSPRNGs. There are several types of these CSPRNGs, but I will concantrate on the hash-based CSPRNGs to make this subject easy and clear. CSPRNGs which are based on hash functions basically..
Remember, the output of a hash function is uniform. It is impossible to reverse a hash function.(you can't know the input by just looking at the output). A hash function takes a limited amount of input and generates a fixed amount of output.
So what happens is;
We have a pool and suppose it is filled with zeros at the start.
When an event happens, it is serialized and hashed with the things that are already in the pool.
When an new event happens, we repeat. The new event is serialized and hashed with the things that are in the pool at that time. (this time it doesn't have zeros..)
This serialization and hashing is repeated every time a new event happens.
The thing that mixes the events into the pool is called the steering function. It is needless to say that this function should be very fast.
As you may guess, the contents of pool are mixed again and again, while new event are happening. The entropy is increased and the predictability is decreasing. An entropy pool is coming out! :)
The predictability is pretty low.. Think about it.. In order to predict the next random number or to predict the current contents of the entropy pool , you need to have the information about all the events that has been entered into the entropy pool.. So the value generated from this thing is unpredictable..
Well , let's look at this subject from OS perspective..
We have the CSPRNG in the Linux kernel :) It maintains the entropy pools and executes the related mechanism for generating the random bytes.
Kernel has the access for all those events/interrupts and it runs the CSPRNG when we need random bytes. Having this kind of a mechanism inside the kernel also provides centralization and security. (much better protection for the pool, better protection than user space applications provide)
In Linux, we have /dev/urandom and /dev/random for this. These are character devices and they look like files. We read them like we read files and when we read 100 bytes from them, they actually run CSPRNG on the entropy pool and give us the random number we need.
In Linux and in all other Operating systems and security implementations, we want them to be unpredictable..
As you may guess, the predictability in this case is also quantified in a measure called entropy.
Remember our flipping coin example? It had 1 bit of entropy. There was no predictability in the coin's output. If the coin was unfair (lets suppose it always outputs tails), the outcome of flipping that coin would have less entropy (zero entropy actually).. So it could be guessed with a certainity.
In short, in Linux we want that, the uncertainity in those random numbers, in those the random bytes..
This is because, a lot of the security protocols and systems depends on those random bytes.
For instance the encryption keys.. They are based on those random bytes and they should be really hard to predict.
SSL/TLS and SSH keys are another examples that use this randomness.
We have a challenge here.. Well, we are using the computers to generate those random bytes right?
A computer is a deterministic machine.. So what will a computer use for supplying this randomness? How can an OS (Linux in this case) generate unpredictable random bytes?
The answer is short. Our physical world... In our world, the things/events are not always happening in the same way.. So by feeding itself with these random things/events, an OS like Linux can have a source for creating this randomness.
Linux does this by using the events, the interrupts. A mouse move, a keyboard click , a driver interaction or an I/O operation have attributes like mouse position or I/O time.. They differ right?
They are random.. Consider I/O attributes.. An I/O that takes 2 seconds to finish in a system, may take 5 seconds to finish in another system.. Kernel have rights to get those info from those interrupts.. So Linux use these attributes to build a pool called the entropy pool and uses it as a source while generating the unpredictable random numbers.
Ofcourse having the info about those interrupts is not enough for generating the good random bytes that we need. For instance, elapsed time of an I/O may differ. It may differ according to the disk speed, the position of the requested block and so on. But, how different can it be?
So having those interrupts provides a source for generating some unpredictability.. It has some entropy, but actually we need more. They can't directly satisfy our random byte needs and they are not uniform either.
We need to increase the unpredictability and at this point, we have the CSPRNGs as the solution.
CSPRNG stands for Cryptopgraphically Secure Pseudo Random Number Generator.
This tool takes the input and generates random and uniform bytes.. Generated bytes depends on the input only.
The values of those random events that I just mentioned, are the input of CSPRNGs. There are several types of these CSPRNGs, but I will concantrate on the hash-based CSPRNGs to make this subject easy and clear. CSPRNGs which are based on hash functions basically..
Remember, the output of a hash function is uniform. It is impossible to reverse a hash function.(you can't know the input by just looking at the output). A hash function takes a limited amount of input and generates a fixed amount of output.
So what happens is;
We have a pool and suppose it is filled with zeros at the start.
When an event happens, it is serialized and hashed with the things that are already in the pool.
When an new event happens, we repeat. The new event is serialized and hashed with the things that are in the pool at that time. (this time it doesn't have zeros..)
This serialization and hashing is repeated every time a new event happens.
The thing that mixes the events into the pool is called the steering function. It is needless to say that this function should be very fast.
As you may guess, the contents of pool are mixed again and again, while new event are happening. The entropy is increased and the predictability is decreasing. An entropy pool is coming out! :)
The predictability is pretty low.. Think about it.. In order to predict the next random number or to predict the current contents of the entropy pool , you need to have the information about all the events that has been entered into the entropy pool.. So the value generated from this thing is unpredictable..
Well , let's look at this subject from OS perspective..
We have the CSPRNG in the Linux kernel :) It maintains the entropy pools and executes the related mechanism for generating the random bytes.
Kernel has the access for all those events/interrupts and it runs the CSPRNG when we need random bytes. Having this kind of a mechanism inside the kernel also provides centralization and security. (much better protection for the pool, better protection than user space applications provide)
These tools provide us limited and uniform random bytes when we need. Moreover, the source they are fed, is populated by the unpredictable events.
But, as you may ask, we have two devices, right? /dev/random and /dev/urandom.. So which one should be used in which case? This is definitely the question that one may ask.
Well, first describe the difference between these tools, so that maybe we can make a decision depending on those differences.
The main difference between /dev/random and /dev/urandom is that, /dev/random tracks the entproy that we have in the entropy pool and it blocks when the entropy is low. (remember the entropy that I mentioned in the first part of the blog post).. It is basically implemented in a way to block itself when it thinks that the unpredictability is low.
We can also monitor this entropy by using the file entropy avail located in the proc filesystem. /proc/sys/kernel/random/entropy_avail (maximum is 4096)
As you see in the following screenshot, /dev/random block us when the entropy_avail decreases. It is sometimes unacceptable either.
In the following example, we were blocked when the entropy_avail was around 20.. We get some bytes when it increased to 23 and then we were blocked again.. Note that, when we are blocked, our general performance decreases, our users and apps may face downtime..
Note that this is the man page of Oracle Linux 6.10 , kernel 4.1.12-61.1.28.el6uek.x86_64.
But the same sentences are there in Oracle Linux Server release 7.8, kernel 4.1.12-124.38.1.el7uek.x86_64.
But the same sentences are there in Oracle Linux Server release 7.8, kernel 4.1.12-124.38.1.el7uek.x86_64.
Basically it says that /dev/random is more secure, as it blocks when the entropy that it calculates for the entropy pool is low.. Calculating the entropy of the pool, calculating the entropy that is brought by a new event and deciding on the entropy to decrease when an output is given, should be hard actually :) I also find it difficult to build such an accurate mechanism.. However; by doing that it becomes more secure than /dev/urandom..
Man page also says the following in this manner;
If there is not sufficient entropy in the entropy pool, the returned values are theoretically vulnerable to a cryptographic attack on the algorithms used by the driver. Knowledge of how to do this is not available in the current non-classified liter-ature, but it is theoretically possible that such an attack may exist.
Man page also says the following in this manner;
If there is not sufficient entropy in the entropy pool, the returned values are theoretically vulnerable to a cryptographic attack on the algorithms used by the driver. Knowledge of how to do this is not available in the current non-classified liter-ature, but it is theoretically possible that such an attack may exist.
Okay noted.. Talking about information-theoretical security probably.. When we consider computational security, both /dev/random and /dev/urandom are both secure in my opinion..
(except the early boot phase when there are not enough events to feed the entropy pool.. Bytheway, this problem can be solved by the distributions by saving the entropy pool during reboot and etc..)
So, it is better to mention about /dev/urandom at this point. Well, /dev/urandom never stops generating random bytes for us.. We keep getting our random bytes even if the entropy_avail is low..
It should also be noted that, no one learns anyting by getting an output from /dev/urandom and /dev/random.. The kernel use SHA1.. So when we request some random bytes, the kernel runs SHA1 on the entropy pool, gives us the output and also it takes that output as an input and feed the entropy pool back with it.. Note that /dev/random and /dev/urandom do the exact same thing!
With this in mind, decreasing the entropy of the entropy pool and blocking the random bytes generation seems a little unnecessary to me.. (I 'am not an expert on this subject .. This is only my opinion)
In the early boot, yes... I completely think the same thing that the man page states. However; what I also think is , once we get enough entropy in the pool, we are done. We should not be blocked after that point..
This change clears my doubts :) The thought that comes out here is -> Once the pool becomes unpredictable, it is forever unpredictable.
That's enough for today. Please do not forget to follow my blog. Add me from linkedin and from twitter to increase the interaction. More will come.
Twitter : https://twitter.com/EAOracleAce