Hashing is an important thing in Blockchain technology. It is a mathematical process that is used to write new transactions into a blockchain. The process is executed by a Hash function with the help of a Hashing algorithm.
A hash function is a function in which input of any length of data or string will give an output of a fixed length. The output of a hash function is known as Hash. In hash function, the size of input is not a matter whether it is 2 or 200 it will give the output of the same length.
For example, if we use the SHA-256 algorithm for hashing it will always produce an output of 256-bits length.
In order to consider a Hash function as secure, it must have the following features.
- The Hash function should be Deterministic. Which means the output or the ‘Hash’ should be the same even if we execute the same input two or more times. If we get different hashes each time, obviously we can’t keep the track of the input value.
- The hash function using should be able to produce the output or hash quickly as possible.
- For every hash say H(A), it should be infeasible to find an input ‘A’ from H(A).
Suppose we are using a 128-bit hash where the data is very huge and ‘Brute Force method’ is the only way to find the original input. In Brute Force method a random input is selected and hashed and is compared to the target hash. This process is repeated until it finds a match. Generally, it is not a practical method as the data is huge in volume.
- For every small change in the input, it should make a huge change in the hash.
- The hash values should be unique. i.e., for any two input A and B, the output hashes of H(A) H(B) should not be equal.
- For every output ‘Y’, it should be infeasible to find an input ‘X', such that
H(k|X) = Y
Where k is a random value from a distribution with high min-entropy.
(k|x) is the concatenated value of X and K
There is a number of Hash functions available, of these following, are the majorly used hash functions in Blockchain technology.
SHA-256: Used in Bitcoin
Keccak-256: Used in Ethereum
Basically, data is stored in Blockchains. And of course, it will have a Data Structure also. There are mainly two data structures used in Blockchain, Pointers, and Linked lists. We know pointers are variables that store the address of the other variables. In Blockchain, more than the address the pointers also store the hash value of the previous block.
In data structure level view, we can say that Blockchain is basically a Linked list in which each node stores a hash pointer and a data header. While the data header will store the data of that block the pointer will have the address of the preceding block as well as the hash value.
Mining is simply a process that adds new blocks to the Blockchain. The miners verify a transaction that has pushed to blockchain and adds it to the Blockchain if it is a valid transaction.
Each blockchain network will have a time limit for a creation of a block (In bitcoin, it is 10 minutes as of now). If the blocks are created faster, it will result in the generation of more hash functions in short time which may result in the collision of hashes. Also when new blocks are created faster certain blocks cannot be the part of the main chain as more blocks are added simultaneously to the chain. So to avoid this problem certain difficulty level is set for ‘hash’.
Let us look at the example of Bitcoin,
In Bitcoin, when a new block arrives, hash function hashes all the contents of the block.
After that, the hashed output is concatenated with a nonce (Which will be a random string).
Now the entire concatenated string is hashed again and perform a difficulty level comparison.
If the new output is less than the difficulty level then the block is added to the chain otherwise the nonce is changed and the process is repeated until it passes the criteria.
So in the case of Bitcoin,
For every output ‘Y’, it is infeasible to find an input ‘X’ such that,
where k = nonce
X=hash of block