Merkle Trees


In today’s blog post I am going to discuss the Merkle trees, how they work and where they are used.

What is a Merkle tree ?

The leaf nodes in this tree represent data nodes where as the the nodes above them represent the cryptographic hash of the child nodes. It looks like this :

Let me explain a little bit about the anatomy of the above the above tree. The lowest nodes, marked as D1, D2, D3 and D4 are data nodes and can contain any data of your choice. Now that data is hashed and every leaf node is stored in a node representing the hash of data. e.g. in this case, Node Hash 0-0 contains hash of data D1, Node hash 0-1 contains hash of data of node D2, etc. you get the idea. Now this pattern continues till the top. Value of each node in the tree ,except the leaf nodes, is actually the hash of values of it’s children nodes.

In case the trees are being used to deal with security concerns for the data, cryptographic hash functions are used to hash the data. Cryptographic hash functions are those which are practically impossible to invert. e.g. to gain the input by looking at the output of hash function. Simpler version : They are one way only. Ideal crypto functions are quick to compute, make it impossible to find same output with two different inputs.  otherwise it is perfectly fine to use simple checksums on the data.


Use of the above trees :

  1. IPFS and ZFS file systems
  2.  BitTorrent protocol
  3. Git distributed revision control system
  4. Bitcoin peer-to-peer network
  5. the Certificate Transparency framework

and many more. The inspiration to write about merkle trees originally came from this paper about certificate transparency which I was reading a while back.

Let us examine a practical use of this kind of tree and for that we will look into how how zfs uses it. Since, folks who edited the wikipedia article have done a good job in explaining what I am trying to imply, let me directly quote

For ZFS, data integrity is achieved by using a Fletcher-based checksum or a SHA-256 hash throughout the file system tree. Each block of data is checksummed and the checksum value is then saved in the pointer to that block—rather than at the actual block itself. Next, the block pointer is checksummed, with the value being saved at its pointer. This checksumming continues all the way up the file system’s data hierarchy to the root node, which is also checksummed, thus creating a Merkle tree.

I came across this data structure recently and I found it a very interesting use of a very simple binary tree structure. If you find it interesting, Try implementing it yourself 🙂

It is said that money grows on tree, quite true in case of bitcoins 🙂

Merkle Trees