128bit File node IDs

2020-10-19

This is a very technical update on the Boomla Platform, aimed at curious developers.

Old File node IDs

Until today, Boomla has been using 32bit incremental File node IDs, for example i102FD. This is just a special formatting of the number 66301. It starts with an i to avoid ambiguity, followed by the number expressed in hexadecimal form because it is somewhat shorter.

I intentionally said File node ID not File ID, because the two are different. File node IDs identify file nodes within volumes while File IDs identify files in filesystems. A filesystem may contain nested volumes so a File ID may contain one or more nested File node IDs.

An example (old) file ID could be f102FD.158A9. It starts with an f, followed by file node IDs in hex format, separated by dots.

Merging branches

This worked super well but Boomla has evolved to a point where it became necessary to merge branches. The problem is, with incremental file node IDs, merging is almost guaranteed to fail.

The reason is that File IDs are super important in Boomla, so they must be tracked throughout the version history. (For example, they are used for automatically redirecting visitors after a page was renamed.) But with incremental file node IDs, one will use the same IDs on every branch.

Let's look at an example. Say you have the website example.com with the biggest file node ID of 20000. You want to work on a new feature so you create the branch beta.example.com. You keep working on both branches, you create new files on each. The problem is, the first new file will get the ID 20001 on both branches! Because of this, merging will fail.

We need to guarantee that file nodes will get unique IDs on each branch, thereby avoiding merge conflicts.

Random File node IDs

To avoid this, we are going to use random file node IDs. Unfortunately, at this point using 32bit file node IDs will become too tight. At 32bit, the we can allocate ~4 billion IDs. On the other hand, it is said that one can only use the square root of it without a high probability of conflicts, which is only 65536 files. That's clearly a bit too tight, even when considering this applies to a single volume only, so an entire website having a tree of volumes could have way more than that.

Because of this, the size of File node IDs has been expanded to 128 bits. That's an insanely large number, even't its square root is 18446744073709551616. Thereby we have practically eliminated the chance of any file node ID conflicts.

An example File node ID (FnodeId) now looks like this:

iF78B0BDEC6305840069A4D27D48D2A7B

The equavalent File ID looks like this (not the f at the start):

fF78B0BDEC6305840069A4D27D48D2A7B

And an example nested File ID now looks like this:

fF78B0BDEC6305840069A4D27D48D2A7B.EC64D53884F1744DBB11A782271E990

Rolling out

We want to keep all existing file node IDs so the roll-out did not require any migration. New file nodes will get these large, random IDs while existing files will keep their old ones.

Merging is not publicly available yet but will be some time in the future.

Cheers,

Tibor Halter

New how-to videos