LUT-based 3D Hilbert curves

As referenced in my earlier post about Hilbert curves, it’s possible to map between $$d$$-dimensional Euclidean coordinates and the offset along the Hilbert curve in $$O(n)$$ time by direct application of the transformation group at each recursion level of the curve. The following applies this to the 3D case:

static const uint8_t mortonToHilbertTable[] = {
	48, 33, 27, 34, 47, 78, 28, 77,
	66, 29, 51, 52, 65, 30, 72, 63,
	76, 95, 75, 24, 53, 54, 82, 81,
	18,  3, 17, 80, 61,  4, 62, 15,
	 0, 59, 71, 60, 49, 50, 86, 85,
	84, 83,  5, 90, 79, 56,  6, 89,
	32, 23,  1, 94, 11, 12,  2, 93,
	42, 41, 13, 14, 35, 88, 36, 31,
	92, 37, 87, 38, 91, 74,  8, 73,
	46, 45,  9, 10,  7, 20, 64, 19,
	70, 25, 39, 16, 69, 26, 44, 43,
	22, 55, 21, 68, 57, 40, 58, 67,
};

static const uint8_t hilbertToMortonTable[] = {
	48, 33, 35, 26, 30, 79, 77, 44,
	78, 68, 64, 50, 51, 25, 29, 63,
	27, 87, 86, 74, 72, 52, 53, 89,
	83, 18, 16,  1,  5, 60, 62, 15,
	 0, 52, 53, 57, 59, 87, 86, 66,
	61, 95, 91, 81, 80,  2,  6, 76,
	32,  2,  6, 12, 13, 95, 91, 17,
	93, 41, 40, 36, 38, 10, 11, 31,
	14, 79, 77, 92, 88, 33, 35, 82,
	70, 10, 11, 23, 21, 41, 40,  4,
	19, 25, 29, 47, 46, 68, 64, 34,
	45, 60, 62, 71, 67, 18, 16, 49,
};

uint32_t transformCurve(uint32_t in, uint32_t bits, const uint8_t* lookupTable)
{
	uint32_t transform = 0;
	uint32_t out = 0;

	for (int32_t i = 3 * (bits - 1); i >= 0; i -= 3) {
		transform = lookupTable[transform | ((in >> i) & 7)];
		out = (out << 3) | (transform & 7);
		transform &= ~7;
	}

	return out;
}

uint32_t mortonToHilbert3D(uint32_t mortonIndex, uint32_t bits)
{
	return transformCurve(mortonIndex, bits, mortonToHilbertTable);
}

uint32_t hilbertToMorton3D(uint32_t hilbertIndex, uint32_t bits)
{
	return transformCurve(hilbertIndex, bits, hilbertToMortonTable);
}

The 3D Hilbert curve transforms under the group A4, which has 12 elements. However as there are only 8 transformations that can appear on the left-hand side of each multiplications, only an 8×12 subset of the multiplication table is required. By folding the the index <=> octant mapping function into this table and additionally storing each index respectively octant into the lower 3 bits of each entry, we end up with a compact 96 byte LUT for each function.

These functions can be used as drop-in replacement for the algorithm provided by Fast 2D and 3D Hilbert curves and Morton codes, but are faster by a factor of roughly 2.5 on my machine.