Module datafusion_functions_aggregate::hyperloglog

source ·
Expand description

§HyperLogLog

hyperloglog is a module that contains a modified version of redis’s implementation with some modification based on strong assumption of usage within datafusion, so that function can be efficiently implemented.

Specifically, like Redis’s version, this HLL structure uses 214 = 16384 registers, which means the standard error is 1.04/(163840.5) = 0.8125%. Unlike Redis, the register takes up full u8 size instead of a raw int* and thus saves some tricky bit shifting techniques used in the original version. This results in a memory usage increase from 12Kib to 16Kib. Also only the dense version is adopted, so there’s no automatic conversion, largely to simplify the code.

This module also borrows some code structure from pdatastructs.rs.