Why xxHash64
Benny uses xxHash64 for all hashing. It is non-cryptographic (not designed to resist preimage attacks) but provides fast, high-quality hash distribution in pure JavaScript BigInt arithmetic with zero native dependencies.
The 64-bit output width was chosen over MurmurHash3 x64 128-bit: the collision reduction from 128-bit is not worth the output-format mismatch (128-bit requires two 64-bit numbers, complicating the 16-char hex representation). xxHash64 fits in a single 16-character lowercase hex string, which stores cleanly in JSON and relational databases.
The algorithm is deterministic: the same input bytes produce the same output across all platforms, V8 versions, and page loads. This is the fundamental requirement for fingerprint stability.
// Public hashing API
/**
* Hash a string to a 16-char hex string using xxHash64.
* Encodes the input to UTF-8 via TextEncoder (falls back to a manual
* UTF-8 encoder when TextEncoder is unavailable). Returns
* '0000000000000000' on any error.
*/
declare function hash64(input: string): string;
/**
* Hash a pre-encoded byte array to a 16-char hex string.
* Returns '0000000000000000' on any error.
*/
declare function hash64Bytes(input: Uint8Array): string;
// Composite-fingerprint composition (conceptual sketch):
//
// subHashes = signalOrder.map(name => signals[name]?.hash ?? ABSENT_SENTINEL);
// fingerprint = hash64(subHashes.join(SUB_HASH_DELIMITER));
//
// The ordered signal-name array, the absent-sentinel string, and the sub-hash
// delimiter character are internal constants and not part of the public
// contract. Per-signal hashing serialises the collector's value to a string
// (JSON.stringify is used for non-string values) and runs hash64 over it.hash64 and hash64Bytes are exported from the library's public API. The ordered signal-name array, the absent-sentinel string, and the sub-hash delimiter character are internal constants and may change between releases.
Algorithm overview
xxHash64 uses BigInt arithmetic throughout, masking every intermediate result to 64 bits with MASK64 = 0xFFFFFFFFFFFFFFFFn to simulate unsigned 64-bit overflow.
For inputs of 32 bytes or more, four parallel accumulators are initialised from a seed, then each 32-byte block feeds one 8-byte lane to each accumulator through xxhRound (multiply by PRIME64_2, rotate left 31 bits, multiply by PRIME64_1). The four accumulators are combined with mergeAccumulator. For inputs shorter than 32 bytes, the seed-based path initialises hash = seed + PRIME64_5 directly.
Both paths then process remaining bytes in 8-byte, 4-byte, and 1-byte sub-blocks before a final avalanche mix (three rounds of XOR-shift and prime multiplication). The output is left-padded to 16 hex digits with '0' characters.
Key constants
| Constant | Value / Location | Purpose |
|---|---|---|
| Absent-sentinel hash | Internal constant | A fixed sentinel string contributed by absent, timed-out, or throwing signals to the hash input. Not a real hash. |
| Sub-hash delimiter | Internal constant | Single-character separator joining sub-hashes before the composite fingerprint hash call. |
| PRIME64_1 to 5 | Standard xxHash64 primes | Mix constants for the four-accumulator and remainder processing paths. |
| MASK64 | 0xFFFFFFFFFFFFFFFFn | Bitmask applied after every arithmetic operation to simulate 64-bit overflow in BigInt. |
Per-signal hash semantics
Each signal's hash field is set by wrapCollector. The collector function returns a raw value (object, array, number, or string). wrapCollector serialises it: if it is already a string, it is used directly; otherwise JSON.stringify is called. The serialised string is then passed to hash64.
This means the hash is sensitive to serialisation order for object values. Signal implementations must produce deterministic JSON; they must not use Set, Map, or unordered property enumeration in the value they return without first sorting. Any change to the serialised representation, even whitespace, changes the signal's hash and therefore the composite fingerprint.
When a signal is absent (threw, timed out, or API not supported), its hash field is set to a fixed absent-sentinel hash string. That sentinel is a short non-hex string, so it is distinguishable from any real 16-character hex hash by shape alone.
Things worth knowing
- hash64 returns '0000000000000000' on any internal error. This is a valid hash input and will produce a deterministic composite fingerprint, but it is indistinguishable from a real signal with that hash value. Errors in the hashing layer are extremely unlikely given that all internal helpers also catch and return 0n.
- UTF-8 encoding uses TextEncoder when available (all modern browsers). The fallback manual encoder handles four code-point ranges and is exercised in environments that predate TextEncoder (some older WebViews).
- The composite fingerprint is a hash of hashes, not a hash of raw values. Individual signal hashes are computed independently and then composed with a fixed delimiter before the final hash64 call; the delimiter character is part of the internal composition recipe and not part of the public contract.
- The composition is unambiguous by construction: the delimiter cannot appear in a 16-char hex string or in the absent-sentinel hash, so the composed input parses to exactly one signal-hash sequence.
- Both hash64 and hash64Bytes are exported from the implementation and available to consumers via the library's public API, but direct use is rarely needed; wrapCollector and fuseSignals handle all internal hashing.
Last reviewed 2026-06-04

