sumid

Sequential Universally-Unique & Mergeable ID

sumid logo featuring a dark teal Greek letter sigma (summation symbol) on the left and an orange hourglass on the right

What is a sumid?

A sumid is a 256-bit, base-50-encoded, sequential UUID built from nanosecond-precision timestamps. With 160 bits of randomness per time slot, collisions are negligible.

The name stands for S⁠equential U⁠niversally-Unique & M⁠ergeable ID.

The canonical reference implementation is written in Python.

Key Properties

The Alphabet

The sumid uses a curated, sorted 50-character alphabet. Sorting order of the characters matches ASCII order, which means lexicographic string comparison preserves chronological order.

In formal language theory, a finite alphabet is traditionally denoted by the Greek letter Σ (Sigma) — hence the Σ in the sumid logo. Where a binary system uses Σ = {0, 1}, a sumid uses an alphabet optimised for human readability and technical safety.

467ACDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnpqrstuvwxyz

This gives a base of 50. A 256-bit value encodes into at most 46 characters.

Tolerant Decoding

When decoding, visually ambiguous substitutes are silently accepted and mapped to their canonical character. Encoding always produces the canonical form.

If you type Decoded as Why
5 S Confused in handwriting and small print
2 Z Confused in handwriting and small print
9 g Confused in handwriting
3 E Confused in handwriting and pixelated displays

Excluded Characters & Rationale

Every printable ASCII character not in the alphabet was excluded for a specific reason. The exclusions fall into two categories.

1. Visual Ambiguity

These characters were removed because all members of the group are too easily confused with each other — no single canonical pick is reliable. (For pairs where one character is clearly more recognizable, see Tolerant Decoding instead.)

Removed Confused With Reason
0 O o Each other Zero, uppercase O, and lowercase o are often indistinguishable.
1 I l Each other One, uppercase I, and lowercase l look identical in many sans-serif fonts.
B 8 Each other Can blur together in poor print quality.

2. Technical Constraints

These characters have special meanings in shells, URLs, or file systems.

Shell & Command Line
Character Role
$ Variable expansion
# Comment marker
! History expansion (Bash)
~ Home directory expansion
& | ; Control operators
( ) { } < > Grouping & redirection
* ? [ ] Globbing wildcards
` Command substitution
- Option/flag prefix (--foo)
URL & Web Standards
Character Role
% URL escape character
+ Space in query strings
@ Auth separator
= Key-value separator
& Parameter separator
File Systems & Paths
Character Role
/ Unix directory separator
\ Windows separator / escape character
: Drive separator (Windows)
. Hidden file prefix / directory traversal
Quoting & Syntax
Character Role
" ' String delimiters
, CSV / JSON delimiter
Space Argument separator / requires URL encoding
_ Regex word boundary (\b) / breaks double-click selection

Bit Structure (256 bits total)

96 bits — Time (ns) 160 bits — Entropy

Time Component (96 bits)

The most significant 96 bits store the nanoseconds since the Unix Epoch (UTC). This is large enough to represent roughly 2.5 trillion years — enough to track the entire history of the universe roughly 180 times over.

296 / (365.25 × 24 × 3600 × 109) ≈ 2,512,308,552,583 years

Entropy Component (160 bits)

The remaining 160 bits are filled with cryptographic-quality random data, making collisions negligible even at extreme generation rates.

Merging

When two sumids are merged, the result sorts between them. Its position within that range encodes when the merge happened using logarithmic compression: merges soon after creation get far more resolution than merges in the distant future, matching the real-world likelihood of when merges occur.

18 overflow bits are borrowed from the most-significant end of the entropy section to provide sub-slot precision, so repeated merges of the same pair sort chronologically. The remaining 142 random bits still exceed UUID v4's 122. The approximate merge time can be recovered given both parents and the merged result.

FAQ

When would I actually use merging?

Merging lets you insert an item between two existing items without renumbering anything. If you maintain a sorted list — tasks in a to-do app, pages in a document, rows in a spreadsheet — and a user drags something between two neighbors, you merge those neighbors' IDs to get a new ID that sorts right where you need it. No index shuffling, no fractional positions, no coordination with a server.

It also works for offline collaboration: two users can independently insert between the same pair of items while disconnected. When they sync, the embedded merge timestamps give each insertion a distinct sort position automatically. Your application still needs to handle semantic conflicts, but ordering is taken care of.

Beyond lists, merging applies to any ordered dimension. A design tool can place a layer between two existing z-indices by merging their IDs; repeated merges of the same pair stay chronologically ordered thanks to the overflow bits.

How does this compare to UUIDv7?

UUIDv7 gives you 48 bits of millisecond time and 74 bits of randomness in 128 bits total. A sumid gives you 96 bits of nanosecond time and 160 bits of randomness in 256 bits. That means better ordering resolution, fewer collisions, and the ability to merge — at the cost of twice the bits (46 characters vs. 36).

Is 46 characters too long?

It depends on your constraints. If you're storing IDs in a database column or passing them in URLs, 46 characters is comparable to a base64-encoded 256-bit hash. If your system is tight on space, a shorter ID format may be a better fit.

Can I extract the creation time from a sumid?

Yes — right-shift the integer value by 160 and you have the timestamp in nanoseconds. The Validate & Decode tool in the sidebar does this automatically.

What happens if two machines generate IDs at the same nanosecond?

They share the same time prefix but differ in the random portion. A collision requires matching all 160 random bits — probability 1 in 2160 (roughly 1048).

Why base 50 instead of base 62 or base 64?

Honestly, in most cases it wouldn't matter — IDs are copied and pasted, not read aloud or typed by hand. A larger base would shave off about three characters. But when an ID does end up in a screenshot, a log file, or a support ticket, having no ambiguous glyphs and no shell-unsafe characters removes an entire class of "wait, is that a zero or an O?" moments. The cost is small (46 characters instead of ~43); the payoff is that every character is safe everywhere without escaping or squinting.

Do sumids work as database primary keys?

Yes. They're sequential, so B-tree inserts stay roughly append-only — you avoid the random-write penalty that plagues UUIDv4. Store them as a fixed-length string column or as a 256-bit integer, depending on your database.

Is the nanosecond timestamp actually nanosecond-precise?

It depends on the platform. Most OSes top out at microsecond precision; browsers at milliseconds. The 96-bit field has room for true nanoseconds, but actual precision is whatever your runtime provides.