What is a sumid?
A sumid is a 256-bit, base-50-encoded, sequential UUID built from nanosecond-precision timestamps. With 160 bits of randomness per time slot, collisions are negligible.
The name stands for Sequential Universally-Unique & Mergeable ID.
The canonical reference implementation is written in Python.
Key Properties
- Sequential: IDs generated later sort after earlier ones (monotonic ordering).
- Unique: 160 bits of entropy make accidental collisions negligible.
- Mergeable: Two IDs can be merged to produce a new ID that sorts between them — useful for ordered insertion.
- Human-Readable: The 50-character alphabet avoids visually ambiguous glyphs, and tolerant decoding accepts common lookalikes.
- Technically Safe: Every character is safe for URLs, file paths, and unquoted shell arguments.
The Alphabet
The sumid uses a curated, sorted 50-character alphabet. Sorting order of the characters matches ASCII order, which means lexicographic string comparison preserves chronological order.
In formal language theory, a finite alphabet is traditionally denoted by the Greek letter Σ (Sigma) — hence the Σ in the sumid logo. Where a binary system uses Σ = {0, 1}, a sumid uses an alphabet optimised for human readability and technical safety.
467ACDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnpqrstuvwxyz
This gives a base of 50. A 256-bit value encodes into at most 46 characters.
Tolerant Decoding
When decoding, visually ambiguous substitutes are silently accepted and mapped to their canonical character. Encoding always produces the canonical form.
| If you type | Decoded as | Why |
|---|---|---|
5
|
S
|
Confused in handwriting and small print |
2
|
Z
|
Confused in handwriting and small print |
9
|
g
|
Confused in handwriting |
3
|
E
|
Confused in handwriting and pixelated displays |
Excluded Characters & Rationale
Every printable ASCII character not in the alphabet was excluded for a specific reason. The exclusions fall into two categories.
1. Visual Ambiguity
These characters were removed because all members of the group are too easily confused with each other — no single canonical pick is reliable. (For pairs where one character is clearly more recognizable, see Tolerant Decoding instead.)
| Removed | Confused With | Reason |
|---|---|---|
0 O o
|
Each other | Zero, uppercase O, and lowercase o are often indistinguishable. |
1 I l
|
Each other | One, uppercase I, and lowercase l look identical in many sans-serif fonts. |
B 8
|
Each other | Can blur together in poor print quality. |
2. Technical Constraints
These characters have special meanings in shells, URLs, or file systems.
Shell & Command Line
| Character | Role |
|---|---|
$
|
Variable expansion |
#
|
Comment marker |
!
|
History expansion (Bash) |
~
|
Home directory expansion |
& | ;
|
Control operators |
( ) { } < >
|
Grouping & redirection |
* ? [ ]
|
Globbing wildcards |
`
|
Command substitution |
-
|
Option/flag prefix (--foo)
|
URL & Web Standards
| Character | Role |
|---|---|
%
|
URL escape character |
+
|
Space in query strings |
@
|
Auth separator |
=
|
Key-value separator |
&
|
Parameter separator |
File Systems & Paths
| Character | Role |
|---|---|
/
|
Unix directory separator |
\
|
Windows separator / escape character |
:
|
Drive separator (Windows) |
.
|
Hidden file prefix / directory traversal |
Quoting & Syntax
| Character | Role |
|---|---|
" '
|
String delimiters |
,
|
CSV / JSON delimiter |
| Space | Argument separator / requires URL encoding |
_
|
Regex word boundary (\b) / breaks double-click selection
|
Bit Structure (256 bits total)
Time Component (96 bits)
The most significant 96 bits store the nanoseconds since the Unix Epoch (UTC). This is large enough to represent roughly 2.5 trillion years — enough to track the entire history of the universe roughly 180 times over.
296 / (365.25 × 24 × 3600 × 109) ≈ 2,512,308,552,583 years
Entropy Component (160 bits)
The remaining 160 bits are filled with cryptographic-quality random data, making collisions negligible even at extreme generation rates.
Merging
When two sumids are merged, the result sorts between them. Its position within that range encodes when the merge happened using logarithmic compression: merges soon after creation get far more resolution than merges in the distant future, matching the real-world likelihood of when merges occur.
18 overflow bits are borrowed from the most-significant end of the entropy section to provide sub-slot precision, so repeated merges of the same pair sort chronologically. The remaining 142 random bits still exceed UUID v4's 122. The approximate merge time can be recovered given both parents and the merged result.
FAQ
When would I actually use merging?
Merging lets you insert an item between two existing items without renumbering anything. If you maintain a sorted list — tasks in a to-do app, pages in a document, rows in a spreadsheet — and a user drags something between two neighbors, you merge those neighbors' IDs to get a new ID that sorts right where you need it. No index shuffling, no fractional positions, no coordination with a server.
It also works for offline collaboration: two users can independently insert between the same pair of items while disconnected. When they sync, the embedded merge timestamps give each insertion a distinct sort position automatically. Your application still needs to handle semantic conflicts, but ordering is taken care of.
Beyond lists, merging applies to any ordered dimension. A design tool can place a layer between two existing z-indices by merging their IDs; repeated merges of the same pair stay chronologically ordered thanks to the overflow bits.
How does this compare to UUIDv7?
UUIDv7 gives you 48 bits of millisecond time and 74 bits of randomness in 128 bits total. A sumid gives you 96 bits of nanosecond time and 160 bits of randomness in 256 bits. That means better ordering resolution, fewer collisions, and the ability to merge — at the cost of twice the bits (46 characters vs. 36).
Is 46 characters too long?
It depends on your constraints. If you're storing IDs in a database column or passing them in URLs, 46 characters is comparable to a base64-encoded 256-bit hash. If your system is tight on space, a shorter ID format may be a better fit.
Can I extract the creation time from a sumid?
Yes — right-shift the integer value by 160 and you have the timestamp in nanoseconds. The Validate & Decode tool in the sidebar does this automatically.
What happens if two machines generate IDs at the same nanosecond?
They share the same time prefix but differ in the random portion. A collision requires matching all 160 random bits — probability 1 in 2160 (roughly 1048).
Why base 50 instead of base 62 or base 64?
Honestly, in most cases it wouldn't matter — IDs are copied and pasted, not read aloud or typed by hand. A larger base would shave off about three characters. But when an ID does end up in a screenshot, a log file, or a support ticket, having no ambiguous glyphs and no shell-unsafe characters removes an entire class of "wait, is that a zero or an O?" moments. The cost is small (46 characters instead of ~43); the payoff is that every character is safe everywhere without escaping or squinting.
Do sumids work as database primary keys?
Yes. They're sequential, so B-tree inserts stay roughly append-only — you avoid the random-write penalty that plagues UUIDv4. Store them as a fixed-length string column or as a 256-bit integer, depending on your database.
Is the nanosecond timestamp actually nanosecond-precise?
It depends on the platform. Most OSes top out at microsecond precision; browsers at milliseconds. The 96-bit field has room for true nanoseconds, but actual precision is whatever your runtime provides.