Human-friendly 32-bit encoding

Rust 100%

Find a file

Mike Dilger 29ac57009d binaries		2025-07-17 08:40:24 +12:00
src	binaries	2025-07-17 08:40:24 +12:00
.gitignore	Initial version of hum32	2025-07-15 15:13:00 +12:00
Cargo.lock	Changed alphabet, included code that helped us pick the alphabet	2025-07-16 15:21:39 +12:00
Cargo.toml	Changed alphabet, included code that helped us pick the alphabet	2025-07-16 15:21:39 +12:00
README.md	Changed alphabet, included code that helped us pick the alphabet	2025-07-16 15:21:39 +12:00

README.md

hum32

hum32 is a 5-bit (32 character) encoding scheme that attempts to:

Output human writable characters that are hard to mistake for other characters
Correct some mistakes automatically
Detect errors via a checksum

In that vein it is similar to base32, zbase32, and bech32. We compare these as follows:

Encoding	Chars Avoided	Padding	Checksum
base32	0,1,8,9 uppercase	yes	no
zbase32	0,l,v,2, uppercase	no	no
bech32	1,b,i,o, uppercase	no	custom, poor
bech32m	1,b,i,o, uppercase	no	custom
hum32	g,i,o,s, samecase	no	xxHash

Alphabet Choice

Note that hum32 uses a mixture of uppercase and lowercase characters, but never allows both the uppercase and lowercase of the same character. We choose the case representation with the least visual ambiguity, for example we use uppercase 'L' because lower case 'l' looks like a '1' (and an 'i' or 'I').

The full alphabet is: 123456789aBCdEFHJkLMNOPQRtUvWXYz

We wrote code to determine this alphabet, it is available at src/alphabet_choice.rs and can be run with cargo test choose_character_set -- --nocapture We just swapped 'O' for '0'.

Checksum

Prior to hum32, only bech32 and it's fixed bech32m provide a checksum.

We provide a 32-bit checksum using xxHash's xxh32 function which is performed on and appended to the data prior to encoding. This may not be optimal for the kinds of errors humans make, but it is easy and very effective.

Automatic Correction

Unlike any of the prior algorithms, we detect and correct bad input. This usually only happens when the case of the character is wrong. But other out-of-alphabet characters have substitutions too such as 'G' probably was a '6', etc.

Padding

We do not pad.

Prefix support

Like bech32 we support prefixes, separated in our case with a '0'.

The crate is #[no_std]

API

pub fn encode(plain: &[u8], prefix: Option<&str>) -> Result<String, Error>
pub fn prefix(coded: &[u8]) -> Option<&[u8]> {
pub fn decode(coded: &str, strict: bool) -> Result<Vec<u8>, Error>

Example

input = [246, 11, 226, 142, 73, 141, 43, 201, 119, 153, 142, 112, 11, 216, 255, 247,
         149, 36, 188, 231, 3, 176, 115, 77, 88, 172, 174, 148, 25, 78, 190, 236]
output = TeHTHd9D65h39m3p6pecwR1jTTg9DL11c2e14ygeh9wDk4g2w7R2y6Q235

However if you mistype some of those characters like this:

output = TeHTHd9D6Sh39m3P6peCwRIjTTg9DLllc2e1Aygeh9wDk4g2w7R2y6Q235

You still get back the correct original input.

Our character set is layed out as follows:

       0  1  2  3  4  5  6  7

 +0    c  Q  8  d  E  H  4  1
 +8    k  D  g  q  F  N  2  L
 +16   X  6  9  y  5  h  R  w
 +24   e  p  G  7  3  m  T  j