Say you built an app and wanted to show a user’s initial in an avatar, to do so you KISS:
const avatar = displayName[0]
const displayName = "Nadav"
const displayName = "🇨🇦 Mike"
const displayName = "👍🏽 Dave"
const displayName = "👨👩👧👦 The Smiths"
const displayName = "🤷 idk"
*slack bloop* avatars are broken, pls fix.
const avatar = [...displayName][0]
Closer, but notice how the flag split into a letter, the thumbs up lost its skin tone, and the family is just a man now? Hold that thought.
const seg = new Intl.Segmenter('en', { granularity: 'grapheme' })
const avatar = [...seg.segment(displayName)].map(x => x.segment)[0]
ok but why
str[0] doesn’t give you a character, it gives you a code unit. A chunk of UTF-16, which is how JavaScript stores strings internally. For a, 1, or $, one code unit is the whole character and you never notice. But most emoji live outside the Basic Multilingual Plane (they’re like on a different dimension maaaan) and need two code units to fit (aka surrogate pairs). str[0] hands you the first one, half an emoji. That’s the garbage in those avatars.
Spread helps. [...str] splits on codepoints instead of code units. Closer, but emoji like 👨👩👧👦 are actually four emoji joined by zero-width joiners. Spreading rips the family apart (rough I know). Skin tone is a separate modifier codepoint, gone. Flags are actually two regional indicator letters that render as a flag when paired, also gone.
Intl.Segmenter splits on graphemes, one visual character each, regardless of how many codepoints it takes to make them.
That’s it. If a user still sees tofu, their OS is older than the emoji. Tell them to update or eat it (the tofu I mean…).