Reading the following tutorial regarding a VueJS component that displays the character count for a textarea got me thinking.
You see, the problem is that when Javascript was first created it didn’t had proper UTF-8 support. Javascript’s internal encoding is UCS-2 or UTF-16 depending the articles you find on the internet. (actually there’s an awesome article from 2012 that explains this in detail ) .
What does that mean you say ? Well it’s rather straightforward, if you’re trying to get the length property of a string that contains UTF-8 3/4 byte (that translate into UTF-16 surrogate pair characters) your length will return 2 for each of the characters.
This might not be an issue usually, but it’s a big issue if you’re having a password policy of 8 characters that can be filled by just 4 “😹🐶😹🐶” (ok, not the best example, but everybody likes cats and dogs)
Now the fix with modern Javascript is rather easy, because it supports surrogates properly in arrays, and using array destructuring makes it a quick and easy one liner.
I’m interested in knowing if you got any weird/interesting experiences with UTF-8
PS: Use this link for a nice simple-ish explanation of Unicode encodings