A Byte of JS

A conversation with a co-worker…

So, to answer your question: how do you represent the character J…
Open up dev tools and type 'J'.charCodeAt(0)
It will return 74.

Why does 74 represent J?
According to charCodeAt, the first 128 codes match up with ASCII table. So it takes 7 bits (27 = 128) to represent everything on the ASCII table, letters, numbers, common symbols, and other important ‘characters’ such as space, tab, line-feed and carriage return.

7 bits, or 7 0’s and 1’s….

What does 74 translate to in bits in RAM or on disk?
Or, how do you go from decimal to binary?

Whats the largest power of 2 that is less than 74? 64, or 26.
Take the remainder, (74 – 64 = 10), whats the largest power of 2 that is less than 10? 8, or 23.
Take the remainder (10 – 8 = 2), which is 21.

So 74 = 26 + 23 + 21 = 64 + 8 + 2
Since the right-most digit of a binary number corresponds to 20,
the 2nd from the right corresponds to 21,
4th from the right corresponds to 23, and
7th from the right corresponds to 26.
(Think of it as a zero-indexed array starting from the right).

27 26 25 24 23 22 21 20
128 64 32 16 8 4 2 1
0 1 0 0 1 0 1 0

So the 7 bits used to represent the letter J on disk are 01001010.

All this talk about bits, whats a byte?
A byte is 8 bits. So one byte of computer space can represent up to 28 (256) characters. In JavaScript, thats the first 128 on the ASCII table, then another 128 of other special characters.

If J can be represented by one byte, then 1 kb = 1024 bytes, or 1024 of the letter J.
Why 1024? Because computer scientists think in terms of binary, and 2^10 = 1024.
Then whats a MB? 2^20, or 1024^2, or 1,048,567 instances of the letter J.
But this is all assuming characters are 1 byte….

Are all JS characters 1 byte?
No, JS characters can be 1-4 bytes long. Special characters will take up more than 1-byte, and this is where you can get nasty bugs with JS if you’re not aware.

The docs for charCodeAt say that it returns values 65535 or less. Thats 216, or 2 bytes.

So what about characters that are 4-bytes long?
They’re represented by surrogate char codes, two char codes joined together to represent a char.

Try: '𐐷'.charCodeAt(0)

You’ll get two separate char codes!

Who cares? How does this lead to bugs?
You can’t always rely on String.prototype.length to return the number of characters in a string. Suppose you are validating the length of a string, how should you handle special characters? Because:

Will return a value of 2.

This is boring, can you end with a joke?
Sure. There are 10 types of people in this world: those who understand binary, and those who don’t.