*A conversation with a co-worker…*

So, to answer your question: how do you represent the character J…

Open up dev tools and type `'J'.charCodeAt(0)`

It will return 74.

*Why does 74 represent J?*

According to `charCodeAt`

, the first 128 codes match up with ASCII table. So it takes 7 bits (2^{7} = 128) to represent everything on the ASCII table, letters, numbers, common symbols, and other important ‘characters’ such as space, tab, line-feed and carriage return.

7 bits, or 7 0’s and 1’s….

*What does 74 translate to in bits in RAM or on disk?*

Or, how do you go from decimal to binary?

Whats the largest power of 2 that is less than 74? 64, or 2^{6}.

Take the remainder, (74 – 64 = 10), whats the largest power of 2 that is less than 10? 8, or 2^{3}.

Take the remainder (10 – 8 = 2), which is 2^{1}.

So 74 = 2^{6} + 2^{3} + 2^{1} = 64 + 8 + 2

Since the right-most digit of a binary number corresponds to 2^{0},

the 2nd from the right corresponds to 2^{1},

4th from the right corresponds to 2^{3}, and

7th from the right corresponds to 2^{6}.

(Think of it as a zero-indexed array starting from the right).

2^{7} |
2^{6} |
2^{5} |
2^{4} |
2^{3} |
2^{2} |
2^{1} |
2^{0} |

128 |
64 |
32 |
16 |
8 |
4 |
2 |
1 |

0 |
1 |
0 |
0 |
1 |
0 |
1 |
0 |

So the 7 *bits* used to represent the letter J on disk are `01001010`

.

*All this talk about bits, whats a byte?*

A *byte* is 8 bits. So one byte of computer space can represent up to 2^{8} (256) characters. In JavaScript, thats the first 128 on the ASCII table, then another 128 of other special characters.

If J can be represented by one byte, then 1 kb = 1024 bytes, or 1024 of the letter J.

Why 1024? Because computer scientists think in terms of binary, and 2^10 = 1024.

Then whats a MB? 2^20, or 1024^2, or 1,048,567 instances of the letter J.

But this is all assuming characters are 1 byte….

*Are all JS characters 1 byte?*

No, JS characters can be 1-4 bytes long. Special characters will take up more than 1-byte, and this is where you can get nasty bugs with JS if you’re not aware.

The docs for charCodeAt say that it returns values 65535 or less. Thats 2^{16}, or 2 bytes.

*So what about characters that are 4-bytes long?*

They’re represented by surrogate char codes, two char codes joined together to represent a char.

Try: `'𐐷'.charCodeAt(0)`

`'𐐷'.charCodeAt(1)`

You’ll get two separate char codes!

*Who cares? How does this lead to bugs?*

You can’t always rely on String.prototype.length to return the number of characters in a string. Suppose you are validating the length of a string, how should you handle special characters? Because:

”.length

Will return a value of 2.

This is boring, can you end with a joke?

Sure. There are 10 types of people in this world: those who understand binary, and those who don’t.