stmpjmpr

stmpjmpr

Programming Ruby 3.3 (5th Edition, p 122): Unicode Character Definition Incorrect

@noelrappin

On page 122, in the discussion of “Strings and Encodings”, it is stated: “A Unicode character is two bytes long, which makes a Unicode string twice as long internally as a Latin-1 string.” This is incorrect. Unicode represents a 21-bit code space; two bytes are not enough to cover it.

The Unicode Standard says, "Unicode characters are represented in one of three encoding forms: a 32-bit form (UTF-32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8). The 8-bit, byte-oriented form, UTF-8, has been designed for ease of use with existing ASCII-based systems. "

I’d further argue that this is not just being pedantic: the idea that Unicode can be captured completely in two bytes is one of the more dangerous lingering misconceptions. It leads people to think that UTF-16, for instance, will always be 2-byte units (it is, like UTF-8, a variable-length encoding).

Where Next?

Popular Pragmatic Bookshelf topics Top

johnp
Running the examples in chapter 5 c under pytest 5.4.1 causes an AttributeError: ‘module’ object has no attribute ‘config’. In particula...
New
jesse050717
Title: Web Development with Clojure, Third Edition, pg 116 Hi - I just started chapter 5 and I am stuck on page 116 while trying to star...
New
Alexandr
Hi everyone! There is an error on the page 71 in the book “Programming machine learning from coding to depp learning” P. Perrotta. You c...
New
raul
Hi Travis! Thank you for the cool book! :slight_smile: I made a list of issues and thought I could post them chapter by chapter. I’m rev...
New
jskubick
I’m under the impression that when the reader gets to page 136 (“View Data with the Database Inspector”), the code SHOULD be able to buil...
New
brunogirin
When I run the coverage example to report on missing lines, I get: pytest --cov=cards --report=term-missing ch7 ERROR: usage: pytest [op...
New
creminology
Skimming ahead, much of the following is explained in Chapter 3, but new readers (like me!) will hit a roadblock in Chapter 2 with their ...
New
dtonhofer
@parrt In the context of Chapter 4.3, the grammar Java.g4, meant to parse Java 6 compilation units, no longer passes ANTLR (currently 4....
New
mcpierce
@mfazio23 I’ve applied the changes from Chapter 5 of the book and everything builds correctly and runs. But, when I try to start a game,...
New
New

Other popular topics Top

AstonJ
Or looking forward to? :nerd_face:
498 13326 269
New
Exadra37
I am thinking in building or buy a desktop computer for programing, both professionally and on my free time, and my choice of OS is Linux...
New
New
Rainer
My first contact with Erlang was about 2 years ago when I used RabbitMQ, which is written in Erlang, for my job. This made me curious and...
New
AstonJ
Was just curious to see if any were around, found this one: I got 51/100: Not sure if it was meant to buy I am sure at times the b...
New
New
New
PragmaticBookshelf
Develop, deploy, and debug BEAM applications using BEAMOps: a new paradigm that focuses on scalability, fault tolerance, and owning each ...
New
AstonJ
This is cool! DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON We just witnessed something incredible: the largest open-s...
New
NewsBot
Node.js v22.14.0 has been released. Link: Release 2025-02-11, Version 22.14.0 'Jod' (LTS), @aduh95 · nodejs/node · GitHub
New

Sub Categories: