stmpjmpr

stmpjmpr

Programming Ruby 3.3 (5th Edition, p 122): Unicode Character Definition Incorrect

@noelrappin

On page 122, in the discussion of “Strings and Encodings”, it is stated: “A Unicode character is two bytes long, which makes a Unicode string twice as long internally as a Latin-1 string.” This is incorrect. Unicode represents a 21-bit code space; two bytes are not enough to cover it.

The Unicode Standard says, "Unicode characters are represented in one of three encoding forms: a 32-bit form (UTF-32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8). The 8-bit, byte-oriented form, UTF-8, has been designed for ease of use with existing ASCII-based systems. "

I’d further argue that this is not just being pedantic: the idea that Unicode can be captured completely in two bytes is one of the more dangerous lingering misconceptions. It leads people to think that UTF-16, for instance, will always be 2-byte units (it is, like UTF-8, a variable-length encoding).

Where Next?

Popular Pragmatic Bookshelf topics Top

jesse050717
Title: Web Development with Clojure, Third Edition, pg 116 Hi - I just started chapter 5 and I am stuck on page 116 while trying to star...
New
jamis
The following is cross-posted from the original Ray Tracer Challenge forum, from a post by garfieldnate. I’m cross-posting it so that the...
New
sdmoralesma
Title: Web Development with Clojure, Third Edition - migrations/create not working: p159 When I execute the command: user=> (create-...
New
edruder
I thought that there might be interest in using the book with Rails 6.1 and Ruby 2.7.2. I’ll note what I needed to do differently here. ...
New
rmurray10127
Title: Intuitive Python: docker run… denied error (page 2) Attempted to run the docker command in both CLI and Powershell PS C:\Users\r...
New
adamwoolhether
When trying to generate the protobuf .go file, I receive this error: Unknown flag: --go_opt libprotoc 3.12.3 MacOS 11.3.1 Googling ...
New
jgchristopher
“The ProductLive.Index template calls a helper function, live_component/3, that in turn calls on the modal component. ” Excerpt From: Br...
New
brunogirin
When I run the coverage example to report on missing lines, I get: pytest --cov=cards --report=term-missing ch7 ERROR: usage: pytest [op...
New
taguniversalmachine
It seems the second code snippet is missing the code to set the current_user: current_user: Accounts.get_user_by_session_token(session["...
New
a.zampa
@mfazio23 I’m following the indications of the book and arriver ad chapter 10, but the app cannot be compiled due to an error in the Bas...
New

Other popular topics Top

AstonJ
What chair do you have while working… and why? Is there a ‘best’ type of chair or working position for developers?
New
AstonJ
Curious to know which languages and frameworks you’re all thinking about learning next :upside_down_face: Perhaps if there’s enough peop...
New
Rainer
My first contact with Erlang was about 2 years ago when I used RabbitMQ, which is written in Erlang, for my job. This made me curious and...
New
AstonJ
There’s a whole world of custom keycaps out there that I didn’t know existed! Check out all of our Keycaps threads here: https://forum....
New
DevotionGeo
The V Programming Language Simple language for building maintainable programs V is already mentioned couple of times in the forum, but I...
New
AstonJ
We’ve talked about his book briefly here but it is quickly becoming obsolete - so he’s decided to create a series of 7 podcasts, the firs...
New
AstonJ
Was just curious to see if any were around, found this one: I got 51/100: Not sure if it was meant to buy I am sure at times the b...
New
PragmaticBookshelf
Author Spotlight Mike Riley @mriley This month, we turn the spotlight on Mike Riley, author of Portable Python Projects. Mike’s book ...
New
PragmaticBookshelf
Author Spotlight: VM Brasseur @vmbrasseur We have a treat for you today! We turn the spotlight onto Open Source as we sit down with V...
New
PragmaticBookshelf
Explore the power of Ash Framework by modeling and building the domain for a real-world web application. Rebecca Le @sevenseacat and ...
New

Sub Categories: