stmpjmpr

stmpjmpr

Programming Ruby 3.3 (5th Edition, p 122): Unicode Character Definition Incorrect

@noelrappin

On page 122, in the discussion of “Strings and Encodings”, it is stated: “A Unicode character is two bytes long, which makes a Unicode string twice as long internally as a Latin-1 string.” This is incorrect. Unicode represents a 21-bit code space; two bytes are not enough to cover it.

The Unicode Standard says, "Unicode characters are represented in one of three encoding forms: a 32-bit form (UTF-32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8). The 8-bit, byte-oriented form, UTF-8, has been designed for ease of use with existing ASCII-based systems. "

I’d further argue that this is not just being pedantic: the idea that Unicode can be captured completely in two bytes is one of the more dangerous lingering misconceptions. It leads people to think that UTF-16, for instance, will always be 2-byte units (it is, like UTF-8, a variable-length encoding).

Where Next?

Popular Pragmatic Bookshelf topics Top

telemachus
Python Testing With Pytest - Chapter 2, warnings for “unregistered custom marks” While running the smoke tests in Chapter 2, I get these...
New
herminiotorres
Hi! I know not the intentions behind this narrative when called, on page XI: mount() |> handle_event() |> render() but the correc...
New
digitalbias
Title: Build a Weather Station with Elixir and Nerves: Problem connecting to Postgres with Grafana on (page 64) If you follow the defau...
New
Charles
In general, the book isn’t yet updated for Phoenix version 1.6. On page 18 of the book, the authors indicate that an auto generated of ro...
New
New
jonmac
The allprojects block listed on page 245 produces the following error when syncing gradle: “org.gradle.api.GradleScriptException: A prob...
New
s2k
Hi all, currently I wonder how the Tailwind colours work (or don’t work). For example, in app/views/layouts/application.html.erb I have...
New
dtonhofer
@parrt In the context of Chapter 4.3, the grammar Java.g4, meant to parse Java 6 compilation units, no longer passes ANTLR (currently 4....
New
dachristenson
@mfazio23 Android Studio will not accept anything I do when trying to use the Transformations class, as described on pp. 140-141. Googl...
New
dachristenson
I’ve got to the end of Ch. 11, and the app runs, with all tabs displaying what they should – at first. After switching around between St...
New

Other popular topics Top

PragmaticBookshelf
Ruby, Io, Prolog, Scala, Erlang, Clojure, Haskell. With Seven Languages in Seven Weeks, by Bruce A. Tate, you’ll go beyond the syntax—and...
New
New
PragmaticBookshelf
Create efficient, elegant software tests in pytest, Python's most powerful testing framework. Brian Okken @brianokken Edited by Kat...
New
AstonJ
Saw this on TikTok of all places! :lol: Anyone heard of them before? Lite:
New
Maartz
Hi folks, I don’t know if I saw this here but, here’s a new programming language, called Roc Reminds me a bit of Elm and thus Haskell. ...
New
foxtrottwist
A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...
New
AstonJ
We’ve talked about his book briefly here but it is quickly becoming obsolete - so he’s decided to create a series of 7 podcasts, the firs...
New
PragmaticBookshelf
Author Spotlight: VM Brasseur @vmbrasseur We have a treat for you today! We turn the spotlight onto Open Source as we sit down with V...
New
New
AstonJ
Curious what kind of results others are getting, I think actually prefer the 7B model to the 32B model, not only is it faster but the qua...
New

Sub Categories: