stmpjmpr

stmpjmpr

Programming Ruby 3.3 (5th Edition, p 122): Unicode Character Definition Incorrect

@noelrappin

On page 122, in the discussion of “Strings and Encodings”, it is stated: “A Unicode character is two bytes long, which makes a Unicode string twice as long internally as a Latin-1 string.” This is incorrect. Unicode represents a 21-bit code space; two bytes are not enough to cover it.

The Unicode Standard says, "Unicode characters are represented in one of three encoding forms: a 32-bit form (UTF-32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8). The 8-bit, byte-oriented form, UTF-8, has been designed for ease of use with existing ASCII-based systems. "

I’d further argue that this is not just being pedantic: the idea that Unicode can be captured completely in two bytes is one of the more dangerous lingering misconceptions. It leads people to think that UTF-16, for instance, will always be 2-byte units (it is, like UTF-8, a variable-length encoding).

Where Next?

Popular Pragmatic Bookshelf topics Top

jon
Some minor things in the paper edition that says “3 2020” on the title page verso, not mentioned in the book’s errata online: p. 186 But...
New
Alexandr
Hi everyone! There is an error on the page 71 in the book “Programming machine learning from coding to depp learning” P. Perrotta. You c...
New
simonpeter
When I try the command to create a pair of migration files I get an error. user=> (create-migration "guestbook") Execution error (Ill...
New
jdufour
Hello! On page xix of the preface, it says there is a community forum "… for help if your’re stuck on one of the exercises in this book… ...
New
raul
Hi Travis! Thank you for the cool book! :slight_smile: I made a list of issues and thought I could post them chapter by chapter. I’m rev...
New
alanq
This isn’t directly about the book contents so maybe not the right forum…but in some of the code apps (e.g. turbo/06) it sends a TURBO_ST...
New
Chrichton
Dear Sophie. I tried to do the “Authorization” exercise and have two questions: When trying to plug in an email-service, I found the ...
New
brunogirin
When running tox for the first time, I got the following error: ERROR: InterpreterNotFound: python3.10 I realised that I was running ...
New
s2k
Hi all, currently I wonder how the Tailwind colours work (or don’t work). For example, in app/views/layouts/application.html.erb I have...
New
rainforest
Hi, I’ve got a question about the implementation of PubSub when using a Phoenix.Socket.Transport behaviour rather than channels. Before ...
New

Other popular topics Top

PragmaticBookshelf
Machine learning can be intimidating, with its reliance on math and algorithms that most programmers don't encounter in their regular wor...
New
PragmaticBookshelf
Write Elixir tests that you can be proud of. Dive into Elixir’s test philosophy and gain mastery over the terminology and concepts that u...
New
PragmaticBookshelf
Design and develop sophisticated 2D games that are as much fun to make as they are to play. From particle effects and pathfinding to soci...
New
AstonJ
Continuing the discussion from Thinking about learning Crystal, let’s discuss - I was wondering which languages don’t GC - maybe we can c...
New
rustkas
Intensively researching Erlang books and additional resources on it, I have found that the topic of using Regular Expressions is either c...
New
New
PragmaticBookshelf
Leverage Elixir and the Nx ecosystem to build intelligent applications that solve real-world problems in computer vision, natural languag...
New
First poster: AstonJ
Jan | Rethink the Computer. Jan turns your computer into an AI machine by running LLMs locally on your computer. It’s a privacy-focus, l...
New
PragmaticBookshelf
Fight complexity and reclaim the original spirit of agility by learning to simplify how you develop software. The result: a more humane a...
New
PragmaticBookshelf
A concise guide to MySQL 9 database administration, covering fundamental concepts, techniques, and best practices. Neil Smyth MySQL...
New

Sub Categories: