ManningBooks

ManningBooks

Devtalk Sponsor

Evaluation and Alignment, The Seminal Papers (Manning)

Erlang and OTP in Action teaches you the concepts of concurrent programming and the use of Evaluation and Alignment: The Seminal Papers teaches you to think of evaluation as a design constraint. You’ll employ a “working backwards” methodology that begins with what your system must get right, which directs you to the appropriate evaluation approach. As you internalize the define > evaluate > analysis > align cycle, you’ll start making more informed tradeoffs, and expertly balancing helpfulness, safety, and brand voice in your models.

Hanchung Lee

Evaluation and Alignment: The Seminal Papers brings together a set of influential research papers and connects them to day-to-day engineering work. The focus isn’t just on metrics in isolation, but on how evaluation shapes the system you end up building.

The book traces how evaluation has evolved. It starts with straightforward approaches like text matching, moves through semantic similarity, and reaches more recent methods where models are used to judge other models. Seeing that progression helps explain why certain techniques break down and where newer ones fit.

One idea that runs throughout the book is treating evaluation as a design constraint. Instead of measuring quality after the fact, you begin by defining what the system must get right. That choice influences everything else—what data you collect, which metrics you use, and how you interpret results.

There’s also a strong emphasis on closing the loop. Evaluation feeds analysis, which leads to changes in prompts, data, or architecture. Those changes get tested again. Over time, this cycle becomes part of how you build and maintain AI systems, not something you bolt on at the end.

Some of the topics covered along the way:

  • choosing evaluation methods that match the behavior you care about

  • spotting failure modes that simple metrics tend to miss

  • working with subjective qualities like helpfulness, safety, and tone

  • using evaluation results to guide alignment decisions

If you’ve worked on LLM-based systems, you’ve probably run into the gap between a model that “looks good in a demo” and one that holds up in production. This book is aimed squarely at that gap.


Don’t forget you can get 45% off with your Devtalk discount! Just use the coupon code “devtalk.com” at checkout :+1:

Where Next?

Popular Ai topics Top

ManningBooks
Based on Ilya Sutskever’s famous “must-read” list of ~30 AI papers, this book walks you through the research that shaped today’s deep lea...
New
ManningBooks
The bestselling book on Python deep learning, now covering generative AI, Keras 3, PyTorch, and JAX! François Chollet and Matthew ...
New
ManningBooks
After ChatGPT used RLHF to become production-ready, this foundational technique exploded in popularity. In The RLHF Book, AI expert Natha...
New
pragdave
Build a prototype in a weekend or a full product in a month or two. Untangle legacy systems, improve tests and documentation, and tackle ...
New
ManningBooks
Rearchitecting LLMs: Structural techniques for efficient models turns research from the latest AI papers into production-ready practices ...
New
ManningBooks
Hugging Face in Action reveals how to get the absolute best out of everything Hugging Face, from accessing state-of-the-art models to bui...
New
ManningBooks
CUDA for Deep Learning shows you how to work within the CUDA ecosystem, from your first kernel to implementing advanced LLM features like...
New
ManningBooks
Introduction to Generative AI, Second Edition, guides you from your first eye-opening interaction with tools like ChatGPT to how AI tools...
New
New
ManningBooks
AI is changing how offensive security workflows are designed, executed, and analyzed. AI Agents for Offensive Security: AI-powered attack...
New

Other popular topics Top

AstonJ
If it’s a mechanical keyboard, which switches do you have? Would you recommend it? Why? What will your next keyboard be? Pics always w...
New
ohm
Which, if any, games do you play? On what platform? I just bought (and completed) Minecraft Dungeons for my Nintendo Switch. Other than ...
New
PragmaticBookshelf
Tailwind CSS is an exciting new CSS framework that allows you to design your site by composing simple utility classes to create complex e...
New
PragmaticBookshelf
Build highly interactive applications without ever leaving Elixir, the way the experts do. Let LiveView take care of performance, scalabi...
New
AstonJ
We’ve talked about his book briefly here but it is quickly becoming obsolete - so he’s decided to create a series of 7 podcasts, the firs...
New
DevotionGeo
I have always used antique keyboards like Cherry MX 1800 or Cherry MX 8100 and almost always have modified the switches in some way, like...
New
First poster: bot
zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...
New
CommunityNews
A Brief Review of the Minisforum V3 AMD Tablet. Update: I have created an awesome-minisforum-v3 GitHub repository to list information fo...
New
AstonJ
If you’re getting errors like this: psql: error: connection to server on socket “/tmp/.s.PGSQL.5432” failed: No such file or directory ...
New
AstonJ
Curious what kind of results others are getting, I think actually prefer the 7B model to the 32B model, not only is it faster but the qua...
New