ManningBooks

ManningBooks

Devtalk Sponsor

Evaluation and Alignment, The Seminal Papers (Manning)

Erlang and OTP in Action teaches you the concepts of concurrent programming and the use of Evaluation and Alignment: The Seminal Papers teaches you to think of evaluation as a design constraint. You’ll employ a “working backwards” methodology that begins with what your system must get right, which directs you to the appropriate evaluation approach. As you internalize the define > evaluate > analysis > align cycle, you’ll start making more informed tradeoffs, and expertly balancing helpfulness, safety, and brand voice in your models.

Hanchung Lee

Evaluation and Alignment: The Seminal Papers brings together a set of influential research papers and connects them to day-to-day engineering work. The focus isn’t just on metrics in isolation, but on how evaluation shapes the system you end up building.

The book traces how evaluation has evolved. It starts with straightforward approaches like text matching, moves through semantic similarity, and reaches more recent methods where models are used to judge other models. Seeing that progression helps explain why certain techniques break down and where newer ones fit.

One idea that runs throughout the book is treating evaluation as a design constraint. Instead of measuring quality after the fact, you begin by defining what the system must get right. That choice influences everything else—what data you collect, which metrics you use, and how you interpret results.

There’s also a strong emphasis on closing the loop. Evaluation feeds analysis, which leads to changes in prompts, data, or architecture. Those changes get tested again. Over time, this cycle becomes part of how you build and maintain AI systems, not something you bolt on at the end.

Some of the topics covered along the way:

  • choosing evaluation methods that match the behavior you care about

  • spotting failure modes that simple metrics tend to miss

  • working with subjective qualities like helpfulness, safety, and tone

  • using evaluation results to guide alignment decisions

If you’ve worked on LLM-based systems, you’ve probably run into the gap between a model that “looks good in a demo” and one that holds up in production. This book is aimed squarely at that gap.


Don’t forget you can get 45% off with your Devtalk discount! Just use the coupon code “devtalk.com” at checkout :+1:

Where Next?

Popular Ai topics Top

New
ManningBooks
Build an AI Agent (From Scratch) is a step-by-step guide to creating a working AI agent, starting with the bare essentials and growing yo...
New
ManningBooks
Grokking AI Algorithms, Second Edition introduces the most important AI algorithms using relatable illustrations, interesting examples, a...
New
pragdave
Build robust LLM-powered apps, chatbots, and agents while mastering AI engineering principles that will help you outlast the tools and th...
New
ManningBooks
In Build a DeepSeek Model (From Scratch) you’ll build your own DeepSeek clone from the ground up. First, you’ll quickly review LLM fundam...
New
ManningBooks
AI agent technology is changing fast! This totally revised Second Edition of AI Agents in Action by Micheal Lanham guides you through the...
New
ManningBooks
After ChatGPT used RLHF to become production-ready, this foundational technique exploded in popularity. In The RLHF Book, AI expert Natha...
New
pragdave
Build a prototype in a weekend or a full product in a month or two. Untangle legacy systems, improve tests and documentation, and tackle ...
New
ManningBooks
Dr Luca Belli, co-founder and former research lead for Twitter’s Machine Learning Ethics, Transparency and Accountability team, has been ...
New
ManningBooks
Retrieval Augmented Generation, The Seminal Papers explores 12 foundational research papers that explain why RAG works, how it’s built, a...
New

Other popular topics Top

AstonJ
A thread that every forum needs! Simply post a link to a track on YouTube (or SoundCloud or Vimeo amongst others!) on a separate line an...
New
PragmaticBookshelf
Brace yourself for a fun challenge: build a photorealistic 3D renderer from scratch! In just a couple of weeks, build a ray tracer that r...
New
PragmaticBookshelf
Free and open source software is the default choice for the technologies that run our world, and it’s built and maintained by people like...
New
AstonJ
In case anyone else is wondering why Ruby 3 doesn’t show when you do asdf list-all ruby :man_facepalming: do this first: asdf plugin-upd...
New
rustkas
Intensively researching Erlang books and additional resources on it, I have found that the topic of using Regular Expressions is either c...
New
AstonJ
Biggest jackpot ever apparently! :upside_down_face: I don’t (usually) gamble/play the lottery, but working on a program to predict the...
New
mafinar
This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...
New
PragmaticBookshelf
Programming Ruby is the most complete book on Ruby, covering both the language itself and the standard library as well as commonly used t...
New
New
First poster: AstonJ
Jan | Rethink the Computer. Jan turns your computer into an AI machine by running LLMs locally on your computer. It’s a privacy-focus, l...
New