ManningBooks

ManningBooks

Devtalk Sponsor

Evaluation and Alignment, The Seminal Papers (Manning)

Erlang and OTP in Action teaches you the concepts of concurrent programming and the use of Evaluation and Alignment: The Seminal Papers teaches you to think of evaluation as a design constraint. You’ll employ a “working backwards” methodology that begins with what your system must get right, which directs you to the appropriate evaluation approach. As you internalize the define > evaluate > analysis > align cycle, you’ll start making more informed tradeoffs, and expertly balancing helpfulness, safety, and brand voice in your models.

Hanchung Lee

Evaluation and Alignment: The Seminal Papers brings together a set of influential research papers and connects them to day-to-day engineering work. The focus isn’t just on metrics in isolation, but on how evaluation shapes the system you end up building.

The book traces how evaluation has evolved. It starts with straightforward approaches like text matching, moves through semantic similarity, and reaches more recent methods where models are used to judge other models. Seeing that progression helps explain why certain techniques break down and where newer ones fit.

One idea that runs throughout the book is treating evaluation as a design constraint. Instead of measuring quality after the fact, you begin by defining what the system must get right. That choice influences everything else—what data you collect, which metrics you use, and how you interpret results.

There’s also a strong emphasis on closing the loop. Evaluation feeds analysis, which leads to changes in prompts, data, or architecture. Those changes get tested again. Over time, this cycle becomes part of how you build and maintain AI systems, not something you bolt on at the end.

Some of the topics covered along the way:

  • choosing evaluation methods that match the behavior you care about

  • spotting failure modes that simple metrics tend to miss

  • working with subjective qualities like helpfulness, safety, and tone

  • using evaluation results to guide alignment decisions

If you’ve worked on LLM-based systems, you’ve probably run into the gap between a model that “looks good in a demo” and one that holds up in production. This book is aimed squarely at that gap.


Don’t forget you can get 45% off with your Devtalk discount! Just use the coupon code “devtalk.com” at checkout :+1:

Where Next?

Popular Ai topics Top

New
ManningBooks
After ChatGPT used RLHF to become production-ready, this foundational technique exploded in popularity. In The RLHF Book, AI expert Natha...
New
pragdave
Build a prototype in a weekend or a full product in a month or two. Untangle legacy systems, improve tests and documentation, and tackle ...
New
ManningBooks
Dr Luca Belli, co-founder and former research lead for Twitter’s Machine Learning Ethics, Transparency and Accountability team, has been ...
New
ManningBooks
Rearchitecting LLMs: Structural techniques for efficient models turns research from the latest AI papers into production-ready practices ...
New
ManningBooks
CUDA for Deep Learning shows you how to work within the CUDA ecosystem, from your first kernel to implementing advanced LLM features like...
New
ManningBooks
Build AI-Enhanced Web Apps guides you through AI development using only JavaScript and other common web dev skills–no Python or Machine L...
New
ManningBooks
AI applications need much more than a connection to a model. To work well in the real world, they need memory, access to company knowledg...
New
ManningBooks
AI tools like ChatGPT, Claude Code, and OpenClaw produce impressive results that can be shockingly human-like. But are they really thinki...
New
ManningBooks
Build Applications with Local AI Models on a Mac shows you exactly how to build and run a ChatGPT-style assistant entirely on your own Ma...
New

Other popular topics Top

PragmaticBookshelf
Machine learning can be intimidating, with its reliance on math and algorithms that most programmers don't encounter in their regular wor...
New
DevotionGeo
I know that these benchmarks might not be the exact picture of real-world scenario, but still I expect a Rust web framework performing a ...
New
AstonJ
poll poll Be sure to check out @Dusty’s article posted here: An Introduction to Alternative Keyboard Layouts It’s one of the best write-...
New
AstonJ
I’ve been hearing quite a lot of comments relating to the sound of a keyboard, with one of the most desirable of these called ‘thock’, he...
New
AstonJ
We’ve talked about his book briefly here but it is quickly becoming obsolete - so he’s decided to create a series of 7 podcasts, the firs...
New
New
AstonJ
Was just curious to see if any were around, found this one: I got 51/100: Not sure if it was meant to buy I am sure at times the b...
New
husaindevelop
Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...
New
AstonJ
This is a very quick guide, you just need to: Download LM Studio: https://lmstudio.ai/ Click on search Type DeepSeek, then select the o...
New
PragmaticBookshelf
As digital systems increasingly run the world, mastery of the recurring patterns of software development risk is the key to fast and effe...
New