Introduction
The power of AI agents and AI evaluations
()
1. Introducing AI Agents and Evaluations
Demo of fully functional human and auto-evaluator systems
()
What are AI agents?
()
Why a lot of AI agents fail
()
Understanding the "moat" in AI agents
()
Evaluating the moat and backbone of your AI agents
()
Challenges in setting proprietary AI evaluations
()
2. Foundation Models and Benchmarks in AI
Introduction to AI foundation models
()
Essential requirements for model evaluations
()
Define requirements for model evaluations
()
Understanding and leveraging benchmarks
()
Hands-on lab: Choosing the right model with benchmark analysis
()
3. Manual Evaluation Strategies and AI Component-Level Testing
Decomposing AI agents into evaluative components
()
Identifying high-risk or hard-to-evaluate components
()
Manual evaluation with criteria
()
Defining evaluation criteria from MVP to GA
()
Hands-on lab: Vibe code auto evaluations using Cursor
()
Hands-on lab: Automating AI evaluation using LLM as judge
()
4. Automated Evaluation Techniques and Metrics Deep Dive
Deep dive into evaluation metrics for AI agents
()
Hands-on lab: Building an automated evaluator
()
Red teaming: Scaling automated evaluations without ground truth
()
Continuous evaluation with real-time monitoring and alerts
()
Conclusion
What's next
()
Ex_Files_Intro_AI_Evals.zip
(110 KB)