June 28th, 2023
Speak1: Swyx - smol developer
Speaker2: Alex - Agent Eval
- 很難 debug agent failure:
- failure token
- 三種 Evaluation 方式
- 抓下一堆 dataset
- Alex: https://twitter.com/alexreibman
- Gurkaran: https://twitter.com/aigsingh
- dare: https://twitter.com/dariusemrani
- Jesse Hu https://twitter.com/huyouare
- What is the most affordable (free, local?) LLM for specific Agent Executor /
Agent task like decision making, tool selection…?
- In my experience, the OpenAI functions work really well in deciding what tool(s) to use even in multi-step scenarios. Do you think that a train-of-thought process is used behind the scenes, like ReACT or MLKR? And how useful are they now?
- 可以考慮看看 few shot