Recent posts
MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
[Agent] MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
[Agent] CANVAS: A Benchmark for Vision-Language Models on Tool-Based UI Design
[Agent] CANVAS: A Benchmark for Vision-Language Models on Tool-Based UI Design
[Agent] PPTArena: A Benchmark for Agentic PowerPoint Editing
[Agent] PPTArena: A Benchmark for Agentic PowerPoint Editing