Agent workloads scale with tasks, users, and repeated tool calls.
Research Statement
Agentic Small Language Models
Agentic scaffolding for small language models: why limited memorization capacity matters, and how tool integration, distillation, and context optimization can help.

Small models are the deployment path for everyday agents.
My industry experience repeatedly pointed to the same constraint: useful agents must often run close to users and products.
Private memories and user data should not always leave controlled environments.
Games, medical AI, and productivity agents need responsive agents under real deployment limits.
Why are small models weak?
Small models are bottlenecked by parametric memory.
They are not simply unable to reason. Many tasks require missing facts or exact calculations to be produced internally, turning reasoning into a memorization-heavy problem.
KARD shows that external knowledge reduces what a small model must memorize internally.
T1 shows that tools turn memorization-heavy verification into tool-use and interpretation.
The goal is to convert memory-heavy problems into agentic problems.
Small models need tools more, but use tools worse.
Small models need external memory, computation, and verification to overcome limited capacity.
Without training, they are worse at retrieving, computing, checking, and recovering.
My research trains small models to use agentic scaffolding reliably enough that external memory and computation compensate for limited parametric capacity.
Completed Work
My work builds components for agentic small language models.
KARD: Knowledge-Augmented Reasoning
Retrieve knowledge for rationale generation.
NeurIPS 2023T1: Tool-integrated Self-verification
Use tools for calculation and fact-checking.
ICLR 2026Distilling LLM Agent into Small Models
Distill full task-solving behavior with retrieval and code tools.
NeurIPS 2025 SpotlightACON: Optimizing Context Compression
Compress context for long-horizon productive work.
ICML 2026Future Vision
Agentic scaffolding can expand what small models do locally.
Move routine work from cloud inference to edge small models.
Use large models selectively when small agents cannot solve the task.
Use research feedback to improve data, objectives, and tool-use behavior.
Takeaway: self-improving agentic sLMs are one path toward efficient systems where most tasks run locally and only genuinely difficult cases call large models.