FLEX: Continuous Agent Evolution via Forward Learning from Experience

Let AI “learn from a setback” like humans: the FLEX framework boosts Agent performance by 23%

Current large-model Agents, while powerful, all share a kind of “amnesia”: once training is complete, they become static tools. Whether they succeed or fail on a task, they cannot learn from it, and when they encounter the same problem again, they still make the same mistakes. This is a far cry from the human way of learning, where people keep growing through experience.

Paper title: FLEX: Continuous Agent Evolution via Forward Learning from Experience

ArXiv URL:http://arxiv.org/abs/2511.06449v1

Now, researchers from ByteDance, Tsinghua University, and other institutions have proposed a brand-new learning paradigm — FLEX (Forward Learning with EXperience).

It completely overturns the traditional learning approach, enabling AI Agents to accumulate experience by reflecting on success and failure during interaction with the environment, and thus achieve continuous evolution.

Most importantly, this process requires no gradient backpropagation at all. It is extremely low-cost, yet it delivers performance gains of up to 23% on difficult problems across mathematics, chemistry, biology, and more!

Figure illustration

The dilemma: why can’t today’s Agents “learn”?

We all know that AI models rely on Gradient Descent for learning, using backpropagation to fine-tune billions of parameters.

But for the continuous evolution of Agents, this process faces three major obstacles:

  1. High cost: Backpropagation on large models requires enormous computing resources.

  2. Catastrophic forgetting: When learning new knowledge, the model can easily forget old capabilities.

  3. Closed models: Top-tier models like GPT-4 are closed-source, so we simply cannot access their parameters for fine-tuning.

Although some studies have tried to “evolve” models by modifying prompts or workflows, these methods are often task-specific, do not generalize across domains, and are hard to keep improving as experience accumulates.

FLEX: a new paradigm of forward learning

FLEX proposes a completely different idea: the focus of learning should not be on changing model parameters, but on building and using an evolvable “experience library.”

Traditional learning includes two processes: “forward propagation” and “backpropagation.”

FLEX simplifies learning into a purely forward process, divided into three stages:

  1. Forward exploration: Boldly try to solve the problem and collect plenty of successful and failed experiences.

  2. Experience evolution: A “updater” Agent reflects on the experiences, distills them, and updates the experience library.

  3. Forward guidance: When solving new problems, retrieve relevant knowledge from the experience library to guide the reasoning process.

This paradigm brings three profound advantages:

Core mechanism: a two-level MDP and an experience library

To realize this elegant idea, FLEX designs a sophisticated mechanism that can be understood as a clearly divided “strategic command center.” The study formalizes it as a hierarchical Markov decision process (Meta-MDP).

Figure illustration

Lower-level MDP: frontline exploration and experience distillation

This can be seen as the “frontline combat unit.” It consists of an Actor and a Critic.

This “try-feedback-correct” loop is the process by which experience is generated. Both successful paths and lessons from failure are recorded.

Upper-level MDP: global scheduling and experience evolution

This is equivalent to the “headquarters.” It is responsible for maintaining a global, hierarchical experience library.

When the lower-level exploration unit submits new experience, the “updater” Agent decides whether to add, merge, or discard it, ensuring the library remains efficient and free of redundancy.

When solving new problems, the Agent can retrieve contextually relevant information from this library just like consulting reference materials, thereby gaining powerful external knowledge support.

Practical results: significant gains across domains

No matter how good the theory is, the proof is in the results. FLEX achieved remarkable outcomes on challenging tasks in mathematics, chemistry, and biology.

Model Benchmark Vanilla ICL Agent FLEX (this paper)
Claude-Sonnet-4 AIME25 40.0 - 42.0 63.3 (+23.3)
GPT-4 GSM8k 93.4 92.5 95.3 95.9 (+0.6)
Claude-Sonnet-4.5 USPTO50k 20.0 20.0 20.0 30.0 (+10.0)
GenAI-FLEX-128k ProteinGym 0.460 - 0.460 0.602 (+0.142)

In addition, the study found a clear Scaling Law of Experience between Agent performance and the size of the experience library, and showed that the experience library can be inherited across different models (Experience Inheritance), enabling rapid knowledge transfer.

Conclusion

The FLEX paradigm paints an exciting future: AI Agents are no longer static tools frozen at the factory, but dynamic partners that can keep learning, reflecting, and growing through practice, just like intelligent beings.

By shifting the core of learning from parameter adjustment to experience building, FLEX opens up an efficient, low-cost, and interpretable path for Agent evolution, marking a solid step toward scalable, inheritable, continuously evolving intelligent Agents.