1 results for tag "openclaw-rl-training"
Fully asynchronous reinforcement-learning framework that turns live multi-turn conversations into training signal for personalized agents, wrapping a self-hosted model as an OpenAI-compatible API via OpenClaw and continuously optimizing the policy in the background. Runs four independent loops (serving, rollout collection, PRM/judge evaluation, GRPO/OPD training via slime or Tinker) that never block each other.