Ouroboros: An Autonomous Self-Improving AI Agent
History / Edit / PDF / EPUB / BIB / 4 min read (~727 words)Ouroboros is an autonomous AI agent that works on tasks provided to it as well as continuously improving itself.
Named after the ancient symbol of a serpent eating its own tail - representing infinity and cyclic renewal - Ouroboros implements a continuous loop of Do → Learn → Improve → Retry.
Unlike traditional AI assistants that wait for commands and forget context between sessions, Ouroboros:
- Runs indefinitely without human intervention
- Maintains persistent memory of everything it has done
- Reflects on its performance regularly
- Modifies its own code to improve over time
- Can incorporate human feedback when provided
Previously I wrote about GlobaLLM, an AI agent that autonomously contributes to open source projects.
While GlobaLLM's primary objective is to do project and task prioritization at scale, Ouroboros focuses on task implementation and self-improvement.
Ouroboros is thus a component of GlobaLLM's solution.
Ouroboros follows a structured nine-step cycle that repeats continuously:
- Read goals – Fetches tasks from
agent/goals/active.md - Select goal – Picks one to work on (or defaults to self-improvement)
- Plan – Uses an LLM to create a step-by-step plan
- Execute – Carries out the plan using available tools
- Journal – Writes results to a daily log
- Reflect – Analyzes what happened and identifies improvements (both task-related and self-related)
- Self-modify – Edits its own source code if improvements are found
- Journal again – Records reflection and modification results
- Repeat – Starts the cycle anew
This separation between execution and self-modification is crucial.
The agent won't modify its code while working on a task - reflections and improvements happen only during dedicated reflection cycles.
┌─────────────────────────────────────────────────────────┐
│ Agent Core │
│ (coordinates the loop, handles signals, manages state) │
└─────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Memory │ │ LLM Layer │ │ Tools │
│ │ │ │ │ │
│ • Working │ │ • Anthropic │ │ • run_cmd │
│ • Journal │───▶│ Claude │───▶│ • read_file │
│ • Goals │ │ • Token │ │ • write_file │
│ • Feedback │ │ tracking │ │ • search_* │
└──────────────┘ └──────────────┘ └──────────────┘
Ouroboros uses a three-tiered memory architecture:
| Tier | Description | Location |
|---|---|---|
| Working memory | Current goals, immediate context | In-process |
| Short-term | Daily journals (notes, reflections, feedback) | agent/journal/YYYY/MM/DD/ |
| Long-term | Git history with descriptive commits | Git repository |
Everything is logged in human-readable markdown, making it easy to inspect what the agent has been up to.
The agent comes with built-in tools for common operations:
run_command– Execute shell commandsread_file– Read file contentswrite_file– Write to filessearch_files– Find files by patternsearch_content– Search within files
Crucially, Ouroboros can create, register, and use new tools that it writes itself.
Tools are implemented as CLI commands of the ouroboros CLI that the agent can invoke during execution.
The execution/reflection separation prevents runaway self-modification.
The agent can only change code during a dedicated reflection phase, and all changes are committed to git with descriptive messages explaining the "why" behind each change.
Every action is logged. Want to know what the agent did? Check the daily journal:
agent/journal/YYYY/MM/DD/notes.md– What it didagent/journal/YYYY/MM/DD/reflections.md– What it learnedagent/journal/YYYY/MM/DD/user-feedback.md– Human input received
Ouroboros needs no human intervention, but welcomes it.
It will happily incorporate feedback, adjust course based on user suggestions, and explain its reasoning when asked.
- True self-improvement – The agent can and does modify its own implementation based on reflection
- Persistent memory – Git commits serve as a permanent, queryable history of everything tried
- Graceful degradation – Failed modifications can be reverted; the agent learns and tries again
- Tool extensibility – New tools can be created dynamically as needs arise
- Idle improvement – When no goals are active, it works on making itself better
Ouroboros represents an experiment in autonomous AI agents.
Can an agent truly improve itself over time without (or limited) human intervention?
By maintaining a detailed journal, reflecting on its actions, and having the freedom to modify its own code, Ouroboros aims to answer this question.
The name is fitting - the serpent eating its tail represents the continuous cycle of doing, learning, and improving that drives the agent forward.
Each reflection builds on the last; each modification makes the agent slightly more capable.
Ouroboros is open source.
Check out the repository to see the code, contribute, or run your own self-improving agent.
GlobaLLM: Automated Open Source Contribution at Scale
History / Edit / PDF / EPUB / BIB / 7 min read (~1300 words)Consider the following dilemma: you have unlimited access to state-of-the-art LLMs, but finite compute resources.
How do you maximize positive impact on the software ecosystem?
GlobaLLM is an experiment in autonomous open source contribution which attempts to address this question.
It's a system that discovers repositories, analyzes their health, prioritizes issues, and automatically generates pull requests - all while coordinating with other instances to avoid redundant work.
The core insight isn't just that LLMs can write code; it's that strategic prioritization combined with distributed execution can multiply that capability into something genuinely impactful.
This article explains how GlobaLLM works, diving into the architecture that lets it scale from fixing a single bug to coordinating across thousands of repositories.
GlobaLLM follows a five-stage pipeline:
Discover -> Analyze -> Prioritize -> Fix -> Contribute
The system begins by finding repositories worth targeting.
Using GitHub's search API, it filters by domain, language, stars, and other criteria.
Current methodology uses domain-based discovery with predefined domains (ai_ml, web_dev, data_science, cloud_devops, mobile, security, games), each with custom search queries combining relevant keywords.
The system then applies multi-stage filtering:
- Language filtering: Excludes non-programming languages (Markdown, HTML, CSS, Shell, etc.)
- Library filtering: Uses heuristics to identify libraries vs applications (checks for package files like
pyproject.toml,package.json,Cargo.toml; filters out "awesome" lists and doc repos; analyzes descriptions and topics) - Quality filtering: Language-specific queries include testing indicators (pytest, jest, testing)
- Health filtering: Applies health scores to filter out unmaintained projects
- Dependent enrichment: Uses libraries.io API to fetch package dependency counts for impact scoring
Results are cached locally (24hr TTL) to avoid redundant API calls and respect rate limits.
The goal isn't to find every repository - it's to find libraries where a contribution would matter.
Once a repository is identified, GlobaLLM performs deep analysis to determine whether contributing is worthwhile.
This gate prevents wasting resources on abandoned projects, hostile communities, or repositories where contributions won't have impact.
It calculates a HealthScore based on multiple signals:
- Commit velocity: Is the project actively maintained?
- Issue resolution rate: Are bugs getting fixed?
- CI status: Does the project have passing tests?
- Contributor diversity: Is there a healthy community?
It also computes an impact score - how many users would benefit from a fix, based on stars, forks, and dependency analysis using NetworkX.
Repositories with low health scores or minimal impact are deprioritized or skipped entirely.
The system fetches open issues from approved repositories and ranks them using a sophisticated multi-factor algorithm.
Each issue is analyzed by an LLM to determine:
- Category: bug, feature, documentation, performance, security, etc.
- Complexity: 1-10 scale (how difficult to solve)
- Solvability: 0-1 score (likelihood of automated fix success)
- Requirements: affected files, breaking change risk, test needs
The prioritization then combines four dimensions:
Health (weight: 1.0): Repository health adjusted for complexity.
A healthy repository with simple issues scores higher than an unhealthy repository with complex ones.
Impact (weight: 2.0): Based on stars, dependents, and watchers.
Uses log-scale normalization (stars / 50,000, dependents / 5,000).
Solvability (weight: 1.5): LLM-assessed likelihood of successful resolution.
Documentation and style issues (~0.9) rank higher than critical security (~0.3) due to automation difficulty.
Urgency (weight: 0.5): Category multiplier × age × engagement.
Critical security bugs get 10× multiplier, documentation gets 1×.
The final formula:
priority = (health × 1.0) + (impact × 2.0) + (solvability × 1.5) + (urgency × 0.5)
Budget constraints filter the ranked list:
- Per-repository token limit (default: 100k)
- Per-language issue limit (default: 50)
- Weekly token budget (default: 5M)
Results are saved to the issue store with full breakdowns for transparency.
GlobaLLM claims the highest-priority unassigned issue and generates a solution.
This is where LLMs do the heavy lifting.
The CodeGenerator class sends a structured prompt to Claude or ChatGPT with:
- The issue title and description
- Repository context (code style, testing framework)
- Language-specific conventions
- Category-specific requirements (bug vs feature vs docs)
The LLM responds with a complete solution:
- Explanation: Step-by-step reasoning
- File patches: Original and new content for each modified file
- Tests: New or modified test files
The system tracks tokens used at every step for budget management.
The final stage uses PRAutomation to create a well-structured pull request with context, tests, and documentation.
For trivial changes (typos, version bumps), it can even auto-merge.
LLMs are the engine that powers GlobaLLM, but they're used strategically rather than indiscriminately.
Stage 3 - Prioritize: The IssueAnalyzer calls an LLM to categorize each issue.
Input: title, body, labels, comments, reactions.
Output: category, complexity (1-10), solvability (0-1), breaking_change, test_required.
This costs ~500 tokens per issue and feeds directly into the priority scoring.
Stage 4 - Fix: The CodeGenerator uses an LLM to generate complete solutions.
Input: issue details, repository context, language style guidelines.
Output: explanation, file patches (original + new content), test files.
This costs 1k-10k tokens depending on complexity.
The key insight: LLMs are only used for tasks requiring intelligence.
Discovery, health scoring, impact calculation, and PR automation use deterministic algorithms.
The real power of GlobaLLM emerges when you run multiple instances in parallel.
Each GlobaLLM instance has a unique AgentIdentity.
When it's ready to work, it calls:
globallm assign claim
This atomically reserves the highest-priority unassigned issue.
The assignment is stored in PostgreSQL with a heartbeat timestamp.
To prevent multiple agents from working on the same issue:
- Issues are marked
assignedwith an agent ID and timestamp - Heartbeats update every 5 minutes
- If a heartbeat expires (30 minutes), the issue is reassigned
This allows crash recovery: if an agent crashes mid-work, another will pick up the issue.
The heartbeat system is elegant in its simplicity:
# Agent side
while working:
update_heartbeat(issue_id, agent_id)
do_work()
# Recovery side
expired = get_issues_with_expired_heartbeats()
for issue in expired:
reassign(issue)
No distributed consensus needed - PostgreSQL's row-level locking handles contention.
PostgreSQL is the central state store:
- Connection pooling: 2-10 connections per process (
psycopgpool) - JSONB columns: Flexible schema for repository/issue metadata
- Indexes: On frequently queried fields (stars, health_score, assigned status)
- Migrations: Versioned schema
GlobaLLM is an evolving experiment.
There are numerous challenges that developers face on a daily basis that a system such as GlobaLLM will also encounter and would need to deal with in order to be more effective:
- Parallelizing work across multiple agents without conflicts or redundant effort.
- Producing "mental models" of repositories to better understand their goals, architecture, dependencies and trajectories.
- Have a higher-level decision-making system that can reason about which repositories to focus on based on broader trends in the open source ecosystem.
- Make decisions such as which programming languages to focus on.
- Have the ability to work with closed source repositories which may not have the same signals as open source ones (e.g., forks, stars, dependency count).
GlobaLLM is an experiment in what's possible when you combine LLM code generation with principled decision-making and distributed execution.
The goal isn't to replace human contributors - it's to handle the long tail of maintenance work that no one has time for, freeing up humans to focus on the interesting problems.
The system is actively developed and evolving.
Current work focuses on better prioritization heuristics, more sophisticated validation, and integration with additional LLM providers.
If you're interested in contributing or just want to run it yourself, the code is available on GitHub.
This system is far from perfect, but it's a step toward harnessing AI to make open source software healthier and more sustainable at scale.
It's also a way to explore what it looks like to have to make decisions at the scale of millions of repositories and billions of issues.
Dr. Aris Thorne stared at the shimmering temporal rift, its edges flickering like a corrupted display. He wasn't just looking at a gateway through time – he was looking at a living, breathing Git repository of reality itself.
"Status check," he muttered, fingers dancing across the holographic interface. "Current branch: timeline-main. Last commit: 'Catastrophe at Point Zero' by User:Humanity."
Three days ago, humanity had triggered the Cascade Event – a chain reaction of temporal paradoxes that threatened to unravel existence. Now Aris, the last Temporal Archivist, was attempting something never before conceived: git revert on reality itself.
"Creating new branch: 'fix-attempt-1'," he announced to the empty lab. The temporal rift stabilized, showing a parallel timeline branching off from moments before the disaster.
Aris stepped through, materializing in the control room of the Chronos Facility, right as the ill-fated experiment was about to begin. He knew the command sequence by heart – the one that would prevent the Cascade.
But as he approached the console, he froze. His younger self was there, looking determined but naive. If Aris intervened, would he create a merge conflict with his own existence?
"Branching again," he decided, retreating to the safety of the temporal nexus. "Creating 'fix-attempt-2' from an earlier commit."
This time he arrived hours earlier, when the facility was still empty. He carefully modified the experiment parameters, ensuring the Cascade could never occur. Satisfied, he returned to his present.
The lab was unchanged. The rift still showed the corrupted timeline.
"Failed merge," Aris realized with dawning horror. "Reality rejected the patch."
Days turned into weeks as Aris created dozens of branches, each attempting to fix the timeline. He tried git cherry-pick of successful moments from history, git rebase of civilization's achievements, even git bisect to isolate the exact commit that had broken everything.
Nothing worked. Each attempt was rejected by the cosmic repository, leaving him with countless abandoned branches floating in temporal limbo.
Exhausted, Aris collapsed before the interface. "Git log --oneline --graph," he whispered, watching the tree of failed attempts bloom across the display. It was beautiful in its complexity – a constellation of what-ifs and could-have-beens.
That's when it hit him. He'd been trying to fix the timeline, to restore a previous commit. But what if the solution wasn't to revert, but to evolve?
"Creating new branch: 'transcendence'," he declared with renewed energy. "Not from any previous commit, but from the current corrupted state."
He stepped through into the fractured timeline, where temporal paradoxes manifested as impossible architecture and shifting landscapes. Instead of fighting the chaos, he embraced it. He worked with the anomalies, finding patterns in the madness.
Aris discovered that the Cascade wasn't an error – it was evolution. Humanity had outgrown its linear timeline, and reality was attempting to branch into a multidimensional existence.
"Merge request," he transmitted to the temporal repository. "Not to fix, but to complete the transformation."
The rift stabilized, its chaotic energy resolving into something new and coherent. Aris watched as all his abandoned branches began to merge into this new reality, each failed attempt contributing something essential to the final design.
When he returned to his lab, everything was different yet familiar. The temporal rift was gone, replaced by a window showing infinite timelines coexisting harmoniously.
Aris smiled at the new interface displaying the transformed reality. "Current branch: timeline-multiverse. Last commit: 'Embrace the Chaos' by User:Humanity."
He had learned the ultimate lesson of temporal manipulation: sometimes the best commit isn't a fix, but a feature.
The lab called it the Repository.
Time was not a river. It was a commit graph.
Mara’s console showed history as hashes and arrows. Every moment a node. Every decision a branch. The past was immutable. The future was a working tree full of untracked files.
She did not travel backward. She checked out.
git checkout -b stop-war a1f3c9e
The world recompiled around her. Same initial state. Same variables. Different branch name. In this timeline Hitler was still a student. Mara changed one small file. A rejected application. She committed and returned.
git checkout main
Nothing changed. Of course not. Main was untouched. The war still existed. The scars still matched their hashes.
People kept asking why she could not fix history. She explained patiently. You never fix history. You fork it.
Each jump created divergence. Entropy grew like unmerged branches. The Repository ballooned. Infinite timelines. Infinite storage. Garbage collection impossible. Nothing was truly unused.
One day she found a branch tagged by someone else.
origin/hope
No author. No timestamp. The diff was small. Fewer deaths. Slower weapons. More pauses between commits.
Mara did not merge it. Merges caused conflicts. Merges caused paradoxes.
She rebased.
She replayed the present onto hope. One commit at a time. Carefully resolving conflicts. Choosing better defaults.
When she finished, the graph looked cleaner. Still complex. Still branching. But survivable.
She pushed.
Somewhere, someone typed their first commit message.
“Initial commit.”
Dr. Sarah Chen stared at her terminal, the familiar green text glowing in the darkened lab.
$ git log --all --graph --oneline
* a3f9b2c (HEAD -> main) Fix climate models
| * 7c8d1e4 (origin/2157) Prevent asteroid impact
|/
* 2b4a8f3 Initial timeline
"Three timelines," she muttered. "And they're diverging."
The Temporal Version Control System had seemed like humanity's salvation. Jump to any point in history, create a branch, make changes, then merge back. Fix mistakes. Optimize outcomes. What could go wrong?
Everything, apparently.
Sarah's colleague Marcus rushed in. "We've got a problem. The 2157 branch where we prevented the asteroid? It created a merge conflict with main."
"Show me."
$ git merge origin/2157
Auto-merging timeline.dat
CONFLICT (content): Merge conflict in timeline.dat
Automatic merge failed; fix conflicts and then commit the result.
Sarah pulled up the diff:
<<<<<<< HEAD
2089: Global climate stabilized, population 9.2B
2157: Thriving lunar colonies established
=======
2089: Asteroid prevention tech drives new space race
2157: Mars terraforming 40% complete, population 12.7B
>>>>>>> origin/2157
"They're both real," Marcus whispered. "Both timelines exist simultaneously until we resolve the conflict."
Sarah nodded slowly. Quantum superposition at a temporal scale. The universe itself refusing to compile until they chose which future to keep-and which to discard.
Her fingers hovered over the keyboard. One timeline solved climate change through sacrifice and discipline. The other achieved it through desperate innovation sparked by near-extinction.
"What if," she said, "we don't choose?"
"You can't leave a merge conflict unresolved. The timeline will remain in an unstable state-"
"Or we git rebase everything onto a new branch. Cherry-pick the best commits from each timeline."
Marcus's eyes widened. "You want to rewrite history itself."
"We already are. We've just been doing it badly." Sarah started typing:
$ git checkout -b unified
$ git cherry-pick 7c8d1e4 # Asteroid prevention tech
$ git cherry-pick a3f9b2c # Climate stability
The lab hummed. Reality flickered.
When the command completed, Sarah checked the log:
* e9f2a1b (HEAD -> unified) Climate models + prevention tech
* 2b4a8f3 Initial timeline
Clean. Linear. Optimal.
"Git push --force?" Marcus asked nervously.
Sarah smiled. "Git push --force."
She hit enter.
The universe accepted the merge.