Knowledge Base

OWL includes a RAG (Retrieval-Augmented Generation) knowledge base powered by ChromaDB for semantic search over your documents.

Overview

The knowledge base allows you to:

Add text documents (markdown, code, plain text)
Search semantically (not just keywords)
Get relevant context in conversations

Everything runs locally using Ollama embeddings.

Adding Documents

/learn command

/learn README.md
/learn docs/architecture.md
/learn ~/notes/project-spec.txt

Output:

Learning from /home/user/project/README.md...
Learned from README.md (12 chunks)

What Happens

Document is read and parsed
Content is split into ~500 token chunks
Each chunk is embedded using nomic-embed-text
Embeddings are stored in ChromaDB

Searching

/knowledge search

/knowledge search authentication flow

Output:

Search Results (3)

README.md (0.85)
  Authentication is handled by the auth middleware...

architecture.md (0.72)
  The auth flow starts when a user submits credentials...

spec.txt (0.68)
  Users must authenticate before accessing protected...

Automatic Search

During conversations, OWL automatically searches the knowledge base when relevant:

you: How does authentication work in this project?

[OWL searches knowledge base]
[Finds relevant chunks]
[Includes them in context]

owl: Based on the project documentation, authentication works as follows...

Managing Knowledge

View Stats

/knowledge

Output:

Knowledge Base
  Total chunks: 42
  Sources: 3

Sources:
  - README.md (12 chunks)
  - architecture.md (20 chunks)
  - spec.txt (10 chunks)

Remove Documents

/unlearn README.md

Output:

Removed: README.md

Supported Formats

Format	Extension	Notes
Markdown	`.md`	Full support
Plain text	`.txt`	Full support
Code files	`.py`, `.js`, `.ts`, etc.	Treated as text
Config files	`.yaml`, `.json`, `.toml`	Treated as text

Note: Binary formats like PDF are not currently supported. Convert to text first.

How It Works

Chunking

Documents are split into chunks by paragraph:

Chunk size: ~500 tokens
Splits on blank lines (paragraph boundaries)
Keeps related content together

Embedding

Chunks are embedded using Ollama's nomic-embed-text model:

768-dimensional vectors
Semantic meaning preserved
Similar content = similar vectors

Storage

ChromaDB stores:

Chunk text
Embedding vector
Metadata (source file, project, timestamp)

Location: ~/.owl/knowledge/chroma/

Retrieval

When searching:

Query is embedded
ChromaDB finds nearest neighbors
Top 3 chunks returned
Included in LLM context

Project Scoping

Knowledge searches are scoped by project:

# In project A
/learn docs/api.md    # Added to project A

# Switch to project B
/project ~/project-b
/knowledge search api  # Won't find project A docs

Best Practices

What to Add

Good candidates:

Project documentation
Architecture decisions
API specifications
Team guidelines
Complex code explanations

What NOT to Add

Avoid:

Frequently changing files
Generated documentation
Entire codebases (use tools instead)
Sensitive information

Keeping Knowledge Fresh

When documents change:

# Re-learn to update
/learn docs/api.md  # Replaces old version

Organization

Keep related documents together:

/learn docs/architecture.md
/learn docs/api.md
/learn docs/deployment.md

Troubleshooting

"No embedding model"

Ensure you have the embedding model:

ollama pull nomic-embed-text

Slow Learning

Large documents take time to embed. For very large files:

Split into smaller documents
Learn incrementally

No Results

If searches return nothing:

Check document was learned: /knowledge
Try different query terms
Ensure you're in the right project

Technical Details

ChromaDB Collection

Each project gets a collection:

Name: owl_knowledge
Distance: cosine similarity
Persistence: ~/.owl/knowledge/chroma/

Embedding Model

Default: nomic-embed-text

768 dimensions
Good semantic understanding
Runs locally via Ollama

Query Parameters

Top K: 3 chunks returned
Similarity threshold: None (returns top K regardless)
Project filter: Applied when project is set

Overview​

Adding Documents​

/learn command​

What Happens​

Searching​

/knowledge search​

Automatic Search​

Managing Knowledge​

View Stats​

Remove Documents​

Supported Formats​

How It Works​

Chunking​

Embedding​

Storage​

Retrieval​

Project Scoping​

Best Practices​

What to Add​

What NOT to Add​

Keeping Knowledge Fresh​

Organization​

Troubleshooting​

"No embedding model"​

Slow Learning​

No Results​

Technical Details​

ChromaDB Collection​

Embedding Model​

Query Parameters​

Overview

Adding Documents

/learn command

What Happens

Searching

/knowledge search

Automatic Search

Managing Knowledge

View Stats

Remove Documents

Supported Formats

How It Works

Chunking

Embedding

Storage

Retrieval

Project Scoping

Best Practices

What to Add

What NOT to Add

Keeping Knowledge Fresh

Organization

Troubleshooting

"No embedding model"

Slow Learning

No Results

Technical Details

ChromaDB Collection

Embedding Model

Query Parameters