Cost Efficiency

As your project grows, so does your knowledge base. Without smart context selection, you'd dump everything into every prompt — burning tokens on irrelevant information. Engrams solves this with intelligent, relevance-based selection that keeps costs down while keeping your AI informed.

The "$1 per MB" Problem

The concern is real: at typical LLM pricing, sending large amounts of context is expensive.

The Math

  • Input tokens: ~$0.003 per 1K tokens (varies by model)
  • 1 MB of text: ~200K tokens
  • Cost per MB: ~$0.60 per request
  • Daily cost: 10 requests × $0.60 = $6/day = $180/month

If you're working on a large project with 50+ decisions, 20+ patterns, and extensive glossaries, dumping all of it into every prompt gets expensive fast.

The Traditional Approach (Expensive)

You: "Add a new API endpoint"

AI: [Loads entire Engrams database]
    - 50 decisions (all of them)
    - 20 patterns (all of them)
    - 100+ glossary terms (all of them)
    - Full knowledge graph (all relationships)

    Total: ~500KB of context
    Cost: ~$0.30 per request

    But only 5-10 items are actually relevant to the task!

The Smart Approach (Efficient)

You: "Add a new API endpoint"

AI: [Engrams relevance selection activates]
    1. Semantic search: Find items related to "API endpoint"
    2. Relevance scoring: Rank by relevance to the task
    3. Load only what matters:
       - Decision #7: JWT authentication
       - Decision #14: Rate limiting
       - Pattern #3: Error handling
       - Pattern #5: API response format
       - 2-3 glossary terms

    Total: ~50KB of context
    Cost: ~$0.03 per request

    90% cost reduction, same quality output!

Scoring & Selection Algorithm

Engrams uses a multi-factor scoring system to rank items by relevance:

Relevance Factors

Factor Weight How It Works
Semantic Similarity 40% Vector embedding similarity to the current task. "API endpoint" matches decisions about REST, HTTP, routing.
Tag Matching 25% Exact tag matches. If you're working on "authentication", items tagged "auth" score higher.
Code Bindings 20% Items bound to files you're editing. If you open src/auth/middleware.py, auth-related items surface automatically.
Recency 10% Recently modified items score slightly higher (you're probably still thinking about them).
Governance Scope 5% Team-level decisions always included. Individual decisions only if relevant.

Real-World Cost Comparison

Let's compare costs for a realistic project over one month:

Scenario: Building a Task Management API

  • 50 architectural decisions
  • 20 system patterns
  • 100+ glossary terms
  • 30 active tasks
  • 10 development sessions per day
  • 5 requests per session (50 requests/day)

Cost Without Smart Selection

Approach: Dump entire database into every prompt

Per request:
  - 50 decisions (full text): ~2,000 tokens
  - 20 patterns (full text): ~1,500 tokens
  - 100+ glossary terms: ~1,000 tokens
  - Knowledge graph: ~500 tokens
  - Total: ~5,000 tokens per request
  - Cost per request: 5,000 × $0.003/1K = $0.015

Daily cost: 50 requests × $0.015 = $0.75
Monthly cost: $0.75 × 30 = $22.50

Cost With Engrams Smart Selection

Approach: Relevance-ranked context selection

Per request:
  - Semantic search finds relevant items
  - Scoring ranks by relevance
  - Only top items included
  - Average context: ~2,000 tokens
  - Cost per request: 2,000 × $0.003/1K = $0.006

Daily cost: 50 requests × $0.006 = $0.30
Monthly cost: $0.30 × 30 = $9.00

Savings: $22.50 - $9.00 = $13.50/month (60% reduction)

Scaling Benefits

The larger your project, the bigger the savings:

Project Size Without Smart Selection With Smart Selection Savings
Small (10 decisions) $5/month $4/month 20%
Medium (50 decisions) $22.50/month $9/month 60%
Large (200+ decisions) $90/month $15/month 83%
Enterprise (500+ decisions) $225/month $25/month 89%

Key insight: As your project grows, smart selection saves more money because irrelevant items are filtered out.

Optimization Strategies

Here are practical ways to get the most relevant context at the lowest cost:

1. Use Code Bindings

Bind decisions and patterns to specific code paths. When you edit a file, only relevant context loads:

engrams bind --decision 7 --pattern "src/auth/**/*.py"

Now when you edit src/auth/middleware.py:
Decision #7 (JWT auth) loads automatically
Unrelated decisions (database, caching) don't load
Context is smaller, cost is lower

2. Tag Strategically

Use consistent tags so semantic search works better:

Decision: "Use PostgreSQL for primary database"
Tags: ["database", "architecture", "persistence"]

Decision: "Use Redis for caching"
Tags: ["caching", "performance", "infrastructure"]

Now searching for "persistence" finds the PostgreSQL decision.
Searching for "performance" finds the Redis decision.

3. Use Glossaries Efficiently

Glossary terms are cheap (low token count) but valuable. Use them for:

  • Domain-specific terminology
  • API schemas and data structures
  • Common abbreviations and acronyms
  • Team conventions

Comparison: Engrams vs. Alternatives

Approach Setup Cost/Month Scalability Flexibility
Manual Copy-Paste None $22.50 (50 decisions) Poor (gets worse as project grows) High (you control what's included)
Dump Everything Simple $22.50 (50 decisions) Poor (costs grow with project) Low (all or nothing)
Engrams Smart Selection 5 minutes $9.00 (50 decisions) Excellent (costs stay flat) High (semantic + bindings + tags)
Custom MCP Server Days/weeks $9.00 (if you build it) Depends on implementation Very high (but requires coding)

FAQ: Cost Questions

Q: Does smart context selection reduce the quality of AI responses?

A: No. The scoring algorithm prioritizes relevance, so you get the most important context. In fact, less noise often leads to better responses because the AI isn't distracted by irrelevant information.

Q: Can I see what context was actually loaded for a request?

A: Yes. Engrams logs which items were selected and why. You can review this in the dashboard or export the logs.

Q: Does semantic search cost extra?

A: No. Embeddings are generated locally (using Ollama) and cached. There's no per-request cost.

Q: How is Engrams different from a RAG system?

A: Engrams is a project memory and governance platform, not a document retrieval pipeline. While it uses semantic search under the hood, what it actually provides is structured, linked project knowledge — decisions, patterns, progress, governance rules — that grows with your team and enforces standards. General RAG pipelines don't have governance, bindings, or the MCP-native tooling that makes Engrams work seamlessly inside your AI coding assistant.

Summary

Smart context selection is Engrams' answer to the "$1 per MB" problem:

  • Smart selection: Only relevant items are included
  • Cost reduction: 60-89% savings depending on project size
  • Scaling benefits: Larger projects save more money
  • No quality loss: Better focus, better responses
  • Flexible optimization: Code bindings, tags, and strategic tagging

With Engrams, your AI stays informed without burning tokens on irrelevant context.