Cost Efficiency
As your project grows, so does your knowledge base. Without smart context selection, you'd dump everything into every prompt — burning tokens on irrelevant information. Engrams solves this with intelligent, relevance-based selection that keeps costs down while keeping your AI informed.
The "$1 per MB" Problem
The concern is real: at typical LLM pricing, sending large amounts of context is expensive.
The Math
- Input tokens: ~$0.003 per 1K tokens (varies by model)
- 1 MB of text: ~200K tokens
- Cost per MB: ~$0.60 per request
- Daily cost: 10 requests × $0.60 = $6/day = $180/month
If you're working on a large project with 50+ decisions, 20+ patterns, and extensive glossaries, dumping all of it into every prompt gets expensive fast.
The Traditional Approach (Expensive)
You: "Add a new API endpoint"
AI: [Loads entire Engrams database]
- 50 decisions (all of them)
- 20 patterns (all of them)
- 100+ glossary terms (all of them)
- Full knowledge graph (all relationships)
Total: ~500KB of context
Cost: ~$0.30 per request
But only 5-10 items are actually relevant to the task! The Smart Approach (Efficient)
You: "Add a new API endpoint"
AI: [Engrams relevance selection activates]
1. Semantic search: Find items related to "API endpoint"
2. Relevance scoring: Rank by relevance to the task
3. Load only what matters:
- Decision #7: JWT authentication
- Decision #14: Rate limiting
- Pattern #3: Error handling
- Pattern #5: API response format
- 2-3 glossary terms
Total: ~50KB of context
Cost: ~$0.03 per request
90% cost reduction, same quality output! Scoring & Selection Algorithm
Engrams uses a multi-factor scoring system to rank items by relevance:
Relevance Factors
| Factor | Weight | How It Works |
|---|---|---|
| Semantic Similarity | 40% | Vector embedding similarity to the current task. "API endpoint" matches decisions about REST, HTTP, routing. |
| Tag Matching | 25% | Exact tag matches. If you're working on "authentication", items tagged "auth" score higher. |
| Code Bindings | 20% | Items bound to files you're editing. If you open src/auth/middleware.py, auth-related items surface automatically. |
| Recency | 10% | Recently modified items score slightly higher (you're probably still thinking about them). |
| Governance Scope | 5% | Team-level decisions always included. Individual decisions only if relevant. |
Real-World Cost Comparison
Let's compare costs for a realistic project over one month:
Scenario: Building a Task Management API
- 50 architectural decisions
- 20 system patterns
- 100+ glossary terms
- 30 active tasks
- 10 development sessions per day
- 5 requests per session (50 requests/day)
Cost Without Smart Selection
Approach: Dump entire database into every prompt
Per request:
- 50 decisions (full text): ~2,000 tokens
- 20 patterns (full text): ~1,500 tokens
- 100+ glossary terms: ~1,000 tokens
- Knowledge graph: ~500 tokens
- Total: ~5,000 tokens per request
- Cost per request: 5,000 × $0.003/1K = $0.015
Daily cost: 50 requests × $0.015 = $0.75
Monthly cost: $0.75 × 30 = $22.50 Cost With Engrams Smart Selection
Approach: Relevance-ranked context selection
Per request:
- Semantic search finds relevant items
- Scoring ranks by relevance
- Only top items included
- Average context: ~2,000 tokens
- Cost per request: 2,000 × $0.003/1K = $0.006
Daily cost: 50 requests × $0.006 = $0.30
Monthly cost: $0.30 × 30 = $9.00
Savings: $22.50 - $9.00 = $13.50/month (60% reduction) Scaling Benefits
The larger your project, the bigger the savings:
| Project Size | Without Smart Selection | With Smart Selection | Savings |
|---|---|---|---|
| Small (10 decisions) | $5/month | $4/month | 20% |
| Medium (50 decisions) | $22.50/month | $9/month | 60% |
| Large (200+ decisions) | $90/month | $15/month | 83% |
| Enterprise (500+ decisions) | $225/month | $25/month | 89% |
Key insight: As your project grows, smart selection saves more money because irrelevant items are filtered out.
Optimization Strategies
Here are practical ways to get the most relevant context at the lowest cost:
1. Use Code Bindings
Bind decisions and patterns to specific code paths. When you edit a file, only relevant context loads:
engrams bind --decision 7 --pattern "src/auth/**/*.py"
Now when you edit src/auth/middleware.py:
Decision #7 (JWT auth) loads automatically
Unrelated decisions (database, caching) don't load
Context is smaller, cost is lower 2. Tag Strategically
Use consistent tags so semantic search works better:
Decision: "Use PostgreSQL for primary database"
Tags: ["database", "architecture", "persistence"]
Decision: "Use Redis for caching"
Tags: ["caching", "performance", "infrastructure"]
Now searching for "persistence" finds the PostgreSQL decision.
Searching for "performance" finds the Redis decision. 3. Use Glossaries Efficiently
Glossary terms are cheap (low token count) but valuable. Use them for:
- Domain-specific terminology
- API schemas and data structures
- Common abbreviations and acronyms
- Team conventions
Comparison: Engrams vs. Alternatives
| Approach | Setup | Cost/Month | Scalability | Flexibility |
|---|---|---|---|---|
| Manual Copy-Paste | None | $22.50 (50 decisions) | Poor (gets worse as project grows) | High (you control what's included) |
| Dump Everything | Simple | $22.50 (50 decisions) | Poor (costs grow with project) | Low (all or nothing) |
| Engrams Smart Selection | 5 minutes | $9.00 (50 decisions) | Excellent (costs stay flat) | High (semantic + bindings + tags) |
| Custom MCP Server | Days/weeks | $9.00 (if you build it) | Depends on implementation | Very high (but requires coding) |
FAQ: Cost Questions
Q: Does smart context selection reduce the quality of AI responses?
A: No. The scoring algorithm prioritizes relevance, so you get the most important context. In fact, less noise often leads to better responses because the AI isn't distracted by irrelevant information.
Q: Can I see what context was actually loaded for a request?
A: Yes. Engrams logs which items were selected and why. You can review this in the dashboard or export the logs.
Q: Does semantic search cost extra?
A: No. Embeddings are generated locally (using Ollama) and cached. There's no per-request cost.
Q: How is Engrams different from a RAG system?
A: Engrams is a project memory and governance platform, not a document retrieval pipeline. While it uses semantic search under the hood, what it actually provides is structured, linked project knowledge — decisions, patterns, progress, governance rules — that grows with your team and enforces standards. General RAG pipelines don't have governance, bindings, or the MCP-native tooling that makes Engrams work seamlessly inside your AI coding assistant.
Summary
Smart context selection is Engrams' answer to the "$1 per MB" problem:
- Smart selection: Only relevant items are included
- Cost reduction: 60-89% savings depending on project size
- Scaling benefits: Larger projects save more money
- No quality loss: Better focus, better responses
- Flexible optimization: Code bindings, tags, and strategic tagging
With Engrams, your AI stays informed without burning tokens on irrelevant context.