What counts as a Boundev task?
If it touches LLMs, embeddings, agents, retrieval, evals, or the cost of running them in production — we ship it. Eight recurring task types, each shippable inside a single subscription cycle.
Eight task types we ship every week.
Each card is a real, recurring engagement type with an example outcome from production. Pick what matches your roadmap — or bring something close and we'll tell you on the call.
Production retrieval-augmented generation over your docs, knowledge bases, or product data. Chunking, hybrid search, eval harness, drift monitoring. Example: 12K-page legal-tech support archive — 40% faster support response time.
Multi-step autonomous agents that own a workflow end-to-end. LangGraph, CrewAI, or pure code. Example: daily competitor research agent that produces a Slack briefing for a Series B SaaS.
Production-grade Model Context Protocol servers exposing your SaaS data to Claude, Cursor, and other MCP clients. Example: in-product Claude assistant for a DevTools customer.
OpenAI, Anthropic, Gemini, OpenRouter, OSS — production-grade with cost controls and eval-driven swaps. Example: GPT-4 → Claude 3.7 migration with eval suite. 60% cost cut, no quality regression.
Automating internal ops with LLMs. Lead qualification, content moderation, classification at scale. Example: 200 inbound leads/day scored automatically with human-in-the-loop checkpoints.
Audit and reduce LLM/inference spend without quality loss. Prompt caching, model routing, batching, eval-driven model swaps. Example: $48K/mo → $19K/mo for a Series A SaaS.
Unit tests for LLMs. Catch RAG regressions and prompt drift in CI before they ship. Example: RAG eval suite running on every PR for a vertical SaaS team.
Pinecone / Weaviate / Qdrant / pgvector setup, chunking strategy, hybrid search. Example: Pinecone → self-hosted Weaviate migration for 70% lower vector infra spend.
What counts as one task?
Roughly: anything a senior AI engineer can ship in 5–7 days with AI-augmented tooling. If your task is bigger than that, we scope it as Enterprise — never as a surprise charge.
- 01Build a RAG pipeline over one document source with eval harness and production deploy.
- 02Ship an MCP server exposing 5–10 tools from your existing API.
- 03Add semantic search to your existing app (embeddings + vector DB + UI integration).
- 04Build a customer-support triage agent with Slack escalation and human-in-the-loop.
- 05Optimize your AWS Bedrock + OpenSearch spend with documented before/after.
- 06Build a daily research agent that produces a Slack briefing on competitors or industry signal.
- 07Wire up a production eval pipeline for an existing RAG or agent stack.
- 08Migrate from one LLM provider to another, eval-first, with no quality regression.
What we don't take on (yet).
We're honest about scope so the engagement doesn't sour mid-flight. We say no to roughly 30% of inbound tasks for one of these reasons.
- 01Mobile app builds — we ship the AI feature, not the iOS shell.
- 02Pure data engineering — if there's no LLM, agent, or retrieval, it's not us.
- 03Compliance work without a partner — we ship inside your SOC 2 / HIPAA boundary, but we don't run the audit.
- 04Anything heavier than ~30 hours per task — that's an Enterprise engagement with a custom SOW.
- 05Tasks where the spec is genuinely undefined — we scope first, then build.
Common task questions.
Have a task that doesn't fit any of these?
We'll tell you in 20 minutes whether it's a one-week task, an Enterprise engagement, or something we won't take on. Either way, you leave with a written scope.
