Contents
NovaQuantiX is an AI engineering practice founded by Julien Compain. We build production-grade artifacts — not prototypes, not slides — and we transfer full ownership at the end of every engagement. Each of the four services below is scoped, priced, and delivered the same way: fixed-price phases, signed builds, replayable evaluations, and a documented handover.
1 · Custom MCP servers
What it is. A custom Model Context Protocol server is the bridge between your internal systems (APIs, databases, document stores, business logic) and AI agents such as Claude Desktop, Cursor, Windsurf, Cline, or any headless orchestrator. We design, build, harden, and operate that bridge.
What we deliver
- A standalone repository (TypeScript or Python) implementing the MCP server, with stateful tools, streaming I/O, structured logging, and OpenTelemetry-compatible traces.
- Schema-first tool definitions: each tool exposes a Zod (TS) or Pydantic (Python) input schema, an explicit error contract, and permission scopes.
- Authentication layer: OAuth 2.1 / mTLS / API keys depending on your internal stack, with refresh and rotation built in.
- Observability: per-tool metrics, latency histograms, error budget tracking, and replayable session logs.
- Production hardening: rate-limit, circuit breaker, retry-with-backoff, input validation, secret scoping, and supply-chain attestation.
- Ed25519-signed releases plus a Merkle-chained build log so any deployment can be verified post-hoc.
- Runbook: deployment procedure, on-call playbook, version-bump checklist, and integration documentation for downstream agents.
Typical use cases
- Exposing an internal CRM, ticketing system, or data warehouse to Claude for sales / support / analytics workflows.
- Wrapping a domain-specific search engine, a vector database, or a legal document store as MCP tools.
- Building safe write-side tools (issue creation, file modification, calendar updates) with approval gates and audit trails.
Timeline
2 to 6 weeks depending on the number of tools, the depth of integration, and the security posture required. The first deliverable (working skeleton) lands at the end of week 1.
2 · Open-weight fine-tuning
What it is. We take a recent open-weight model (DeepSeek V4, Kimi K2.6, GLM 5.1, Qwen 3.7, Gemma 4, Llama 4) and adapt it to your domain — your documents, your tone, your tool patterns, your guardrails — under reproducible runs that you can re-execute any time without us.
Methods we use
- QLoRA / LoRA for cost-efficient domain adaptation with limited GPU budget.
- Full fine-tuning when the domain shift is large or when latency-critical paths benefit from a smaller, distilled model.
- GRPO (Group Relative Policy Optimization) for reasoning-heavy tasks and verifiable-reward training, popularised by DeepSeek and now natively supported in Unsloth Studio and torchtune.
- DPO & KTO for preference alignment without a separate reward model.
- Distillation of a frontier model (e.g. Claude Opus 4.8 outputs) into a smaller, deployable open-weight model.
What we deliver
- A dataset pipeline: collection, deduplication, decontamination against public benchmarks, formatting, and versioning.
- A training repository: dataset loader, training script, hyper-parameter configuration, multi-seed runs, and W&B / MLflow tracking.
- An evaluation suite based on Inspect-AI with pinned datasets and deterministic seeds — your team can re-run the exact same eval on day 365 and get the same numbers.
- Quantised inference artifacts (GGUF, AWQ, MLX as applicable) and a vLLM or SGLang serving config.
- A signed model card documenting training data, evaluation results, known limitations, and intended use.
Timeline
3 to 8 weeks depending on dataset complexity, model size, and the evaluation surface. Initial baseline eval is delivered within the first 10 days so we can measure every subsequent run against a fixed point.
3 · Agent architecture & audit
What it is. A written architecture review of your existing or planned agentic system — single agent, multi-agent, retrieval pipeline, RL-routed orchestration — covering correctness, safety, latency, cost, and security.
What we deliver
- A written Architecture Decision Record (ADR) describing the recommended topology, model routing strategy, retrieval design, and tool surface.
- A token-and-cost budget per critical path with sensitivity analysis at 1×, 10×, 100× volume.
- A threat model identifying prompt injection vectors, data exfiltration paths, tool-misuse cases, and mitigation strategies.
- A latency budget specifying p50, p95, p99 targets per tool and end-to-end.
- A red-team report with documented attack scenarios, observed behaviour, and remediation.
Typical use cases
- Pre-launch review of a customer-facing agent before going to production.
- Validation of a multi-agent orchestration design (e.g. router + specialists, RL conductor patterns).
- Cost reduction audit on an existing deployment burning more tokens than necessary.
Timeline
2 to 4 weeks. Fixed-scope, fixed-price.
4 · Autonomous agent swarms
What it is. Production agent pipelines — from a single tool-using agent to coordinated swarms of up to 300 sub-agents — built for reliability, replayability, and human-in-the-loop control.
What we deliver
- A deterministic orchestrator with structured state, explicit transitions, and full event sourcing — every run is replayable byte-for-byte.
- Sub-agent fan-out with per-agent budgets (token, time, tool call count), automatic kill-switches, and quorum-based aggregation.
- Human-in-the-loop checkpoints: typed approval gates the user can wire into your existing review surfaces (Slack, dashboards, email).
- Resumable execution: any run can be paused, inspected, edited, forked, and resumed. Failures are first-class state, not exceptions.
- Production telemetry: per-run cost, latency, success rate, and drift-from-baseline alerts.
Typical use cases
- Document research and synthesis at scale (legal, financial, biomedical).
- Code modernisation pipelines (audit, plan, refactor, review).
- Customer support triage with deterministic escalation.
- Compliance review of internal artifacts before publication.
Timeline
4 to 12 weeks depending on the number of agents, the integration surface, and the regulatory environment.
Engagement model
Every engagement follows the same three-phase, fixed-price structure. No hourly billing, no scope creep, no rent-seeking after handover.
- Discovery & architecture (week 1). We map your data flows, agents, and risks. You receive a written Architecture Decision Record, a fixed budget, and a delivery schedule before any code is written.
- Reproducible engineering (weeks 2–6). Each commit triggers an immutable build log, an Ed25519-signed artifact, and an Inspect-AI evaluation run. CI/CD gates are explicit and verified.
- Delivery & ownership transfer (week 6+). We deploy, document, train your team, and transfer full ownership. Source code, signing keys, repositories, deployment credentials, and evaluation baselines are yours.
Pricing & deliverables
We bill in fixed-price phases. Every quote is delivered as a one-page document covering scope, budget, schedule, risks, and acceptance criteria. You receive it within 48 hours of an aligned discovery call.
NovaQuantiX is an independent practice — Julien Compain delivers every engagement directly, with a trusted network of senior collaborators brought in only when the scope explicitly requires it. Our base day rate is €800 – €1 200 per engineering day, aligned with the 2026 European market for senior AI freelancers. Ranges below are scoped per typical effort (in days × day rate). Final pricing depends on scope, depth, regulatory environment, and acceptance criteria.
Custom MCP server
- Focused — 3–5 tools, standard authentication, baseline observability: €8 000 – €15 000 (2–3 weeks, ~10–15 days).
- Standard — 5–15 tools, OAuth 2.1 / mTLS, full observability, hardening: €15 000 – €30 000 (3–5 weeks, ~15–25 days).
- Enterprise — 15+ tools, multi-tenant, compliance requirements, multi-region: €30 000 – €55 000 (5–8 weeks, ~25–40 days).
Open-weight fine-tuning
- Domain adaptation — QLoRA on existing curated dataset, single-objective eval: €14 000 – €28 000 (3–5 weeks, ~15–25 days).
- Reasoning & alignment — GRPO / DPO / KTO on a mid-size open-weight model, full eval suite: €26 000 – €50 000 (5–8 weeks, ~25–40 days).
- Distillation & production — frontier model distilled to a deployable open-weight, quantised inference, serving stack: €44 000 – €75 000 (8–12 weeks, ~40–60 days).
GPU compute (training runs, hyperparameter search, evaluation grids) is passed through at cost on supplier invoices — typically €2 000 – €25 000 depending on model size and number of runs. Data preparation typically accounts for 30–50% of a fine-tuning budget; we scope it explicitly in the proposal.
Agent architecture & audit
- Focused audit — one critical path, written ADR + cost-and-latency budget: €9 000 – €17 000 (2–3 weeks, ~10–15 days).
- Full architecture & red-team — multi-agent topology, threat model, cost budget, documented red-team report: €17 000 – €32 000 (3–5 weeks, ~15–25 days).
Autonomous agent swarms
- Single-flow pipeline — 1–5 specialised agents, 2–4 integrations, HITL on write paths: €18 000 – €40 000 (4–7 weeks, ~20–35 days).
- Multi-agent orchestration — 10–50 sub-agents, RL routing, replay, drift monitoring: €42 000 – €78 000 (8–12 weeks, ~40–60 days).
- Enterprise swarm (extended team) — 50–300 sub-agents, multi-tenant, regulated workloads: €80 000 – €180 000+ (12+ weeks, with two to three senior collaborators added to the engagement under a single contractual envelope).
Compliance adders
Regulated workloads (SOC 2, ISO 27001, ISO 42001, HIPAA, EU AI Act high-risk classification) add structured deliverables on top of any engagement: documented controls, evidence collection, third-party audit support. Expect +€15 000 to +€80 000 depending on the framework, scoped explicitly in the proposal.
Standard deliverables
Every engagement ships with: source repository, Ed25519-signed releases, Merkle-chained build log, runbook, Inspect-AI evaluation suite, model card (when applicable), SBOM (CycloneDX), and a final ownership-transfer document.
For long-running programmes we recommend a two-tier model: a focused build led by NovaQuantiX (3–6 months) followed by your internal team for operations — typically 30–45% lower TCO than running a single mid-market vendor end-to-end.
Frequently asked questions
Do you work with US / non-EU clients?
Yes. We are based in the EU and we have engineers across multiple time zones. Contracts can be drafted under your preferred jurisdiction with prior agreement.
Do you sign NDAs?
Yes. A mutual NDA is signed before any technical discussion. We use a standard MNDA template; your own template is also acceptable.
Do you use customer data to train your own models?
No. Customer data is processed strictly under the contract scope, never used for general model training, and deleted on request at the end of the engagement (see Privacy).
Can we self-host everything you deliver?
Yes — that's the default. Every artifact runs on your infrastructure (cloud, dedicated, or on-premise). We hold no production keys after handover.
What if a fine-tuned model underperforms your baseline?
Acceptance criteria are written into the SOW with quantitative thresholds on the Inspect-AI suite. If a run misses the threshold, we iterate within the agreed budget, or we refund the milestone — at your option.
Do you offer ongoing support after handover?
On request, we offer a structured support agreement (monthly retainer, response-time SLA, quarterly review). It is opt-in and never bundled into the initial engagement.
Contact
Email contact@novaquantix.tech or book a 30-minute discovery call at cal.com/julien-compain. We respond within one business day with a one-page proposal — scope, budget, and risks.