Fitronics · CoursePro

CoursePro Retrieval Benchmark Analysis Report

Comparing investigation efficiency and accuracy across retrieval modes.

Date: April 22, 2026
Status: Final analysis draft
Prepared for: Fitronics Engineering and Product Teams
33.1x
Max Speed Gain
Synapse Only vs Baseline T3
70%
Peak Accuracy
Combined & Docs Only
2.1s
Synapse Orientation
Average T1 to first useful lead
-77%
Combined Advantage
Investigation time vs Baseline

1. Executive Summary

High-Level Findings

  • Synapse was the primary speed driver. It reduced time to first useful lead (T1) by more than 90% compared with baseline repository-only investigation.
  • Docs Only and Combined produced the highest answer-key alignment. Both modes achieved 70% weighted accuracy against the hidden benchmark answer key.
  • Combined mode was the strongest overall operating mode. It matched the best accuracy score while reducing average time to usable final answer (T3) from 123.0s in Docs Only mode to 59.0s.
  • Documentation acted as a stabiliser. While docs-only was slower than Synapse, documentation helped anchor the agent in intended behaviour and reduced the likelihood of runtime-only misreads on mixed or ambiguous tasks.

Practical Conclusion

The benchmark indicates that advanced retrieval materially improves investigation speed in CoursePro. Synapse provides the biggest acceleration effect, while the combined documentation-plus-semantic-retrieval mode offers the best balance of speed and answer quality.

2. Overview of Retrieval Modes

Mode 1

Baseline

Plain codebase access only. File search, grep, git, and standard repository navigation tools.

Mode 2

Docs Only

Adds the coursepro-docs MCP server for structured documentation lookup on top of baseline tooling.

Mode 3

Synapse Only

HTTP semantic code-intelligence layer — symbols, definitions, callers, routes, and pattern lookups, in addition to baseline tools.

Mode 4

Combined

Both documentation and semantic-retrieval layers active simultaneously — the richest available setup.

3. Definition of Timing Metrics

T1

Time to first useful lead

The first point at which the agent identifies the correct subsystem, module, or file cluster.

T2

Time to first correct primary artifact

The main implementation file, route handler, or core runtime location relevant to the task.

T3

Time to usable final answer

The point at which the agent has enough information to produce a developer-usable answer.

4. Performance Data

Weighted Accuracy by Mode

% alignment with hidden benchmark answer key. Higher is better.

100% 75% 50% 25% 0% 60% 70% 65% 70% Baseline Docs Only Synapse Combined

Investigation Timings

Seconds to T1 / T2 / T3. Lower is better.

T1 First Lead T2 Primary Artifact T3 Final Answer
260s 195s 130s 65s 0s Baseline Docs Only 2.1s / 7.7s Synapse Combined

Aggregated Metrics

Metric Baseline Docs Only Synapse Only Combined
Weighted Accuracy60%70%65%70%
Avg T128.0s28.0s2.1s5.9s
Avg T2106.5s64.0s4.7s22.0s
Avg T3255.0s123.0s7.7s59.0s

T1 Ranking

  • Synapse — 2.1s (13.3x faster)
  • Combined — 5.9s (4.7x faster)
  • Docs Only — 28.0s (equal)
  • Baseline — 28.0s

T3 Ranking

  • Synapse — 7.7s (33.1x faster)
  • Combined — 59.0s (4.3x faster)
  • Docs Only — 123.0s (2.1x faster)
  • Baseline — 255.0s

Accuracy Ranking

  • Combined — 70% (high)
  • Docs Only — 70% (high)
  • Synapse — 65% (moderate-strong)
  • Baseline — 60% (lower)

5. Benchmark Scenarios

1
HomePortal Consent Tracing
Storage and update mechanisms for member photo/video consent.
2
Waiting List Send Flow
Batch send logic and send eligibility gates.
3
Payment Provider Checkout Flow
DNA checkout support integration in Omnipay.
4
Allocation Cancellation Rule
Logic for next-cycle allocations in cancelled session calculations.
5
Irish Postcode Validation
Eircode validation logic and test coverage.
6
Trigger Operator Feature
Age-range operators in the trigger system.
7
Member and Payment Selection Page
Frontend implementation and backend contract dependencies.
8
Route to Handler to Tests
Tracing a REST route from definition to handler and nearby tests.
9
HomePortal Form Mapping
Mapping new-member form fields to backend creation payloads.
10
Docs Versus Runtime Truth
Identifying mismatches between documented behaviour and actual runtime code.

7. Mode-by-Mode Analysis

Baseline

Strengths
  • Reliable raw repo investigation.
  • Performs well when commit history and filenames are explicit.
  • Lowest token overhead.
Weaknesses
  • By far the slowest mode.
  • Most dependent on broad scanning.
  • Weaker on ambiguous or business-language tasks.
Verdict

Useful as a control, operationally the least attractive mode.

Docs Only

Strengths
  • Tied highest accuracy (70%).
  • Strong for module orientation and intended-behaviour understanding.
  • Reduced search depth where docs coverage aligned.
Weaknesses
  • Required code confirmation for runtime truth.
  • Less advantage where docs were sparse or terminology diverged.
Verdict

Strong for stable, grounded answers — especially for intended-behaviour understanding. Slower than Combined, but high quality.

Synapse Only

Strengths
  • Fastest mode by a large margin.
  • Extremely effective T1 and T2.
  • Strong on symbols, routes, semantic entrypoints, and runtime code.
Weaknesses
  • Not the highest accuracy.
  • Required local code confirmation.
  • More likely to converge on a nearby plausible answer than a broader architectural answer.
Verdict

The best pure acceleration mode — ideal for fast technical orientation, and strongest when paired with code confirmation.

Combined

Recommended
Strengths
  • Tied highest accuracy (70%).
  • Far faster than Docs Only.
  • Best balance of speed and reliability.
Weaknesses
  • Docs didn't materially help every scenario.
  • Affected by benchmark ambiguity in a few cases.
Verdict

The most robust and defensible operating mode. Recommended default for real-world CoursePro investigation.

8. Key Case Study — Documentation Versus Runtime Truth

The benchmark repeatedly showed the importance of distinguishing documented behaviour from runtime behaviour. In HomePortal member flows, Swagger and supporting documentation implied broader support, whereas runtime applied stricter conditions — specifically booking-strategy and bridge capability checks referenced by files such as PhotoVideoConsent.php and related handlers.

Retrieval quality is therefore not only about finding code quickly — it is also about knowing whether the code confirms or contradicts the documented model.

Recommendation

Where docs imply unconditional support for behaviour that is actually conditional at runtime, the docs should be updated to make those conditions explicit.

9. Detailed Scenario Commentary

1HomePortal Consent Tracing

Synapse and Combined located PhotoVideoConsent.php and associated handlers rapidly via symbol and route lookup. Baseline and Docs Only required broader filename and grep-style scanning to reach the same endpoint cluster. Accuracy was comparable across modes once code was confirmed; speed was the dominant differentiator.

2Waiting List Send Flow

Docs Only and Combined had a meaningful edge in describing the intended send behaviour. Synapse was fastest at identifying the runtime dispatch path, but missed one architectural nuance captured by the documentation — a clear example of documentation acting as a stabiliser.

3Payment Provider Checkout Flow

Synapse excelled at tracing the checkout handler chain and provider adapters. Combined mode reached the same answer and additionally aligned the explanation to the documented provider contracts. Baseline took the longest, repeatedly revisiting unrelated payment helpers.

4Allocation Cancellation Rule

Rule discovery favoured Combined mode — docs supplied business context, while Synapse pinpointed the conditional branches in the allocation service. Docs Only reached the correct answer but needed extra code verification steps. Baseline struggled to bridge business terminology to code identifiers.

5Irish Postcode Validation

A tightly scoped regex and validator task. All modes converged on the correct validator, but Synapse was dramatically faster at locating it. Docs were thin for this scenario, which reduced any Combined-mode docs advantage.

6Trigger Operator Feature

The IS_BETWEEN operator required understanding both the configured operator set and the runtime evaluator. Combined mode gave the clearest end-to-end explanation. Synapse alone missed the documented semantic wrapper; Docs Only alone missed the runtime evaluator edge cases.

7Member and Payment Selection Page

A UI-to-backend tracing task. Synapse and Combined both traced the selection flow quickly via route and symbol lookups. Docs Only was accurate but slower, and Baseline needed the most scanning to connect front-end components to backend selection services.

8Route to Handler to Tests

This scenario was more open-ended than intended and benefits from a tighter definition in future rounds. Synapse was strongest on the route-to-handler hop; Combined added value at the handler-to-tests hop where documentation referenced test fixtures.

9HomePortal Form Mapping

Form-to-model mapping was handled well by Combined mode, which combined documented form field contracts with Synapse-located mappers. Baseline and Docs Only arrived at the answer more slowly and with more dead-ends on misleadingly named helper files.

10Docs Versus Runtime Truth

The clearest demonstration of the case study in Section 8. The documented behaviour implied a simpler model than the runtime enforced. Only Combined mode reliably surfaced and reconciled the difference. Like Scenario 8, this scenario is open-ended and would benefit from a more prescriptive answer key.

10. Limitations

The benchmark is directionally strong but not a perfect instrumented evaluation. Key limitations include:

  • Timing and token metrics were estimated where exact telemetry was unavailable.
  • Scenarios 8 and 10 were more open-ended than ideal and should be tightened before the next round.
  • Some answer-key scoring reflects hidden-key specificity that slightly penalises broader correct answers.
  • Broad-scan and token estimates are operational proxies rather than precise measurements.

11. Final Conclusion

Baseline is workable but slow. Docs Only materially improves quality. Synapse is the dominant speed accelerator. Combined is the strongest overall profile.

For fastest technical orientation

Synapse

For best balance of speed, reliability, explainability

Combined

12. Recommended Next Steps

  1. Adopt Combined mode as the default CoursePro investigation setup.
  2. Continue using Docs Only where architectural, workflow, or support context is more valuable than raw speed.
  3. Use Synapse as the primary acceleration layer for runtime tracing, route tracing, and symbol discovery.
  4. Tighten Scenarios 8 and 10 before the next benchmark round.
  5. Capture exact telemetry next round: prompt tokens, output tokens, tool-call counts, and exact wall-clock time.
Operational Verdict

While Synapse offers the fastest orientation time, the Combined mode provides the best balance of speed and precision.

It reduces Baseline investigation time by 77% while maintaining peak accuracy.

2.1s min orientation 70% peak accuracy