How it works · AI-pair coding assessment

A 60-minute session, beat by beat.

Six phases of a technical screen built around real work with an AI pair. Calm transitions between them; the candidate always knows where they are and how much time is left.

~3 min

Invitation & identity

The candidate opens the invite, confirms identity, and reads the consent — including that Lyra is fallible and reviewing Lyra's work is part of the assessment.

consentreal product

~1 min

Workspace provisioning

We spin up a real container with a real editor and a real codebase. The clock doesn't start until everything is ready — the candidate isn't penalized for our infra.

provisioningreal product

~45 min

The work, with a pair

The candidate writes a realistic full-stack task with Lyra available — sometimes drafting, sometimes reviewing, sometimes asking questions to understand the candidate's thinking.

working sessionreal product

mid-session

A curveball

Once during the session, the spec changes — calmly. Lyra flags it; the brief panel updates with strikethroughs and additions; the candidate acknowledges and continues.

brief updatedreal product

~10 min

The closing defense

A structured conversation: Lyra asks the candidate to walk through their key choices and surfaces anything they missed — constructively, with room to respond.

closing defensereal product

immediate

The report — to the reviewer

Within minutes, the hiring team sees a layered report: the band, the four families, every dimension with its cited evidence. The candidate gets confirmation; no score (in v1).

console · resultreal product

The instrumentation

What we record, and why each piece is there.

Editor events

Every keystroke timing, file change, paste, and accept/reject on Lyra's suggestions. Used to score AI direction.

Lyra transcript

The full conversation between candidate and Lyra. Used to score reasoning and communication.

Test runs

Pass/fail status across all runs, including ones that uncovered planted errors. Used to score execution.

Defense answers

Recorded and scored separately — the trade-off articulation does most of the work for judgment.

Planted-error outcomes

Caught / uncaught state for each planted error. Surfaces only at defense.

Reviewer decision

The human's call is logged with the report and fed back as a calibration signal.