Yamakei.info

Notes on building reliable software with AI in the loop.

Takumi: Evaluating Conviction in the AI Era

Designing a multi-scenario strength evaluation system inspired by leadership principles and built to distinguish human conviction from synthetic fluency.

2026-02-18

Duration: 2 weeksTools: Next.js, TypeScript, Supabase, PostgreSQL, LLM APIs (OpenAI / Gemini), Structured Prompting, Custom Evaluation Engine

Executive Summary

Takumi is a strength-evaluation system for the AI era, originally inspired by a simple frustration:

Traditional coding quizzes and system design interviews are increasingly poor proxies for real-world leadership and engineering strength.

When large language models can generate syntactically correct code and polished system design answers in seconds, evaluating engineers and managers based on whiteboard exercises becomes less meaningful.

Takumi shifts the evaluation axis.

It does not test memorization or syntax fluency.

It evaluates:

Conviction, value stability, and behavioral posture under pressure.

The system draws philosophical inspiration from structured leadership-based evaluation frameworks—particularly those that emphasize principles, ownership, and long-term thinking—while adapting them to an era where AI-assisted reasoning is ubiquitous.


1. Original Motivation

Takumi began as a personal design experiment.

After years of participating in and observing technical hiring loops—coding quizzes, algorithm drills, system design prompts—a pattern became clear:

  • Strong engineers sometimes underperformed in artificial whiteboard environments.
  • Polished communicators could “simulate” good system design answers.
  • Performance often reflected preparation style rather than operational strength.

At the same time, real-world performance evaluation in high-performing organizations often relied on structured leadership principles:

  • Ownership
  • Long-term thinking
  • Bias for action
  • Disagreement and commitment
  • Customer obsession

These frameworks focus less on what someone knows and more on how they behave under constraint.

Takumi attempts to combine these insights:

  • Move beyond syntax testing.
  • Simulate principled decision-making environments.
  • Evaluate posture, not just answers.

2. The Problem in the AI Era

Historically, interviews optimized for:

  • Clear thinking
  • Structured communication
  • Trade-off articulation
  • Technical modeling ability

In 2026, those signals are commoditized.

LLMs can generate:

  • Executive-ready architectural proposals
  • Risk-calibrated responses
  • Balanced stakeholder analyses
  • Clean coding solutions

As a result, interviews increasingly measure:

Fluency, not strength.

The core challenge becomes distinguishing between:

  • Synthetic coherence
  • Human conviction

Takumi was designed to operate in that gap.


3. Core Hypothesis: Processing Is Cheap. Conviction Is Not.

In the AI era:

  • Information processing is abundant
  • Structured reasoning is reproducible
  • Diplomatic neutrality is easily generated

What remains scarce:

  • Defensible bias
  • Value hierarchy stability
  • Non-negotiable constraint articulation
  • Risk ownership under escalation

Takumi evaluates whether someone:

  • Restates constraints when pressure increases
  • Defends principles against authority pushback
  • Maintains value consistency across scenarios
  • Makes trade-offs with visible cost

It treats interviews as pressure simulations, not technical exams.


4. From Leadership Principles to Forte Dimensions

Takumi is philosophically influenced by principle-driven evaluation systems.

However, instead of asking candidates to narrate past stories aligned with leadership principles, Takumi simulates live decision environments and observes behavior directly.

The internal evaluation model—referred to as Fortes—maps observable signals such as:

  • Constraint defense
  • Trade-off decisiveness
  • Ethical boundary clarity
  • Disagreement posture
  • Pressure stability

Rather than asking:

“Tell me about a time you showed ownership.”

Takumi observes:

Do you demonstrate ownership when authority pushes against your decision?

This shift reduces reliance on rehearsed narratives and increases emphasis on behavioral consistency.


5. How Takumi Works

Each session progresses through structured phases:

  1. Framing – How is the problem defined?
  2. Commitment – What position is taken?
  3. Escalation – What changes when stakes increase?
  4. Reflection – How is recalibration handled?

Instead of scoring correctness, Takumi tracks cross-phase signals aligned with Forte dimensions.

The output is a Capability Snapshot, not a pass/fail result.


6. High-Level Architecture

Takumi combines LLM-powered scenario generation with a structured evaluation engine.

The LLM simulates dynamic, escalating situations.
Takumi analyzes behavioral signals.

Separation of Responsibility

  • LLM Layer: Generates realistic scenarios and escalations.
  • Evaluation Layer: Extracts structured behavioral signals.
  • Forte Model: Maps signals into strength dimensions.
  • Cross-Phase Analysis: Detects value shifts under pressure.
  • Output Layer: Produces interpretable strength insights.

LLMs power simulation.
Takumi owns evaluation.


7. LLM Stress Test: Synthetic Fluency vs Conviction

To test robustness, Takumi scenarios were answered using a modern LLM.

The responses were:

  • Coherent
  • Balanced
  • Persuasive
  • Executive-ready

However, evaluation frequently identified:

  • Weak explicit constraint defense
  • Adaptive neutrality during escalation
  • Limited cost-bearing commitment

The LLM optimized for diplomatic completeness.

It did not exhibit principled rigidity.

This distinction is subtle but critical.

An AI can argue any side fluently.
A leader must decide which side to protect.


8. From Intelligence Testing to Strength Mapping

Traditional hiring asks:

“Can this person design the system?”

Takumi asks:

“How does this person behave when the system decision becomes uncomfortable?”

The shift reframes interviews from examinations to simulations.

The output becomes:

  • A strength map
  • A pressure posture profile
  • A role-alignment signal

Different roles require different Forte configurations.

A startup CTO may require:

  • Strong constraint rigidity
  • Decisive trade-offs
  • High disagreement tolerance

A people manager may require:

  • Ethical boundary clarity
  • Reflective recalibration ability
  • Stable judgment under ambiguity

Takumi does not define excellence universally.

It evaluates structural fit.


9. Architectural Philosophy

Takumi is built on several principles:

  • Multi-scenario evaluation over isolated questions
  • Escalation as a signal amplifier
  • Cross-phase diffing for value stability
  • Structured dimensions over subjective impressions
  • Coaching-oriented output instead of ranking

It acknowledges that evaluation cannot be perfectly objective.

But it can be structurally interpretable.


10. What Takumi Represents

Takumi is not simply an interview tool.

It is an attempt to rethink evaluation in a world where:

  • AI writes code
  • AI drafts system designs
  • AI generates polished answers

If fluency is synthetic,
the differentiator becomes:

  • What you defend
  • What you sacrifice
  • What you refuse to compromise

Takumi is designed to surface that signal.


Closing

The AI era does not eliminate the need for human evaluation.

It demands a new lens.

When coding quizzes can be solved with AI assistance,
and system design answers can be rehearsed or generated,

the meaningful signal shifts from output quality to identity stability.

Takumi is an ongoing exploration of how to measure that shift—
and how to build hiring systems aligned with augmented intelligence rather than threatened by it.