Research and product systems
Evaluation Specialist
This role focuses on the hardest part of AI measurement: deciding what good looks like and making that standard repeatable. You will design human evaluation protocols, build review workflows, and help the team distinguish reliable signals from plausible noise in AI-generated content.
Rolleoppsummering
Own the quality standards for how Chatobserver evaluates AI answers, citations, and visibility signals — and build the human review layer that keeps machine output honest.
Hvorfor rollen finnes
As prompt volume scales, the gap between raw output and trustworthy insight grows. We need someone who treats evaluation quality as a discipline, not a checkbox.
Første 90 dager
Audit the current evaluation rubrics and identify the top gaps in coverage or consistency.
Hvorfor rollen finnes
As prompt volume scales, the gap between raw output and trustworthy insight grows. We need someone who treats evaluation quality as a discipline, not a checkbox.
Dette vil du jobbe med
- Design and maintain evaluation rubrics for answer quality, citation accuracy, and positioning signals.
- Run structured human review workflows to label and audit machine-generated analysis outputs.
- Identify systematic error patterns in the current evaluation pipeline and propose remediation.
- Collaborate with research and product to translate evaluation findings into product improvements.
Hvordan en sterk match ser ut
- Deep experience designing annotation guidelines, evaluation rubrics, or quality review workflows.
- Strong analytical instincts for identifying bias, inconsistency, and labeling noise in structured datasets.
- Comfort working with LLM outputs and an understanding of where they tend to fail in practice.
- Clear writing and the ability to articulate why a quality standard is the right one.
Hva som vil motivere deg her
- Defining what high quality actually means for a product category that lacks established benchmarks.
- Building evaluation infrastructure that improves the entire product's trustworthiness.
- Working at the interface between human judgment and automated analysis.
Første 90 dager
- 01Audit the current evaluation rubrics and identify the top gaps in coverage or consistency.
- 02Design a structured review workflow for at least one core analysis type.
- 03Ship a measurable improvement to inter-rater reliability on a key evaluation task.
Ansettelsesprosess
Prosessen er bevisst kort, direkte og forankret i det faktiske arbeidet.
- 1
Søknad
Send oss bakgrunnen din, relevant arbeid og hvorfor denne rollen gir mening for deg.
- 2
Innledende samtale
En fokusert samtale om arbeidet ditt, vurderingsevnen din og selve rollen.
- 3
Rolle-spesifikk fordypning
En samtale eller oppgave som ligner mer på det faktiske arbeidet enn en generisk intervjurunde.
- 4
Samtale med founder
En siste samtale om standarder, ambisjon og hvordan suksess vil se ut her.
- 5
Beslutning
Vi avslutter tydelig og beveger oss raskt når overbevisningen er der.
Trenger du mer kontekst før du søker? [email protected]
Evaluation Specialist
Rollen er synlig på siden nå. Søknader åpnes så snart den matchende Dover-stillingen er aktiv.
Søknader forblir stengt til den matchende Dover-stillingen er aktiv. I mellomtiden kan du skrive til [email protected].