Voltar para carreiras

Research and product systems

Evaluation Specialist

This role focuses on the hardest part of AI measurement: deciding what good looks like and making that standard repeatable. You will design human evaluation protocols, build review workflows, and help the team distinguish reliable signals from plausible noise in AI-generated content.

Candidaturas em breve

Resumo da vaga

Own the quality standards for how Chatobserver evaluates AI answers, citations, and visibility signals — and build the human review layer that keeps machine output honest.

Por que esta vaga existe

As prompt volume scales, the gap between raw output and trustworthy insight grows. We need someone who treats evaluation quality as a discipline, not a checkbox.

Primeiros 90 dias

Audit the current evaluation rubrics and identify the top gaps in coverage or consistency.

Por que esta vaga existe

As prompt volume scales, the gap between raw output and trustworthy insight grows. We need someone who treats evaluation quality as a discipline, not a checkbox.

No que você vai trabalhar

  • Design and maintain evaluation rubrics for answer quality, citation accuracy, and positioning signals.
  • Run structured human review workflows to label and audit machine-generated analysis outputs.
  • Identify systematic error patterns in the current evaluation pipeline and propose remediation.
  • Collaborate with research and product to translate evaluation findings into product improvements.

Como é um bom encaixe

  • Deep experience designing annotation guidelines, evaluation rubrics, or quality review workflows.
  • Strong analytical instincts for identifying bias, inconsistency, and labeling noise in structured datasets.
  • Comfort working with LLM outputs and an understanding of where they tend to fail in practice.
  • Clear writing and the ability to articulate why a quality standard is the right one.

O que vai te empolgar aqui

  • Defining what high quality actually means for a product category that lacks established benchmarks.
  • Building evaluation infrastructure that improves the entire product's trustworthiness.
  • Working at the interface between human judgment and automated analysis.

Primeiros 90 dias

  1. 01Audit the current evaluation rubrics and identify the top gaps in coverage or consistency.
  2. 02Design a structured review workflow for at least one core analysis type.
  3. 03Ship a measurable improvement to inter-rater reliability on a key evaluation task.

Processo seletivo

O processo é intencionalmente curto, direto e ancorado no trabalho real.

  1. 1

    Candidatura

    Envie seu histórico, trabalhos relevantes e por que essa vaga faz sentido para você.

  2. 2

    Conversa inicial

    Uma conversa focada no seu trabalho, no seu julgamento e na vaga.

  3. 3

    Aprofundamento específico da função

    Uma conversa ou exercício que se parece mais com o trabalho real do que com um loop genérico.

  4. 4

    Conversa com fundador

    Uma conversa final sobre padrão, ambição e como o sucesso se pareceria aqui.

  5. 5

    Decisão

    Fechamos o loop com clareza e andamos rápido quando existe convicção.

Precisa de contexto antes de se candidatar? [email protected]

Evaluation Specialist

A vaga já está visível no site. As candidaturas liberam assim que a vaga correspondente no Dover estiver ativa.

As candidaturas ficam fechadas até que a vaga correspondente no Dover seja ativada. Enquanto isso, você pode escrever para [email protected].