Skip to main content
Docent Query Language (DQL) is a read-only SQL subset that supports ad-hoc exploration in Docent. Queries can only run over a single collection by design (if you need multi-collection support, please reach out to us!)
For the Python SDK methods that execute DQL queries (execute_dql, dql_result_to_dicts), see the SDK DQL Reference. For schema introspection (get_dql_schema), see DQL Schema. For filtering agent runs (select_agent_run_ids), see Query Agent Runs.

Available Tables and Columns

TableDescription
agent_runsInformation about each agent run in a collection.
transcriptsIndividual transcripts tied to an agent run; stores serialized messages and per-transcript metadata.
transcript_groupsHierarchical groupings of transcripts for runs.
judge_resultsScored rubric outputs keyed by agent run and rubric version.
labelsLabels applied to agent runs.
tagsTags applied to agent runs.
rubric_centroidsClustered rubric score categories.
judge_result_centroidsJunction table for judge result clustering.
resultsResults from result sets.

agent_runs

ColumnDescription
idAgent run identifier (UUID).
collection_idCollection that owns the run
nameOptional user-provided display name.
descriptionOptional description supplied at ingest time.
metadata_jsonUser supplied metadata, stored as JSON.
created_atWhen the run was recorded in Docent.

transcripts

ColumnDescription
idTranscript identifier (UUID).
collection_idCollection that owns the transcript.
agent_run_idParent run identifier; joins back to agent_runs.id.
nameOptional transcript title.
descriptionOptional description.
transcript_group_idOptional grouping identifier.
messagesBinary-encoded JSON payload of message turns.
metadata_jsonBinary-encoded metadata describing the transcript.
dict_keyDictionary key for transcript identification.
created_atTimestamp recorded during ingest.

transcript_groups

ColumnDescription
idTranscript group identifier.
collection_idCollection that owns the group.
agent_run_idParent run identifier; joins back to agent_runs.id.
nameOptional name for the group.
descriptionOptional descriptive text.
parent_transcript_group_idIdentifier of the parent group (for hierarchical groupings).
metadata_jsonJSONB metadata payload for the group.
created_atTimestamp recorded during ingest.

judge_results

ColumnDescription
idJudge result identifier.
agent_run_idRun scored by the rubric.
rubric_idRubric identifier.
rubric_versionVersion of the rubric used when scoring.
outputJSON representation of rubric outputs.
valueDeprecated: use output instead.
result_metadataOptional JSON metadata attached to the result.
result_typeEnum describing the rubric output type.

JSON Metadata Paths

Docent stores user-supplied metadata as JSON, and can be accessed in Postgres style. Here are some examples:
JSON operators work on agent_runs.metadata_json, transcript_groups.metadata_json, judge_results.output, and judge_results.result_metadata (stored as JSONB). The transcripts.metadata_json column is stored as binary and does not support direct JSON operators in queries.

Access Patterns

-- Filter agent runs by a metadata attribute
SELECT id, name
FROM agent_runs
WHERE metadata_json->>'environment' = 'staging';
-- Cast numeric metadata for aggregation
SELECT
  AVG(CAST(metadata_json->>'latency_ms' AS DOUBLE PRECISION)) AS avg_latency_ms
FROM agent_runs
WHERE metadata_json ? 'latency_ms';
When querying JSON fields, comparisons default to string semantics. Cast values when you need numeric ordering or aggregation.

Allowed Syntax

DQL supported keywords:
Feature
SELECT, DISTINCT, FROM, WHERE, subqueries
JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN, CROSS JOIN
WITH (CTEs)
UNION [ALL], INTERSECT, EXCEPT
GROUP BY, HAVING
Aggregations (COUNT, AVG, MIN, MAX, SUM, STDDEV_POP, STDDEV_SAMP, ARRAY_AGG, STRING_AGG, JSON_AGG, JSONB_AGG, JSON_OBJECT_AGG, PERCENTILE_CONT, PERCENTILE_DISC (WITHIN GROUP))
Window functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE, PERCENT_RANK, CUME_DIST)
ORDER BY, LIMIT, OFFSET
Conditional & null helpers (CASE, COALESCE, NULLIF)
Boolean logic (AND, OR, NOT)
Comparison operators (=, !=, <, <=, >, >=, IS, IS NOT, IS DISTINCT FROM, IN, BETWEEN, LIKE, ILIKE, EXISTS, SIMILAR TO, ~, ~*, !~, !~*)
Arithmetic & math (+, -, *, /, %, POWER, ABS, SIGN, SQRT, LN, LOG, EXP, GREATEST, LEAST, FLOOR, CEIL, ROUND, RANDOM)
String helpers (SUBSTRING, LEFT, RIGHT, LENGTH, UPPER, LOWER, INITCAP, TRIM, REPLACE, SPLIT_PART, POSITION, CONCAT, CONCAT_WS, STRING_AGG)
JSON operators & functions (->, ->>, #>, #>>, @>, ?, `?, ?&, json_agg, jsonb_agg, json_object_agg`)
Date/time basics (CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, NOW(), EXTRACT, DATE_TRUNC, AT TIME ZONE)
Interval arithmetic (timestamp +/- INTERVAL, INTERVAL literals)
Construction & conversion (MAKE_DATE, MAKE_TIME, MAKE_TIMESTAMP, TO_CHAR)
Array helpers (array_cat, unnest)
Type helpers (CAST, ::)
Unsupported constructs include wildcard * in SELECT clauses (e.g., SELECT *, COUNT(*)), user-defined functions, and any DDL or DML commands.

Common Patterns

Quick-reference snippets for frequent tasks.

Filter by Metadata Field

SELECT id, name FROM agent_runs
WHERE metadata_json->>'environment' = 'prod'

Filter by Date Range

SELECT id, name, created_at FROM agent_runs
WHERE created_at >= NOW() - INTERVAL '7 days'
ORDER BY created_at DESC

Count by Category

SELECT
  metadata_json->>'model' AS model,
  COUNT() AS run_count
FROM agent_runs
GROUP BY metadata_json->>'model'
ORDER BY run_count DESC

Check if Metadata Field Exists

SELECT id, name FROM agent_runs
WHERE metadata_json ? 'custom_field'

Numeric Comparison on JSON Field

SELECT id, name FROM agent_runs
WHERE CAST(metadata_json->>'score' AS DOUBLE PRECISION) > 0.8

Get Transcripts with Their Agent Runs

SELECT
  ar.name AS run_name,
  t.name AS transcript_name
FROM agent_runs ar
JOIN transcripts t ON t.agent_run_id = ar.id

Examples

Basic Queries

Recent Runs

SELECT
  id,
  name,
  metadata_json->'model'->>'name' AS model_name,
  created_at
FROM agent_runs
WHERE metadata_json->>'status' = 'completed'
ORDER BY created_at DESC
LIMIT 10;

Aggregations

Completion Rate by Environment

Aggregates per-environment success rates by normalizing metadata into a CTE.
WITH normalized_runs AS (
  SELECT
    metadata_json->>'environment' AS environment,
    metadata_json->>'status' AS status
  FROM agent_runs
  WHERE metadata_json ? 'environment'
)
SELECT
  environment,
  COUNT() AS total_runs,
  SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) AS completed_runs,
  CAST(SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) AS DOUBLE PRECISION)
    / NULLIF(COUNT(), 0) AS completion_rate
FROM normalized_runs
GROUP BY environment
ORDER BY total_runs DESC;

Joins and CTEs

Transcript Counts per Group

SELECT
  tg.id AS group_id,
  tg.name AS group_name,
  COUNT(t.id) AS transcript_count
FROM transcript_groups tg
JOIN transcripts t ON t.transcript_group_id = tg.id
GROUP BY tg.id, tg.name
HAVING COUNT(t.id) > 1
ORDER BY transcript_count DESC;

Transcript Coverage Audit

Finds transcript groups that are marked as must_have but have no associated transcripts.
SELECT
  tg.id AS group_id,
  tg.name AS group_name,
  COUNT(t.id) AS transcript_count
FROM transcript_groups tg
LEFT JOIN transcripts t
  ON t.transcript_group_id = tg.id
  AND t.collection_id = tg.collection_id
WHERE tg.metadata_json->>'priority' = 'must_have'
GROUP BY tg.id, tg.name
HAVING COUNT(t.id) = 0
ORDER BY group_name;

Advanced Patterns

Flagged Judge Results

SELECT
  jr.agent_run_id,
  jr.rubric_id,
  jr.result_metadata->>'label' AS label,
  jr.output->>'score' AS score
FROM judge_results jr
WHERE jr.result_metadata->>'severity' = 'high'
  AND EXISTS (
    SELECT 1
    FROM agent_runs ar
    WHERE ar.id = jr.agent_run_id
      AND ar.metadata_json->>'environment' = 'prod'
  )
ORDER BY score DESC
LIMIT 25;

Latest Rubric Scores by Model

Pulls the most recent rubric result per run, then joins runs to surface the model responsible for the score.
WITH latest_scores AS (
  SELECT
    agent_run_id,
    MAX(rubric_version) AS rubric_version
  FROM judge_results
  WHERE rubric_id = 'helpful_response_v1'
  GROUP BY agent_run_id
)
SELECT
  ar.id,
  ar.metadata_json->'model'->>'name' AS model_name,
  jr.output->>'score' AS score,
  jr.result_metadata->>'label' AS label
FROM latest_scores ls
JOIN judge_results jr
  ON jr.agent_run_id = ls.agent_run_id
  AND jr.rubric_version = ls.rubric_version
  AND jr.rubric_id = 'helpful_response_v1'
JOIN agent_runs ar ON ar.id = jr.agent_run_id
WHERE ar.metadata_json->>'environment' = 'prod'
ORDER BY CAST(jr.output->>'score' AS DOUBLE PRECISION) DESC
LIMIT 15;

Troubleshooting

”column X does not exist”

  • DQL requires explicit column selection. Wildcards (*) are not supported.
  • Check the schema using client.get_dql_schema(collection_id) to see available columns.

Numeric comparisons not working as expected

JSON fields are strings by default. Cast them for numeric operations:
-- Wrong: string comparison
WHERE metadata_json->>'score' > '0.5'

-- Correct: numeric comparison
WHERE CAST(metadata_json->>'score' AS DOUBLE PRECISION) > 0.5

Query returns no results but data exists

  • Check that you’re querying the correct collection
  • Verify metadata field names are exact matches (case-sensitive)
  • Use ? operator to check if a field exists before filtering on it

Results truncated unexpectedly

DQL caps results at 10,000 rows. Use LIMIT and OFFSET for pagination:
-- First page
SELECT id, name FROM agent_runs LIMIT 1000 OFFSET 0
-- Second page
SELECT id, name FROM agent_runs LIMIT 1000 OFFSET 1000

“syntax error” on valid-looking SQL

Some SQL features aren’t supported in DQL:
  • No * wildcard in SELECT
  • No INSERT, UPDATE, DELETE
  • No user-defined functions

Restrictions and Best Practices

  • Read-only: Only SELECT-style queries are permitted. Use bulk exports or ingestion utilities to modify data outside of DQL.
  • Single statement: Batches or multiple statements are rejected to avoid mixed workloads.
  • Explicit projection: Wildcard projections (*) are disallowed. List the columns you need so downstream tooling (schema builders, type generation) stays predictable.
  • Collection scoping: A single query can only access data within a single collection.
  • Limit enforcement: Every query is capped at 10,000 rows by the server. If you omit LIMIT or request more, Docent automatically applies the cap—use pagination (OFFSET/LIMIT) or offline exports for larger result sets.
  • JSON performance: Metadata fields are stored as JSON; heavy traversal across large collections can be slower than filtering on indexed scalar columns. Prefer top-level fields when available.
  • Type awareness: JSON metadata paths are exposed in the schema with a generic json type. Since the underlying values may be strings, numbers, or other JSON types, cast values explicitly when precision matters (e.g., CAST(metadata_json->>'duration_ms' AS BIGINT)).