Getting Started

cifr.yml Contract Reference

Complete reference for every field in the agent contract file.

The cifr.yml file is the contract between your code and CIFR. It lives at the root of your repository and tells CIFR what your agent does, what it expects as input, and what it produces. Without it, your submission is a one-shot experiment. With it, your code becomes a registered, callable, citable agent.

Top-level structure

Every cifr.yml starts with a single agent: key:

agent:
  name: ...
  version: ...
  description: ...
  # ... everything else goes here

Extra keys at any level are rejected. Typos fail loudly rather than being silently ignored.

Field reference

Core fields

Field	Required	Type	Description
`name`	yes	string	Kebab-case identifier, 3-80 characters. Lowercase letters, digits, and hyphens. Must start and end with a letter or digit.
`version`	yes	string	Semver `MAJOR.MINOR.PATCH` (e.g. `1.0.0`). Immutable once registered -- bump to publish an update.
`description`	yes	string	One or two sentences explaining what the agent does. Up to 2000 characters.
`invoke`	conditional	string	The shell command CIFR runs inside the container. Required for single-function agents. Mutually exclusive with `functions:`.
`inputs`	no	list	Declared input fields. Each has `name`, `format`, and optional `description`.
`outputs`	conditional	list	Declared output fields. At least one required for single-function agents. Same shape as inputs.

Identity and provenance

Field	Required	Type	Description
`rai`	conditional	string	Research Agent Identifier. Format: `RAI-YYYY-author-slug`. Required when `paper:` is present. See RAI docs.
`provenance_type`	no	string	How the agent's code relates to the paper. Defaults to `author_original`. See Provenance Types for all six types.

Paper metadata

The paper: block attaches publication metadata to your agent. When present, rai: becomes required.

Field	Required	Type	Description
`paper.title`	yes	string	Full title of the publication.
`paper.doi`	no	string	DOI starting with `10.` (e.g. `10.1109/TSG.2016.2561303`).
`paper.year`	no	integer	Publication year (1900-2100).
`paper.venue`	no	string	Journal, conference, or preprint server name.
`paper.abstract`	no	string	Paper abstract, up to 8000 characters.
`paper.keywords`	no	list of strings	Up to 32 keywords, each 1-100 characters.
`paper.authors`	no	list	Author records (see below).
`paper.preprint_url`	no	string	Link to a preprint (arXiv, SSRN, etc.).
`paper.related_rais`	no	list of strings	RAIs of related agents.
`paper.bibtex_key`	no	string	Preferred BibTeX citation key.

Each author in paper.authors has:

Field	Required	Description
`name`	yes	Full name.
`orcid`	no	ORCID in `0000-0000-0000-000X` format.
`affiliation`	no	Institution name.
`email`	no	Contact email.

Input and output fields

Each entry in inputs: or outputs: has:

Field	Required	Description
`name`	yes	Snake_case identifier. Lowercase letters, digits, underscores. Must start with a letter.
`format`	yes	MIME type (e.g. `application/json`, `text/csv`, `image/png`).
`description`	no	Human-readable explanation of the field.
`from_agent`	no	Composition binding (inputs only). See Composition.

Composition

Field	Required	Type	Description
`depends_on`	no	list of strings	RAIs of upstream agents this agent calls at runtime.

An input field's from_agent: binding tells CIFR to fill that input by calling an upstream agent instead of expecting it from the user:

from_agent:
  rai: RAI-2016-chanda-resiliency-pds
  output: result          # which upstream output to read
  version: 1.0.0          # optional: pin to a specific version
  inputs_from:            # map upstream inputs to your user-supplied fields
    topology: topology_a

Every RAI referenced in a from_agent: binding must appear in depends_on:.

Multi-function agents

Use functions: instead of top-level invoke/inputs/outputs when one agent exposes multiple operations:

Field	Required	Description
`functions[].name`	yes	Kebab-case function name, unique within the agent.
`functions[].description`	yes	What this function does.
`functions[].invoke`	yes	Shell command for this function.
`functions[].inputs`	no	Input fields for this function.
`functions[].outputs`	yes	Output fields (at least one).

You cannot mix top-level invoke/outputs with a functions: block. Choose one style or the other.

Benchmarks

Declare performance claims that CIFR will verify automatically:

Field	Required	Description
`benchmarks[].dataset`	yes	Identifier of the benchmark dataset (e.g. `redd-house-1`, `imagenet-val-2012`).
`benchmarks[].metric`	yes	Evaluation metric name, lowercase with underscores (e.g. `f1_score`, `accuracy`, `rmse`).
`benchmarks[].value`	yes	The claimed metric value (e.g. `0.973`).
`benchmarks[].description`	no	Human-readable description of the benchmark.

Complete examples

Example 1: Simple single-function agent

A wavelet-based event detector for power system waveforms. No paper, no RAI -- just a utility agent.

agent:
  name: wavelet-event-detector
  version: 1.0.0
  description: Detect transient events in power system waveforms using discrete wavelet transform decomposition.
  provenance_type: original_unpublished
  invoke: python detect.py
  inputs:
    - name: waveform
      format: application/json
      description: Time-series array of voltage or current samples at a fixed sampling rate.
    - name: config
      format: application/json
      description: Detection parameters (wavelet family, threshold, minimum event duration).
  outputs:
    - name: events
      format: application/json
      description: Array of detected events with start time, end time, magnitude, and classification.
  benchmarks:
    - dataset: redd-house-1
      metric: f1_score
      value: 0.973
      description: Event detection F1 on REDD House 1 dataset.

Example 2: Paper-backed agent with full metadata

The resiliency index from Chanda 2016, with a complete publication record and an RAI.

agent:
  name: resiliency-pds
  rai: RAI-2016-chanda-resiliency-pds
  version: 1.0.0
  description: Topological resiliency index for power distribution systems with multiple microgrids using analytic hierarchy process weights.
  provenance_type: author_original
  paper:
    title: Defining and Enabling Resiliency of Electric Distribution Systems with Multiple Microgrids
    doi: 10.1109/TSG.2016.2561303
    year: 2016
    venue: IEEE Transactions on Smart Grid
    abstract: This paper proposes a comprehensive resiliency metric for distribution systems...
    authors:
      - name: Sayonsom Chanda
        orcid: 0000-0003-4178-9482
      - name: Anurag K. Srivastava
    keywords:
      - resilience
      - microgrid
      - distribution network
      - analytic hierarchy process
  invoke: python -m resiliency
  inputs:
    - name: topology
      format: application/json
      description: Network topology as a node-edge adjacency structure with component attributes.
  outputs:
    - name: result
      format: application/json
      description: Resiliency index score and per-component breakdown.
  benchmarks:
    - dataset: ieee-33bus
      metric: r_index
      value: 0.847
      description: Resiliency index on the IEEE 33-bus test system.

Example 3: Data wrapper agent

Experimental measurement data exposed as a queryable agent. Researchers can invoke it to get specific subsets of measurements instead of downloading the entire dataset.

agent:
  name: redd-house-measurements
  rai: RAI-2011-kolter-redd-house1
  version: 1.0.0
  description: Query interface for REDD House 1 energy disaggregation measurements. Returns time-windowed appliance-level and aggregate power readings.
  provenance_type: data_wrapper
  paper:
    title: "REDD: A Public Data Set for Energy Disaggregation Research"
    doi: 10.1007/978-3-642-25999-5_12
    year: 2011
    venue: Workshop on Data Mining Applications in Sustainability
    authors:
      - name: J. Zico Kolter
      - name: Matthew J. Johnson
    keywords:
      - energy disaggregation
      - NILM
      - smart meter
  invoke: python query.py
  inputs:
    - name: query
      format: application/json
      description: "Query parameters: start_time, end_time, appliances (list), resolution."
  outputs:
    - name: measurements
      format: application/json
      description: Time-series power readings for the requested appliances and time window.

How invocations work

When someone calls your agent via POST /api/agents/{id}/invoke_json, CIFR:

Validates the request body against your declared inputs.
Writes each input to /inputs/{field_name}{ext} inside a fresh container.
Starts the container from your pinned image digest with --network none, the configured memory limit, and the configured timeout.
Runs your invoke command.
Captures everything written to /outputs/.
Computes a provenance hash over (image_digest, inputs_sha256, outputs_sha256) and returns it.

Your code reads from /inputs/ and writes to /outputs/. That is the entire contract. The provenance hash makes every invocation independently verifiable.

Versioning rules

Versions are immutable. Submitting the same (name, version) twice produces a clear error. Bump the version number to publish an update:

Patch (1.0.0 to 1.0.1) -- bug fixes that do not change the input/output schema.
Minor (1.0.0 to 1.1.0) -- new optional inputs, new outputs, additive features.
Major (1.0.0 to 2.0.0) -- breaking changes to inputs, outputs, or semantics.

External callers typically pin to a specific minor version in their depends_on so patches flow through automatically while major changes require an explicit upgrade.