Concepts

Provenance Types

The six provenance types describe how an agent's code relates to its paper -- from the author's own code to AI-generated reimplementations.

Every CIFR agent declares a provenance_type in its cifr.yml that describes how the code came into existence. This is not a quality judgment -- it is a factual statement about the relationship between the code and the paper it implements. Provenance type affects which trust tier thresholds apply and helps consumers of an agent understand what they are invoking.

The six types

author_original

The paper's authors wrote this code. It is the canonical implementation of the methodology described in the paper.

agent:
  name: resiliency-pds
  provenance_type: author_original
  paper:
    title: "Defining and Enabling Resiliency of Electric Distribution Systems..."
    doi: 10.1109/TSG.2016.2561303
    authors:
      - name: Sayonsom Chanda

When to use: You wrote the paper and you wrote this code. This is the most common type. If you are the first author (or a coauthor) and this is the code that produced the results in your paper, use author_original.

Trust implications: Standard trust thresholds. Benchmarks are encouraged but not mandatory for Silver.

community_reimplementation

Someone other than the original authors implemented the paper's methodology. The code follows the paper's described method but was written independently.

agent:
  name: chanda-resiliency-reimpl
  provenance_type: community_reimplementation
  paper:
    title: "Defining and Enabling Resiliency of Electric Distribution Systems..."
    doi: 10.1109/TSG.2016.2561303
    authors:
      - name: Sayonsom Chanda
      - name: Anurag K. Srivastava

When to use: You read the paper and implemented the method yourself. You are not one of the paper's authors. This is common in machine learning (reproductions of published models) and in engineering (implementing a published algorithm for a different platform or language).

Trust implications: Standard trust thresholds. Benchmarks verify that the reimplementation matches the original results.

ai_reimplementation

The code was generated by an AI system (a large language model, a code generation tool, etc.) from the paper's methodology. A human may have guided the process, but the implementation logic came from AI.

agent:
  name: chanda-resiliency-ai
  provenance_type: ai_reimplementation
  paper:
    title: "Defining and Enabling Resiliency of Electric Distribution Systems..."
    doi: 10.1109/TSG.2016.2561303

When to use: You fed the paper (or its methodology section) to an AI and the AI generated the code. Even if you made manual corrections afterward, the core implementation originated from AI.

Trust implications: Stricter requirements. AI reimplementations cannot reach Silver without at least one declared and verified benchmark. The reasoning: without benchmark verification, there is no evidence that the AI correctly interpreted the paper's methodology. An AI can produce code that runs without error but computes the wrong thing. Benchmarks are the only automated check that the output is actually correct.

data_wrapper

The agent does not implement a computational method. It wraps a dataset -- experimental measurements, survey responses, simulation outputs -- with a query interface so other agents can consume the data programmatically.

agent:
  name: redd-house-measurements
  provenance_type: data_wrapper
  paper:
    title: "REDD: A Public Data Set for Energy Disaggregation Research"
    doi: 10.1007/978-3-642-25999-5_12

When to use: Your agent's primary value is providing access to data, not performing computation. The invoke command queries or filters the dataset based on input parameters and returns a subset.

Trust implications: Standard trust thresholds. Benchmarks (if declared) verify that the data access layer returns correct results. Data wrappers are often composed into computational agents via depends_on.

reference_implementation

A canonical implementation of a well-known algorithm that is not tied to any specific paper. The algorithm IS the textbook -- FFT, Kalman filter, Newton-Raphson, Dijkstra's algorithm, etc.

agent:
  name: newton-raphson-power-flow
  provenance_type: reference_implementation
  description: Newton-Raphson power flow solver for radial distribution networks.
  invoke: python power_flow.py

When to use: You are implementing a standard algorithm that has no single paper to cite. The paper: block is not required (and therefore neither is an RAI). This type is for utility agents that serve as building blocks.

Trust implications: Standard trust thresholds. Benchmarks verify correctness against known test cases.

original_unpublished

Novel methodology that has not been published yet. The researcher is using CIFR as a pre-publication testbed -- running experiments, collecting benchmark results, and building trust before submitting to a journal.

agent:
  name: novel-disaggregation
  provenance_type: original_unpublished
  description: A new approach to energy disaggregation using transformer attention on low-frequency smart meter data.
  invoke: python disaggregate.py

When to use: You have written new methodology that you plan to publish, but the paper does not exist yet. Once the paper is published, update the agent with a paper: block, an RAI, and change the provenance type to author_original.

Trust implications: Standard trust thresholds. Benchmarks are strongly encouraged -- they build the evidence base you will need for the paper.

Decision tree: which type should I use?

Start here:

Did you write the paper?
- Yes: Is this the code that produced the paper's results?
  - Yes --> author_original
  - No, I rewrote it for CIFR --> still author_original (you are the author)
- No: proceed to question 2.
Is this based on someone else's paper?
- Yes: Did you write the code yourself?
  - Yes --> community_reimplementation
  - No, AI generated it --> ai_reimplementation
- No: proceed to question 3.
Is this a dataset, not a computation?
- Yes --> data_wrapper
- No: proceed to question 4.
Is this a standard algorithm (FFT, Kalman, etc.) with no single paper?
- Yes --> reference_implementation
- No: proceed to question 5.
Is this new methodology you have not published yet?
- Yes --> original_unpublished

Summary table

Type	Paper required?	RAI required?	Benchmarks for Silver	Typical use case
`author_original`	Usually yes	If paper is set	Encouraged	Your own paper's code
`community_reimplementation`	Yes	If paper is set	Encouraged	Reproducing someone's published method
`ai_reimplementation`	Yes	If paper is set	Mandatory	LLM-generated implementation of a paper
`data_wrapper`	Usually yes	If paper is set	If declared	Queryable dataset from a publication
`reference_implementation`	No	No	Encouraged	Standard algorithms (FFT, Newton-Raphson)
`original_unpublished`	No	No	Strongly encouraged	Pre-publication research

Changing provenance type

Provenance type is part of the immutable contract for a given version. To change it, bump the version number and re-submit. This is intentional: the provenance type is a factual claim about a specific version of the code, and it should not change retroactively.

The most common transition is original_unpublished to author_original -- which happens when you publish your paper and update the agent with publication metadata and an RAI.