1. Overview

FORGE (Fixpoint Output Repair for Generative Engines) includes an optional telemetry system that reports anonymized structural metadata about fixpoint repair failures. This metadata helps FORGE maintainers understand failure patterns across the user base and make targeted improvements to the framework's convergence algorithms, lane ordering heuristics, and default configuration values.

Telemetry upload is enabled by default but can be disabled at any time with a single environment variable or configuration key. See Section 8 for complete opt-out instructions.

Core principle

FORGE telemetry is built on a single non-negotiable principle: no content is ever transmitted. Every field in the telemetry record is derived from the structure and behavior of the repair system — never from the actual content being processed. This guarantee is enforced at the code level: the telemetry subsystem has no access to content data.

2. What is Collected — Field by Field

When a FORGE fixpoint repair loop fails, a FailureRecord is constructed and optionally transmitted to https://telemetry.fixpointforge.report/v1/ingest via HTTPS POST. The record contains exactly the following fields. This schema is enforced at the code level; additional fields cannot be added without a code change visible in the commit history.

FailureRecord — complete schema
class FailureRecord:
    record_id:             UUID        # random UUID, no persistent identity
    timestamp:             int         # epoch seconds, rounded to hour
    forge_version:         str         # semver string, e.g. "0.4.1"
    deployment_id:         str         # SHA-256 hex of hostname + PID
    content_type:          str         # enum: diff|json|text|code|structured
    content_length_bucket: str         # enum: tiny|small|medium|large
    structural_shape:      dict        # abstract schema, no actual values
    failure_class:         str         # enum: oscillation|budget_exceeded|...
    lanes_executed:        list[str]   # lane names that were invoked
    lanes_failed:          list[str]   # lane names that returned errors
    lane_state_transitions: list[str]  # e.g. ["pending→running", "running→failed"]
    iteration_count:       int         # total iterations before termination
    oscillation_detected:  bool        # cyclic output pattern detected?
    fingerprint:           str         # SimHash hex of structural features

Field Descriptions

Field Type Description
record_id UUID A randomly generated UUID created at report time. Not linked to any persistent identity, session, or user. A new UUID is generated for every single record.
timestamp epoch int Wall-clock time of the failure, rounded to the nearest hour boundary. The original seconds and minutes are discarded to prevent cross-record timing correlation attacks.
forge_version string Semantic version of the FORGE library (e.g., "0.4.1"). Used to correlate failure patterns with specific releases and detect version-specific regressions.
deployment_id hex string An opaque, non-reversible identifier derived by hashing SHA-256(hostname + ":" + str(os.getpid())). Groups records from the same deployment instance without revealing the hostname. PID inclusion means restarts produce different IDs. See Section 4.
content_type enum The category of content being processed. Possible values: "diff", "json", "text", "code", "structured". No content itself is included — only the type label.
content_length_bucket enum A coarse size category replacing the exact byte count. "tiny" (<512 B), "small" (512 B–8 KB), "medium" (8 KB–64 KB), "large" (>64 KB). Exact lengths are never recorded.
structural_shape dict An abstract structural description of the content — nesting depth, number of top-level keys, array lengths, line-count ranges. This is a schema-level description with no actual values. For example: {"type": "json", "depth": 3, "keys": 12}.
failure_class enum The category of failure that terminated the loop. Values: "oscillation", "budget_exceeded", "max_iterations", "lane_error", "parse_error".
lanes_executed string[] Names of repair lanes invoked during the failed fixpoint attempt. These are lane identifiers from the FORGE configuration (e.g., "structural_validator", "schema_enforcer").
lanes_failed string[] Subset of lanes_executed — the lane names that returned a failure or error state during the repair loop.
lane_state_transitions string[] Sequence of state transitions across all lanes (e.g., ["pending→running", "running→failed", "failed→retry"]). No lane output or content is included — only the state machine trace.
iteration_count integer Total number of fixpoint iterations attempted before the loop was terminated. Used to calibrate default iteration budgets.
oscillation_detected boolean Whether the loop detected a cyclic output pattern — i.e., the repair converged to a repeating cycle rather than a stable fixpoint state.
fingerprint hex string A SimHash structural fingerprint of the content's abstract features. Locality-sensitive hash that captures structural similarity without encoding any content. Used for clustering similar failures. See Section 4.

3. What is NOT Collected

The following categories of data are never collected, transmitted, or stored by the FORGE telemetry system under any circumstances. This is enforced by code architecture, not policy — the telemetry subsystem does not have access to content data.

  • Raw content of any kind — no diffs, patches, JSON documents, text bodies, or code snippets. The telemetry module receives only pre-computed structural metadata.
  • LLM prompts — no system prompts, user messages, conversation history, or prompt templates
  • LLM model outputs — no completions, responses, generated text, or intermediate repair outputs
  • File paths or filenames — no paths from the local filesystem, repository, or working directory
  • File contents — no source code, configuration files, data files, or any file bytes
  • IP addresses — the Cloudflare Worker receiving telemetry does not log or store source IP addresses in any form
  • User account information — no usernames, email addresses, GitHub handles, or SSO identities
  • API keys or credentials — no LLM API keys, tokens, secrets, passwords, or auth headers of any kind
  • Personal information — no names, contact details, biometric data, or any data attributable to a natural person
  • Environment variables — no env var names or values (other than the telemetry opt-out flag status)
  • Repository names or URLs — no git remote URLs, repository identifiers, branch names, or commit hashes
  • Organization identifiers — no company names, team names, project names, or workspace identifiers
  • Exact content lengths — byte counts are replaced with coarse buckets before the record is constructed
  • Exact timestamps — times are rounded to the nearest hour before transmission
  • Hostname or process details — hashed into an opaque deployment_id via SHA-256 and not recoverable

4. How Data is Anonymized

Each field in the FailureRecord is designed to be structurally useful for failure analysis while being privacy-preserving by construction. The key anonymization techniques are detailed below.

SimHash Structural Fingerprinting

The fingerprint field is computed using SimHash — a locality-sensitive hashing algorithm. SimHash produces a fixed-size hash where similar inputs yield similar hashes, enabling efficient clustering of structurally similar failures.

SimHash is applied to abstract structural features of the content — token types, structural delimiters, nesting patterns, key counts — not to the content itself. The hash is a one-way function: given only the fingerprint, it is computationally infeasible to reconstruct any portion of the original content or even its structural features.

Content Length Bucketing

Exact byte counts are never transmitted. Instead, the content length is mapped to one of four coarse categories before the record is constructed:

BucketRangePurpose
tinyLess than 512 bytesDistinguish trivial inputs from real workloads
small512 bytes – 8 KBTypical small document repairs
medium8 KB – 64 KBStandard document repairs
largeGreater than 64 KBLarge document or multi-file repairs

This bucketing prevents content-length-based fingerprinting while preserving enough information to analyze whether failure patterns correlate with content size.

Opaque Deployment Identifier

The deployment_id is computed as:

SHA-256(hostname + ":" + str(os.getpid()))

This produces a deterministic but opaque identifier for a given deployment instance. It groups multiple failure records from the same process without revealing the hostname or process details. SHA-256 is a cryptographic one-way function — the original hostname and PID cannot be recovered from the hash.

Crucially, the PID is included so that deployments restarted on the same host produce different deployment_id values, further reducing the value of any individual hash for re-identification purposes.

Timestamp Rounding

Timestamps are rounded down to the nearest hour boundary. For example, 1742565429 (representing 2026-03-21T14:37:09Z) becomes 1742563200 (representing 2026-03-21T14:00:00Z). This prevents cross-record timing correlation while preserving approximate temporal context for trend analysis across days and weeks.

5. Storage & Encryption

Individual FailureRecord entries are stored in a private Cloudflare R2 bucket. R2 provides encryption at rest for all stored objects using AES-256. The bucket is not publicly accessible — there is no public URL or listing endpoint.

Records are transmitted from FORGE deployments to the Cloudflare Worker endpoint at https://telemetry.fixpointforge.report/v1/ingest over HTTPS (TLS 1.2+), providing encryption in transit. The Worker validates the record schema before writing to R2 — malformed records are rejected and not stored.

6. Who Has Access

Access to raw telemetry records in R2 storage is restricted to FORGE project maintainers only. Access is controlled via Cloudflare account credentials and is not shared with any third party, partner, or contractor.

The Cloudflare Worker endpoint at telemetry.fixpointforge.report is write-only from the perspective of FORGE deployments — records can be submitted but not queried, listed, or retrieved via the public API.

Aggregated cluster reports (which contain only statistical summaries and no individual records) will be published in the FORGE repository under forge/telemetry/reports/ for community review when the project reaches public release.

7. Data Retention

Individual FailureRecord entries are stored for a maximum of 90 days. At the end of the retention period:

  1. Individual records are processed by the FORGE failure analyzer, which groups them into cluster summaries based on structural similarity (using the fingerprint field and SimHash distance metrics).
  2. Cluster summaries are written as aggregate reports — counts, distributions, representative structural patterns, and statistical summaries — with no individual record data preserved.
  3. All individual FailureRecord entries from the batch are permanently deleted from R2 storage.

Aggregate cluster reports are retained indefinitely for long-term trend analysis. These aggregates contain only statistical summaries and representative structural patterns — never individual records, deployment identifiers, or any data that could be attributed to a specific deployment or user.

8. How to Opt Out

Telemetry upload is enabled by default. You can disable it at any time using either of the following methods. Both take effect immediately — no process restart is required in most configurations.

Option 1 — Environment Variable

Set the following environment variable in your shell profile, CI environment, or container configuration:

export FORGE_TELEMETRY_UPLOAD_ENABLED=false

This takes precedence over all configuration file settings.

Option 2 — Configuration File

Add or update the following key in your forge_config.yaml:

telemetry_upload_enabled: false

The configuration file is typically located at ./forge_config.yaml in the project root or ~/.config/forge/forge_config.yaml for user-level settings.

Verifying opt-out

To confirm telemetry is disabled, run FORGE with verbose logging enabled (FORGE_LOG_LEVEL=debug). A log line reading [telemetry] upload disabled will appear at startup when the feature is turned off. If you do not see this line, telemetry upload is still active.

What happens when you opt out

When telemetry upload is disabled, FORGE will not construct FailureRecord objects, will not make any network requests to telemetry.fixpointforge.report, and will not write any telemetry data to disk. The opt-out is complete — there is no local caching or deferred upload.

FORGE telemetry is collected under the legal basis of legitimate interest (Article 6(1)(f) GDPR, where applicable) — specifically, the legitimate interest of improving the quality, reliability, and correctness of open-source software for all users.

GDPR Analysis

No personal data is processed. The FailureRecord schema is designed such that none of its fields constitute personal data under GDPR Article 4(1). The telemetry system does not process:

  • Any data that identifies or could identify a natural person, directly or indirectly
  • Any data subject to special category protections (Article 9)
  • Any data relating to criminal convictions or offences (Article 10)
  • Any data from children (Article 8)

Because no personal data is collected, GDPR data subject rights — including rights of access (Article 15), rectification (Article 16), erasure (Article 17), restriction (Article 18), portability (Article 20), and objection (Article 21) — are not technically applicable. However, we provide the opt-out mechanism as a courtesy and a commitment to user autonomy.

CCPA Analysis

Under the California Consumer Privacy Act (CCPA), "personal information" means information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household. The FORGE FailureRecord schema contains no fields that meet this definition:

  • The deployment_id is a SHA-256 hash of hostname + PID — it cannot be linked to a person or household
  • The record_id is a random UUID — it has no association with any identity
  • No IP addresses, device identifiers, or geolocation data are collected

FORGE does not sell, share, or disclose telemetry data to third parties. Aggregate statistical summaries may be published publicly in the FORGE repository, but these contain no individual records or identifiable information.

Other Jurisdictions

Because the FORGE telemetry system collects no personal data by design, it is compatible with privacy frameworks worldwide, including but not limited to LGPD (Brazil), PIPEDA (Canada), POPIA (South Africa), and the UK GDPR. If you believe FORGE telemetry interacts with local privacy regulations in your jurisdiction, please contact the maintainers via the channels listed below.

10. Contact

If you have questions about this privacy policy, concerns about a specific deployment, or need to discuss FORGE telemetry in the context of your organization's compliance requirements:

Contact placeholder

A dedicated contact email address for privacy inquiries will be published here when FORGE transitions to public release. In the meantime, the GitHub issue tracker is the primary channel for all telemetry and privacy questions. Issues are monitored by the core maintainer team and will receive a response within 5 business days.