FORGE Telemetry Privacy Policy
1. Overview
FORGE (Fixpoint Output Repair for Generative Engines) includes an optional telemetry system that reports anonymized structural metadata about fixpoint repair failures. This metadata helps FORGE maintainers understand failure patterns across the user base and make targeted improvements to the framework's convergence algorithms, lane ordering heuristics, and default configuration values.
Telemetry upload is enabled by default but can be disabled at any time with a single environment variable or configuration key. See Section 8 for complete opt-out instructions.
FORGE telemetry is built on a single non-negotiable principle: no content is ever transmitted. Every field in the telemetry record is derived from the structure and behavior of the repair system — never from the actual content being processed. This guarantee is enforced at the code level: the telemetry subsystem has no access to content data.
2. What is Collected — Field by Field
When a FORGE fixpoint repair loop fails, a FailureRecord is constructed and
optionally transmitted to https://telemetry.fixpointforge.report/v1/ingest
via HTTPS POST. The record contains exactly the following fields. This
schema is enforced at the code level; additional fields cannot be added without a code
change visible in the commit history.
class FailureRecord: record_id: UUID # random UUID, no persistent identity timestamp: int # epoch seconds, rounded to hour forge_version: str # semver string, e.g. "0.4.1" deployment_id: str # SHA-256 hex of hostname + PID content_type: str # enum: diff|json|text|code|structured content_length_bucket: str # enum: tiny|small|medium|large structural_shape: dict # abstract schema, no actual values failure_class: str # enum: oscillation|budget_exceeded|... lanes_executed: list[str] # lane names that were invoked lanes_failed: list[str] # lane names that returned errors lane_state_transitions: list[str] # e.g. ["pending→running", "running→failed"] iteration_count: int # total iterations before termination oscillation_detected: bool # cyclic output pattern detected? fingerprint: str # SimHash hex of structural features
Field Descriptions
| Field | Type | Description |
|---|---|---|
| record_id | UUID | A randomly generated UUID created at report time. Not linked to any persistent identity, session, or user. A new UUID is generated for every single record. |
| timestamp | epoch int | Wall-clock time of the failure, rounded to the nearest hour boundary. The original seconds and minutes are discarded to prevent cross-record timing correlation attacks. |
| forge_version | string | Semantic version of the FORGE library (e.g., "0.4.1"). Used to correlate failure patterns with specific releases and detect version-specific regressions. |
| deployment_id | hex string | An opaque, non-reversible identifier derived by hashing SHA-256(hostname + ":" + str(os.getpid())). Groups records from the same deployment instance without revealing the hostname. PID inclusion means restarts produce different IDs. See Section 4. |
| content_type | enum | The category of content being processed. Possible values: "diff", "json", "text", "code", "structured". No content itself is included — only the type label. |
| content_length_bucket | enum | A coarse size category replacing the exact byte count. "tiny" (<512 B), "small" (512 B–8 KB), "medium" (8 KB–64 KB), "large" (>64 KB). Exact lengths are never recorded. |
| structural_shape | dict | An abstract structural description of the content — nesting depth, number of top-level keys, array lengths, line-count ranges. This is a schema-level description with no actual values. For example: {"type": "json", "depth": 3, "keys": 12}. |
| failure_class | enum | The category of failure that terminated the loop. Values: "oscillation", "budget_exceeded", "max_iterations", "lane_error", "parse_error". |
| lanes_executed | string[] | Names of repair lanes invoked during the failed fixpoint attempt. These are lane identifiers from the FORGE configuration (e.g., "structural_validator", "schema_enforcer"). |
| lanes_failed | string[] | Subset of lanes_executed — the lane names that returned a failure or error state during the repair loop. |
| lane_state_transitions | string[] | Sequence of state transitions across all lanes (e.g., ["pending→running", "running→failed", "failed→retry"]). No lane output or content is included — only the state machine trace. |
| iteration_count | integer | Total number of fixpoint iterations attempted before the loop was terminated. Used to calibrate default iteration budgets. |
| oscillation_detected | boolean | Whether the loop detected a cyclic output pattern — i.e., the repair converged to a repeating cycle rather than a stable fixpoint state. |
| fingerprint | hex string | A SimHash structural fingerprint of the content's abstract features. Locality-sensitive hash that captures structural similarity without encoding any content. Used for clustering similar failures. See Section 4. |
3. What is NOT Collected
The following categories of data are never collected, transmitted, or stored by the FORGE telemetry system under any circumstances. This is enforced by code architecture, not policy — the telemetry subsystem does not have access to content data.
- Raw content of any kind — no diffs, patches, JSON documents, text bodies, or code snippets. The telemetry module receives only pre-computed structural metadata.
- LLM prompts — no system prompts, user messages, conversation history, or prompt templates
- LLM model outputs — no completions, responses, generated text, or intermediate repair outputs
- File paths or filenames — no paths from the local filesystem, repository, or working directory
- File contents — no source code, configuration files, data files, or any file bytes
- IP addresses — the Cloudflare Worker receiving telemetry does not log or store source IP addresses in any form
- User account information — no usernames, email addresses, GitHub handles, or SSO identities
- API keys or credentials — no LLM API keys, tokens, secrets, passwords, or auth headers of any kind
- Personal information — no names, contact details, biometric data, or any data attributable to a natural person
- Environment variables — no env var names or values (other than the telemetry opt-out flag status)
- Repository names or URLs — no git remote URLs, repository identifiers, branch names, or commit hashes
- Organization identifiers — no company names, team names, project names, or workspace identifiers
- Exact content lengths — byte counts are replaced with coarse buckets before the record is constructed
- Exact timestamps — times are rounded to the nearest hour before transmission
- Hostname or process details — hashed into an opaque
deployment_idvia SHA-256 and not recoverable
4. How Data is Anonymized
Each field in the FailureRecord is designed to be structurally useful for
failure analysis while being privacy-preserving by construction. The key anonymization
techniques are detailed below.
SimHash Structural Fingerprinting
The fingerprint field is computed using SimHash — a locality-sensitive hashing
algorithm. SimHash produces a fixed-size hash where similar inputs yield similar hashes,
enabling efficient clustering of structurally similar failures.
SimHash is applied to abstract structural features of the content — token types, structural delimiters, nesting patterns, key counts — not to the content itself. The hash is a one-way function: given only the fingerprint, it is computationally infeasible to reconstruct any portion of the original content or even its structural features.
Content Length Bucketing
Exact byte counts are never transmitted. Instead, the content length is mapped to one of four coarse categories before the record is constructed:
| Bucket | Range | Purpose |
|---|---|---|
| tiny | Less than 512 bytes | Distinguish trivial inputs from real workloads |
| small | 512 bytes – 8 KB | Typical small document repairs |
| medium | 8 KB – 64 KB | Standard document repairs |
| large | Greater than 64 KB | Large document or multi-file repairs |
This bucketing prevents content-length-based fingerprinting while preserving enough information to analyze whether failure patterns correlate with content size.
Opaque Deployment Identifier
The deployment_id is computed as:
SHA-256(hostname + ":" + str(os.getpid()))
This produces a deterministic but opaque identifier for a given deployment instance. It groups multiple failure records from the same process without revealing the hostname or process details. SHA-256 is a cryptographic one-way function — the original hostname and PID cannot be recovered from the hash.
Crucially, the PID is included so that deployments restarted on the same host produce
different deployment_id values, further reducing the value of any individual
hash for re-identification purposes.
Timestamp Rounding
Timestamps are rounded down to the nearest hour boundary. For example,
1742565429 (representing 2026-03-21T14:37:09Z) becomes
1742563200 (representing 2026-03-21T14:00:00Z). This prevents
cross-record timing correlation while preserving approximate temporal context
for trend analysis across days and weeks.
5. Storage & Encryption
Individual FailureRecord entries are stored in a private Cloudflare R2 bucket.
R2 provides encryption at rest for all stored objects using AES-256. The bucket is not
publicly accessible — there is no public URL or listing endpoint.
Records are transmitted from FORGE deployments to the Cloudflare Worker endpoint at
https://telemetry.fixpointforge.report/v1/ingest over HTTPS (TLS 1.2+),
providing encryption in transit. The Worker validates the record schema before writing
to R2 — malformed records are rejected and not stored.
6. Who Has Access
Access to raw telemetry records in R2 storage is restricted to FORGE project maintainers only. Access is controlled via Cloudflare account credentials and is not shared with any third party, partner, or contractor.
The Cloudflare Worker endpoint at telemetry.fixpointforge.report is
write-only from the perspective of FORGE deployments — records can be submitted but
not queried, listed, or retrieved via the public API.
Aggregated cluster reports (which contain only statistical summaries and no individual
records) will be published in the FORGE repository under
forge/telemetry/reports/ for community review when the project reaches
public release.
7. Data Retention
Individual FailureRecord entries are stored for a maximum of
90 days. At the end of the retention period:
-
Individual records are processed by the FORGE failure analyzer, which groups them
into cluster summaries based on structural similarity (using the
fingerprintfield and SimHash distance metrics). - Cluster summaries are written as aggregate reports — counts, distributions, representative structural patterns, and statistical summaries — with no individual record data preserved.
-
All individual
FailureRecordentries from the batch are permanently deleted from R2 storage.
Aggregate cluster reports are retained indefinitely for long-term trend analysis. These aggregates contain only statistical summaries and representative structural patterns — never individual records, deployment identifiers, or any data that could be attributed to a specific deployment or user.
8. How to Opt Out
Telemetry upload is enabled by default. You can disable it at any time using either of the following methods. Both take effect immediately — no process restart is required in most configurations.
Option 1 — Environment Variable
Set the following environment variable in your shell profile, CI environment, or container configuration:
export FORGE_TELEMETRY_UPLOAD_ENABLED=false
This takes precedence over all configuration file settings.
Option 2 — Configuration File
Add or update the following key in your forge_config.yaml:
telemetry_upload_enabled: false
The configuration file is typically located at ./forge_config.yaml in the
project root or ~/.config/forge/forge_config.yaml for user-level settings.
To confirm telemetry is disabled, run FORGE with verbose logging enabled
(FORGE_LOG_LEVEL=debug). A log line reading
[telemetry] upload disabled will appear at startup when the feature is
turned off. If you do not see this line, telemetry upload is still active.
When telemetry upload is disabled, FORGE will not construct FailureRecord
objects, will not make any network requests to telemetry.fixpointforge.report,
and will not write any telemetry data to disk. The opt-out is complete — there is no
local caching or deferred upload.
9. Legal Basis & GDPR/CCPA
FORGE telemetry is collected under the legal basis of legitimate interest (Article 6(1)(f) GDPR, where applicable) — specifically, the legitimate interest of improving the quality, reliability, and correctness of open-source software for all users.
GDPR Analysis
No personal data is processed. The FailureRecord schema is
designed such that none of its fields constitute personal data under GDPR Article 4(1).
The telemetry system does not process:
- Any data that identifies or could identify a natural person, directly or indirectly
- Any data subject to special category protections (Article 9)
- Any data relating to criminal convictions or offences (Article 10)
- Any data from children (Article 8)
Because no personal data is collected, GDPR data subject rights — including rights of access (Article 15), rectification (Article 16), erasure (Article 17), restriction (Article 18), portability (Article 20), and objection (Article 21) — are not technically applicable. However, we provide the opt-out mechanism as a courtesy and a commitment to user autonomy.
CCPA Analysis
Under the California Consumer Privacy Act (CCPA), "personal information" means information
that identifies, relates to, describes, is reasonably capable of being associated with,
or could reasonably be linked, directly or indirectly, with a particular consumer or
household. The FORGE FailureRecord schema contains no fields that meet
this definition:
- The
deployment_idis a SHA-256 hash of hostname + PID — it cannot be linked to a person or household - The
record_idis a random UUID — it has no association with any identity - No IP addresses, device identifiers, or geolocation data are collected
FORGE does not sell, share, or disclose telemetry data to third parties. Aggregate statistical summaries may be published publicly in the FORGE repository, but these contain no individual records or identifiable information.
Other Jurisdictions
Because the FORGE telemetry system collects no personal data by design, it is compatible with privacy frameworks worldwide, including but not limited to LGPD (Brazil), PIPEDA (Canada), POPIA (South Africa), and the UK GDPR. If you believe FORGE telemetry interacts with local privacy regulations in your jurisdiction, please contact the maintainers via the channels listed below.
10. Contact
If you have questions about this privacy policy, concerns about a specific deployment, or need to discuss FORGE telemetry in the context of your organization's compliance requirements:
- Issue tracker — Open an issue at github.com/RainTechRepos/FORGE/issues
- Source code — Review the telemetry implementation at forge/telemetry/
A dedicated contact email address for privacy inquiries will be published here when FORGE transitions to public release. In the meantime, the GitHub issue tracker is the primary channel for all telemetry and privacy questions. Issues are monitored by the core maintainer team and will receive a response within 5 business days.