Transparency Report

Contents

Data Flow Architecture
What Happens with the Data
How Improvements are Made
What Gets Published
Audit Trail
Live Stats
Publication Roadmap
Source Code

Reports not yet available

Aggregated cluster reports will be published in the section below once sufficient telemetry data has been collected and analyzed. This page will be updated with live aggregates when the FORGE project reaches public release. Individual records are never published — only statistical summaries derived from aggregated clusters.

1. Data Flow Architecture

The following diagram shows the complete path a telemetry record takes from a FORGE deployment to a published cluster report. Each stage is described in detail below the diagram.

Step 1

🖥️

Deployment

FORGE repair loop fails; FailureRecord constructed in memory from structural metadata only

▶

Step 2

☁️

Worker

HTTPS POST to Cloudflare Worker; record validated against schema; no IP logged

▶

Step 3

🗄️

R2 Storage

Records stored privately in Cloudflare R2; encrypted at rest; maintainer access only; 90-day TTL

▶

Step 4

🔬

Analyzer

SimHash clustering; frequency analysis; repro specs generated; individual records deleted

▶

Step 5

📄

Public Report

Aggregate cluster summaries published here and committed to the FORGE repository

Stage Details

Step 1 — Deployment

When a repair loop fails, the engine passes structural metadata to the telemetry subsystem, which constructs a FailureRecord. Anonymization is applied immediately: timestamp rounding, content length bucketing, opaque deployment ID hashing, and SimHash fingerprinting from abstract structural features. The telemetry module has no access to content data.

Step 2 — Worker

The record is serialized to JSON and transmitted via HTTPS POST to https://telemetry.fixpointforge.report/v1/ingest. The Cloudflare Worker validates the schema and rejects malformed requests. No IP addresses or request headers are logged.

Step 3 — R2 Storage

Valid records are written to a private Cloudflare R2 bucket with AES-256 encryption at rest. No public URL exists. Access is restricted to FORGE maintainers via Cloudflare account credentials. Records carry a 90-day TTL.

Step 4 — Analyzer

The failure analyzer groups records by SimHash distance into clusters, computes statistical distributions, and generates representative structural patterns. After aggregation, all individual records from the batch are permanently deleted from R2.

Step 5 — Public Report

Aggregate summaries are published on this page and committed to the FORGE repository under forge/telemetry/reports/. No individual records, deployment identifiers, or attributable data is included.

2. What Happens with the Data

The telemetry data serves three concrete purposes, all oriented toward improving FORGE's reliability and effectiveness:

Failure Clustering

SimHash fingerprints group structurally similar failures. Clustering uses Hamming distance — records within a configurable threshold form a cluster with a unique ID and a representative structural pattern derived from the centroid.

Failure Ranking

Clusters are ranked by frequency and severity. A cluster with 500 oscillation failures across dozens of deployments is prioritized over 3 parse errors from a single deployment. This ranking directly drives the maintainer team's development priorities.

Reproducibility Specs

For high-priority clusters, the analyzer generates abstract reproducibility specifications — content type, approximate size, structural shape parameters, lane configuration, and iteration behavior. Maintainers use these to construct synthetic test cases without any real user data.

3. How Improvements are Made

The telemetry-to-improvement pipeline follows a systematic process:

Cluster analysis — Identify the most frequent and severe failure patterns
Root cause investigation — Examine cluster characteristics to hypothesize why the repair loop fails for that pattern
Synthetic test construction — Build synthetic inputs from structural parameters (no user data used)
Fix implementation — Lane ordering, convergence heuristics, budget adjustments, or parser improvements
Verification — Verify against synthetic tests and the full regression suite
Release — Ship the fix; update the cluster report to track whether the pattern recedes

Feedback loop

This process creates a closed feedback loop: telemetry identifies problems, the team builds fixes, and subsequent telemetry confirms whether the fix was effective. Over time, this drives measurable improvements in FORGE's repair success rate across all deployment scenarios.

4. What Gets Published

Cluster reports are published in JSON format. Each report covers a defined time period and contains only aggregate statistical data. Here is an example of the report format:

Example — cluster_report_2026Q1.json Illustrative

{
  "report_period": "2026-Q1",
  "generated_at": "2026-03-31T00:00:00Z",
  "total_records_analyzed": 1842,
  "total_records_deleted": 1842,   // all individual records purged after aggregation
  "forge_versions": ["0.3.0", "0.3.1", "0.4.0", "0.4.1"],
  "clusters": [
    {
      "cluster_id": "c1a2b3",
      "record_count": 412,
      "unique_deployments": 87,
      "failure_class_distribution": {
        "oscillation": 0.61,
        "max_iterations": 0.31,
        "budget_exceeded": 0.08
      },
      "content_type_distribution": {
        "diff": 0.74,
        "json": 0.18,
        "text": 0.08
      },
      "content_length_bucket_distribution": {
        "small": 0.42,
        "medium": 0.51,
        "large": 0.07
      },
      "median_iteration_count": 8,
      "oscillation_detected_rate": 0.61,
      "top_lanes_failed": ["structural_validator", "schema_enforcer"],
      "representative_shape": {
        "type": "diff",
        "hunk_count_range": [3, 12],
        "max_nesting_depth": 2
      }
    }
  ]
}

What cluster reports contain

Each cluster entry contains only statistical distributions, representative structural patterns, and aggregate counts. No individual records, no deployment identifiers, no content, and no data that could identify any specific user, organization, or deployment instance. The unique_deployments count is included to show how widespread a failure pattern is, but individual deployment IDs are not listed.

What is NOT Published

Individual FailureRecords — raw records are never published; only statistical summaries
Deployment identifiers — opaque deployment_id hashes are not included in reports
Individual record IDs — no record_id values appear in published reports
Timestamps of individual records — only aggregate time ranges (e.g., "Q1 2026")
Lane configurations — only the names of failed lanes, not full pipeline configurations

5. Audit Trail

Every component of the FORGE telemetry system is open source. You can independently verify every claim made on this page and in the Privacy Policy by reading the implementation.

Component	Repository Path	What to Verify
FailureRecord schema	`forge/telemetry/record.py`	Exactly which fields are included; no content fields exist
SimHash fingerprinting	`forge/telemetry/fingerprint.py`	Fingerprint is computed from structural features, not content
Upload client	`forge/telemetry/upload.py`	Only FailureRecord JSON is transmitted; opt-out check is first
Opt-out handling	`forge/telemetry/config.py`	Env var and config file opt-out fully disables all telemetry
Anonymization	`forge/telemetry/anonymize.py`	Timestamp rounding, length bucketing, deployment ID hashing
Analysis pipeline	`forge/telemetry/analyze.py`	Clustering algorithm, report generation, individual record deletion

Verification

Every commit to the forge/telemetry/ directory is visible in the Git history. Any change to the telemetry schema, collection behavior, or data handling would appear as a diff that anyone can review. We encourage security researchers and privacy advocates to audit the telemetry codebase.

6. Live Stats

The following statistics will be populated with real data once the telemetry system has accumulated sufficient records for meaningful aggregation. All values shown are aggregate counts — no individual records or identifiers are exposed.

—

Records collected

—

Failure clusters

—

Unique deployments (24h)

—

Algorithm version

Data freshness

Stats are updated in real-time via the /v1/stats endpoint. All values are aggregate — individual records cannot be reconstructed from the statistics shown here.

Live Telemetry Feed

The following data is fetched live from the FORGE telemetry API. Subscribe to the Atom feed to receive updates on new failure clusters.

Loading feed data...

Fetching from telemetry API...

Live Benchmark Results

The self-improvement loop runs weekly, generating real content from GPT-5.4, Claude 4.6, and Gemini 3.1, then comparing FORGE's deterministic repair against each LLM's own repair capability. Results update automatically after each run.

—

FORGE parse rate

—

Avg latency

—

Cost per query

—

Test cases

Benchmark Comparison

Loading benchmark data...

API Endpoints

Endpoint	Description
`/v1/stats`	Live global statistics, content-type and model breakdowns
`/v1/feed`	Failure cluster feed with reproduction specs and model patterns
`/v1/feed/trends.json`	Time-series of failure rates
`/v1/feed/changelog.json`	Algorithm change history
`/v1/feed/atom.xml`	RSS/Atom feed for new clusters (subscribe in any reader)

7. Publication Roadmap

Reports will be published according to the following schedule once data collection reaches meaningful volume:

✓

Telemetry infrastructure deployed

Cloudflare Worker and R2 storage bucket are operational. The FailureRecord schema is finalized. The upload client is integrated into the FORGE library. Records are being collected from opted-in deployments.
○

First cluster analysis run

When sufficient records have accumulated (target: 1,000+ records from 50+ unique deployments), the first clustering analysis will be performed and validated by the maintainer team.
○

Reports published on this page

Aggregate cluster reports will appear in the Live Stats section above and as downloadable JSON. Individual records will be permanently deleted at this point — only aggregates remain.
○

Reports committed to FORGE repository

When FORGE goes public, reports will be committed to the repo under forge/telemetry/reports/.

8. Source Code

The entire telemetry system — data collection, record construction, worker endpoint, and analysis tooling — is open source. Review the implementation to independently verify every claim made in this transparency report and the Privacy Policy.

📁

forge/telemetry/

FailureRecord schema, SimHash fingerprinting, upload client, opt-out handling, anonymization, and analysis pipeline.

View on GitHub → 🔥

FORGE Repository

Fixpoint repair lanes, structural validators, schema enforcers, convergence engine, and configuration.

View on GitHub →

Questions or feedback

Open an issue at github.com/RainTechRepos/FORGE/issues or review the telemetry source code at forge/telemetry/. A dedicated contact channel will be established when FORGE reaches public release.