---
title: Mercor Production Database: Anatomy of a Data Breach
url: https://share.jotbird.com/restless-steady-riverbend
updated_at: 2026-04-05T18:01:54.320836+00:00
---

# Anatomy of Mercor's Data Breach
### A technical analysis a complete operational data (production database, user & customer data) loss

---

> **Disclaimer:** All personally identifiable information (PII) in this document has been obfuscated. Names are partially masked (e.g., `T** O****`), emails redacted (e.g., `e****a1@gmail.com`), phone numbers truncated (e.g., `+4479571****`), bank details masked (e.g., `000**-***`), financial identifiers hidden (e.g., `acct_1Rc*****`), IP addresses truncated (e.g., `71.194.*.*`), and MAC addresses partially redacted (e.g., `1C:93:7C:**:**:**`). This analysis is conducted for educational and security research purposes.

> **Note on source material:** This entire analysis is based on **two small sample files** made publicly available by Lapsus$ — a database schema sample and a database export containing table structures with example rows, plus partial Airtable workspace exports. These files were shared **after Mercor allegedly paid a ransom** to have the data removed from the group's leak site — a fact confirmed to us directly by Lapsus$. Despite receiving payment, the group continues to share samples and is actively engaged in selling the full dataset to private bidders. Together these two files represent a fraction of a percent of the claimed 211GB production database. **We did not have access to the full database, the 939GB of source code, the 3TB of cloud storage, the Slack exports, or the Tailscale VPN data.** Everything documented in this report — every bank routing number, every Apple Foundation Model output, every Persona KYC session token, every desktop screenshot URL — was found in these two small files alone. The full breach is orders of magnitude larger. What follows is the tip of the iceberg.

---

## Table of Contents

1. [Executive Summary](#executive-summary)
2. [Why This Breach Is Serious](#why-this-breach-is-serious)
   - [Why AI Training Data Is Worth Billions](#why-ai-training-data-is-worth-billions)
   - [The Extent - What Data Was Exposed](#the-extent---what-data-was-exposed)
   - [The Scope - Who Is Affected](#the-scope-who-is-affected)
   - [The Scale - Mercor Client Ecosystem](#the-scale---mercor-client-ecosystem)
   - [The Airtable Export - 84 Workspaces 1055 Files](#the-airtable-export---84-workspaces-1055-files)
   - [Customer and Third-Party Platform URLs](#customer-and-third-party-platform-urls-found-in-the-dump)
   - [The Screenshot Problem](#the-screenshot-problem)
3. [Platform Overview](#platform-overview)
4. [Evidence - The Database Layer by Layer](#evidence-the-database-layer-by-layer)
   - [Part I - User and Identity Layer](#part-i---user-and-identity-layer)
   - [Part II - Identity Verification and Fraud Detection](#part-ii---identity-verification-and-fraud-detection)
   - [Part III - The Hiring Pipeline](#part-iii---the-hiring-pipeline)
   - [Part IV - Interviews and Assessments](#part-iv---interviews-and-assessments)
   - [Part V - Work Trials and Onboarding](#part-v---work-trials-and-onboarding)
   - [Part VI - Projects and AI Task Management](#part-vi---projects-and-ai-task-management)
   - [Part VII - Time Tracking and Productivity Surveillance](#part-vii---time-tracking-and-productivity-surveillance)
   - [Part VIII - Payments and Financial Infrastructure](#part-viii---payments-and-financial-infrastructure)
   - [Part IX - Communications and Outreach](#part-ix---communications-and-outreach)
5. [Reverse Engineering - Architecture and Infrastructure](#reverse-engineering---architecture-and-infrastructure)
   - [Part X - Infrastructure and DevOps](#part-x---infrastructure-and-devops)
   - [Part XI - Analytics and ML Layer](#part-xi---analytics-and-ml-layer)
   - [Part XII - Reference Data Layer](#part-xii---reference-data-layer)
6. [Exposed Surface Area Summary](#exposed-surface-area-summary)
7. [Technical Architecture Reverse-Engineered](#technical-architecture-reverse-engineered)
8. [Grounds for Legal Action](#grounds-for-legal-action)
   - [I. Client Company Claims - Loss of Proprietary AI Training Data and Trade Secrets](#i-client-company-claims---loss-of-proprietary-ai-training-data-and-trade-secrets)
   - [II. Contractor Class Claims](#ii-contractor-class-claims)
   - [III. Statutory Violations](#iii-statutory-violations)
   - [IV. Negligence](#iv-negligence---security-failures-evidenced-in-the-data)
   - [V. Third-Party Claims](#v-third-party-claims)
9. [Conclusion - What Happens Now](#conclusion-what-happens-now)
   - [The Data Is Still in Circulation](#the-data-is-still-in-circulation)
   - [The Ongoing Threat](#the-ongoing-threat)
   - [The Case for Radical Transparency](#the-case-for-radical-transparency)
   - [A Structural Critique](#a-structural-critique---youth-velocity-and-the-cost-of-immaturity)
10. [Appendix A - Complete Table Inventory](#appendix-a---complete-table-inventory)

---

## Executive Summary

This document presents a systematic technical analysis of a **small sample** from a database export from **Mercor**, an AI-powered talent marketplace that connects software engineers, AI data labelers, and knowledge workers with companies seeking contract labor. As [reported by the Wall Street Journal](https://www.wsj.com/tech/ai/ai-training-data-mercor-offers-ed37d2a1), Mercor has rapidly become one of the key intermediaries in the AI industry — placing contractors inside organizations like **Meta, OpenAI, Google DeepMind, Anthropic, Apple, and Amazon** to perform AI training, data labeling, software engineering, and other knowledge work.

What we analyzed is **two small sample files** shared by Lapsus$ after Mercor allegedly paid a ransom to have the breach data removed. Despite that payment, the group continues to distribute samples and is actively selling the full dataset to private bidders. Together these files represent a tiny sliver of the claimed 211GB production database. Yet even these small samples contain **over 250 table schemas with sample data rows** exported from Mercor's Aurora MySQL production environment, plus **Airtable workspace exports** containing actual AI training data and model evaluation records. The samples cover every operational dimension of the platform — from contractor signup through identity verification, AI-conducted interviews, job placement, real-time work surveillance, and payment disbursement.

If these samples — containing just one or two rows per table — already expose full bank routing numbers, government ID verification tokens, desktop screenshot URLs, signed legal documents, and proprietary AI model outputs from Apple and Amazon, the full 211GB database contains the same data for **every contractor and every transaction** Mercor has ever processed.

### Scope of This Article and the Full Scale of the Breach

> **Important:** This article analyzes **only two small sample files from the production database**, shared by Lapsus$ after Mercor allegedly paid a ransom. The full production database is 211GB, which is itself a fraction of the claimed **4-terabyte breach**. Every finding documented below was derived from these small samples alone. The full database would contain the complete records for every contractor, every transaction, every screenshot, and every payment Mercor has ever processed.

#### The Breach at a Glance

Mercor's official account attributes the breach to a supply-chain attack on the open-source Python package **LiteLLM** — a widely used AI proxy library estimated to be present in [36% of cloud environments](https://www.securityweek.com/mercor-hit-by-litellm-supply-chain-attack/). On March 27, 2026, using a maintainer's compromised credentials, the **TeamPCP** hacking group published two malicious PyPI package versions (`1.82.7` and `1.82.8`) that were available for download for approximately 40 minutes. The reported attack chain: the poisoned dependency landed in Mercor's development environment, swept the machine for SSH keys, AWS tokens, Kubernetes secrets, and `.env` files, deployed privileged containers across Mercor's Kubernetes clusters, and used the stolen credentials to begin exfiltrating data through Mercor's **Tailscale VPN**.

However, **there are reasons to question whether LiteLLM was the sole or even primary attack vector.** Exfiltrating 4 terabytes of data — production databases, 939GB of source code repositories, 3TB of cloud storage including video recordings and screenshots, plus Slack, Airtable, and Tailscale exports — is not a fast operation. At typical egress speeds, this would have taken **days to weeks** of sustained data transfer. A 40-minute window of malicious package availability seems insufficient to establish the deep, persistent access required to systematically exfiltrate this volume of data across this many distinct systems (Aurora MySQL, S3 buckets, GitHub repositories, Airtable, Slack, Tailscale). It is entirely possible that Mercor was already compromised through other means — whether through prior credential exposure, an insider threat, or a separate vulnerability — and that the LiteLLM incident was coincidental or merely one of multiple entry points. Mercor's characterization of itself as "one of thousands of companies" affected by LiteLLM may be an attempt to deflect from deeper, more embarrassing security failures.

**Lapsus$ group** subsequently [claimed responsibility](https://cybernews.com/security/mercor-data-breach-litelllm-supply-chain-attack/) for the breach, posting samples of the allegedly stolen data. Lapsus$ confirmed to us directly that ransom negotiations with Mercor took place and that Mercor paid. Despite that payment, the group continues to distribute samples and is actively selling the full dataset to private bidders.

Mercor [confirmed the security incident](https://techcrunch.com/2026/03/31/mercor-says-it-was-hit-by-cyberattack-tied-to-compromise-of-open-source-litellm-project/) but characterized itself as "one of thousands of companies" affected by the LiteLLM compromise. The company declined to answer whether any customer or contractor data had been accessed, exfiltrated, or misused.

Security researcher [Archie Sengupta noted](https://x.com/archiexzzz/status/2038828829813493834) it was a "very big breach." Y Combinator president [Garry Tan was more direct](https://x.com/garrytan/status/2039554406501531725): *"Incredible amount of SOTA training data now just available to China thanks to @mercor_ai leak. Every major lab. Billions and billions of value and a major national security issue."*

#### What Was Taken - The Full 4TB

The attackers claim to have exfiltrated the following assets. This article only analyzes the first item — the production database. The remaining categories are **not covered** in this analysis but are described here to convey the full scale of exposure.

| Asset | Size | Contents |
|-------|------|----------|
| **Production Database** | **211 GB** | The subject of this article. 250+ Aurora MySQL tables containing candidate profiles (resumes, work history, skills, education), PII (names, emails, phones, addresses, dates of birth, possibly SSNs and government ID documents), interview recordings/transcripts and AI assessment scores, employer/client data (companies, contracts, pricing), and internal user accounts and credentials. |
| **Source Code** | **939 GB** | The complete contents of Mercor's GitHub organization — including the `mercor-monorepo` and all associated repositories. This exposes proprietary AI/ML models for candidate matching and evaluation, the full platform backend and frontend code, API keys, secrets, and internal service credentials embedded in repositories, and all infrastructure-as-code (Terraform/Terragrunt deployment configs, CI/CD pipelines, cloud architecture). |
| **Cloud Storage Buckets** | **~3 TB** | The actual files referenced by the S3 URLs found in the database. Organized into three categories: **Video** — AI interview recordings of candidates (the `ai-interviewer-recordings` and `dailyco-recordings` S3 buckets), containing face and voice biometric data; **GCF-Source** — Google Cloud Function source code, representing additional serverless application logic beyond the main repositories; **FME Review & Verification** — Identity verification documents including passports, driver's licenses, and facial recognition/biometric data used in the Persona KYC flow (the `mercor-background-check-photos`, `certn-api-s3-certn-images`, `certn-api-s3-one-id-images`, and `certn-api-s3-certn-rcmp-documents` buckets). Also included: **every Insightful desktop screenshot** ever captured from contractor machines (the `mercor-insightful-screenshots-production` bucket), and signed legal documents (offer letters, CIIAs, NDAs). |
| **Tailscale VPN Data** | Included | Internal network topology and routing configurations, device certificates and authentication keys, access paths to internal services, dashboards, and admin tools. This is effectively a **map of Mercor's internal network**. |
| **Slack Export** | Included | A full export of Mercor's enterprise Slack workspace (`mercor.enterprise.slack.com`) and potentially client-specific workspaces like `project-mega.slack.com` and `glowstone-mli-rubrics.slack.com`. Slack exports include every message, file upload, DM, and channel history — candid internal discussions, client communications, incident response threads, and operational decisions. |
| **Airtable Export** | Included | Complete exports of all Airtable workspaces used for annotation and project management (6+ distinct workspace IDs found in the database). This exposes task definitions, contractor submissions, quality review data, and client project configurations — effectively the **work product** of Mercor's annotation pipeline. |
| **Google Workspace** | Unknown | It is unclear whether the attackers obtained a full export of Mercor's Google Workspace. Even the small sample analyzed here contains 30+ Google Doc URLs, 10+ shared Drive folder URLs, Google Sheets, and Google Forms. The full database would contain vastly more. If the Workspace was also exfiltrated, it would include all internal documents, email (Gmail), calendar entries, and shared drives. |

#### Why This Matters Beyond Mercor

The database analyzed in this report is merely the **index** — the structured metadata that describes, catalogs, and points to the stolen assets. Think of it as the card catalog for an entire stolen library:

- The **source code** reveals *how the system works* — every algorithm, every API endpoint, every security mechanism, and every hardcoded credential
- The **Slack export** reveals *what was said about it internally* — incident responses, client negotiations, and operational discussions
- The **cloud storage** contains the *actual files* — the screenshots of contractor screens showing client systems, the video interviews showing candidates' faces and voices, the passport scans and government IDs submitted for verification
- The **Airtable export** contains the *work product itself* — the annotation data, task submissions, and quality reviews that Mercor's clients (including frontier AI labs) paid for
- The **Tailscale VPN data** provides *a map to anything that was missed* — the internal network topology that could enable further unauthorized access if credentials haven't been fully rotated

As Garry Tan noted, the AI training data alone — the prompts, responses, evaluations, and RLHF annotations produced by Mercor's contractors for organizations like OpenAI, Meta, and Google DeepMind — represents potentially billions of dollars in value. If this data reaches competitors — whether domestic rivals or labs in other countries — it would allow them to shortcut years of investment. The source code for Mercor's proprietary ranking algorithms (MercorScore, the Bradley-Terry tournament system, the Bayesian fraud model) adds further competitive intelligence value.

Together, this represents one of the most comprehensive corporate breaches in recent memory: not a single database table or a handful of credentials, but the **complete digital footprint** — code, data, communications, files, network maps, and work product — of an organization entrusted with some of the most sensitive work in the AI industry.

---

## Why This Breach Is Serious

### Why AI Training Data Is Worth Billions

To understand why this breach is significant and not just another corporate data leak, it helps to understand what AI training data is and why companies like OpenAI, Anthropic, Apple, Amazon, Meta, and Google pay enormous sums to produce it.

Modern AI models like GPT-4, Claude, and Gemini are not programmed — they are trained. The raw intelligence comes from pre-training on internet text, but the ability to follow instructions, reason carefully, and refuse harmful requests comes from a second phase that depends entirely on **human-generated data**. This is the data Mercor's contractors produce. It falls into several categories, all of which are present in the breach:

**Supervised Fine-Tuning (SFT) data** — Humans write high-quality responses to prompts, demonstrating how the model should behave. The `TASKS` and `TASK_VERSIONS` tables across Mercor's 84 Airtable workspaces contain these prompt-response pairs, organized by domain (legal, medicine, finance, coding, etc.). A single SFT dataset covering a specialized domain can cost millions of dollars to produce because it requires experts — lawyers, doctors, engineers — writing at $95/hour for months.

**Reinforcement Learning (RL) preference data** — Humans compare two model outputs and judge which is better. This is the core of RLHF (Reinforcement Learning from Human Feedback), the technique that transformed GPT-3 into ChatGPT. The `API_PREFERENCE` workspaces, `PHASE_1_TASKS` (Amazon), and the `GPT-4 vs Claude Evaluation` project all contain this data — complete with the prompts, both model responses, and the human preference judgment. This data teaches models *what humans actually want*, which is the hardest and most expensive part of AI development.

**RL rubrics and evaluation criteria** — Before humans can judge model outputs, someone must define *what good looks like*. The `CRITERIA`, `RUBRIC_VERSIONS`, `QA_SPECS`, and `LLM_CALL_CONFIGURATION` tables across 60+ Airtable workspaces contain these rubrics. They encode the evaluation methodology itself — the scoring frameworks, the edge cases, the quality thresholds. This is proprietary intellectual property that defines how each AI lab measures progress. A competitor with access to these rubrics doesn't just get the training data — they get the *recipe*.

**RL environments and Chain-of-Thought data** — The `AMAZON_LLM_COT_EVALUATION` workspace contains full Chain-of-Thought traces — the step-by-step reasoning that models produce before giving a final answer. The `ACADEMIC_REASONING_SFT` workspace contains a `COT` table explicitly for reasoning supervision. The `Panacea — Consulting RL Envs` project built reinforcement learning environments. This data teaches models *how to think*, not just what to say.

**Benchmark evaluation data** — The `ATHENA_HLE` workspaces (likely Humanity's Last Exam) and `AIME_RUBRICS` (AIME math competition) contain evaluation data for some of the most important AI benchmarks. The `MODEL_RESPONSES` and `AWAITING_REVIEW_METRICS` tables contain graded model outputs against these benchmarks. If this data is used to train future models, it contaminates the benchmarks — the models will appear to perform better than they actually do, undermining the entire AI evaluation ecosystem.

**Pre-release model outputs** — The `APPLE_ENDPOINT_SANDBOX` workspace contains actual outputs from Apple's unreleased Foundation Models (`afm-text-083`, `afm-model-086`). These responses reveal the model's capabilities, limitations, safety alignment, and failure modes before Apple has publicly launched them. For a competitor, this is the equivalent of obtaining a rival's product prototype.

**Why this data is so expensive to reproduce:**

Each data point requires a skilled human — often a domain expert — spending minutes to hours crafting, evaluating, or comparing model outputs. At Mercor's reported average rate of $95/hour across 30,000+ contractors, the annual cost of data production runs into hundreds of millions of dollars. OpenAI, Anthropic, and the other labs have each spent years and billions of dollars building these datasets incrementally, refining their rubrics, and developing their evaluation methodologies.

The breach doesn't just expose *data*. It exposes the **methodology** — the rubrics, the evaluation criteria, the domain taxonomies, the quality control processes, and the scoring frameworks that each lab has spent years developing. Any competitor with access to this material — domestic or foreign — could replicate years of alignment research in months, at a fraction of the cost, by simply adopting the proven evaluation frameworks and training on the stolen preference data.

This is why Garry Tan called it "billions and billions of value." The data in these Airtable workspaces is not supplementary. It is the core competitive advantage of the AI labs that produced it — and it is now for sale.

### The Extent - What Data Was Exposed

The breadth of personally identifiable information (PII) in this breach is staggering. The following inventory documents every category of sensitive data present in the database dump, with specific column names, source tables, and — where available — the format of the exposed data as observed in sample records. This inventory is intended to serve as a factual reference for affected individuals, regulators, and legal counsel.

#### 1. Personal Identity Information

| Data Element | Database Column(s) | Source Table(s) | Format Observed in Sample |
|-------------|-------------------|-----------------|--------------------------|
| **Full legal name** | `name`, `first_name`, `last_name` | `MercorUsers_New`, `MercorUserFinancials` (embedded in Stripe JSON) | `T** O****`, `H****i A****a` (full plaintext names) |
| **Personal email address** | `email` | `MercorUsers_New`, `Candidates`, `LinkedinWarmIntros`, `UserReferences`, `MLExperimentsJobPerformanceReviews` | `e****a1@gmail.com`, `a*****y@gmail.com`, `a*****s@gmail.com` (full plaintext) |
| **Phone number with country code** | `phone` | `MercorUsers_New` | `+4479571****` (full international format) |
| **Date of birth** | `birthday` | `UserMetadata`, `Candidates`, `WorkAuthorization_Audit` | Date field — exact DOB for each contractor |
| **Physical home address** | `physicalLocation`, `residenceCity`, `residenceState`, `residenceZipCode` | `UserMetadata`, `UserLocation`, `Candidates` | City, state, zip code, and country of residence |
| **Profile photograph** | `profilePic` | `MercorUsers_New` | URL to stored profile image |
| **Country of residence** | `residenceCountry`, `countryOfResidence` | `UserLocation`, `UserMetadata`, `Candidates` | `USA`, `United Kingdom` |
| **LinkedIn profile URL** | `linkedinUrl`, `url` | `Candidates`, `LinkedinWarmIntros`, `LinkedinUsers` | `https://www.linkedin.com/in/s**-s**-s******-d*****` (full URL with real name) |

#### 2. Government Identity Documents and Biometrics

| Data Element | Database Column(s) | Source Table(s) | Format Observed in Sample |
|-------------|-------------------|-----------------|--------------------------|
| **Government ID verification outcome** | `governmentIdStatus` | `IDVerificationChecks` | `not_applicable`, `passed`, `failed` |
| **Liveness detection result** | `livenessStatus` | `IDVerificationChecks` | Binary pass/fail — confirms a live facial scan was performed |
| **Facial comparison thumbnail** | `thumbnail_key` (in `providerResponse` JSON) | `IDVerificationChecks` | `intr_AAABnNOWs0wnj7Tmg0hBQpL5_thumbnail.jpg` — a stored facial image key |
| **Persona KYC session token** | `sessionId`, `sessionToken` | `IDVerificationChecks` | `face_baseline_intr_AAABnNOWs0wnj7Tmg0hBQpL5` — replayable session ID |
| **Persona account identifier** | `persona_account_id` (in `providerResponse` JSON) | `IDVerificationChecks` | `act_QMTuQh33A4QU23J8ECPSd32BBKb4` |
| **Address verification status** | `addressStatus` | `IDVerificationChecks` | Confirms whether home address was verified against government records |
| **Verification attempt count** | `attemptNumber`, `maxAttempts` | `IDVerificationChecks` | Tracks repeated identity verification attempts |

*Note: The cloud storage buckets (`mercor-background-check-photos`, `certn-api-s3-one-id-images`, `certn-api-s3-certn-rcmp-documents`) reportedly contain the actual document images — passports, driver's licenses, and RCMP criminal record documents — referenced by these database records.*

#### 3. Financial and Banking Data

| Data Element | Database Column(s) | Source Table(s) | Format Observed in Sample |
|-------------|-------------------|-----------------|--------------------------|
| **Bank name** | `bank_name` (in `accountDetails` JSON) | `MercorUserFinancials` | `BANK OF M*******` (plaintext) |
| **Bank routing number** | `routing_number` (in `accountDetails` JSON) | `MercorUserFinancials` | `000**-***` (full routing number in plaintext) |
| **Bank account last 4 digits** | `last4` (in `accountDetails` JSON) | `MercorUserFinancials` | `07**` |
| **Bank account holder name** | `account_holder_name` (in `accountDetails` JSON) | `MercorUserFinancials` | `H****i A****a` (full legal name on bank account) |
| **Stripe Express account ID** | `providerMethodId`, `stripeAccountId` | `UserPaymentMethods`, `MercorUsers_New` | `acct_1Rc*****` |
| **Full Stripe account JSON** | `accountDetails` | `MercorUserFinancials` | Complete Stripe API response including all fields above plus `charges_enabled`, `payouts_enabled`, `default_currency`, TOS acceptance timestamp, and external account details |
| **Wise transfer & quote IDs** | `wiseTransferId`, `wiseQuoteId` | `WiseDisbursements` | Transfer identifiers for international payments |
| **Payment amounts** | `totalPayableAmount`, `totalBillableAmount`, `totalAmount` | `PaymentLineItems`, `MoneyOut_Audit`, `WiseDisbursements` | Amounts in cents (e.g., `250000` = $2,500.00) |
| **Pay rates** | `payableRate`, `billableRate` | `Jobs`, `Jobs_Audit` | Exact hourly/monthly compensation — both what contractor earns and what client pays |
| **Tax form status** | `tax_form` | `Jobs` | Tax filing status per contractor |
| **Stripe subscription ID** | `stripeSubscriptionId` | `Jobs` | Billing subscription identifier |
| **Payout schedule and currency** | `schedule.interval`, `default_currency` (in JSON) | `MercorUserFinancials` | `daily` payout with `7` day delay, currency `cad` |
| **Payment failure reasons** | `dispatchFailureReason`, `failureReason` | `PaymentLineItems`, `MoneyOut_Audit`, `WiseDisbursements` | Structured failure codes revealing payment issues |

*The `MercorUserFinancials.accountDetails` field is particularly egregious — it stores the **complete Stripe Connect API response** as a JSON blob, which includes the contractor's full legal name, personal email, bank name, routing number, last four digits of the account, account holder name, country, currency, and TOS acceptance details. This is not a reference or a token — it is the raw financial identity of each contractor stored in a single database column.*

#### 4. Employment and Performance Records

| Data Element | Database Column(s) | Source Table(s) | Format Observed in Sample |
|-------------|-------------------|-----------------|--------------------------|
| **Employment contract terms** | `payableRate`, `billableRate`, `commitment`, `expected_hours`, `startDate`, `expiresAt` | `Jobs`, `Jobs_Audit` | Full contract terms including pay rate, hours, and duration |
| **Signed offer letters** | `offerLetter` | `Jobs`, `WorkTrial_Audit` | S3 key or base64 encoded signed legal document |
| **Digital signatures** | `signature` | `Jobs`, `WorkTrial_Audit`, `WorkAuthorization_Audit` | Contractor's digital signature on legal agreements |
| **CIIA/NDA agreements** | `ciiaa_direct`, `ciiaaPassthrough` | `Jobs`, `WorkTrial_Audit` | Confidentiality and IP assignment agreements |
| **Terms of work** | `tow` | `Jobs`, `WorkTrial_Audit` | Full terms of engagement |
| **Safety waiver** | `safety_waiver` | `Jobs` | Safety waiver acceptance |
| **Dismissal date and reason** | `dismissalDate`, `dismissalReason`, `dismissalFlag` | `Jobs`, `JobPerformanceReviews_New` | Date of termination and categorized reason |
| **Offboarding reason** | `Offboarding Reason` | `MLExperimentsJobPerformanceReviews` | Plaintext offboarding justification |
| **Performance scores** | `score`, `Quality of Work`, `Engagement`, `performanceScore` | `JobPerformanceReviews_New`, `MLExperimentsJobPerformanceReviews`, `ContractorPerformance_New` | Numeric ratings with text justifications |
| **Performance review text** | `reviewNotes`, `Justification for rating`, `performanceSummary`, `jobPerformanceSummary` | `JobPerformanceReviews_New`, `MLExperimentsJobPerformanceReviews` | Free-text evaluations of individual contractors |
| **Reviewer identity** | `reviewedBy`, `Reviewer` | `JobPerformanceReviews_New`, `MLExperimentsJobPerformanceReviews` | Named Mercor staff who wrote the review (e.g., `A*** K*****`) |
| **Client project name** | `Account`, `Project`, `projectName` | `MLExperimentsJobPerformanceReviews`, `JobPerformanceReviews_New` | `OpenAI`, `Apertus - Elephant` — links contractor performance to specific client |

*The `MLExperimentsJobPerformanceReviews` table is especially damaging: it contains the contractor's **full name**, **email**, **client company name** (e.g., `OpenAI`), **project name**, **reviewer's name**, **quality score**, **engagement score**, **offboarding reason**, and a **free-text justification** — all in a single row. Sample: `A***** D****`, `a*****s@gmail.com`, `OpenAI`, `Apertus - Elephant`, reviewed by `A*** K*****`, rated `4 - Redefines Expectations`.*

#### 5. Criminal Background and Adverse Media Checks

| Data Element | Database Column(s) | Source Table(s) | Format Observed in Sample |
|-------------|-------------------|-----------------|--------------------------|
| **Criminal background check status** | `status` | `BackgroundCheck`, `BackgroundCheck_New` | `clear` / `consider` (whether criminal history was flagged) |
| **Adverse media check status** | `adverseMediaCheckStatus` | `BackgroundCheck` | Whether negative news/media was found about the individual |
| **Background check package** | `package` | `BackgroundCheck` | e.g., `tasker_pro` — defines which checks were run |
| **RCMP criminal record documents** | Referenced via S3 bucket | `certn-api-s3-certn-rcmp-documents-ca-central-1-production` | Royal Canadian Mounted Police criminal record check documents |
| **External candidate ID at Checkr/Certn** | `externalCandidateId`, `backgroundCheckId`, `reportId` | `BackgroundCheck` | Cross-references to external background check providers |
| **Work location for check** | `workLocation` | `BackgroundCheck` | Country/jurisdiction of background check |

#### 6. Work Authorization and Immigration Status

| Data Element | Database Column(s) | Source Table(s) | Format Observed in Sample |
|-------------|-------------------|-----------------|--------------------------|
| **Work authorization status** | `workAuthorizationStatus` | `UserMetadata`, `Candidates`, `WorkAuthorization_Audit` | Whether individual is authorized to work in a given country |
| **Physical country vs. residence country** | `physicalCountry` vs. `residenceCountry` | `UserLocation`, `WorkAuthorization_Audit` | Mismatch between these fields is flagged as fraud — revealing who may be working from an unauthorized location |
| **Location attestation with signature** | `agreedToLocation`, `signature`, `attestedAt` | `WorkAuthorization_Audit` | Signed attestation of physical work location |

*Work authorization status is classified as sensitive personal data under GDPR and many state privacy laws. Its exposure, combined with physical location data and location mismatch fraud flags, could be used to identify individuals working from countries where they lack authorization — creating potential immigration enforcement risk.*

#### 7. Device Fingerprints, Network Identifiers and Surveillance Data

| Data Element | Database Column(s) | Source Table(s) | Format Observed in Sample |
|-------------|-------------------|-----------------|--------------------------|
| **IP address** | `ip` | `InsightfulScreenshots` | `71.194.*.*` (full IPv4 address, geolocatable) |
| **MAC address** | `gateways` | `InsightfulScreenshots` | `["1C:93:7C:**:**:**"]` (unique hardware identifier) |
| **Hardware fingerprint (HWID)** | `hwid` | `InsightfulScreenshots` | `8f9f16f0-1fb7-47e4-a2a1-209838aa5c5e` (persistent device ID) |
| **Computer hostname** | `computer` | `InsightfulScreenshots` | `desktop-ue2kgro` |
| **Operating system & version** | `os`, `osVersion` | `InsightfulScreenshots` | `win32`, `10.0.19045` |
| **Application file path** | `appFilePath` | `InsightfulScreenshots` | `C:\Program Files\Google\Chrome\Application\chrome.exe` |
| **Active window title** | `windowTitle` | `InsightfulScreenshots` | Full window title revealing document/conversation content |
| **Browser URL visited** | `browserUrl` | `InsightfulScreenshots` | Full URL being viewed at time of screenshot |
| **Desktop screenshot image** | `storageUrl` | `InsightfulScreenshots` | Direct S3 URL to actual screenshot image file |
| **Productivity score** | `externalProductivityScore` | `InsightfulScreenshots` | Numeric productivity rating per screenshot interval |
| **Timezone** | `timezone` | `InsightfulScreenshots`, `Timelog` | `America/Chicago` — reveals approximate geographic location |
| **Session duration** | `duration`, `timeStart`, `timeEnd` | `Timelog` | Exact milliseconds worked per session |
| **Pay deduction reason** | `reasonForDeduction`, `appName` | `Deductions` | Why money was subtracted from pay, linked to specific application |

*The combination of IP address + MAC address + HWID creates a **triple device fingerprint** that uniquely identifies not just the person but the specific physical machine they used. Under GDPR, device fingerprints are explicitly classified as personal data. Under CCPA, unique device identifiers constitute personal information.*

#### 8. Fraud Profiling and Algorithmic Decision-Making

| Data Element | Database Column(s) | Source Table(s) | Format Observed in Sample |
|-------------|-------------------|-----------------|--------------------------|
| **Fraud probability score** | `posteriorProbability`, `modelScore` | `FraudEvents`, `FraudSignalAuditLog` | Bayesian probability (0.0–1.0) that individual is fraudulent |
| **Fraud decision** | `currentDecision`, `status` | `FraudStates`, `FraudCheck` | `APPROVE` / `ESCALATE` / `REJECT` — algorithmic verdict on individual |
| **LLM-generated fraud reasoning** | `currentReasoning`, `manual_review_rational` | `FraudStates`, `FraudCheck` | AI-written paragraph explaining why individual was flagged: *"The primary concern is a maximum location mismatch score of 1.0, indicating the user's IP address is entirely inconsistent with their stated profile location..."* |
| **Fraud signal inventory** | `currentKeySignals`, `flag_reasons` | `FraudStates`, `FraudCheck` | `["location_mismatch: 1.0", "email_diff: 0.125", "email_is_pwned: False"]` |
| **HaveIBeenPwned result** | `email_is_pwned` (in signals) | `FraudStates` | Whether contractor's email was found in known data breaches |
| **VPN/Tor detection** | Referenced in fraud signals | `FraudStates`, `FraudSignalAuditLog` | Whether VPN or Tor usage was detected |
| **Cheating detection** | `isCheating`, `cheatingProbability`, `signs` | `CheatingDetection` | Whether individual was flagged for cheating during interviews |
| **Duplicate account detection** | `userIdList` | `DuplicateGroups` | Groups of accounts believed to belong to the same person |

*Automated fraud decisions directly impacted individuals' ability to earn income through the platform. Under GDPR Article 22, individuals have the right not to be subject to decisions based solely on automated processing that produce legal or similarly significant effects. The exposure of the complete fraud reasoning — including the LLM-generated explanations — reveals the inner workings of an automated decision-making system that determined whether people could work and earn money.*

#### 9. Communications and Third-Party PII

| Data Element | Database Column(s) | Source Table(s) | Format Observed in Sample |
|-------------|-------------------|-----------------|--------------------------|
| **In-platform message content** | `content` | `Comms`, `CommsSent` | Full text of messages between contractors, recruiters, and clients |
| **Outreach email content** | `subject`, `content`, `messageTemplate` | `EmailTemplates`, `OffPlatformCampaignSteps` | Full email templates with subject lines |
| **Phone call logs** | Call metadata | `AircallComms` | Aircall VoIP call records |
| **Professional reference PII** | `name`, `email`, `company`, `relationship` | `UserReferences` | Third parties' names, emails, and employers — people who did not sign up for Mercor |
| **LinkedIn profiles of non-users** | `linkedinUrl`, `email` | `LinkedinWarmIntros` | Full LinkedIn URLs and email addresses of people contacted for warm intros |
| **Voucher/endorser PII** | `voucherUserId`, `candidateEmail`, `candidateName`, `candidateLinkedinId` | `CandidateVouches` | Names, emails, and LinkedIn IDs of both vouchers and vouched-for candidates |
| **Recruiter notes** | `noteBody`, `notesForCandidate` | `ListingNotes`, `Candidates` | Candid internal commentary about individuals |

*The exposure of **third-party PII** is particularly significant for legal liability. `UserReferences` contains the names, email addresses, employers, and relationships of professional references — individuals who never created Mercor accounts and never consented to having their data stored in Mercor's production database. `LinkedinWarmIntros` contains LinkedIn URLs and emails of people contacted for recruitment outreach. These third parties had no contractual relationship with Mercor and no opportunity to consent to or opt out of data collection.*

#### 10. PostHog Behavioral Analytics De-Anonymized

| Data Element | Database Column(s) | Source Table(s) | Format Observed in Sample |
|-------------|-------------------|-----------------|--------------------------|
| **User email linked to analytics session** | `userEmail` | `PosthogAnalytics` | Personally identified analytics sessions (defeating anonymization) |
| **Company context** | `company` | `PosthogAnalytics` | Which company the user was associated with during the session |
| **Session timing** | `startTimeUtc`, `endTimeUtc` | `PosthogAnalytics` | Exact session start/end times |
| **Active/inactive time** | `activetime`, `inactivetime` | `PosthogAnalytics` | How long the user was actively engaged vs. idle |
| **Entry URL** | `startUrl` | `PosthogAnalytics` | The URL the user was on when the session started |

*PostHog sessions are typically anonymous or pseudonymous. The `PosthogAnalytics` table explicitly links `userEmail` to session data — effectively de-anonymizing behavioral analytics and creating a personally identifiable record of how each contractor and company user navigated the platform.*

#### Legal Significance of This PII Inventory

Any single category above would trigger breach notification obligations under most privacy laws. The combination creates exposure across multiple overlapping regulatory regimes:

| Regulation | Applicable Data | Key Provisions |
|-----------|----------------|----------------|
| **GDPR** (EU/UK) | All categories — Mercor processes data of EU/UK contractors (sample shows `United Kingdom, Harrow` residence) | Articles 5, 6, 9 (special categories), 13-14 (transparency), 22 (automated decisions), 33-34 (breach notification within 72 hours) |
| **CCPA/CPRA** (California) | Personal identity, financial, employment, device identifiers, behavioral analytics | Right to know, right to delete, right to opt-out of sale/sharing, private right of action for data breaches resulting from failure to maintain reasonable security |
| **Illinois BIPA** | Facial geometry scans from Persona liveness detection, facial comparison thumbnails stored as image keys | $1,000–$5,000 per violation statutory damages, private right of action, no harm requirement |
| **FCRA** (Federal) | Background check results, adverse media checks, fraud decisions used for employment decisions | Requires permissible purpose, adverse action notices, accuracy obligations, private right of action |
| **ECPA / Wiretap Act** | Desktop screenshots capturing communications, browser URLs, window titles | Consent requirements for interception of electronic communications |
| **State Data Breach Notification Laws** (all 50 US states) | Name + financial account number, name + SSN, name + government ID | Mandatory notification to affected individuals, typically within 30-60 days |
| **PIPEDA** (Canada) | All categories — sample shows Canadian contractor (`country: CA`, `BANK OF M*******`, `routing_number: 000**-***`) | Breach notification to Privacy Commissioner and affected individuals |
| **SOX / PCI-DSS** | Financial account data, payment card information if present, bank routing numbers | Compliance obligations for financial data handling |

The exposed data supports claims for:

- **Negligence** — Failure to implement reasonable security measures for highly sensitive personal data
- **Breach of contract** — Violation of privacy commitments made to contractors in terms of service and privacy policies
- **Breach of fiduciary duty** — Mishandling of financial and identity data entrusted to Mercor as an employment intermediary
- **Violations of specific statutes** — BIPA (facial geometry scans from Persona KYC liveness detection), FCRA (background check data used in employment), CCPA (failure to maintain reasonable security), GDPR (multiple articles)
- **Unjust enrichment** — Mercor profited from collecting and processing this data without adequately protecting it
- **Third-party claims** — Professional references, LinkedIn contacts, and vouching parties whose data was collected without direct consent

### The Scope - Who Is Affected

The breach affects multiple distinct populations, each with different legal standing:

1. **Contractors (Primary Class)** — Every person who signed up, completed an interview, or performed work through Mercor has their full PII exposed: full legal name, personal email, phone number, date of birth, home address, government ID verification status, **bank name and routing number**, employment terms with exact pay rates, performance reviews with dismissal reasons, and in many cases desktop screenshots of their computer screens while working. The `MercorUserFinancials` table alone contains sufficient information for **bank account fraud** — the bank name, routing number, last four digits of account number, account holder name, and country are all stored in plaintext JSON.

2. **Client Companies** — Companies that hired through Mercor have their project names (including `OpenAI`, `Apertus - Elephant`), internal tooling references, billing details, hiring criteria, candidate evaluation notes, Slack workspace URLs, Okta SSO group configurations, and annotation platform URLs exposed. These include some of the most valuable and secretive AI organizations on the planet.

3. **Mercor Employees** — Internal staff are identifiable through the `IacDeploymentRuns` table (GitHub usernames as `actor` fields), `CatfishAuditLog` (Slack user IDs and real names), `DATABASECHANGELOG` (migration author names), `MLExperimentsJobPerformanceReviews` (reviewer names like `A*** K*****`), and the `IAM` table (users with `ghost` role assignments within client companies).

4. **Third Parties Who Never Consented** — Professional references (`UserReferences`) provided their name, email, employer, and relationship to the contractor. LinkedIn contacts (`LinkedinWarmIntros`) had their profile URLs and email addresses stored. Vouching parties (`CandidateVouches`) provided detailed relationship information. These individuals had **no direct contractual relationship** with Mercor, likely received no privacy notice, and had no opportunity to consent to or opt out of data collection. Their data was collected incidentally through the contractors they were associated with.

### The Scale - Mercor Client Ecosystem

What elevates this breach from a typical startup data leak to an industry-wide crisis is *who Mercor's clients are*.

Meta, OpenAI, and Google DeepMind are among Mercor's publicly known clients — as [reported by the Wall Street Journal](https://www.wsj.com/tech/ai/ai-training-data-mercor-offers-ed37d2a1) — but even our small sample reveals direct evidence of engagements with at least **six major technology companies**, plus numerous additional clients identifiable through project codenames and Airtable workspace names.

#### Confirmed Client Engagements Found in the Sample

The sample file contains not just the production database tables but also an `./EXPORTS/` directory with **full Airtable workspace dumps** — organized by client name. These exports contain the actual work product: prompts, model responses, evaluation rubrics, and contractor submissions. The client names appear directly in the directory structure:

| Client | Evidence in Sample | What Was Exposed |
|--------|-------------------|------------------|
| **Apple** | Airtable workspace: `AIRTABLE_APPLE_ENDPOINT_SANDBOX_APP3PG4U42BALES9K` containing tables: `TEXT`, `DEEP_L`, `TEXT_ORCHESTRATOR`, `RUBRIC_AUTO_GEN` | **Apple's proprietary AI model outputs.** The `TEXT` table contains prompt-response pairs from Apple Foundation Models (`afm-text-083`, `afm-model-085`, `afm-model-086`) — Apple Intelligence's internal language models. Sample: model `afm-text-083` responding to user prompts with temperature=0.7, top_p=0.9. The `DEEP_L` table shows translation evaluation (text→Spanish). The `TEXT_ORCHESTRATOR` table shows orchestrator model (`afm-model-086`) being tested. **This is pre-release Apple Intelligence evaluation data.** |
| **Amazon** | Airtable workspace: `AIRTABLE_AMAZON_LLM_COT_EVALUATION___UPDATED_APP0JM1SJ4XOHMAQC` containing tables: `DOMAINS`, `PHASE_1_TASKS`, `PHASE_1_REVIEWS`, `TALENT` | **Amazon's LLM Chain-of-Thought evaluation data.** The `DOMAINS` table shows evaluation categories (`math`, `stem`). The `PHASE_1_TASKS` table contains full model A vs. model B comparison data with complete Chain-of-Thought reasoning traces, final responses, and preference judgments. Tasks are claimed by named Mercor staff (e.g., `n****k@mercor.com`). **This exposes Amazon's internal model evaluation methodology and scoring rubrics.** |
| **OpenAI** | Performance review record: `Account: OpenAI`, `Project: Apertus - Elephant`, reviewed by named staff. Feather platform URL: `feather.openai.com/campaigns/998855ab-...`. Project codename in `Projects_Audit`. | Named contractor (`A***** D****`, `a*****@gmail.com`) rated `4 - Redefines Expectations` on OpenAI project work. Direct URL to OpenAI's internal Feather annotation platform with campaign UUID. |
| **Anthropic** | Airtable workspace: `AIRTABLE_API_PREFERENCE` containing `PROMPTS`, `RESPONSES`, `ROLES`, `DOMAINS` tables. Project: `GPT-4 vs Claude Evaluation` comparing GPT-4 and Claude 3.5 Sonnet. `AgentSandboxes` table shows `agentType: claude`. | LLM preference evaluation data comparing Anthropic's Claude 3.5 Sonnet against GPT-4 across use cases. AI coding agent sandbox sessions running Claude. **Exposes model comparison methodology and evaluation criteria.** |
| **Meta** | Publicly confirmed client per WSJ. Project references in `Projects_Audit` and `ProjectIntegrations`. | Contractor work product, project configurations, Slack workspace integrations. |
| **Google DeepMind** | Publicly confirmed client per WSJ. | Contractor work product and project data in the full database. |

#### Airtable Workspace Inventory

The sample file reveals **25+ distinct Airtable workspaces** that were exported as part of the breach. Each workspace name follows a pattern that often includes the client name or project identifier. Beyond the named clients above, the Airtable exports include:

| Airtable Workspace | Domain | Notable Tables |
|-------------------|--------|----------------|
| `APEX_LEGAL` | APEX benchmark - Legal | `TASKS`, `CRITERIA`, `TALENT`, `LLM_CALL_CONFIGURATION` |
| `APEX_INSURANCE` | APEX benchmark - Insurance | `TASKS`, `CRITERIA`, `TALENT`, `IMPORTED_TABLE` |
| `APEX_DATA_SCIENCE` | APEX benchmark - Data science | `TASKS`, `CRITERIA`, `TALENT`, `LLM_CALL_CONFIGURATION` |
| `APEX_MECHANICAL_ENGINEERING` | APEX benchmark - Engineering | `TASKS`, `HELPER`, `FAILURE_ANALYSIS`, `TALENT` |
| `APEX_DIY` | APEX benchmark - DIY/consumer | `TASKS`, `CRITERIA`, `TALENT` |
| `ATHENA_HLE___RUBRICS` | Athena HLE (Humanity's Last Exam) rubrics | `TASKS`, `MODEL_RESPONSES`, `AWAITING_REVIEW_METRICS` |
| `ATHENA_HLE__STEM_` | Athena HLE STEM evaluation | `ATHENA_STEM_V_1`, `QA_SPECS` |
| `BEAR_MEDICINE` | Medical domain tasks | `DISCIPLINES`, `REVIEWER_ASSESSMENT`, `WRITER_DAILY_ACTIVITY`, `BONUS_PAYOUTS`, `PODS` |
| `AIME_RUBRICS` | AIME (math competition) rubrics | `TEAMS`, `TASKS`, `USERS` |
| `ARXIV_Q_A` (multiple versions) | Academic paper Q&A generation | `WORK_QUEUE`, `DOUBLE_BLIND`, `LEAD_AUDIT_QA`, `TESTING_ARXIV_LINKS` |
| `AUTO_REVIEWER` | Automated review system | `SUBMISSIONS`, `LLM_CALL_CONFIGURATIONS`, `PROJECTS` |
| `09_29_CAND_MODEL_EVAL` | Candidate model evaluation (IB1, IB2, CML) | `IB_1`, `IB_2`, `CML`, `CML_DEPRECATED_` |
| `API_PREFERENCE` | API preference evaluation | `PROMPTS`, `RESPONSES`, `ROLES`, `DOMAINS`, `PROMPT_TEMPLATES` |
| `APEX_EXPANSION_WEBSITE_TASKS` | Website-related expansion | `CRITERION`, `FILE`, `TASK` |
| `APEX_EVALS` | General evaluation framework | `EVALUATION_RESULTS` |
| `APEX_V1_REVISION` | Apex V1 revision | `EXPERT`, `RUBRIC`, `CRITERION`, `ROLE` |

The `ATHENA_HLE` workspaces are particularly significant — "HLE" likely refers to **Humanity's Last Exam**, a high-profile AI benchmark designed to test frontier model capabilities. The `MODEL_RESPONSES` table in the rubrics workspace suggests Mercor contractors were grading AI model outputs against this benchmark, and the `AWAITING_REVIEW_METRICS` table indicates an active review pipeline. If this data reached adversarial actors, it could be used to game or contaminate one of the most important AI evaluation benchmarks.

The `BEAR_MEDICINE` workspace reveals medical domain annotation work with `DISCIPLINES`, `REVIEWER_ASSESSMENT`, and `WRITER_DAILY_ACTIVITY` tables — indicating Mercor contractors were creating or evaluating medical AI training data, adding healthcare data to the breach's sensitivity profile.

#### Evidence from Named Projects in the Database

Beyond the Airtable exports, the production database tables contain additional project references:

| Project Codename | Domain | Evidence Source |
|------------------|--------|-----------------|
| **Apertus — Elephant** | AI model evaluation (OpenAI-linked) | `MLExperimentsJobPerformanceReviews`: `Account: OpenAI` |
| **Project Mega** | Large-scale annotation (dedicated Slack workspace: `project-mega.slack.com`) | `ProjectIntegrations`, `ActionsQueue` |
| **Panacea — Consulting RL Envs** | Reinforcement learning environments | `Projects_Audit`, 400+ billable hours |
| **Agentic Code Final QC Audit** | AI code generation quality control (GitHub issue solving) | `TaskDefinitions` |
| **GPT-4 vs Claude Evaluation** | LLM preference ranking (GPT-4 vs Claude 3.5 Sonnet) | Airtable export: `AIRTABLE_AIRTABLE_AI_AGENT_DEMO` |
| **Creative Writing Evals** | Creative content evaluation | `Projects_Audit` |
| **arXiv Q&A** | Academic paper Q&A generation (multiple Airtable versions incl. Snowflake integration) | Airtable exports (3+ copies with dates) |
| **Queensland (litigation)** | Legal domain | `Projects_Audit` |
| **FP&A / Corporate Finance** | Finance domain | `Projects_Audit` |
| **Obsidian** | Human data client (`billingModel: "invoice"`, tagged `humandataclient`) | `Company` |

#### The Magnificent Seven, Frontier AI Labs, and the Competitive Fallout

Mercor is not a niche startup. According to [Big Think](https://bigthink.com/business/inside-the-meteoric-rise-of-mercor/) and [TechCrunch](https://techcrunch.com/2025/10/27/mercor-quintuples-valuation-to-10b-with-350m-series-c/), Mercor has signed deals with **six of the seven "Magnificent Seven"** tech giants — Apple, Microsoft, Alphabet, Amazon, Meta, and Nvidia — plus frontier model developers **OpenAI** and **Anthropic**. The company employs over 30,000 contractors, pays an average rate of $95/hour, and reached a $500 million annual revenue run rate within 17 months of launch. It is valued at $10 billion.

This means the stolen data — the 211GB database, the 939GB of source code, the 3TB of cloud storage, and the 84 Airtable workspaces documented above — contains the operational records, AI training data, and work product for engagements touching **nearly every major AI program in the Western world**.

The small sample analyzed in this report already confirms direct evidence of work for Apple (Foundation Model outputs), Amazon (LLM Chain-of-Thought evaluation), OpenAI (Feather platform, Apertus project), Anthropic (Claude evaluation), and Meta (multimedia annotation templates). The full 211GB database — which we have not seen — would contain the complete records for all six Magnificent Seven clients plus the frontier labs.

The competitive implications are severe:

1. **The training data itself is the prize.** The leaked RLHF annotations, model evaluation data, and preference rankings produced by Mercor's contractors represent billions of dollars in training data investment. This data — now in the hands of Lapsus$ and available to any buyer — could be used by any competitor to accelerate their own model development without incurring the cost of generating it. As Y Combinator president Garry Tan [noted](https://x.com/garrytan/status/2039554406501531725): *"Incredible amount of SOTA training data now just available to China thanks to @mercor_ai leak. Every major lab. Billions and billions of value."*

2. **Apple Foundation Model outputs are in the dump.** The `AIRTABLE_APPLE_ENDPOINT_SANDBOX` workspace contains actual `afm-text-083` and `afm-model-086` model responses — pre-release Apple Intelligence outputs. These provide direct insight into Apple's model capabilities, safety alignment approach, and weaknesses before public release. Any competitor — whether a Silicon Valley rival or a lab in Beijing, London, or Tel Aviv — now has access to Apple's unreleased model behavior.

3. **Amazon's Chain-of-Thought evaluation methodology is exposed.** The `AIRTABLE_AMAZON_LLM_COT_EVALUATION` workspace reveals how Amazon evaluates LLM reasoning quality, including the full prompts, complete Chain-of-Thought traces, and preference rubrics. The methodology itself is as valuable as the data — it reveals what Amazon considers "good reasoning" and how they measure it.

4. **The Anthropic/Claude evaluation data could inform adversarial attacks.** The preference evaluation data comparing Claude 3.5 Sonnet against GPT-4 — including the exact prompts, response pairs, and preference reasoning — could be used to identify weaknesses in Claude's alignment or to train models that specifically exploit those weaknesses.

5. **Mercor's global contractor base spans dozens of jurisdictions.** With 30,000+ contractors across many countries, Mercor's database contains work authorization records, physical location data, and IP-based geolocation. The platform's fraud detection system flags contractors whose physical IP doesn't match their declared residence — meaning the database contains a map of which contractors may be working from undisclosed locations.

Beyond the companies confirmed in the data, multiple sources — including former Mercor employees — claim that Mercor also maintains engagements with **Chinese AI laboratories**, including companies developing frontier models that compete directly with the labs whose training data is now in the breach. If true, this means Mercor was a single point of compromise connecting competing labs on opposite sides of the global AI race, with training data, evaluation methodologies, model outputs, and contractor talent pools for all of them sitting in the same breached infrastructure.

Even setting aside the question of direct Chinese client relationships, the stolen data — RLHF annotations, preference rankings, model evaluation rubrics, and Chain-of-Thought traces produced for OpenAI, Anthropic, Apple, Amazon, Meta, and Google — is now available on the black market. Given that Lapsus$ is actively auctioning the data, this material will reach whoever is willing to pay for it.

The `TaskDefinitions` table also references autograder configurations using `openai/gpt-4.1` and `openai/gpt-5` as scoring models, and task rubrics include constraints like *"LLMs other than ChatGPT are prohibited"* — rules that only make sense when the work product is destined for a specific model vendor's training pipeline.

The scope of client engagements extends far beyond AI companies. The Airtable workspaces alone span legal, insurance, data science, mechanical engineering, medicine, academic research, and mathematics — suggesting Mercor's contractor workforce touches data and systems across a wide range of industries. Any attacker with access to the full dump could enumerate every active client engagement by cross-referencing the `Company`, `Projects_Audit`, `ProjectIntegrations`, `Listings_New` tables, and the complete Airtable export directory.

### The Airtable Export - 84 Workspaces, 1055 Files

A separate directory tree from the breach (`EXPORTS/`) reveals the full structure of the exfiltrated Airtable data. The export contains **84 unique Airtable workspaces** totaling **1,055 JSONL files** — each file containing the complete contents of one Airtable table. This is not a sample. It is the **complete export** of every Airtable base connected to Mercor's Fivetran data pipeline.

The directory structure reveals how Airtable sits at the center of Mercor's operation. It is used as:

1. **The annotation task management system** — Every domain-specific project has its own Airtable base with a standardized schema: `TASKS`, `TASK_VERSIONS`, `CRITERIA`, `DOMAIN`, `SUBDOMAIN`, `TALENT`, `QA_SPECS`, `WORKFLOW`, `LLM_CALL_CONFIGURATION`, `CONTROL_PANEL`, and `FILES`. This is a fully industrialized annotation pipeline.

2. **The work product repository** — Tables like `PHASE_1_TASKS` (Amazon), `TEXT` (Apple), `PROMPTS`/`RESPONSES` (API Preference), and `MODEL_RESPONSES` (Athena HLE) contain the **actual task inputs and outputs** — the prompts sent to AI models, the model responses, and the human evaluations. This is the training data itself.

3. **The talent and compensation ledger** — `TALENT` tables appear in nearly every workspace, tracking which contractors worked on which tasks. `CALCULATED_BONUSES`, `BONUS_PAYOUTS`, `TIMELOG`, and `CLAIMS` tables track compensation. `WRITER_STATS`, `REVIEWER_STATS`, and `WRITER_DAILY_ACTIVITY` tables (in `BEAR_MEDICINE`) track individual productivity.

4. **The QA and audit system** — `QA_SPECS`, `LEAD_AUDIT_QA`, `DOUBLE_BLIND`, and `REVIEWER_ASSESSMENT` tables track quality control processes.

The named workspaces can be organized into categories that reveal the full breadth of Mercor's operations:

**Client-Named Workspaces (Direct Client Evidence):**

| Workspace | Client | Content |
|-----------|--------|---------|
| `APPLE_ENDPOINT_SANDBOX` | **Apple** | Apple Foundation Model outputs (`afm-text-083`, `afm-model-086`), translation testing (`DEEP_L`), orchestrator testing (`TEXT_ORCHESTRATOR`), rubric auto-generation |
| `AMAZON_LLM_COT_EVALUATION` (2 versions) | **Amazon** | LLM Chain-of-Thought evaluation: `DOMAINS`, `PHASE_1_TASKS`, `PHASE_1_REVIEWS`, `MODEL_A_STRENGTHS` |
| `AAIE___META_MULTIMEDIA_TEMPLATE_COMMAND_CENTER` | **Meta** | Meta multimedia annotation template with `OVERALL_META`, `PROJECTS`, `FORMS`, and `TEMPLATE` tables. Workspace name explicitly says "META" and "USE META_X_MULTIMEDIA_SPL_AIRTABLE_TEMPLATE" |
| `API_PREFERENCE` / `API_PREFERENCE_V2` / `API_PREFERENCE__COPY__FOR_BRENDAN` / `API_PREF___KANIX` | **Anthropic/Multi-vendor** | LLM API preference evaluation: `PROMPTS`, `RESPONSES`, `ROLES`, `DOMAINS`, `PROMPT_TEMPLATES`, `QA`. Multiple versions and personal copies for named staff |

**APEX - Mercor's AI Benchmark Suite (Compromised):**

The `APEX_` prefix identifies **Mercor's proprietary suite of AI benchmarks** — domain-specific evaluation frameworks used to measure AI model performance across verticals. Each APEX benchmark has its own Airtable workspace with a standardized schema: `TASKS`, `TASK_VERSIONS`, `CRITERIA`, `DOMAIN`, `SUBDOMAIN`, `QA_SPECS`, `WORKFLOW`, `LLM_CALL_CONFIGURATION`, and `CONTROL_PANEL`. The complete APEX suite spans 15+ domains:

| Workspace | Benchmark Domain | Notable Tables |
|-----------|-----------------|----------------|
| `APEX_LEGAL` | Legal reasoning | Standard APEX schema |
| `APEX_INSURANCE` | Insurance domain | Standard APEX + `IMPORTED_TABLE` |
| `APEX_FINANCE` | Financial services | Standard APEX + `HELPER` |
| `APEX_ACCOUNTING` | Accounting | Standard APEX |
| `APEX_CONSULTING` | Management consulting | Standard APEX + `TEST_HEX_TABLE` |
| `APEX_DATA_SCIENCE` | Data science | Standard APEX |
| `APEX_MECHANICAL_ENGINEERING` | Engineering | Standard APEX + `FAILURE_ANALYSIS`, `HELPER` |
| `APEX_MEDICINE` | Medical/healthcare | Standard APEX |
| `APEX_FOOD` | Food industry | Standard APEX + `DELIVERIES` |
| `APEX_GAMING` | Gaming | Standard APEX |
| `APEX_RETAIL___E_COMMERCE` | Retail & e-commerce | Standard APEX + `DOMAIN_QC` |
| `APEX_SALES___MARKETING` | Sales & marketing | Standard APEX |
| `APEX_SHOPPING_STYLISTS` | Personal shopping | Standard APEX |
| `APEX_DIY` (2 versions) | DIY/consumer | Standard APEX |
| `APEX_WEBSITE_TASKS` / `APEX_EXPANSION_WEBSITE_TASKS` | Web content | `CRITERION`, `FILE`, `TASK` |

The exposure of the complete APEX benchmark suite — including all tasks, criteria, scoring rubrics, and `LLM_CALL_CONFIGURATION` — **renders these benchmarks untrustworthy**. Any AI model trained on the leaked APEX data will appear to perform well on these benchmarks without genuinely possessing the evaluated capabilities. This is benchmark contamination at scale. Unless Mercor rebuilds the entire APEX suite from scratch with new tasks, new criteria, and new evaluation data, every APEX benchmark result produced after this breach is suspect. The `EVALS` workspace — which contains `APEX_RESULTS`, `BOREALIS_RESULTS`, and `LUCIUS_RESULTS` — further confirms that APEX was actively used to evaluate and compare models, making the contamination risk concrete and immediate.

**Other AI Benchmark and Evaluation Workspaces:**

| Workspace | Purpose | Notable Tables |
|-----------|---------|----------------|
| `ATHENA_HLE___RUBRICS` | Humanity's Last Exam rubric grading | `MODEL_RESPONSES`, `AWAITING_REVIEW_METRICS`, `CLAIMS` |
| `ATHENA_HLE__STEM_` (4 versions incl. July 3, 2025 dated copy) | HLE STEM vertical evaluation | `ATHENA_STEM_V_1` |
| `APEX_HLE_BASED_RUBRICS` | HLE-derived rubric system | `CRITERIA`, `LLM_CALL_CONFIGURATION` |
| `APHRODITE__SEARCH_HLE` | Search-based HLE evaluation | HLE search variant |
| `ACADEMIC_REASONING_SFT` | Supervised fine-tuning for academic reasoning | `COT` (Chain-of-Thought), `ROLES`, `TALENTS` |
| `AIME_RUBRICS` | AIME math competition rubrics | `TEAMS`, `USERS`, `TASKS` |
| `EVALS` / `EVALS__COPY_` | General evaluation framework | `APEX_RESULTS`, `BOREALIS_RESULTS`, `LUCIUS_RESULTS`, `_09_04_HLE_RUBRICS` |
| `09_29_CAND_MODEL_EVAL` (5 versions) | Candidate model evaluation (IB1, IB2, CML) | Iterative model comparison datasets |

**Medical Domain Workspaces:**

| Workspace | Purpose | Notable Tables |
|-----------|---------|----------------|
| `BEAR_MEDICINE` | Medical annotation | `DISCIPLINES`, `REVIEWER_ASSESSMENT`, `ASSESSMENT`, `WRITER_DAILY_ACTIVITY`, `REVIEWER_STATS`, `WRITER_STATS`, `ALL_TIME_TOP_5`, `BONUS_PAYOUTS`, `CLAIM_LOCK`, `AHT_STATS`, `ASSESSMENT_ANALYSIS`, `PODS` |
| `BEAR_RADIOLOGISTS` | Radiology-specific annotation | Radiologist-specific tasks |
| `BANKERS` | Financial/banking domain | Banking-specific tasks |

**Aircall Integration (complete phone system export):**

The export also includes a full **Aircall** directory — Mercor's VoIP phone system — containing **27 tables**: `CALL`, `CALL_TRANSCRIPTION`, `CALL_TRANSCRIPTION_CONTENT_UTTERANCE`, `CALL_SENTIMENT`, `CALL_SENTIMENT_PARTICIPANT`, `CALL_SUMMARY`, `CALL_ACTION_ITEM`, `CALL_TAG`, `CALL_TOPIC`, `CONTACT`, `CONTACT_EMAIL`, `CONTACT_NUMBER`, `USERS`, `USER_AVAILABILITY`, and more. This represents the **complete call history** including full transcriptions, sentiment analysis, AI-generated summaries, and contact information for every recruiter phone call.

**What the Airtable Export Means:**

The Airtable export transforms this breach from a database leak into a **complete AI training data theft**. The database tables documented in the rest of this article provide the metadata — who worked on what, when, and how much they were paid. The Airtable export contains the **actual work product**: every prompt, every model response, every human evaluation, every rubric score, every Chain-of-Thought trace, and every preference judgment that Mercor's contractors produced for Apple, Amazon, OpenAI, Anthropic, Meta, and dozens of other clients.

The iterative versioning visible in the workspace names (e.g., `APEX_RUBRICS` with 12+ dated copies from August 7, 2025 through January 23, 2026) reveals that this export captured the **complete historical evolution** of Mercor's benchmark and evaluation pipeline — not just a snapshot, but the full development history of rubrics, task definitions, and evaluation criteria across months of refinement. For the APEX benchmarks specifically, this means every iteration of every benchmark task is now public — an attacker can study how the benchmarks evolved and craft model training data that targets the final versions.

### Customer and Third-Party Platform URLs Found in the Dump

Beyond project codenames, the dump contains **direct URLs to customer platforms, internal tools, and third-party services** — embedded in configuration fields, JSON blobs, onboarding documents, and metadata columns across dozens of tables. An exhaustive search of the file reveals **1,800+ unique URLs**. The most sensitive are catalogued below.

#### Client Annotation and Work Platforms

These are URLs to the actual platforms where Mercor contractors perform work for clients. Each one identifies a specific client engagement and, in many cases, a specific campaign or task within that client's systems:

| URL / Domain | What It Reveals | Source Table |
|-------------|-----------------|--------------|
| `feather.openai.com/campaigns/998855ab-60e7-4aed-9f08-5fccd56fe53e` | **OpenAI's internal Feather annotation platform** — a specific campaign UUID, confirming Mercor contractors work directly inside OpenAI's tooling | `Projects_Audit` (annotationPlatform) |
| `alabaster-studio.com/project/abacus/conversation/7c9facb4-...` | A client project management / collaboration platform — captured as the live browser URL during a monitored work session | `InsightfulScreenshots` (browserUrl) |
| `glowstone-mli-rubrics.slack.com` (channels: `C0994P7BH2N`, `D09969QHV62`) | A **client-specific Slack workspace** for MLI rubric development — likely a client or partner organization's dedicated workspace | `ProjectIntegrations`, `ActionsQueue` |
| `project-mega.slack.com` | A **dedicated Slack workspace** for a single large-scale annotation project | `ProjectIntegrations` |
| 6 distinct **Airtable workspace IDs** (`appX7l7xADlyFD3nL`, `appEzeshKTIKSrvBV`, `app9DBchZKUj2auMZ`, `appCZwMqiIUkP7KIQ`, `appLmn3266lQsaUXK`, `appYFQOZicXUoO2yz`) | Airtable used as an annotation and project management platform — each app ID is a distinct workspace, likely per-client or per-project | `Projects_Audit`, `OnboardingDocument` |
| `ta-01km6j8ztpd4vttvzb7ctgqteh-8080-ms3c95f46vnxcii7cwsi84ago.w.modal.host` | A **Modal.com** serverless deployment — indicating Mercor or a client runs ML model inference on Modal | `AgentSandboxes` or service configuration |

#### Mercor Internal Infrastructure URLs

These URLs expose Mercor's own internal architecture, allowing an attacker to map the entire operational surface:

| URL / Domain | What It Reveals | Source Table |
|-------------|-----------------|--------------|
| `work.mercor.com` | Primary contractor work portal (100+ URLs with job IDs like `/create/job_AAABm...`) | `Comms`, `ActionsQueue` |
| `team.mercor.com` | Company-facing team portal | `Comms`, `EmailTemplates` |
| `talent.docs.mercor.com/how-to/okta-access` | Internal documentation portal — includes onboarding guides for Okta and Insightful setup | `ActionsQueue` |
| `api.mercor.com` | API gateway endpoint | Configuration fields |
| `dev.coil.mercor.com` | Development webhook endpoint for the `coil` microservice | `ProjectIntegrations` |
| `coil.mercor.com` | Production `coil` service endpoint | `ProjectIntegrations` |
| `c-mercor.okta.com` | **Okta SSO instance** — the identity provider for all contractor and staff authentication | `ActionsQueue`, `UserMetadata` |
| `linear.app/mercor` | Mercor's **Linear issue tracker** — exposes internal engineering project management | Configuration metadata |
| `pic-gen.r2.mercor.com` | **Cloudflare R2** image generation service | Asset URLs |
| `ddcd-2601-642-4c01-5a8d-...ngrok-free.app` | An **ngrok development tunnel** — a temporary public URL exposing a local dev server, including the developer's IPv6 address embedded in the subdomain | Webhook configurations |

#### AWS S3 Buckets

Each S3 bucket below contains files that are directly addressable via URL if the bucket permissions are misconfigured. The bucket names alone reveal the categories of stored data:

| S3 Bucket | Contents |
|-----------|----------|
| `mercor-insightful-screenshots-production` | **Every screenshot** captured from contractor desktops during monitored work |
| `mercor-background-check-photos` | Background check identity documents and photographs |
| `ai-interviewer-recordings` | Audio/video recordings of AI-conducted interviews |
| `dailyco-recordings` | Daily.co video call recordings |
| `production-pdx-5557735*****-web-recordings` | Production call recordings (AWS account ID `5557735*****` is embedded in the bucket name) |
| `kite-uhn-brain-injury.s3.ca-central-1.amazonaws.com` | **Medical documents** — bucket name references brain injury records at UHN (University Health Network), a major Canadian hospital system |
| `certn-api-s3-certn-images-ca-central-1-production` | Certn identity verification images |
| `certn-api-s3-certn-rcmp-documents-ca-central-1-production` | **RCMP (Royal Canadian Mounted Police) criminal record check documents** |
| `certn-api-s3-one-id-images-ca-central-1-production` | OneID government identity verification images |

The S3 bucket `kite-uhn-brain-injury` is particularly alarming — it suggests that either Mercor or a client project involved handling protected medical records, and the bucket name alone leaks the nature of the data and the institution involved.

#### Google Workspace Documents

The dump contains direct URLs to **30+ Google Docs**, **2+ Google Sheets**, **2+ Google Forms**, and **10+ shared Google Drive folders** used for project onboarding, task instructions, rubric definitions, and team coordination:

- `docs.google.com/document/d/1111XpiZ9eZvH8X_...` — Onboarding materials
- `docs.google.com/document/d/1770ZnTy0_Yt-U-U7W...` — Project documentation
- `docs.google.com/spreadsheets/d/10LWCzAD1e-J8W7v...` — Tracking spreadsheets
- `docs.google.com/forms/d/e/1FAIpQLSdLnOJ9DZoq...` — Assessment/intake forms
- `drive.google.com/drive/folders/14eFptQgb2FjWoFh...` — 10+ shared project folders

Many of these Google Docs likely remain live and accessible if the sharing permissions are set to "anyone with the link" — a common practice for contractor onboarding materials.

#### Communication and Collaboration Evidence

| Platform | Evidence | Count |
|----------|----------|-------|
| **Slack** | 4 distinct workspaces: `mercor.enterprise.slack.com`, `project-mega.slack.com`, `glowstone-mli-rubrics.slack.com`, `6385b64336a9545.slack.com` | 4 workspaces, 5+ named channels |
| **Google Meet** | Meeting room codes: `deo-ixih-ivt`, `cae-eois-jwn`, `hhr-erjm-svp`, `pmi-ogrs-aap`, `szd-qvcr-hfp`, `zoz-shgt-epy` | 6+ meeting rooms |
| **LinkedIn** | Contractor profile URLs with full names | Multiple profiles |
| **Aircall** | Call recordings via `media-web.aircall.io` and `assets.aircall.io` | Recruiter phone call audio |
| **Ashby HQ** | Job postings at `jobs.ashbyhq.com` and `app.ashbyhq.com` | Hiring platform |
| **Certn** | Background check portals: `mercor.certn.co/hr/applications/{uuid}/`, enrollment at `certn.trustmatic.ws/web-enrolment/` | Identity verification flows |

#### What This URL Inventory Means

An attacker with this data does not need to guess what Mercor's clients are or what systems contractors access. The URLs are **already in the database**. Specifically:

1. **OpenAI's Feather platform URL** with a campaign UUID gives an attacker a direct entry point to probe OpenAI's annotation infrastructure
2. **S3 bucket names** allow targeted enumeration attacks — checking whether buckets are publicly accessible or brute-forcing object keys based on the naming patterns visible in the dump
3. **Google Docs and Drive folders** may still be live and accessible if shared via link — giving an attacker access to project rubrics, onboarding materials, and task instructions
4. **Slack workspace identifiers** enable social engineering against teams working on specific projects
5. **The ngrok tunnel URL** embeds a developer's IPv6 address, adding another vector for targeting Mercor engineering staff
6. **The AWS account ID** (`5557735*****`) embedded in the S3 bucket name enables targeted cloud reconnaissance

### The Screenshot Problem

The most dangerous element of this breach is the **Insightful time-tracking screenshot system** — and the danger compounds with every client Mercor serves, every platform URL catalogued above, and every S3 bucket of screenshots that can be systematically correlated.

Mercor requires contractors to install the **Insightful** (formerly Workpuls) monitoring agent on their computers. This agent captures a screenshot of the contractor's desktop **every few minutes** while they are clocked in. Each screenshot is uploaded to `mercor-insightful-screenshots-production.s3.amazonaws.com` and indexed in the `InsightfulScreenshots` table with rich metadata:

- The **full screenshot image** (stored at a direct, addressable S3 URL — e.g., `https://mercor-insightful-screenshots-production.s3.amazonaws.com/screenshots/[employeeId]/[timestamp]_[uuid].png`)
- The **application open** at the time (`appName`, `appFileName`, `appFilePath`)
- The **window title** (which often contains document names, code file paths, or chat conversations)
- The **browser URL** being visited (which can include `feather.openai.com`, client Airtable workspaces, or any of the platform URLs catalogued above)
- The contractor's **IP address**, **MAC address** (via `gateways`), and **hardware fingerprint** (`hwid`)
- The contractor's **timezone**, **OS version**, and **Insightful agent version**

A sample screenshot record from the dump shows a contractor working in `Google Chrome` on `alabaster-studio.com/project/abacus/conversation/...` — with their IP (`71.194.*.*`), MAC address (`1C:93:7C:64:**:**`), hardware ID, and full filesystem path to Chrome all recorded.

**Here is why this is catastrophic in context:**

The database contains all the ingredients for a systematic visual intelligence operation. An attacker can **join tables** to correlate screenshots with client projects and platform URLs:

1. **Which client project** a contractor was assigned to (from `ProjectIAM` and `Jobs`)
2. **Which annotation platform** that project uses (from `Projects_Audit.annotationPlatform` — e.g., `feather.openai.com`, specific Airtable workspace IDs)
3. **Every screenshot** taken while the contractor worked on that project (from `InsightfulScreenshots` filtered by `contractorId` and `projectId`)
4. **The exact URLs, window titles, and application contents** visible in those screenshots — cross-referenced against the known client platform URLs to confirm which client's systems are shown

This means an attacker doesn't just get a list of Mercor's clients — they get a **visual archive of what contractors saw inside those clients' systems**. If the project was for OpenAI, the screenshots show OpenAI's Feather annotation interface, the prompts being graded, and the evaluation criteria. If the project was for Meta, the screenshots show Meta's internal tooling. If the project involved reinforcement learning environments, the screenshots show the RL training data and reward models.

The scope of what these screenshots can reveal includes:

- **Proprietary client code and architecture** visible in IDE windows, terminal sessions, and browser tabs
- **Annotation platform interfaces** showing the exact tasks, rubrics, and datasets used to train frontier AI models
- **Internal Slack channels and email threads** visible in background windows — the `ProjectIntegrations` table confirms contractors are added to client Slack workspaces (`project-mega.slack.com`, `mercor.enterprise.slack.com`)
- **Authentication tokens, API keys, and session cookies** potentially visible in browser URL bars, developer tools, or terminal output
- **Unreleased product features, research results, and trade secrets** visible in dashboards or documents
- **Other contractors' work and personal information** if collaborative tools were open on screen

Perhaps most critically, the screenshots create an involuntary record of **contractor misconduct**. As the [Wall Street Journal has reported](https://www.wsj.com/tech/ai/ai-training-data-mercor-offers-ed37d2a1) on the growing concerns around AI training data supply chains, contractors in these roles often have privileged access to sensitive client systems. If any contractor was engaged in **unauthorized data exfiltration** — copying proprietary datasets, screenshotting confidential research, leaking model weights, or otherwise violating their employment agreements — that activity was **captured frame by frame by the monitoring system and is now available to anyone with the dump**.

The monitoring system that was designed to protect Mercor's clients has become a comprehensive, timestamped, visually indexed archive of everything those clients wanted to keep secret.

This creates a **cascading breach**. Mercor's data exposure is not just a breach of Mercor — it is a **proxy breach of every client organization** whose internal systems, annotation platforms, Slack workspaces, and proprietary tooling were visible on a contractor's screen during monitored work sessions. The number of indirectly breached organizations equals the number of clients Mercor has ever served.

---

## Platform Overview

Mercor presents itself publicly as an AI-powered hiring platform. The database tells a more complete story: it is a **full-stack labor marketplace and employment management system** that spans acquisition, vetting, matching, contracting, surveillance, and payment.

The platform operates across at least three distinct product surfaces:

1. **Talent Portal** — Where contractors create profiles, complete interviews, apply to listings, and track their work
2. **Company Portal** — Where client companies post listings, review candidates, manage projects, and receive invoices
3. **Godmode / Internal Admin** — An internal dashboard (`GodmodeCompanies`, `GodmodeArbitraryCells`) used by Mercor staff for operations

The backend is a **microservices architecture** with at least 13 named services: `coil`, `site_fe`, `team_fe`, `work_fe`, `mercor_go`, `mercor_api`, `mercor_api_nginx`, `celery`, `workflow`, `db_trigger_consumer`, `steve`, `woz`, and `payments_temporal_worker`. These are deployed on AWS ECS and managed via Terraform/Terragrunt in the `mercor-monorepo` GitHub repository.

The primary database is **Aurora MySQL** (AWS), with the analytics warehouse being **Snowflake** (evidenced by `dbt` model tables like `DbtFirmSchoolRank` and `DbtSchoolRankings`). Schema migrations are managed by **Liquibase** (evidenced by `DATABASECHANGELOG` and `DATABASECHANGELOGLOCK` tables).

---

## Evidence - The Database Layer by Layer

The following sections present a systematic walk through every domain of the exposed database, with obfuscated sample records drawn directly from the dump. This is the evidence base for the claims made above.

---

## Part I - User and Identity Layer

### The Contractor Profile

At the core of Mercor's data model is the contractor. The `MercorUsers_New` table stores the primary user record, while `MercorUsers_New_backup` appears to be a historical snapshot. A sample (obfuscated):

| Field | Value |
|-------|-------|
| `userId` | `7d10d057-0c11-438a-ace1-9a9c8a50c925` |
| `email` | `e****a1@gmail.com` |
| `name` | `T** O****` |
| `phone` | `+44795718****` |
| `location` | `United Kingdom, Harrow` |
| `createdAt` | `2025-08-30 09:49:20` |
| `lastLogin` | `2025-09-20 09:16:33` |
| `insightfulId` | `wesvspdyd5m3zg2` |
| `stripeAccountId` | `NULL` |
| `isDeleted` | `0` |

The `insightfulId` field is particularly significant — it links this user to their Insightful (formerly Workpuls) monitoring agent, meaning every screenshot taken of this person while working is tied to this identifier.

The `MercorUsers_New` table extends the backup with additional fields: `phoneVerificationStatus`, `phoneVerifiedAt`, `phoneOptIn` — indicating ongoing additions to the user data model. The `authType` field suggests support for multiple authentication providers (Firebase, Google OAuth, email/password).

### Location and Residence Data

`UserLocation` stores both declared residence and physical presence:

| Field | Value |
|-------|-------|
| `residenceCountry` | `USA` |
| `physicalCountry` | `USA` |
| `residenceState` | `NULL` |
| `physicalState` | `NULL` |

The distinction between residence and physical country is central to Mercor's fraud detection logic — a mismatch between declared location and actual IP-derived location is one of the primary fraud signals.

`UserMetadata` enriches the contractor record with:
- `workAuthorizationStatus` — eligibility to work in specific countries
- `birthday` — date of birth
- `physicalLocation` — freeform address field
- `contractorMail` — a Mercor-provisioned email address (e.g., `@mercor.com`)
- `oktaUserId` / `oktaAccountState` — SSO integration
- `maxContracts` — cap on concurrent engagements
- `fraudStatusEnum` — a denormalized fraud verdict

`UserAvailability_Audit` captures declared working hours: `maxWeeklyHours`, `desiredWeeklyHours`, `expectedStartOffset`, and `timezone` — allowing Mercor to understand contractor bandwidth and scheduling preferences.

### Referral and Social Vouching

`CandidateVouches` is a comprehensive social trust mechanism. When a voucher endorses a candidate, they fill out a structured questionnaire:

- How did you know this person? (social platforms, working together, studying together, other)
- Why are they qualified? (skills, education, employer, expertise, other)

Each field has a paired `*Detail` text field. This creates a rich graph of professional and social relationships.

`UserReferences` stores professional references with names, companies, relationships, and contact emails — conventional hiring data now sitting in an exposed database.

`UserState` tracks lifecycle metrics: `resumeUploaded`, `interviewsCompletedCount`, `jobApplicationsCount`, `totalMillisWorked`.

---

## Part II - Identity Verification and Fraud Detection

### The KYC Layer

Mercor uses **Persona** as its identity verification provider. The `IDVerificationChecks` table records each check with:

- `provider`: `persona`
- `source`: e.g., `interview-face-comparison`
- `sessionId`: the Persona interview session token
- `verificationStatus`, `governmentIdStatus`, `livenessStatus`, `addressStatus`
- `fraudDecision`: `NULL` / escalated / approved
- `providerResponse`: full JSON blob from Persona's API

A sample Persona response shows:
```json
{
  "type": "baseline",
  "interview_id": "intr_AAABnNOWs0wnj7Tmg0hBQpL5",
  "thumbnail_key": "intr_AAABnNOWs0wnj7Tmg0hBQpL5_thumbnail.jpg",
  "persona_account_id": "act_QMTuQh33A4QU23J8ECPSd32BBKb4"
}
```

The thumbnail key references a stored facial image from the verification session.

`BackgroundCheck` and `BackgroundCheck_New` record criminal background and adverse media checks (via **Checkr** or **Certn**):

| Field | Example |
|-------|---------|
| `externalCandidateId` | Checkr candidate UUID |
| `workLocation` | `USA` |
| `package` | `tasker_pro` |
| `status` | `clear` / `consider` |
| `adverseMediaCheckStatus` | `clear` |

`ScreeningPackage` defines what checks are bundled per company engagement, including `checkConfig` (JSON with individual check types) and `graceDays` (how many days a contractor has to complete checks before being blocked).

### The Fraud Pipeline

Mercor operates a multi-stage fraud pipeline that is one of the most sophisticated components in the database. It runs at four stages: **profile**, **interview**, **post-interview**, and **on-project**.

**`FraudStates`** — The current fraud verdict per user, maintained as a state machine:

| Field | Example Value |
|-------|---------------|
| `userId` | `000087ef-2296-445c-b355-9d5e600e0af2` |
| `currentStage` | `profile` |
| `currentDecision` | `ESCALATE` |
| `currentConfidence` | `medium` |
| `currentReasoning` | *"The primary concern is a maximum location mismatch score of 1.0, indicating the user's IP address is entirely inconsistent with their stated profile location..."* |
| `currentKeySignals` | `["location_mismatch: 1.0", "email_diff: 0.125", "email_is_pwned: False"]` |

The reasoning field contains LLM-generated natural language explanations — almost certainly from Vertex AI / Gemini based on the signal schema.

**`FraudCheck`** — The central fraud queue:
- `stage`, `interviewId`, `jobId` — context of the check
- `process_status`, `retryCount` — pipeline execution state
- `flag_reasons`, `automatedReasons`, `manual_review_rational`, `manual_review_signs`
- `assigned_to`, `assigned_on` — human reviewer assignment
- `splReview` — special review flag

**`FraudSignalAuditLog`** — Every individual signal evaluated:
- `signalType` — e.g., `location_mismatch`, `email_is_pwned`, `vpn_detected`
- `modelName` — which ML model produced the score
- `modelScore` — numeric confidence
- `status` — accepted / rejected

**`FraudEvents`** — Bayesian belief updates per event:
- `priorAlpha`, `priorBeta`, `priorProbability`, `priorStatus`
- `posteriorAlpha`, `posteriorBeta`, `posteriorProbability`, `posteriorStatus`
- `evidence` — JSON describing what caused the update

This is a textbook Beta-Binomial Bayesian fraud model — prior beliefs updated with evidence to produce posterior fraud probability estimates.

**`ProductionFraudState`** — Final fraud disposition:
- `fraudModality` — type of fraud (identity, time, quality)
- `source` — automated / manual
- `productionModelId` — versioned model that made the call

**`OnProjectFraudWindows`** — Time-based on-project fraud analysis:
- `fraudType`, `flags`, `flagMetadata`, `windowMetadata`, `screenshotMetadata`
- Analyzes patterns within work sessions using screenshot data

**`CheatingDetection`** / **`CheatingDetection_Audit`** — Interview cheating detection:
- `isCheating`, `cheatingProbability`, `signs`
- Tracks whether candidates used external resources during AI interviews

**`QAReviewLog`** — Manual fraud review outcomes:
- `stage`, `signalType`, `decision`, `comments`
- Assigned to specific `reviewerId` for human-in-the-loop adjudication

**`AutoFraudChecks`** — Automated rule-based checks triggered on a schedule or event.

**`DuplicateGroups`** — Groups of user IDs believed to be the same person (`userIdList`), with merge tracking (`mergedIntoGroupId`).

---

## Part III - The Hiring Pipeline

### Listings

`Listings_New` is the job posting table. A Mercor listing is considerably more structured than a typical job board entry:

| Field | Description |
|-------|-------------|
| `title` | Job title |
| `description` | Full job description |
| `rateMin` / `rateMax` | Pay rate range |
| `hoursPerWeek` | Expected commitment |
| `payRateFrequency` | `hourly` / `monthly` |
| `workArrangement` | Remote / hybrid |
| `eligibleLocation` | Which countries can apply |
| `ineligibleResidenceLocation` | Explicitly excluded countries |
| `listingType` | Job category |
| `evaluationCriteria` | JSON rubric for ranking candidates |
| `automatedCommsOn` | Boolean — auto-send rejection emails |
| `automaticRejectionsOn` | Boolean — auto-reject below threshold |
| `timeToAutoReject` | Days until auto-rejection fires |
| `goalNumHires` | Target headcount |
| `referralBoost` | Bonus multiplier for referred candidates |
| `isExploreAlways` | Always appear on public explore page |
| `disableApplications` | Freeze new applications |

`EvaluationCriteria` stores the per-listing scoring rubric used during candidate ranking — each criterion has `shortCriteria`, `type` (hard filter or soft score), `hardFilter` boolean, and `position` for display ordering.

`ListingNotes` stores internal recruiter notes per listing — including candid operational commentary. A sample (obfuscated):

> *"33 leads confirmed on sheet by B****** to send offers — @N*** to staff RM for conversion"*

This reveals that Mercor staff are managing candidate pipelines directly, with named individuals responsible for conversions.

### Candidates

`Candidates` / `Candidates_Audit` tracks every application:

| Field | Description |
|-------|-------------|
| `status` | `applied` / `shortlisted` / `offered` / `rejected` |
| `listingStepConfigId` | Which step in the hiring funnel |
| `notesForCandidate` | Recruiter notes visible to candidate |
| `birthday` | Date of birth at application time |
| `physicalLocation` | Where they were when applying |
| `workAuthorizationStatus` | Work eligibility |
| `rejectionReason` | Categorized rejection reason |
| `starred` | Recruiter-starred flag |
| `automaticRejectAt` | Scheduled auto-rejection timestamp |
| `numCommsSent` / `lastCommSentAt` | Outreach tracking |
| `referralId` | Linked referral if any |

`CandidateMatchScores` provides ML-generated match scores:
- `matchScore` — numeric compatibility score
- `contextualSummary` — LLM-generated natural language explanation of why this candidate fits this listing

`MercorScores` stores the tournament-based ranking scores:
- `mScoreRaw` / `mScoreNormalized` — the MercorScore
- `numComparisons` — how many pairwise comparisons informed the score
- `contextualSummary` — LLM narrative on the candidate's standing
- `aggregateFeatureScore` — combined feature vector score

`PairwiseComparisons` stores individual A/B comparisons:
- `winnerResumeId` / `loserResumeId`
- `winnerUserId` / `loserUserId`
- `reasoning` — LLM explanation of why candidate A beat candidate B

This implements a **Bradley-Terry tournament ranking model** — candidates are repeatedly compared in pairs, with each comparison updating relative ranking scores.

`TalentViewSearchUsers` and `SharableTalentViewConfig` enable companies to create saved talent searches and share curated candidate shortlists with colleagues. `SharableTalentViewConfigUsers` adds per-candidate evaluation data including `likeCount`, `dislikeCount`, and free-text `feedback`.

---

## Part IV - Interviews and Assessments

### AI Interview System

Mercor's interview process is AI-conducted and rubric-graded. The `Forms_Audit` table reveals the full interview configuration:

- `items` — JSON array of interview questions
- `evaluationCriteria` — per-question scoring criteria
- `assessmentRubricId` — linked rubric
- `allowCopyPaste` — flag to restrict copy-paste (cheating prevention)
- `allowFormRetakes` / `maxRetakeAttempts` — retry policy
- `prep` — pre-interview preparation materials shown to candidate
- `feedbackConfig` — how/whether to share scoring feedback

`AssessmentRubrics` defines the grading framework:
- `title`, `instructions` — rubric metadata
- `sumScores`, `sumSquareScores`, `countScores` — aggregate statistics across all uses of this rubric
- `passThreshold` — minimum score to pass

`AssessmentRubricItems_Audit` stores individual rubric criteria:
- `criteria` — the evaluation criterion text
- `shortName` — abbreviated label
- `points` — maximum points
- `format` — scoring format (binary, scale, etc.)
- `webSearch` — whether web search context is provided to the grader
- `smartScoring` — whether AI auto-scoring is enabled
- `type` — criterion type

`FormSubmissions` records every interview submission:
- `responseStatus` — `submitted` / `abandoned` / `in_progress`
- `activeTimeSeconds` — actual time spent on the form
- `posthogSessionIds` — linked PostHog analytics session
- `assessmentVersionId` — which version of the assessment was taken

`AssessmentEvalState` tracks the grading pipeline:
- `assessmentType`, `jobType`, `status`
- `retryCount`, `reason`
- `triggerSource`, `triggeredByUserId`
- `durationMs` — how long grading took

`InterviewEvals` stores scored results:
- `communicationScore`, `technicalScore`
- `qaPairScores` — per question-answer pair scores

`InterviewIssues` records reported problems during interviews:
- `issue` — issue type (technical problem, suspected cheating, content issue)
- `source` — who reported it (candidate, system, reviewer)
- `startPosition` / `endPosition` — timestamp positions within the interview
- `reportedBy` — user ID of reporter

`InterviewScores` provides the final aggregate score per interview.

---

## Part V - Work Trials and Onboarding

### Work Trial Contracts

`WorkTrial_Audit` captures the structured trial engagement contract:

| Field | Description |
|-------|-------------|
| `payableAmount` | Amount payable to contractor (cents) |
| `billableAmount` | Amount charged to company (cents) |
| `ciiaaDirect` | Confidentiality agreement (direct) |
| `ciiaaPassthrough` | Confidentiality agreement (passthrough) |
| `tow` | Terms of work |
| `offerLetter` | S3 key or base64 of signed offer letter |
| `signature` | Digital signature string |
| `startDate` / `endDate` | Trial period |
| `projectId` | Linked project |
| `billingAccountId` | Billing target |

The presence of `offerLetter` and `signature` fields indicates that signed legal documents are stored directly in the production database.

`WorkTrialConfig` defines reusable work trial templates per company:
- `emailTemplateSubject` / `emailTemplateBody` — invitation email content
- `emailTemplateSubjectExtension` / `emailTemplateBodyExtension` — offer extension emails
- `interviewIds`, `formIds` — prerequisite steps before trial activation
- `isUnified` — whether trial is shared across listings

### Onboarding Pipeline

`OnboardingState` defines the onboarding funnel steps:

| Field | Example |
|-------|---------|
| `shortName` | `interview_completed` |
| `name` | `Interview Completed` |
| `threshold` | `1` |
| `order` | `0` |

`OnboardingDocument` stores the per-project onboarding materials (links, instructions, or document content) shown to newly hired contractors.

`TierProgress` tracks contractor progress through Mercor's internal tier/certification system — mapping contractors to `planId`, `tierId`, `status`, and `completedAt`.

`PlanAssignments` assigns contractors to specific plans with defined `startDate`, `endDate`, `userHours` allocation, and `tasksCompleted` tracking.

---

## Part VI - Projects and AI Task Management

### Project Structure

`Projects_Audit` reveals the full project configuration:

| Field | Description |
|-------|-------------|
| `companyId` | Client company |
| `name` | Internal project name |
| `screenshotEnabled` | Whether Insightful monitoring is active |
| `userGroupEmail` | Google Group for project members |
| `projectType` | Project category |
| `annotationPlatform` | e.g., Scale AI, Label Studio |
| `annotationPlatformIDs` | External platform project identifiers |
| `ssotLink` | Single source of truth document URL |
| `taskMetricsDatastore` | Where task data is stored |
| `status` | `active` / `archived` |
| `notes` | Internal notes |
| `offerExtendedText` | Custom text in offer letters for this project |

`ProjectIAM` / `ProjectIAM_Audit` defines role-based access: each record maps a `userId` to a `roleId` within a `projectId`, with `status` and `assignedBy` for audit purposes.

`ProjectIntegrations` is particularly revealing — it links each project to:
- `oktaGroupId` / `oktaOwnerGroupId` / `oktaEPMGroupId` — Okta SSO groups
- `googleGroupId` — Google Workspace group
- `slackChannelId` / `workspaceNotificationChannel` — Slack notification channels
- `projectShortId` — human-readable project identifier

This table effectively maps every production project to its Slack workspace and Okta group, providing a complete picture of Mercor's organizational structure.

### AI Task System

`TaskDefinitions` / `TaskDefinitions_Audit` define the structure of AI training tasks:

| Field | Description |
|-------|-------------|
| `rubric` | JSON grading rubric for this task type |
| `autograder` | Autograding configuration (model, prompts) |
| `task_schema` | JSON Schema defining the task response format |
| `metadata` | Additional task configuration |

`TaskAudits` records individual task submissions for review:

| Field | Description |
|-------|-------------|
| `taskDefinitionId` | Which task definition was used |
| `recordId` | The submitted task record |
| `s3KeyPrefix` | S3 location of submission artifacts |
| `authorId` | Contractor who submitted |
| `auditorId` | Reviewer assigned |
| `status` | `pending` / `approved` / `rejected` |
| `outcome` | Final grading outcome |
| `autoOutcome` | Automated grading result |
| `dispute` | Dispute information if challenged |
| `disputedBy` | Who filed the dispute |

`TaskAssignments` maps tasks to specific jobs and users, with `appliedBy` tracking who made the assignment.

`DeliverableBatches` groups deliverables for invoicing:
- `uid`, `name` — batch identifier
- `invoiceLineItemId` — linked invoice line
- `taskCount`, `status`
- `metadata` — additional batch configuration

`ProjectCustomColumns` adds arbitrary metadata fields to projects, with `sqlQuery` indicating some columns are dynamically computed from database queries. `ProjectCustomColumnValueHistory` tracks changes to these values over time.

`ProjectArchetypes` stores character/role descriptions for specific project types — suggesting Mercor operates AI roleplay or persona-based annotation tasks (`archetypeText`, `elements`).

`ProductivityProjectRules` defines per-project productivity monitoring rules (`rules` JSON, `is_active`, versioned).

---

## Part VII - Time Tracking and Productivity Surveillance

### The Insightful Integration

This is the most invasive component of the exposed data. Mercor uses **Insightful** (formerly Workpuls) — a workforce monitoring agent installed on contractors' computers — to capture screenshots and activity data.

**`InsightfulScreenshots`** — Every screenshot record contains:

| Field | Example (obfuscated) |
|-------|---------------------|
| `storageUrl` | `https://mercor-insightful-screenshots-production.s3.amazonaws.com/screenshots/[id]/[timestamp]_[uuid].png` |
| `storageKey` | `screenshots/wmcw2pdyvenmluy/1767129970810_3b62edd1-...` |
| `screenshotTimestamp` | `1767129970810` (Unix ms) |
| `ip` | `71.194.*.*` |
| `gateways` | `["1C:93:7C:64:**:**"]` (MAC address) |
| `os` | `win32` |
| `osVersion` | `10.0.19045` |
| `agentVersion` | `7.9.3` |
| `computer` | `desktop-ue2kgro` |
| `hwid` | `8f9f16f0-1fb7-47e4-a2a1-209838aa5c5e` |
| `appName` | `Google Chrome` |
| `appFileName` | `chrome.exe` |
| `appFilePath` | `C:\Program Files\Google\Chrome\Application\chrome.exe` |
| `windowTitle` | `Alabaster Studio - Google Chrome` |
| `browserUrl` | `alabaster-studio.com/project/abacus/conversation/[uuid]` |
| `browserSite` | `alabaster-studio.com` |
| `isBlurred` | `0` |
| `externalProductivityScore` | `1` |

Every screenshot includes the contractor's IP address, MAC address (gateway), hardware fingerprint, operating system, the exact application open, the window title, and the URL being visited — all timestamped to the millisecond.

The `storageUrl` field contains direct S3 URLs to screenshot image files. The S3 bucket `mercor-insightful-screenshots-production` is referenced explicitly.

The `hwid` (hardware ID) field provides a persistent device fingerprint that can re-identify a contractor even if they change their email or create a new account.

### Timelog

`Timelog` / `Timelog_Audit` records every work session:

| Field | Description |
|-------|-------------|
| `externalId` | Insightful shift/session ID |
| `externalProjectId` | Insightful project ID |
| `employeeId` | Insightful employee identifier |
| `duration` | Session duration (ms) |
| `timeStart` / `timeEnd` | Session timestamps |
| `timezone` | Contractor's timezone |
| `taskId` / `taskName` | Task being worked on |
| `lineItemUid` | Linked payment line item |
| `adjustmentReason` | If hours were manually adjusted |
| `userId` | Mercor user ID |
| `isCompleted` | Whether session was completed normally |
| `linkFailReason` | If Insightful–Mercor link failed |

### Deductions

`Deductions` records time deducted from pay:

| Field | Description |
|-------|-------------|
| `durationToSubtractMs` | Milliseconds deducted |
| `appName` | Application that triggered the deduction |
| `reasonForDeduction` | Why the time was removed |
| `payoutCycleID` | Which pay cycle was affected |
| `approvedBy` / `approvedAt` | Approval chain |
| `appliedBy` / `appliedAt` | Application record |

This reveals that Mercor can and does subtract pay from contractors based on monitored activity, with an approval workflow for doing so.

---

## Part VIII - Payments and Financial Infrastructure

### Contractor Payment Methods

`UserPaymentMethods` / `UserPaymentMethods_Audit` stores linked payment accounts:

| Field | Example (obfuscated) |
|-------|---------------------|
| `provider` | `stripe` |
| `providerMethodId` | `acct_1R0V****` (Stripe Express account) |
| `methodType` | `express_account` |
| `status` | `onboarded` |
| `countryCode` | `USA` |

US contractors use **Stripe Express** accounts. International contractors use **Wise** (evidenced by `WiseDisbursements`). The `metadata` field includes context like `"context": "backfill"` — indicating historical payment method imports.

`MercorUserFinancials` stores additional financial account details:
- `paymentProvider` — stripe / wise
- `providerIdentifier` — account identifier
- `accountDetails` — JSON with bank routing and account details
- `lastFetchedOn` — when the financial data was last synced

### Payment Line Items

`PaymentLineItems` is the core payment ledger:

| Field | Description |
|-------|-------------|
| `cycleStartTs` / `cycleEndTs` | Pay period boundaries |
| `totalPayableAmount` | Amount owed to contractor (cents) |
| `totalBillableAmount` | Amount charged to company (cents) |
| `status` | `pending` / `paid` / `failed` |
| `jobUid` | Linked job contract |
| `timelogUid` | Linked timelog entry |
| `bonusUid` | Linked bonus if applicable |
| `referralUid` | Linked referral payment |
| `dispatchFailureReason` | Why a payment failed |
| `moneyOutId` | Linked outbound transfer |

`PayoutCycles` defines payment periods:
- `cycleStartTs` / `cycleEndTs` — date boundaries
- `status` — `open` / `processing` / `completed`
- `configId` / `configVersion` — which payout configuration governs this cycle

`PayoutConfigs` stores payout rules:
- `type` — payment cadence (daily, weekly, etc.)
- `configuration` — JSON with limits, caps, and routing rules

`MoneyOut_Audit` records every outbound payment:
- `externalAccountId` — contractor's Stripe or Wise account
- `externalTransferId` — transfer ID at the payment provider
- `totalAmount` — disbursed amount
- `paymentMethod` — stripe / wise
- `status` — `pending` / `paid` / `failed`
- `failureReason` — structured failure code

`WiseDisbursements` records international transfers:
- `wiseTransferId`, `wiseQuoteId` — Wise API identifiers
- `amount`, `currency`
- `sequenceNumber` — ordering within a batch
- `status`, `failureReason`

### Company Billing

`BillingAccounts` manages company-side billing:
- Multiple billing accounts per company
- Linked to Stripe customer IDs

`BillingConfigs` defines billing rules:
- `rules` JSON — markup percentages, caps, billing model
- `isLatestVersion` — versioned configuration

`BillingRateCards` defines per-contract rate structures:
- `formulaType` — e.g., `markup_percentage`, `flat_rate`
- `rateRows` — tiered rate table

`InvoiceLineItems` records invoice line entries:
- `rawAmount` / `adjustedAmount` — pre- and post-adjustment amounts
- `taskCount` — number of tasks in this line
- `sowId` — Statement of Work identifier

`RevenueAdjustments` records revenue corrections:
- `amountCentsUsd`, `category`, `reason`
- `revenueRecognitionDate` — accounting date
- `formula`, `labels`, `aggregationFields`
- `attachments`, `invoices` — supporting documentation

### Referral System

`Referrals` / `Referrals_Audit` tracks the contractor referral program:

| Field | Description |
|-------|-------------|
| `referredUserId` / `referringUserId` | The parties |
| `totalEarned` / `totalEarningsPotential` | Referral payment amounts |
| `state` | Current state |
| `paidAt` | When the referral bonus was paid |
| `disputeStatus` | If disputed |
| `isGuaranteedReferral` | Whether guaranteed payment applies |
| `referral_cap` | Maximum referral earnings |
| `isPaymentBlocked` | Payment hold |

`ReferralEligibility` manages the conditions under which referral payments vest — including `onboardingStateId` requirements and `criteriaId` checks.

`GuaranteedReferralQuota` manages quota-based guaranteed referral programs:
- `quotaId`, `referringUserId`, `offPlatformUserId`
- `shortenedLink` — the referral tracking URL
- `weekStart`, `status`

---

## Part IX - Communications and Outreach

### Internal Messaging

`Comms` is the platform messaging table:

| Field | Description |
|-------|-------------|
| `commId` | Message identifier |
| `groupId` | Conversation thread |
| `senderId` / `receiverId` | Parties |
| `content` | Message body |
| `type` | Message type (system, human, etc.) |
| `triggerRef` | What triggered this message |
| `listingReferenceUID` | Associated listing |

`CommsSent` records delivery tracking — when messages were sent, to whom, via what channel.

`EmailTemplates` stores company-specific email templates:
- `subject`, `content` — template body
- `isGlobal` — available to all companies
- `isPersonal` — creator-private
- `tags` — categorization

### External Outreach

`LinkedinWarmIntros` manages LinkedIn outreach campaigns:

| Field | Example (obfuscated) |
|-------|---------------------|
| `linkedinUrl` | `https://www.linkedin.com/in/[username]` |
| `email` | `s**@homeinheritance.com` |
| `referringUserId` | Internal user who made the intro |
| `commEvent` | `WARM_INTRO/OUTREACH` |
| `status` | `sent` |

`OffPlatformCampaigns` and `OffPlatformCampaignSteps` manage multi-step email/LinkedIn outreach sequences:
- `campaignType` — category
- `stepNumber` — sequence position
- `subject` / `messageTemplate` — email content with template variables
- `scheduledAt` — send time
- `outreachedCandidateIds` / `failedCandidateIds` — delivery tracking

`AircallComms` records phone call logs from Mercor's **Aircall** integration — the VoIP platform used for recruiter outbound calls, with call metadata and outcomes.

`FirstTimeInvites` tracks first-contact outreach to candidates:
- `commEvent` — invitation type
- `contentType` / `subject` — message details
- `listingIdCount` — how many listings the invite covers
- `refListingUid` — the originating listing

### Notification Infrastructure

`AutomationTemplates` defines automated workflow triggers:
- `handler` — which service handles this automation
- `sourceType` / `sourceSql` — SQL query that triggers the automation
- `templateBody` — notification content template
- `cron` — scheduled execution
- `autoApprove` — whether human approval is required
- `triggerConfig`, `config` — detailed trigger configuration

`ProjectAutomations` links automation templates to specific projects.

---

## Reverse Engineering - Architecture and Infrastructure

The database schema, table names, column conventions, and embedded metadata allow us to reverse-engineer Mercor's complete technical architecture — from microservice names to third-party integrations — purely from the contents of this dump.

---

## Part X - Infrastructure and DevOps

### Deployment Pipeline

`IacDeploymentRuns` is one of the most operationally sensitive tables:

| Field | Example |
|-------|---------|
| `runType` | `plan` / `apply` |
| `environment` | `staging` / `production` |
| `status` | `success` / `failed` |
| `commitSha` | `784cfd495ddfa3b67187433cb7cb66f2d27ad458` |
| `branch` | `dacq/backend-v2` |
| `actor` | `k*********77` (GitHub username) |
| `githubRunId` | `23520976410` |
| `githubRunUrl` | `https://github.com/Mercor-io/mercor-monorepo/actions/runs/23520976410` |
| `prNumber` | `26645` |
| `stacksAffected` | `["iac/aws/envs/staging"]` |
| `resourcesAdded` | `25` |
| `resourcesChanged` | `2` |
| `resourcesDestroyed` | `6` |
| `summary` | Full Terraform plan output (including deprecation warnings) |
| `durationSeconds` | `134` |

This table exposes:
- The full URL path to the private GitHub monorepo
- Individual engineer GitHub usernames (actors)
- Branch naming conventions
- Terraform variable names and deprecated configurations
- Number of AWS resources created/modified/destroyed per deployment
- Complete Terraform plan output in `summary`

Named Terraform service stacks include: `talent-success-coil`, `referrals-coil`, `iac/aws/envs/staging`.

`ProductionDeployment` records ECS production releases:
- `releaseTag` — semantic version tag
- `buildHash` — Docker image hash
- `deployedAt` — deployment timestamp
- `deploymentIds` — ECS deployment identifiers
- `taskDefinitionArns` — ECS task definition ARNs (include AWS account ID)

`PreprodDeployment` records pre-production (staging) releases:
- `commitSha` — exact commit deployed
- `loadTestPassed` — whether load testing passed
- `releaseOwner` — engineer responsible for the release

`ProductionVersion` maintains a single-row current version pointer:
- `lastVersion`, `lastReleaseTag`, `lastBuildHash`, `updatedAt`

`RollbackExecution` records emergency rollback events:
- `services` — which microservices were rolled back
- Details of 4-second rollback capability observed in the data

### Database Schema Management

`DATABASECHANGELOG` and `DATABASECHANGELOGLOCK` are **Liquibase** tables that record every schema migration:
- `ID`, `AUTHOR`, `FILENAME`, `DATEEXECUTED`, `MD5SUM`, `DESCRIPTION`, `COMMENTS`, `EXECTYPE`, `LIQUIBASE`

These tables reveal the full history of schema changes, including the names of engineers who authored migrations, the migration scripts' filenames (revealing internal project structure), and the exact timestamp each change was applied to production.

### Agent Sandboxes

`AgentSandboxes` records AI coding agent sessions:

| Field | Description |
|-------|-------------|
| `agentType` | Type of AI agent |
| `status` | `active` / `stopped` / `expired` |
| `backendType` | Compute backend |
| `host` | Sandbox hostname |
| `stopReason` | Why session ended |
| `transcriptRawUrl` | S3 URL of raw conversation transcript |
| `transcriptConsolidatedUrl` | S3 URL of consolidated transcript |
| `acpSessionId` | Agent control protocol session ID |
| `sandboxToken` | Authentication token for sandbox |
| `claimedAt` / `expiresAt` | Session lifecycle timestamps |

The `sandboxToken` field suggests that expired sandbox tokens are persisted in the database — a potential credential exposure if these tokens have long validity windows.

---

## Part XI - Analytics and ML Layer

### School and Firm Rankings

`DbtFirmSchoolRank` contains Mercor's proprietary employer prestige scores:

| Field | Example |
|-------|---------|
| `firmId` | `000013c1653de847e38d755ca1c310a5` |
| `firmName` | `75th ranger regiment, u.s. army` |
| `academicField` | `overall` |
| `nProfiles` | `2` |
| `avgSchoolRank` | `90.00` |
| `firmSchoolRank` | `81723` |
| `firmSchoolRankPercentile` | `0.528839` |

This table represents a proprietary ranking of ~154,000 firms by the average educational prestige of their employees — effectively a derived signal used to score resumes. It is computed from the full contractor profile database using an empirical Bayesian model (`ebPriorStrength`, `ebAvgSchoolRank`).

`DbtSchoolRankings` ranks individual schools within academic fields:
- `schoolName`, `academicField`, `schoolScore`, `schoolRank`

### Resume Evaluation

`UserResumeEvaluation` stores ML-generated resume scores:

| Field | Description |
|-------|-------------|
| `workExperienceScore` | Quality of work experience |
| `yearsOfWorkExperience` | Parsed years of experience |
| `graduationYear` | Estimated graduation year |
| `mScore` | Composite score |
| `inferredRole` | Predicted job function |
| `educationScore` | Academic credential score |
| `awardScore` | Competitive award weighting |
| `rateAcademicCompetitions` | Participation in academic competitions |
| `rateCompetitiveProgramming` | Competitive programming score |
| `rateHackathonPerformance` | Hackathon achievement score |
| `technicalSkills` | JSON list of detected skills |
| `highestDegree` | Parsed degree level |
| `searchFlag`, `imageFlag`, `transcriptFlag` | Data quality flags |

### Behavioral Analytics

`PosthogAnalytics` links PostHog behavioral sessions to user identity:
- `userEmail` — email address (PII)
- `company` — company context
- `startTimeUtc` / `endTimeUtc` — session boundaries
- `activetime` / `inactivetime` — engagement metrics
- `startUrl` — entry point URL

This directly links PostHog analytics sessions (which include click-level behavior) to user identity — a significant privacy concern as PostHog sessions are typically anonymized.

`SearchAnalytics` records search quality metrics:
- `avg_relevance_score`, `avg_prestige_score`
- `p99_latency_ms` — 99th percentile search latency
- `position_weighted_relevance_score` — ranking quality metric

`ForecastMetrics` stores ML forecast outputs:
- `entity`, `id`, `dt`, `snapshot_dt`
- `modelVersion`, `predictedValue`

Used for capacity planning, fill rate forecasting, and contractor supply predictions.

### ML Experiments

`MLExperimentsJobPerformanceReviews` reveals the experimental ML pipeline:

| Column | Description |
|--------|-------------|
| `Date of review` | Review date |
| `Account` | Client company |
| `Project` | Project name |
| `Reviewer` | Reviewer name (Mercor staff) |
| `Work type` | Category of work |
| `Review type` | Type of performance review |
| `Name` / `Email` | Contractor identity |
| `Quality of Work` | Score |
| `Engagement` | Score |
| `Offboarding Reason` | Why contractor was removed |
| `Justification for rating` | Free-text explanation |

This table contains raw performance review data used to train or evaluate ML models for automated contractor performance assessment — with staff names, contractor names, and qualitative judgments all stored in plaintext.

---

## Part XII - Reference Data Layer

### Skills and Certifications

`Skills` is the platform's skills taxonomy:
- `skillId`, `name`, `description`, `type`, `parent` — hierarchical skill tree
- `CertificationPolicy` — linked certification requirement

`CertificationPolicies_Audit` defines the rules for earning certifications:
- `rules` — JSON eligibility criteria
- `isRevokable`, `requiresApproval`
- `icon`, `iconColor`, `showBadge`, `displayText` — display configuration

`Certifications_Audit` records individual earned certifications:
- `evidence` — JSON array of qualifying events (e.g., `{"id": "proj_...", "score": 88.84, "sourceType": "project_hours_worked"}`)
- `status` — `AUTO_AWARDED` / `MANUALLY_AWARDED` / `REVOKED`
- `isCertified` — current state

`SkillCertifications_Audit` and `SkillCertificationsEvidence_Audit` track per-skill certification with scores and source evidence.

`ContractorEndorsements` stores peer endorsements:
- `endorsingUserId` / `endorsedUserId`
- `contents` — endorsement text
- `tags` — skills endorsed
- `sentiment` — positive/negative
- `source` — where the endorsement originated

### Company Data

`Company` stores client company records:
- `name`, `description`, `website`, `logo`
- `billingModel` — pricing structure
- `billingStartDay` / `billingEndDay` — billing cycle configuration
- `brandVisible` — whether company name is shown to candidates
- `universe` — internal company segmentation
- `externalName` — display name if different from legal name

`IAM` / `IAM_Audit` manages company-level role assignments:
- `roleId` — e.g., `ghost` (internal Mercor staff), `admin`, `member`
- `companyId`, `userId_v4`, `status`

A sample IAM record shows a user with `roleId: ghost` being `REMOVED` from a company — revealing Mercor's internal staff operated within client company contexts under a `ghost` role identity.

### URL Management

`ShortenedUrls` manages the platform's link shortening system:
- Used for referral tracking, campaign links, and onboarding flows

`UrlClicks` records every click on shortened URLs:
- `urlId`, `clickedAt`, `ipHash`, `userId`, `country`

Even with `ipHash` (rather than raw IP), the combination of `userId`, `country`, and timestamp enables click attribution across the contractor population.

### Catfish Audit Log

`CatfishAuditLog` is a security/compliance tool:

| Field | Description |
|-------|-------------|
| `slackUserId` / `slackUserName` | Mercor staff member |
| `targetEmail` | Person being looked up |
| `platform` | Where the lookup happened |
| `intent` | Declared reason for the lookup |
| `status` | Success/failure |

This table records every time an internal Mercor employee looks up a user's information through an internal tool called "Catfish" — indicating awareness that internal user lookup is an auditable, privacy-sensitive operation. Ironically, this audit log itself now sits in the exposed dataset.

---

## Exposed Surface Area Summary

| Domain | Tables | Sensitivity | Key Exposure |
|--------|--------|-------------|--------------|
| User & Identity | ~10 | Critical | PII (name, email, phone, location) for all contractors |
| Identity Verification & Fraud | ~12 | Critical | Government ID outcomes, facial comparison tokens, fraud verdicts |
| Hiring Pipeline | ~10 | High | Application status, rejection reasons, recruiter notes |
| Interviews & Assessments | ~15 | High | Interview responses, scores, cheating flags, rubrics |
| Work Trials & Onboarding | ~6 | High | Signed legal documents, offer letters, digital signatures |
| Projects & AI Tasks | ~15 | Medium-High | Client company projects, task definitions, AI training data |
| Time Tracking | ~4 | Critical | Per-minute screenshots, browser URLs, MAC addresses, hardware fingerprints |
| Payments & Finance | ~20 | Critical | Stripe account IDs, bank details, exact payment amounts, payout records |
| Communications | ~10 | Medium | Message content, outreach campaigns, phone call logs |
| Infrastructure & DevOps | ~10 | High | Commit SHAs, GitHub URLs, ECS ARNs, Terraform configs, sandbox tokens |
| Analytics & ML | ~10 | Medium | Resume scores, school rankings, PostHog identity links |
| Reference Data | ~15 | Medium | Skills taxonomy, certifications, endorsements, company configurations |

---

## Technical Architecture Reverse-Engineered

The following architecture is entirely reconstructed from database table names, column values, JSON blobs, and embedded metadata. No source code or documentation was available — everything below was inferred from the data alone.

### Backend Services

Based on the database content, Mercor's backend comprises at least 13 microservices:

| Service | Inferred Function |
|---------|-------------------|
| `mercor_api` | Primary API backend |
| `mercor_api_nginx` | API gateway / reverse proxy |
| `mercor_go` | Go-language service (likely performance-critical paths) |
| `coil` | Contractor-facing service (multiple instances by function) |
| `site_fe` | Public website frontend |
| `team_fe` | Company/team portal frontend |
| `work_fe` | Work/task frontend |
| `celery` | Async task queue |
| `workflow` | Workflow orchestration |
| `db_trigger_consumer` | Database event consumer |
| `steve` | Internal tool/admin service |
| `woz` | Fraud/ML pipeline service |
| `payments_temporal_worker` | Temporal.io worker for payments |

### Frontend Portals

- **Public site** — `site_fe`, routes handled by Next.js (inferred from URL patterns)
- **Company portal** — `team_fe` — for clients to manage listings and review candidates
- **Work portal** — `work_fe` — for contractors to find and complete tasks
- **Internal admin** — Godmode interface used by Mercor staff

### Data Infrastructure

- **Primary DB**: Aurora MySQL (AWS)
- **Analytics warehouse**: Snowflake (via Fivetran sync, evidenced by `dbt` models)
- **Schema migrations**: Liquibase
- **Object storage**: S3 (screenshots, offer letters, transcripts)
- **Monitoring**: Insightful agent on contractor machines
- **Auth**: Firebase + Okta (SSO)
- **Analytics**: PostHog
- **Feature flags / A-B**: Inferred from `configVersion` patterns

### Third-Party Integration

| Provider | Purpose |
|----------|---------|
| **Persona** | Identity verification (KYC) |
| **Stripe** | US contractor payments (Express accounts) |
| **Wise** | International contractor payments |
| **Insightful** | Workforce monitoring / screenshot capture |
| **Okta** | SSO for company and internal access |
| **Aircall** | Recruiter phone calls |
| **PostHog** | Product analytics |
| **Vertex AI / Gemini** | Fraud LLM reasoning |
| **OpenAI (GPT-4.1 / GPT-5)** | AI interview conductor and task autograder |
| **Checkr / Certn** | Background checks |
| **HaveIBeenPwned** | Email breach checking |
| **Customer.io** | Transactional email |
| **GitHub Actions** | CI/CD pipeline |
| **Terraform / Terragrunt** | Infrastructure as code |
| **Temporal.io** | Payments workflow orchestration |
| **Liquibase** | Database schema versioning |

---

## Grounds for Legal Action

The evidence documented throughout this report supports multiple independent legal claims by distinct plaintiff classes. This section consolidates the factual basis for each claim, cross-referencing the specific database tables, column names, and sample values that constitute the evidentiary foundation.

### I. Client Company Claims - Loss of Proprietary AI Training Data and Trade Secrets

This is the most consequential category of legal exposure. Mercor's client companies — Apple, Amazon, OpenAI, Anthropic, Meta, Google, and others — entrusted Mercor with their most valuable competitive assets: the data, methodologies, and evaluation frameworks that define how their AI models are built. All of it is now in criminal hands.

**A. Trade Secret Misappropriation**

Under the federal Defend Trade Secrets Act (DTSA) and state Uniform Trade Secrets Acts, a trade secret is information that derives economic value from not being generally known and is subject to reasonable efforts to maintain its secrecy. The breach exposes client trade secrets across three categories:

**1. AI Training Data as Trade Secrets.** The SFT data, RLHF preference rankings, and Chain-of-Thought traces produced by Mercor's contractors for each client constitute trade secrets. Each dataset represents millions of dollars of investment and years of iterative refinement. The `TASKS`, `TASK_VERSIONS`, and `PHASE_1_TASKS` tables across 84 Airtable workspaces contain the actual work product — prompts, model responses, and human evaluations — that each client paid to produce. Their value derives entirely from secrecy: once a competitor has access to another lab's RLHF preference data, they can train equivalent alignment without the cost.

**2. Evaluation Methodology as Trade Secrets.** How an AI lab evaluates its models — what rubrics it uses, what scoring thresholds it applies, how it structures domain-specific benchmarks — is core intellectual property. The `CRITERIA`, `RUBRIC_VERSIONS`, `QA_SPECS`, and `LLM_CALL_CONFIGURATION` tables across 60+ workspaces expose this methodology in full. Amazon's Chain-of-Thought evaluation framework, Apple's endpoint testing rubrics, and the cross-model preference evaluation criteria are all now available to any buyer. This is not just data — it is the *recipe* for how each lab measures AI progress.

**3. Pre-Release Model Capabilities as Trade Secrets.** The `APPLE_ENDPOINT_SANDBOX` workspace contains actual outputs from Apple's unreleased Foundation Models (`afm-text-083`, `afm-model-086`). These responses reveal the model's capabilities, safety alignment, and failure modes before public launch. Under trade secret law, the unauthorized disclosure of pre-release product capabilities is a textbook misappropriation.

*Key legal point:* Trade secret protection requires "reasonable efforts to maintain secrecy." Mercor's storage of this data — in plaintext, behind a flat network with no segmentation, accessible via a single VPN hop — likely fails this standard. Clients may argue that they maintained secrecy on their end but that Mercor's negligent security destroyed the trade secret status of the data. This creates a damages claim for the full economic value of the lost trade secrets.

**B. Breach of Confidentiality and NDA Violations**

The database confirms confidentiality agreements governed the relationship. The `Jobs` table contains `ciiaa_direct`, `ciiaaPassthrough`, `confidentiality`, and `tow` (terms of work) fields. The `WorkTrial_Audit` table contains signed CIIAs and offer letters. The exposure of:

- **Apple**: Foundation Model outputs (`afm-text-083`, `afm-model-086`), endpoint sandbox testing data, translation evaluation, orchestrator configurations
- **Amazon**: Complete LLM Chain-of-Thought evaluation framework with full reasoning traces, preference judgments, domain taxonomy (math, STEM), and named Mercor staff assignments
- **OpenAI**: Feather platform campaign UUIDs, `Apertus - Elephant` project data, contractor performance reviews naming OpenAI as the account
- **Meta**: Multimedia annotation template command center (`AAIE___META_MULTIMEDIA_TEMPLATE`), project configurations
- **Anthropic**: Claude 3.5 Sonnet evaluation data compared against GPT-4, preference reasoning, agent sandbox configurations running Claude

constitutes a breach of these confidentiality obligations. Each client has a separate breach of contract claim with damages measured by the economic harm caused by the disclosure.

**C. Loss of Competitive Advantage**

The breach doesn't just expose data — it destroys competitive moats. If a Chinese AI lab purchases the stolen data, they acquire:
- The exact prompts and rubrics that OpenAI uses to fine-tune its models
- The evaluation methodology that Amazon uses to measure Chain-of-Thought reasoning quality
- Apple's pre-release model outputs revealing capabilities and weaknesses
- The preference data that teaches Anthropic's Claude how to respond to contentious queries

Each client's AI training pipeline is now potentially replicable by any competitor with access to the stolen Airtable workspaces. The damages extend beyond the cost of producing the data — they include the competitive harm of having that data available to rivals.

**D. Secondary Breach via Desktop Screenshots**

The `InsightfulScreenshots` table creates a mechanism for **visual intelligence extraction** from client systems. Screenshots captured during monitored work sessions show whatever was on the contractor's screen — client internal dashboards, Slack conversations, code repositories, proprietary tools, unreleased product interfaces. Mercor stored these screenshots on S3 with metadata linking each image to the specific `projectId`. An attacker can systematically extract visual intelligence about every client's internal systems by filtering screenshots by project. This constitutes a secondary breach of each client's confidential systems, for which Mercor bears direct liability.

**E. APEX Benchmark Contamination**

Mercor's proprietary APEX benchmark suite — covering 15+ domains from legal to medicine to mechanical engineering — is now compromised. All tasks, criteria, scoring rubrics, and evaluation data are exposed. Any client that relied on APEX benchmark results for vendor selection, model comparison, or procurement decisions now faces the risk that those results are unreliable. Models trained on the leaked APEX data will appear to perform well without genuinely possessing the evaluated capabilities. Clients may claim damages for decisions made in reliance on benchmarks that are now contaminated.

### II. Contractor Class Claims

**A. Financial Data Exposure and Identity Theft Risk**

The `MercorUserFinancials` table stores the **complete Stripe Connect API response** as plaintext JSON — including bank name, routing number, last four digits, account holder name, email, and country. This is sufficient for bank account fraud. Every contractor whose financial data is in this table faces ongoing risk of unauthorized transactions, account takeover, and identity theft. The `UserPaymentMethods` table adds Stripe Express account IDs and Wise transfer identifiers. The exposure of this data — unencrypted, untokenized, in a database accessible via a single VPN hop — constitutes negligence per se under multiple state data breach statutes.

**B. Surveillance Overreach and Privacy Violations**

The Insightful monitoring system captured far more than work activity:
- Full desktop screenshots every few minutes — not just the work application, but everything on screen
- Browser URLs for all tabs, including personal browsing
- IP addresses and MAC addresses from personal home networks
- Hardware fingerprints of personal devices

Contractors used personal computers for Mercor work (the data shows personal Chrome installations, personal hostnames like `desktop-ue2kgro`). The monitoring system captured personal activity on personal devices — personal emails, banking sessions, medical information, or other private content visible in background windows. All of this is now in criminal hands. Under ECPA and state wiretap laws, the capture of third-party communications visible in screenshots (Slack messages, emails, video calls) may constitute unlawful interception.

**C. Wrongful Termination via Automated Fraud Decisions**

The database reveals that automated fraud decisions directly determined whether contractors could earn a living:
- `FraudStates.currentDecision` = `REJECT` → contractor blocked from the platform
- `FraudStates.currentReasoning` contains LLM-generated explanations that were almost certainly never disclosed to affected contractors
- `ProductionFraudState.status` → final production fraud verdict with no apparent appeal mechanism

Under FCRA, if Mercor used these automated fraud scores or background check results (`BackgroundCheck.status`) to deny, suspend, or terminate contractor engagements without providing required adverse action notices, each instance is a separate violation. Under GDPR Article 22, EU/UK contractors have the right not to be subject to decisions based solely on automated processing.

**D. Wage-Related Claims**

The `Deductions` table records pay subtractions based on monitored activity — exact milliseconds deducted, which application triggered the deduction, and who approved it. If deductions were applied using data from the now-compromised monitoring system, or if the breach reveals inconsistent application, contractors have wage theft claims in addition to privacy claims.

### III. Statutory Violations

**A. CCPA/CPRA** — Private right of action for data breaches resulting from failure to maintain reasonable security (Cal. Civ. Code § 1798.150). Plaintext bank routing numbers, unencrypted PII, and excessive data collection constitute failure to implement reasonable security. Statutory damages: $100–$750 per consumer per incident.

**B. GDPR** — EU/UK contractors confirmed in the data (sample: `United Kingdom, Harrow`). Violations include data minimization failure (Article 5(1)(c)), integrity/confidentiality failure (Article 5(1)(f)), automated decision-making without safeguards (Article 22), and breach notification delays (Article 33). Fines up to €20 million or 4% of annual global turnover.

**C. Illinois BIPA** — Persona's liveness detection requires a scan of face geometry, explicitly listed as a biometric identifier (740 ILCS 14/10). The `IDVerificationChecks` table confirms facial geometry scans were captured (`livenessStatus`), facial comparison performed (`interview-face-comparison`), and thumbnail images stored (`thumbnail_key`). Statutory damages: $1,000–$5,000 per violation, no harm requirement. *(Note: MAC addresses and hardware fingerprints are not biometric identifiers under BIPA.)*

**D. FCRA** — Background check results and automated fraud scores used in employment decisions without required adverse action notices. Per-violation damages.

**E. ECPA / State Wiretap Laws** — Desktop screenshots capturing third-party communications visible on screen. Per-interception damages.

**F. PIPEDA** — Canadian contractors confirmed (sample: `country: CA`, `BANK OF M*******`). Breach notification to Privacy Commissioner and affected individuals required.

### IV. Negligence - Security Failures Evidenced in the Data

The database structure itself constitutes evidence of systemic negligence:

- **Plaintext financial data**: Complete Stripe API responses with bank names, routing numbers, and account holder names stored as unencrypted JSON
- **No field-level encryption**: Names, emails, phones, DOBs, and addresses readable as-is in the export
- **Excessive data collection**: Full Stripe API responses when only an account ID was needed; desktop screenshots capturing vastly more than needed to verify work hours; HaveIBeenPwned results stored as fraud signals; Persona KYC session tokens persisted indefinitely
- **Infrastructure failures**: ngrok dev tunnels with developer IPv6 in production config; AWS account ID embedded in S3 bucket names; sandbox tokens persisted after session expiry; GitHub Actions URLs exposing the private monorepo

### V. Third-Party Claims

Individuals who never created Mercor accounts have their data exposed:
- `UserReferences`: Names, emails, employers, and relationships of professional references
- `LinkedinWarmIntros`: LinkedIn profile URLs and email addresses of people contacted for outreach
- `CandidateVouches`: Relationship details provided by vouchers

These individuals never consented to data collection and likely never received a privacy notice. Under GDPR Article 14, Mercor was required to notify them within one month. The breach exposes them to targeted social engineering using their real relationship data.

### Summary - Combined Legal Exposure

| Claim | Plaintiff Class | Key Evidence |
|-------|----------------|--------------|
| **Trade secret misappropriation** | **Apple, Amazon, OpenAI, Anthropic, Meta, Google** | **Pre-release model outputs, evaluation methodologies, RLHF data, rubrics, CoT traces** |
| **Breach of confidentiality / NDA** | **All client companies** | **Signed CIIAs in database, client-named Airtable workspaces with proprietary data** |
| **Competitive harm** | **All client companies** | **Training data, evaluation frameworks, and benchmark data now available to rivals** |
| **APEX benchmark contamination** | **Companies relying on APEX results** | **Complete benchmark tasks, criteria, and scores exposed** |
| Financial data negligence | 30,000+ contractors | Plaintext bank routing numbers, Stripe account details |
| Surveillance overreach | 30,000+ contractors | Desktop screenshots of personal devices, personal browsing, background windows |
| Automated adverse actions | Contractors denied/terminated | Fraud scores, LLM-generated reasoning, no disclosure or appeal |
| CCPA violations | 30,000+ contractors | Failure to maintain reasonable security |
| GDPR violations | EU/UK contractors | Data minimization, automated decisions, notification delays |
| BIPA violations | Contractors who completed Persona KYC | Facial geometry scans, liveness detection |
| Third-party privacy | References, LinkedIn contacts, vouchers | Data collected without consent, now in criminal hands |

The client claims are likely the largest in dollar terms — the economic value of the lost trade secrets (training data, evaluation methodologies, pre-release model outputs) runs into the billions. The contractor claims are the broadest in scope — affecting every individual who ever used the platform. Together, the total legal exposure is conservatively in the **hundreds of millions of dollars** before punitive damages.

---

## Conclusion - What Happens Now

The breach is not a past event. It is an ongoing situation with no clear resolution.

### The Data Is Still in Circulation

Mercor allegedly paid the attackers to have the data removed from the Lapsus$ leak site — a fact confirmed to us directly by Lapsus$ themselves. The data was taken down briefly. It reappeared. The group is now actively selling the full dataset to private bidders while continuing to distribute samples. The two files analyzed in this report were obtained after the ransom was paid. This is the predictable outcome of paying ransom for digital assets — there is no mechanism to verify deletion, no way to revoke copies already distributed, and every economic incentive for the attackers to continue monetizing the data through private sales, selective leaks, and derivative attacks. Mercor's ransom payment bought nothing except proof that they considered the data worth paying to suppress.

The attackers now possess:

- The **complete identity** of every Mercor contractor — name, email, phone, date of birth, home address, bank routing number, government ID verification status, and a photographic record of their desktop activity
- The **complete client map** — which companies use Mercor, what projects they run, which annotation platforms they use, and what their internal Slack workspaces and Okta SSO groups are called
- **Apple's pre-release Foundation Model outputs**, Amazon's Chain-of-Thought evaluation methodology, OpenAI's Feather platform campaign UUIDs, and Anthropic's model comparison data
- The **source code** for Mercor's entire platform — including its fraud detection algorithms, MercorScore ranking system, and payment infrastructure — providing a complete blueprint for exploitation
- **Tailscale VPN credentials and network topology** — a map of Mercor's internal infrastructure that could enable further unauthorized access if credentials have not been fully rotated
- **939GB of code repositories** that likely contain hardcoded API keys, database credentials, and third-party service tokens scattered across commit history

This is not a dataset that loses value over time. The PII is permanent. The bank routing numbers don't expire. The government ID verification records don't reset. The signed legal documents don't un-sign. And the AI training data — the RLHF annotations, preference rankings, and rubric evaluations produced for frontier AI labs — retains its full value to any competitor seeking to accelerate their own model development.

### The Ongoing Threat

With this data, the attackers (or any subsequent buyer) can:

1. **Launch targeted phishing campaigns** against every Mercor contractor, using their real name, employer, project assignment, and pay rate to craft highly convincing social engineering attacks
2. **Commit financial fraud** using the bank names, routing numbers, and account holder names stored in `MercorUserFinancials`
3. **Blackmail contractors** whose desktop screenshots may reveal confidential client information, personal browsing activity, or employment at companies their current employer doesn't know about
4. **Attack Mercor's clients** using the Slack workspace URLs, Okta SSO configurations, and annotation platform campaign IDs as entry points for further social engineering or credential stuffing
5. **Sell the AI training data** — the prompts, responses, evaluations, and preference rankings — to competitors or foreign actors, undermining billions of dollars of investment by OpenAI, Anthropic, Apple, Amazon, Meta, and Google DeepMind
6. **Exploit the source code** to identify vulnerabilities in Mercor's (and potentially its clients') systems that have not yet been patched
7. **Impersonate Mercor staff** using the internal employee names, Slack IDs, and GitHub usernames found throughout the database to conduct supply-chain attacks against Mercor's clients and partners

Each of these vectors becomes more dangerous the longer the data remains in circulation — and there is no indication it will stop circulating.

### The Case for Radical Transparency

There is an uncomfortable truth that Mercor, its clients, and the affected contractors must confront: **the data is out. It cannot be put back.**

The current trajectory — where the breach is acknowledged in vague corporate language, specific questions are deflected, and affected individuals receive minimal information about what was exposed — serves no one except the attackers. It creates an information asymmetry where the adversary has complete knowledge of what was taken, while the victims operate in the dark.

Every contractor whose bank routing number is in `MercorUserFinancials` deserves to know — specifically — that their bank name, routing number, and account holder name were stored in plaintext JSON and are now in the hands of criminal actors. Every contractor whose desktop screenshots are in the `mercor-insightful-screenshots-production` S3 bucket deserves to know that their IP address, MAC address, browser history, and application usage during work sessions are exposed. Every client whose annotation platform URLs, Slack workspaces, and proprietary model outputs appear in the Airtable exports deserves to understand the exact scope of their secondary exposure.

The alternative to transparency is prolonged paranoia. If Mercor does not disclose the specific contents of the breach, every contractor must assume the worst about what was taken. Every client must assume their internal systems were visible on a contractor's screen. Every reference, every LinkedIn contact, every vouching party must assume their personal information was collected without their knowledge and is now compromised.

Perhaps the most constructive path forward — however counterintuitive — is full, detailed, public disclosure of exactly what the breach contained. Not the raw data itself, but a complete accounting: which tables, which fields, which categories of PII, which clients, which time periods. The world can adjust to a known breach. It cannot adjust to an unknown one. Sunlight remains the best disinfectant, and in the aftermath of a breach of this magnitude, the cost of silence far exceeds the cost of honesty.

The contractors who built the AI training data that powers the world's most valuable models deserve at least that much.

### A Structural Critique - Youth Velocity and the Cost of Immaturity

Mercor's three founders — Brendan Foody, Adarsh Hiremath, and Surya Midha — were [21 years old](https://techcrunch.com/2025/02/20/mercor-an-ai-recruiting-startup-founded-by-21-year-olds-raises-100m-at-2b-valuation/) when they raised their Series A. They became the [world's youngest self-made paper billionaires at 22](https://startupstag.com/startups/mercor-founders-worlds-youngest-self-made-billionaires-10b-valuation/) when their Series C valued the company at $10 billion. The average age of the Mercor team was reported at **22 years old**. They are Thiel Fellows — college dropouts celebrated for building fast. They stored bank routing numbers in plaintext, ran a flat network where a single VPN hop reached everything, and let 4 terabytes walk out the door without anyone noticing.

**Perhaps Mercor is best understood as a phenomenon of hype and strong mimetic desire within the AI industry.** Perhaps the AI labs got ahead of themselves too early. Perhaps researchers and vendor managers chose Mercor not because they evaluated the vendor thoroughly enough to handle critical workloads, but because OpenAI was already using it.

The pattern is worth examining. [OpenAI was one of Mercor's earliest major customers](https://sfstandard.com/2025/11/07/san-francisco-s-youngest-billionaires-betting-new-kind-job-boom/). The relationship began when Mercor's 20-year-old CEO cold-emailed OpenAI's head of human data operations, Shaun VanWeelden, and landed a contract to recruit Math Olympiad winners for model training. VanWeelden later [left OpenAI to become Mercor's managing director](https://en.wikipedia.org/wiki/Mercor). Two sitting OpenAI board members — [Adam D'Angelo](https://techstartups.com/2024/09/18/ai-powered-hiring-platform-startup-mercor-raises-30m-in-series-a-funding-led-by-benchmark/) (Quora CEO) and [Larry Summers](https://www.caproasia.com/2025/01/30/united-states-ai-recruiting-platform-mercor-raised-new-funding-at-2-billion-valuation-raised-32-million-at-250-million-valuation-in-2024-series-a-funding-from-investors-including-benchmark-peter/) (former U.S. Treasury Secretary) — invested in Mercor's earlier funding rounds.

This is not without precedent. Much of the AI data infrastructure landscape has been shaped by proximity to OpenAI. [Scale AI's Alexandr Wang was Sam Altman's roommate](https://x.com/amir/status/1806730856012382291) during the pandemic. Scale went through Y Combinator when Altman ran it. [Altman and Wang later discussed an acquisition](https://x.com/amir/status/1806730856012382291).

With Mercor, the signal was unmistakable. OpenAI used them. OpenAI's board members invested in them. OpenAI's head of data operations joined them. Once that signal propagated, perhaps the other labs followed not because of independent evaluation, but because OpenAI had validated the choice for them. The $10 billion valuation, the press coverage, and the youngest-billionaires narrative reinforced what was already a foregone conclusion.

The Girardian irony is that this breach — the scapegoating event — may produce the same mimetic cycle in reverse. The labs may collectively abandon Mercor, collectively discover the next shiny vendor, and collectively onboard without asking the hard questions about security and privacy. The sacrifice of the scapegoat restores order. The community moves on, having learned nothing structural — only that *this particular* vendor was the wrong one.

Having reverse-engineered Mercor's complete operational architecture from its database schema — the annotation pipeline, the evaluation frameworks, the contractor management system, the payment infrastructure — it is clear that the underlying business is well-understood and replicable. For new entrepreneurs, the opportunity is straightforward: build the same platform, but treat security and privacy as foundational rather than an afterthought. The market for AI training data is not going away. The demand for a vendor that handles it responsibly has never been higher.

---

---

# Appendix A - Complete Table Inventory

*All 149+ tables organized by functional domain, with column lists and sample data where present.*

---

## Domain 1 - User and Identity

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `MercorUsers_New` | userId, email, name, phone, profilePic, createdAt, lastLogin, location, isWhiteListed, source, firebaseUID, authType, isAnonymous, insightfulId, stripeAccountId, customerId, isDeleted, phoneVerificationStatus, phoneVerifiedAt, phoneOptIn | Primary contractor user table. Sample: `e****a1@gmail.com`, `T** O****`, `+44795718****`, `United Kingdom,Harrow` |
| `MercorUsers_New_backup` | userId, email, name, phone, profilePic, createdAt, lastLogin, location, isWhiteListed, source, firebaseUID, authType, isAnonymous, insightfulId, stripeAccountId, customerId, isDeleted | Historical backup snapshot of user table |
| `UserLocation` | userLocationId, userId, residenceCountry, residenceState, residenceCity, residenceZipCode, physicalCountry, physicalState, physicalCity, physicalZipCode, version, createdAt, updatedAt | Tracks declared residence vs. physical location. Used in fraud detection. Sample: `residenceCountry=USA, physicalCountry=USA` |
| `UserLocation_Audit` | All UserLocation columns + auditAction, auditTimestamp | Audit trail for location changes |
| `UserMetadata` | userMetadataId, userId, workAuthorizationStatus, birthday, physicalLocation, countryOfResidence, createdAt, updatedAt, maxHourCap, contractorMail, fraudStatus, oktaUserId, fraudStatusEnum, oktaAccountState, externalId, maxContracts, offPlatformEmail | Extended user metadata including Okta SSO ID and fraud status |
| `UserState` | id, userId, resumeUploaded, interviewsCompletedCount, jobApplicationsCount, totalMillisWorked, createdAt, updatedAt | Lifecycle counters — tracks user progression through platform |
| `UserAvailability_Audit` | availabilityId, version, userId, maxWeeklyHours, desiredWeeklyHours, expectedStartOffset, expectedStartOffsetUpdatedAt, earliestStartDateChoice, timezone, updatedAt, createdAt, auditAction, auditTimestamp | Declared working hours and timezone preferences |
| `UserReferences` | referenceId, email, name, company, relationship, userId | Professional references provided by contractors |
| `WorkAuthorization_Audit` | workAuthorizationId, userId, birthday, physicalCountry, workAuthorizationStatus, agreedToLocation, signature, attestedAt, source, version, createdAt, updatedAt, auditAction, auditTimestamp | Work authorization attestations with digital signatures |
| `UserPlatformStatus` | id, userId, status, action, source, sourceDetail, isLatest, createdAt | Platform access status (active, suspended, banned) |
| `LinkedinUsers` | id, name, url, email, company, position, lastUpdated | LinkedIn profile cache used for warm intros and candidate sourcing |
| `MembershipSnapshots` | scopeType, scopeId, userId, createdAt | Point-in-time snapshots of group/project memberships |

---

## Domain 2 - Identity Verification and Background Checks

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `IDVerificationChecks` | verificationCheckId, userId, candidateId, jobId, listingId, provider, source, sessionId, sessionToken, onboardingUrl, sessionStatus, verificationStatus, governmentIdStatus, livenessStatus, addressStatus, attemptNumber, maxAttempts, providerResponse, fraudDecision, flagReasons, manualReviewStatus, createdAt, updatedAt, completedAt | Persona KYC session records. `providerResponse` contains full JSON API response including facial thumbnail keys. `provider=persona` |
| `BackgroundCheck` | contractorID, externalCandidateId, workLocation, package, invitationId, invitationCreatedAt, invitationCompletedAt, backgroundCheckId, reportId, status, createdAt, updatedAt, adverseMediaCheckStatus | Criminal background check records (Checkr). Status: `clear` / `consider` |
| `BackgroundCheck_New` | Richer version of BackgroundCheck with additional fields | Updated background check schema |
| `BackgroundCheckDetails` | Detailed per-check results | Granular check outcomes |
| `ScreeningPackage` | id, companyId, name, isActive, lastUpdatedAt, checkConfig, graceDays | Per-company screening package configurations defining which checks are required |

---

## Domain 3 - Fraud Detection

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `FraudStates` | userId, currentStage, currentDecision, currentConfidence, currentReasoning, currentKeySignals, currentTimestamp, previousStageDecision, createdAt, updatedAt | Current fraud state per user. `currentDecision`: APPROVE / ESCALATE / REJECT. LLM-generated reasoning. Sample signal: `location_mismatch: 1.0` |
| `FraudCheck` | id, user_id, stage, interviewId, jobId, triggered_on, process_status, retryCount, flag_reasons, automatedReasons, status, priority, idVerificationStatus, manual_review_status, manual_review_rational, manual_review_signs, isMostRecent, assigned_to, assigned_on, splReview | Central fraud queue. Tracks automated and manual review states |
| `FraudSignalAuditLog` | id, userId, userVersionId, stage, signalType, modelName, triggeredOn, status, modelScore, createdAt | Per-signal audit trail. Every fraud signal evaluated is logged here |
| `FraudEvents` | id, eventId, userId, eventType, stage, priorAlpha, priorBeta, priorProbability, priorStatus, posteriorAlpha, posteriorBeta, posteriorProbability, posteriorStatus, evidence, createdAt, createdBy, notes | Bayesian belief update log. Each event updates prior→posterior fraud probability |
| `ProductionFraudState` | id, userId, status, fraudModality, source, sourceDetail, lastEvaluatedStage, productionModelId, userVersionId, isLatest, createdAt, updatedAt | Final production fraud verdict. `fraudModality`: identity / time / quality |
| `AutoFraudChecks` | Automated rule-based fraud check records | Scheduled fraud scans |
| `OnProjectFraudWindows` | id, employeeId, contractorId, projectId, scanDate, startTime, endTime, fraudType, fragmentCount, flags, flagMetadata, windowMetadata, screenshotMetadata, createdAt, updatedAt, userVersionId | On-project time fraud analysis windows. Analyzes screenshot patterns |
| `QAReviewLog` | id, userId, reviewerId, bucketName, status, assignedOn, completedAt, isActive, lockKey, createdAt, updatedAt, comments, decision, userVersionId, stage, signalType, flags | Human QA reviewer assignments and decisions for fraud cases |
| `CheatingDetection` | annotationId, userId, interviewId, interviewConfigId, formResponseId, formId, isCheating, cheatingProbability, signs, notes, reportedBy, createdAt, updatedAt | Interview cheating detection results |
| `CheatingDetection_Audit` | All CheatingDetection columns + auditAction, auditTimestamp | Audit trail for cheating detection |
| `DuplicateGroups` | groupId, userIdList, mergedIntoGroupId, createdAt | Groups of suspected duplicate/sock-puppet accounts |

---

## Domain 4 - Hiring Pipeline

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `Listings_New` | listingId, version, uid, companyId, title, description, commitment, referralAmount, createdAt, deletedAt, status, requiredInterviewConfigId, rateMin, rateMax, hoursPerWeek, location, formId, automatedCommsOn, payRateFrequency, isPrivate, autoRedirectToApply, evaluationCriteria, offersEquity, rejectionTemplateSubject, rejectionTemplateBody, campaignId, ownerIds, goalNumHires, goalDeadline, isExploreAlways, interviewSchedulingEnabled, interviewScheduleLink, disableApplications, isMostRecent, offerExtendedText, minHeadcount, maxHeadcount, referralBoost, timeToAutoReject, automaticRejectionsOn, computedExplorePageVisibility, workArrangement, eligibleLocation, ineligibleResidenceLocation, listingType | Primary job listing table. Includes pay ranges, location eligibility, automation settings |
| `Listings_New_Audit` | All Listings_New columns + auditAction, auditTimestamp | Audit trail for listing changes |
| `Candidates` | candidateId, userId, companyId, listingUid, createdAt, deletedAt, status, notesForCandidate, birthday, physicalLocation, workAuthorizationStatus, responseId, version, uid, source, countryOfResidence, isMostRecent, listingId, listingStepConfigId, linkedinUrl, actionItem, lastSignificantUpdatedAt, rejectionReason, updatedBy, starred, appliedAt, goalId, automaticRejectAt, addedAt, referralId, isEligible, numCommsSent, lastCommSentAt | Per-application record. Tracks status, notes, scheduled auto-rejection, outreach counts |
| `Candidates_Audit` | All Candidates columns + auditAction, auditTimestamp | Audit trail for application changes |
| `CandidateMatchScores` | candidateId, listingId, matchScore, contextualSummary | ML-generated candidate-to-listing fit scores with LLM explanations |
| `EvaluationCriteria` | evaluationCriteriaId, listingId, criteria, shortCriteria, type, hardFilter, position, updatedAt, evalCriterionCritique, evalCriterionCritiquePass, status | Per-listing scoring rubric criteria |
| `ListingNotes` | listingNoteId, listingId, authorUserId, assigneeUserId, notificationStatus, createdAt, noteBody | Recruiter notes on listings. Contains candid operational commentary |
| `SavedListings` | id, userId, listingId, listingUid, createdAt | Candidates who bookmarked a listing |
| `ListingPipelines` | Pipeline stage configurations per listing | Hiring funnel stage definitions |
| `TalentViewSearchUsers` | searchId, userId, score, addedAt, starredAt, deletedAt | Users surfaced in talent search results |
| `SharableTalentViewConfig` | viewId, name, description, userIds, userCount, maxCandidatesCount, createdAt, updatedAt, revokedAt, createdBy, expiryAt, viewCount, visibleSections, preferredTitle | Shareable talent shortlist configurations |
| `SharableTalentViewConfigUsers` | userId, viewId, workExperience, education, summary, createdAt, updatedAt, yearsOfExperience, interviews, forms, likeCount, dislikeCount, feedback | Per-candidate data within shared talent views |
| `TalentViewUserEvaluations` | criteriaId, userId, criteriaScore | Per-criteria scores for talent view candidates |

---

## Domain 5 - Interviews and Assessments

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `Forms_Audit` | formId, companyId, listingId, title, description, guide, evaluationCriteria, assessmentRubricId, items, isArchived, isAuthed, numQuestions, isUnified, allowFormRetakes, maxRetakeAttempts, allowCopyPaste, version, createdAt, updatedAt, createdBy, auditAction, auditTimestamp, prep, assessmentVersionId, feedbackConfig | Interview/assessment form definitions. `items` contains full question list |
| `FormSubmissions` | formResponseId, formId, companyId, userId, responseStatus, formVersion, startedAt, submittedAt, activeTimeSeconds, posthogSessionIds, createdAt, updatedAt, attempt, isLatestSubmission, assessmentVersionId, feedbackSentAt | Every interview submission. Tracks time spent (`activeTimeSeconds`) |
| `AssessmentRubrics` | assessmentRubricId, title, createdAt, instructions, sumScores, sumSquareScores, countScores, version, passThreshold | Scoring rubric definitions with aggregate statistics |
| `AssessmentRubrics_Audit` | All AssessmentRubrics columns + auditAction, auditTimestamp | Rubric change history |
| `AssessmentRubricItems_Audit` | assessmentRubricItemId, assessmentRubricId, criteria, shortName, points, position, format, relatedQuestionIds, version, auditAction, auditTimestamp, webSearch, smartScoring, type, config, createdAt, updatedAt | Individual rubric criteria with AI scoring configuration |
| `AssessmentEvalState` | id, submissionId, assessmentType, jobType, status, retryCount, createdAt, reason, triggerSource, triggeredByUserId, modalJobId, durationMs, operationId, assessmentId | Grading pipeline execution state |
| `AssessmentVersions` | Versioned assessment configurations | Assessment version tracking |
| `AssessmentAudits` | Assessment activity audit trail | Audit log for assessment operations |
| `GradedRubricItems` | Per-rubric-item graded scores | Individual rubric item scores per submission |
| `GradedRubricItems_Audit` | Audit trail for graded items | Score change history |
| `InterviewEvals` | interviewId, communicationScore, technicalScore, qaPairScores | Aggregate interview scores by dimension |
| `InterviewScores` | scoreId, userId, interviewId, interviewConfigId, points, createdAt | Final interview score per user |
| `InterviewIssues` | issueId, interviewId, issue, source, notes, startPosition, endPosition, reportedBy, createdAt, updatedAt | Technical and integrity issues reported during interviews |
| `PairwiseComparisons` | listingId, listingUid, interviewConfigId, winnerResumeId, loserResumeId, reasoning, winnerUserId, loserUserId | Bradley-Terry tournament comparisons for candidate ranking |
| `MercorScores` | candidateId, listingId, listingUid, resumeId, evaluationCriteria, interviewConfigId, mScoreRaw, mScoreNormalized, numComparisons, contextualSummary, userId, aggregateFeatureScore | Final MercorScore per candidate per listing |

---

## Domain 6 - Work Trials and Onboarding

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `WorkTrial_Audit` | workTrialId, userId, companyId, listingStepConfigId, status, payableAmount, billableAmount, ciiaaDirect, ciiaaPassthrough, tow, offerLetter, startDate, endDate, payout, payment, paymentMethod, signature, projectId, billingAccountId, createdAt, updatedAt, version, auditAction, auditTimestamp, updatedBy | Work trial contract records. Contains signed legal documents and pay amounts |
| `WorkTrialConfig` | workTrialConfigId, title, payableAmount, billableAmount, ciiaaDirect, ciiaaPassthrough, tow, endDate, emailTemplateSubject, emailTemplateBody, emailTemplateSubjectExtension, emailTemplateBodyExtension, interviewIds, formIds, createdAt, updatedAt, deletedAt, companyId, isUnified, projectId | Reusable work trial templates |
| `OnboardingState` | id, shortName, name, threshold, createdAt, updatedAt, order | Onboarding funnel steps. Sample: `interview_completed` threshold=1 order=0 |
| `OnboardingDocument` | onboardingDocumentId, onboardingDocument, createdAt, projectId | Per-project onboarding materials |
| `TierProgress` | id, createdAt, updatedAt, userId, tierId, planId, status, completedAt, paidAt | Contractor tier/level progression tracking |
| `PlanAssignments` | id, createdAt, updatedAt, userId, planId, assignedBy, startDate, endDate, userHours, tasksCompleted, status | Assigns contractors to specific earning/task plans |

---

## Domain 7 - Projects and AI Task Management

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `Projects_Audit` | projectId, name, createdAt, updatedAt, companyId, archivedAt, externalId, onboardingDocumentId, userId, screenshotEnabled, userGroupEmail, description, requireAvailabilityUpdates, skills, projectType, offerExtendedText, annotationPlatform, annotationPlatformIDs, ssotLink, status, notes, version, auditAction, auditTimestamp, taskMetricsDatastore | Full project configuration audit trail |
| `ProjectIAM` | id, projectId, userId, roleId, status, assignedBy, version, createdAt, updatedAt | Role assignments within projects |
| `ProjectIAM_Audit` | All ProjectIAM columns + auditAction, auditTimestamp | Project IAM change history |
| `ProjectCustomColumns` | id, projectId, name, dataType, position, createdBy, createdAt, updatedAt, deletedAt, sqlQuery, source | Dynamic metadata columns per project. Some computed via SQL |
| `ProjectCustomColumnValueHistory` | id, customColumnId, jobId, value, changedBy, createdAt | History of custom column values |
| `ProjectArchetypes` | archetypeId, projectId, archetypeText, createdAt, updatedAt, version, elements | Character/persona definitions for annotation projects |
| `ProjectAttributeValues` | Project attribute key-value pairs | Flexible project attribute storage |
| `ProjectViewConfig` | viewId, title, projectId, viewContext, createdByUserId, createdAt, updatedByUserId, updatedAt, deletedAt, roleId, viewType | Saved view configurations for project management |
| `ProjectIntegrations` | id, projectId, groupMail, autoProvision, createdAt, updatedAt, oktaGroupId, integrationsData, oktaOwnerGroupId, oktaEPMGroupId, latestGroupBatch, latestBatchMemberCount, projectShortId, workspaceNotificationChannel, ownerGwGroup, epmGwGroup, slackChannelId | Project integrations with Okta groups and Slack channels |
| `ProjectAutomations` | Project-specific automation configurations | Automation bindings per project |
| `ProjectFunctions` | id, name, description, createdAt, updatedAt | Named functions available in project automation |
| `TaskDefinitions` | taskDefId, projectId, rubric, autograder, version, createdAt, updatedAt, task_schema, metadata | AI task type definitions with grading rubrics |
| `TaskDefinitions_Audit` | All TaskDefinitions columns + auditAction, auditTimestamp | Task definition change history |
| `TaskAudits` | uid, taskDefinitionId, recordId, s3KeyPrefix, authorId, auditorId, status, outcome, autoOutcome, createdAt, updatedAt, dispute, disputedBy | Individual task submission reviews with dispute tracking |
| `TaskAssignments` | id, createdAt, updatedAt, jobId, taskId, userId, appliedBy | Maps tasks to jobs and users |
| `DeliverableBatches` | id, uid, name, projectId, invoiceLineItemId, status, taskCount, version, isLatest, metadata, createdAt, updatedAt, createdBy | Grouped task deliverable batches for invoicing |
| `Deliverables` | deliverableId, jobId, userId, projectId, entityType, entityId, status, createdAt, updatedAt | Individual deliverable records |
| `Deliverables_Audit` | All Deliverables columns + isMostRecent, auditAction, auditTimestamp | Deliverable change history |
| `ProductivityProjectRules` | id, project_id, description, rules, created_by, is_active, version, created_at | Per-project productivity monitoring rule configurations |

---

## Domain 8 - Jobs and Contracts

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `Jobs` | jobID, contractorID, companyID, status, payableRate, commitment, ciiaa_direct, ciiaa_passthrough, tow, payment, startDate, createdAt, updatedAt, expiresAt, tax_form, expected_hours, title, stripeSubscriptionId, billableRate, version, dismissalDate, insightful, paymentMethod, projectID, checkr, idVerification, uid, payout, offerLetter, listingUID, managerId, signature, backgroundCheck, isLatest, note, referralId, roleId, provisionIdpAccess, safety_waiver, sourceId, confidentiality, billingAccountID, backgroundCheckConfig | Core employment contract. Contains pay rates, legal agreements, Stripe subscription |
| `Jobs_Audit` | All Jobs columns + auditAction, auditTimestamp | Job contract change history |
| `JobEvents` | jobEventId, jobId, contractorId, actorId, actionType, metadata, createdAt | Events on job contracts (status changes, communications). Sample: `comm`, `Contract Reminder` |
| `JobEventsQueue` | queueItemId, sourceType, sourceId, payload, renderedPreview, editedPreview, status, response, createdAt, resolvedBy, resolvedAt, jobEventId | Queued job events pending processing or review |
| `JobEventReasonAssociations` | jobEventId, reasonId, createdAt | Structured reasons associated with job events |
| `JobTasks` | Tasks linked to specific jobs | Job-task mapping |
| `JobPerformanceMetrics_New` | jobPerformanceMetricsId, jobId, performanceScore, standardError, jobPerformanceSummary, version, createdAt, updatedAt | ML-generated job performance metrics |
| `JobPerformanceMetrics_Audit` | jobPerformanceMetricsId, jobId, version, lvr, lvrReasoning, confidenceLevel, isFraud, wasDismissedEarly, jobSummary, auditAction, auditTimestamp, createdAt, updatedAt | Detailed performance metrics audit trail including fraud flags |
| `JobPerformanceReviews_New` | performanceReviewId, jobId, contractorId, companyId, projectName, taskId, score, reviewNotes, performanceReasons, dismissalFlag, dismissalReason, reviewedBy, createdAt, updatedAt, oldReviewId, feedBackFlag | Human-reviewed job performance assessments |
| `WeeklyProjectFeedback` | weeklyProjectFeedbackId, userId, jobID, weekStart, rating, feedbackText, submittedAt, updatedAt, createdAt | Weekly contractor feedback on their project experience |
| `ContractorPerformance_New` | contractorPerformanceId, contractorId, standardError, performanceScore, performanceSummary, version, createdAt, updatedAt | Aggregate contractor performance across all jobs |
| `ContractorPerformance_New_Audit` | All ContractorPerformance_New columns + auditAction, auditTimestamp | Contractor performance change history |
| `PerformanceReviews` | performanceReviewId, contractorId, reviewDate, performanceDetails, stars, taskDetails, reviewBy, createdAt, updatedAt, companyId | Company-authored contractor performance reviews with star ratings |
| `MLExperimentsJobPerformanceReviews` | Date of review, Account, Project, Reviewer, Work type, Review type, Name, Email, Quality of Work, Engagement, Offboarding Reason, Justification for rating | Raw performance data for ML model training |

---

## Domain 9 - Time Tracking and Productivity Surveillance

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `InsightfulScreenshots` | id, externalId, contractorId, projectId, storageBucket, storageKey, storageUrl, storageProvider, fileExtension, contentType, fileSizeBytes, vendorName, schemaVersion, vendorMetadata, externalIdentifiers, screenshotTimestamp, timestampTranslated, timezoneOffset, timezone, isBlurred, isOriginal, isRemoved, removedAt, externalProductivityScore, computer, hwid, os, osVersion, agentVersion, appName, appFileName, appFilePath, windowTitle, browserUrl, document, browserSite, ip, gateways, windowId, activityId, fragmentId, createdAt, updatedAt | Per-screenshot records with full device fingerprint (IP, MAC, HWID), application, URL, and S3 image link |
| `Timelog` | id, externalId, externalProjectId, employeeId, duration, timeStart, timeEnd, timezone, source, taskId, taskName, lineItemUid, adjustmentReason, uid, version, userId, isCompleted, linkFailReason, insightfulCreatedAt, insightfulUpdatedAt, createdAt, updatedAt | Work session records synced from Insightful |
| `Timelog_Audit` | All Timelog columns + audit metadata | Timelog change history |
| `Deductions` | id, contractId, contractorId, durationToSubtractMs, appName, reasonForDeduction, payoutCycleID, externalProjectId, externalEmployeeId, status, approvedBy, approvedAt, appliedBy, appliedAt, createdAt, createdBy, updatedAt | Pay deductions for non-productive time with approval chain |

---

## Domain 10 - Payments and Financial Infrastructure

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `UserPaymentMethods` | id, userId, provider, providerMethodId, methodType, status, metadata, createdAt, updatedAt, version, countryCode | Contractor payment accounts. Sample: `stripe`, `acct_1R0V****`, `express_account`, `onboarded`, `USA` |
| `UserPaymentMethods_Audit` | All UserPaymentMethods columns + auditAction, auditTimestamp | Payment method change history |
| `MercorUserFinancials` | id, userId, paymentProvider, providerIdentifier, accountDetails, lastFetchedOn, createdOn, updatedOn | Full financial account details including bank routing numbers |
| `PaymentLineItems` | id, version, cycleStartTs, cycleEndTs, totalPayableAmount, totalBillableAmount, status, createdAt, updatedAt, uid, jobUid, dispatchFailureReason, timelogUid, bonusUid, transferId, referralUid, companyId, projectId, contractorId, timeStamp, isLatestVersion, referralId, moneyOutId, eventTime, referralEligibilityId | Core payment ledger. Amounts in cents |
| `PaymentLineItems_Audit` | All PaymentLineItems columns + auditAction, auditTimestamp | Payment line item change history |
| `PaymentLineItems_TransactionalAudit` | Transactional-level payment audit | Fine-grained payment operation audit trail |
| `MoneyOut_Audit` | id, statementId, entityId, userId, entity, externalAccountId, externalTransferId, cycleStartTs, cycleEndTs, totalAmount, paymentMethod, status, createdAt, failureReason, payoutCycleId, auditTimestamp, auditAction, version | Outbound payment records |
| `WiseDisbursements` | id, moneyOutId, amount, currency, sequenceNumber, wiseTransferId, wiseQuoteId, status, failureReason, createdAt, updatedAt, accountId | International Wise payment records |
| `PayoutCycles` | cycleStartTs, cycleEndTs, id, status, configId, configVersion | Pay period definitions |
| `PayoutRecords` | Individual payout transaction records | Detailed payout ledger |
| `PayoutConfigs` | payoutConfigId, status, type, configuration, version | Payment configuration rules |
| `InvoiceLineItems` | id, name, companyId, invoiceId, sowId, taskCount, rawAmount, adjustedAmount, status, description, metadata, createdAt, updatedAt, createdBy | Company invoice line items |
| `BillingAccounts` | Company billing account definitions | Client billing account management |
| `BillingConfigs` | id, uid, version, isLatestVersion, rules, projectId, createdAt, updatedAt, createdBy | Billing rule configurations (markup, caps) |
| `BillingRateCards` | billingRateCardId, uid, version, isLatestVersion, sowId, formulaType, rateRows, createdAt, updatedAt, createdBy | Per-SOW rate card definitions |
| `RevenueAdjustments` | id, companyId, projectId, attestationId, cancelledAdjId, amountCentsUsd, category, revenueRecognitionDate, reason, createdAt, creatorId, isCancellation, formula, labels, aggregationFields, attachments, invoices | Revenue adjustments and corrections |
| `FinanceLabels` | Finance label definitions | Labels for financial categorization |
| `CompanyFinanceLabels` | companyId, financeLabelId, createdAt, creatorId | Finance label assignments to companies |
| `ReferralEligibility` | id, createdAt, updatedAt, referralUid, campaignId, referrerAmount, refereeAmount, referrerLineItemId, refereeLineItemId, criteriaId, onboardingStateId, referralId, entity_id, entity_type, type, jobId, billingAccountId, toolingIdempotencyKey, creatorId | Referral payment eligibility and vesting conditions |

---

## Domain 11 - Referrals and Growth

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `Referrals` | referralId, referredUserId, referringUserId, createdAt, version, uid, status, reason, listingId, campaignId, totalEarned, totalEarningsPotential, state, deleted_at, paidAt, disputeStatus, isActive, referral_cap, referralIdempotencyKey, isPaymentBlocked, isGuaranteedReferral | Core referral records with earnings tracking |
| `Referrals_Audit` | All Referrals columns + audit metadata | Referral change history |
| `ReferralReminder` | referralId, createdAt, lastSentAt | Referral reminder email tracking |
| `GuaranteedReferralQuota` | quotaId, referringUserId, offPlatformUserId, shortenedLink, weekStart, status, createdAt, updatedAt, isEmailSent | Guaranteed referral program quota management |
| `ReferrerMeta` | Referrer metadata and configuration | Additional referrer attributes |
| `OffPlatformCampaigns` | Campaign definitions for off-platform outreach | External recruitment campaign management |
| `OffPlatformCampaignSteps` | campaignStepId, stepNumber, campaignId, campaignType, subject, messageTemplate, parameters, scheduledAt, status, outreachedCandidateIds, failedCandidateIds, createdAt, updatedAt | Multi-step outreach sequence steps |
| `OffPlatformRecruitingManager` | id, managerId, offPlatformUserId, listingId, createdAt, updatedAt, updatedBy | Off-platform recruiter assignments |
| `OffPlatformUsersMapping` | mappingId, userId, offPlatformUserId, createdAt, updatedAt, referringUserId, status | Mapping between platform and off-platform user identities |

---

## Domain 12 - Communications and Outreach

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `Comms` | commId, groupId, senderId, receiverId, content, type, triggerRef, createdAt, listingReferenceUID | In-platform messaging with full message content |
| `CommsSent` | Communication delivery records | Message send tracking |
| `EmailTemplates` | emailTemplateId, companyId, subject, content, createdBy, createdAt, updatedAt, isGlobal, tags, isPersonal | Email template library |
| `AircallComms` | Phone call logs from Aircall VoIP integration | Recruiter call records |
| `LinkedinWarmIntros` | warmIntroId, linkedinUrl, email, referringUserId, listingId, commEvent, status, createdAt, updatedAt, sentAt | LinkedIn outreach campaign records |
| `PartnerChatThreads` | threadId, listingId, referralId, partnerId, createdAt | Chat threads with referral partners |
| `FirstTimeInvites` | commId, userId, listingId, createdAt, commEvent, refListingUid, contentType, subject, listingIdCount | First-contact invitations to candidates |
| `AutomationTemplates` | templateId, name, description, category, handler, sourceType, sourceSql, templateBody, paramsSchema, cron, idempotency, autoApprove, version, createdAt, updatedAt, deletedAt, triggerConfig, config | Automated notification/workflow templates |
| `Feedback` | id, user_id, question_text, question_response, rating, device, created_at, updated_at | In-app user feedback submissions |

---

## Domain 13 - Company and Access Management

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `Company` | companyId, name, description, website, externalName, billingModel, logo, brandVisible, billingStartDay, billingEndDay, aboutCompany, universe | Client company master records |
| `IAM` | roleId, companyId, status, userId_v4, id, version | Company-level role assignments |
| `IAM_Audit` | roleId, companyId, status, userId_v4, id, version, auditAction, auditTimestamp | IAM change history. Sample: `roleId=ghost`, `REMOVED` |
| `IAMOutbox` | id, resourceType, resourceId, relation, subjectType, subjectId, operation, requestedBy, requestedByService, createdAt, callerToken | IAM change outbox for event-driven propagation |
| `GodmodeCompanies` | companyId, createdAt, createdBy, includeInFillRate | Companies accessible via internal Godmode admin |
| `GodmodeArbitraryCells` | entityType, entityGmId, acKey, acValueNumber, acValueString, acValueFormula, userId, createdAt, acMetadata | Arbitrary Godmode data cells for internal operations |
| `Audience` | id, projectId, companyId, audienceType, slug, anchorType, anchorId, oktaGroupId, googleGroupId, slackGroupId, insightfulTaskId, createdAt, updatedAt, slackChannelId, query | Audience definitions linking projects to Okta/Slack/Insightful groups |
| `AudienceTargetProviders` | id, audienceId, name, externalId, type, createdAt, metadata | External providers linked to audiences |
| `DrivePermission` | id, driveId, googleGroupId, permissionLevel, googlePermissionId, createdAt, updatedAt | Google Drive access permissions for project documents |

---

## Domain 14 - Skills Certifications and Endorsements

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `Skills` | skillId, name, description, CertificationPolicy, type, parent, createdAt | Hierarchical skills taxonomy |
| `CertificationPolicies_Audit` | certificationPolicyId, companyId, name, description, rules, isActive, isUnified, createdAt, icon, isRevokable, requiresApproval, version, auditAction, auditTimestamp, iconColor, showBadge, displayText | Certification program definitions |
| `Certifications_Audit` | certificationId, certificationPolicyId, userId, evidence, status, isCertified, earnedAt, note, createdAt, updatedAt, version, auditAction, auditTimestamp | Individual earned certifications. `evidence` contains scoring proof |
| `SkillCertifications_Audit` | uid, userId, skillId, isCertified, version, lastEvaluatedAt, auditedAt, auditAction | Per-skill certification status |
| `SkillCertificationsEvidence_Audit` | uid, userId, skillId, isCertified, version, sourceType, sourceId, createdAt, updatedAt, auditedAt, auditAction, score, metadata | Evidence backing skill certifications |
| `ContractorEndorsements` | endorsementId, endorsingJobId, endorsedJobId, endorsingUserId, endorsedUserId, contents, tags, createdAt, updatedAt, source, sentiment | Peer endorsements with text content and sentiment |
| `UserResumeEvaluation` | evaluationId, workExperienceScore, yearsOfWorkExperience, graduationYear, mScore, inferredRole, workExperienceSkills, resumeEvalScore, awardScore, educationScore, rateAcademicCompetitions, rateCompetitiveProgramming, rateHackathonPerformance, sumScore, technicalSkills, normalisedSumScore, highestDegree, userId | ML resume evaluation scores |
| `CandidateVouches` | vouchId, voucherUserId, candidateUserId, candidateEmail, candidateLinkedinId, candidateName, resumeS3Key, resumeHash, howKnowSocialPlatform, howKnowSocially, howKnowWorkedTogether, howKnowStudiedTogether, howKnowOther, reasonSkills, reasonEducation, reasonEmployer, reasonExpertise, reasonOther, createdAt, updatedAt | Structured peer vouching with relationship details |

---

## Domain 15 - Analytics and ML

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `DbtFirmSchoolRank` | firmId, firmName, academicField, nProfiles, avgSchoolRank, medianSchoolRank, priorMeanSchoolRank, ebPriorStrength, ebAvgSchoolRank, firmsInField, firmSchoolRank, firmSchoolRankPercentile | Employer prestige scores for ~154,000 firms. Used in resume scoring |
| `DbtSchoolRankings` | academicField, schoolName, schoolScore, schoolRank | School prestige rankings by field |
| `PosthogAnalytics` | uuid, userEmail, company, startTimeUtc, endTimeUtc, activetime, inactivetime, startUrl | PostHog sessions linked to user email identity |
| `SearchAnalytics` | run_id, run_timestamp, avg_relevance_score, avg_prestige_score, p99_latency_ms, position_weighted_relevance_score, avg_relevant_prestige_score | Search quality metrics over time |
| `ForecastMetrics` | entity, id, dt, snapshot_dt, modelVersion, predictedValue | ML forecast outputs for capacity and fill rate planning |
| `MLExperimentsJobPerformanceReviews` | Date of review, Account, Project, Reviewer, Work type, Review type, Name, Email, Quality of Work, Engagement, Offboarding Reason, Justification for rating | Raw performance review data for ML training |
| `TalentViewUserEvaluations` | criteriaId, userId, criteriaScore | Structured per-criteria talent evaluations |
| `ProductivityProjectRules` | id, project_id, description, rules, created_by, is_active, version, created_at | Per-project productivity monitoring rule definitions |

---

## Domain 16 - Infrastructure and DevOps

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `IacDeploymentRuns` | id, runType, environment, status, commitSha, branch, actor, githubRunId, githubRunUrl, prNumber, stacksAffected, resourcesAdded, resourcesChanged, resourcesDestroyed, summary, durationSeconds, startedAt, completedAt, createdAt | Terraform deployment records. Exposes GitHub monorepo URLs, engineer usernames, Terraform plan output |
| `ProductionDeployment` | deploymentRecordId, releaseTag, buildHash, deployedAt, deploymentIds, taskDefinitionArns, status, createdAt, updatedAt | ECS production deployment records. Contains AWS task definition ARNs |
| `PreprodDeployment` | id, releaseTag, commitSha, deployedAt, loadTestPassed, releaseOwner, status, createdAt, updatedAt | Staging deployment records with load test results |
| `PreprodDeploymentTest` | id, test_message, created_at, updated_at | Test table for pre-production deployment validation |
| `ProductionVersion` | id, lastVersion, lastReleaseTag, lastBuildHash, updatedAt | Single-row pointer to current production version |
| `RollbackExecution` | Rollback event records including affected services | Emergency rollback tracking |
| `DATABASECHANGELOG` | ID, AUTHOR, FILENAME, DATEEXECUTED, MD5SUM, DESCRIPTION, COMMENTS, EXECTYPE, LIQUIBASE | Liquibase schema migration history. Reveals engineer names, migration filenames |
| `DATABASECHANGELOGLOCK` | Liquibase migration lock state | Prevents concurrent schema migrations |
| `AgentSandboxes` | sandboxId, userId, title, agentType, status, backendType, host, stopReason, transcriptRawUrl, transcriptConsolidatedUrl, snapshotId, lastSnapshotId, snapshotStorageKey, acpSessionId, backendId, sandboxToken, claimedAt, expiresAt, createdAt, updatedAt, deletedAt | AI coding agent sandbox sessions. `transcriptRawUrl` links to S3 conversation logs |
| `DrivePermission` | id, driveId, googleGroupId, permissionLevel, googlePermissionId, createdAt, updatedAt | Google Drive permission records |

---

## Domain 17 - Reference and Miscellaneous

| Table | Key Columns | Notes |
|-------|-------------|-------|
| `Country` | id, isoCode3, name, currency, psp, createdAt, updatedAt | Country reference table with payment service provider per country |
| `TagAssignments_Audit` | tagAssignmentId, tagId, entityType, entityId, createdAt, updatedAt, version, auditAction, auditTimestamp | Tag assignments to entities |
| `ShortenedUrls` | URL shortener records | Shortened URL definitions |
| `UrlClicks` | id, urlId, clickedAt, ipHash, userId, country | Click tracking on shortened URLs |
| `BeelineJobMapping` | External job platform mapping | Maps Mercor jobs to Beeline external system |
| `UserManagement` | Internal user management records | Admin user management |
| `UserManagementWorkflows` | User management workflow state | Multi-step user management processes |
| `ActionsQueue` | Queued action records | General purpose action queue |
| `GoldenReviewSample` | Golden reference samples for review calibration | QA calibration data |
| `References` | Professional reference records | Additional reference management |
| `CatfishAuditLog` | id, slackUserId, slackUserName, targetEmail, platform, environment, intent, status, errorMessage, slackChannelId, createdAt | Internal user lookup audit. Records every time staff look up user data via "Catfish" tool |
| `CapacityApplicationLog` | id, capacityBudgetId, capacityLogId, projectId, actionsTakenJson, status, notes, createdAt | Capacity budget application tracking |
| `OffPlatformCampaignSteps` | campaignStepId, stepNumber, campaignId, campaignType, subject, messageTemplate, parameters, scheduledAt, status, outreachedCandidateIds, failedCandidateIds, createdAt, updatedAt | Off-platform outreach campaign step execution |

---

*End of Appendix A*

---

*Document prepared for security research and educational purposes. All PII has been obfuscated.*