Anatomy of Mercor's Data Breach

A technical analysis a complete operational data (production database, user & customer data) loss

Disclaimer: All personally identifiable information (PII) in this document has been obfuscated. Names are partially masked (e.g., T** O****), emails redacted (e.g., e****a1@gmail.com), phone numbers truncated (e.g., +4479571****), bank details masked (e.g., 000**-***), financial identifiers hidden (e.g., acct_1Rc*****), IP addresses truncated (e.g., 71.194.*.*), and MAC addresses partially redacted (e.g., 1C:93:7C:**:**:**). This analysis is conducted for educational and security research purposes.

Note on source material: This entire analysis is based on two small sample files made publicly available by Lapsus$ — a database schema sample and a database export containing table structures with example rows, plus partial Airtable workspace exports. These files were shared after Mercor allegedly paid a ransom to have the data removed from the group's leak site — a fact confirmed to us directly by Lapsus$. Despite receiving payment, the group continues to share samples and is actively engaged in selling the full dataset to private bidders. Together these two files represent a fraction of a percent of the claimed 211GB production database. We did not have access to the full database, the 939GB of source code, the 3TB of cloud storage, the Slack exports, or the Tailscale VPN data. Everything documented in this report — every bank routing number, every Apple Foundation Model output, every Persona KYC session token, every desktop screenshot URL — was found in these two small files alone. The full breach is orders of magnitude larger. What follows is the tip of the iceberg.

Executive Summary
Why This Breach Is Serious
Platform Overview
Evidence - The Database Layer by Layer
Reverse Engineering - Architecture and Infrastructure
Exposed Surface Area Summary
Technical Architecture Reverse-Engineered
Grounds for Legal Action
Conclusion - What Happens Now
Appendix A - Complete Table Inventory

Executive Summary

This document presents a systematic technical analysis of a small sample from a database export from Mercor, an AI-powered talent marketplace that connects software engineers, AI data labelers, and knowledge workers with companies seeking contract labor. As reported by the Wall Street Journal, Mercor has rapidly become one of the key intermediaries in the AI industry — placing contractors inside organizations like Meta, OpenAI, Google DeepMind, Anthropic, Apple, and Amazon to perform AI training, data labeling, software engineering, and other knowledge work.

What we analyzed is two small sample files shared by Lapsus$ after Mercor allegedly paid a ransom to have the breach data removed. Despite that payment, the group continues to distribute samples and is actively selling the full dataset to private bidders. Together these files represent a tiny sliver of the claimed 211GB production database. Yet even these small samples contain over 250 table schemas with sample data rows exported from Mercor's Aurora MySQL production environment, plus Airtable workspace exports containing actual AI training data and model evaluation records. The samples cover every operational dimension of the platform — from contractor signup through identity verification, AI-conducted interviews, job placement, real-time work surveillance, and payment disbursement.

If these samples — containing just one or two rows per table — already expose full bank routing numbers, government ID verification tokens, desktop screenshot URLs, signed legal documents, and proprietary AI model outputs from Apple and Amazon, the full 211GB database contains the same data for every contractor and every transaction Mercor has ever processed.

Scope of This Article and the Full Scale of the Breach

Important: This article analyzes only two small sample files from the production database, shared by Lapsus$ after Mercor allegedly paid a ransom. The full production database is 211GB, which is itself a fraction of the claimed 4-terabyte breach. Every finding documented below was derived from these small samples alone. The full database would contain the complete records for every contractor, every transaction, every screenshot, and every payment Mercor has ever processed.

The Breach at a Glance

Mercor's official account attributes the breach to a supply-chain attack on the open-source Python package LiteLLM — a widely used AI proxy library estimated to be present in 36% of cloud environments. On March 27, 2026, using a maintainer's compromised credentials, the TeamPCP hacking group published two malicious PyPI package versions (1.82.7 and 1.82.8) that were available for download for approximately 40 minutes. The reported attack chain: the poisoned dependency landed in Mercor's development environment, swept the machine for SSH keys, AWS tokens, Kubernetes secrets, and .env files, deployed privileged containers across Mercor's Kubernetes clusters, and used the stolen credentials to begin exfiltrating data through Mercor's Tailscale VPN.

However, there are reasons to question whether LiteLLM was the sole or even primary attack vector. Exfiltrating 4 terabytes of data — production databases, 939GB of source code repositories, 3TB of cloud storage including video recordings and screenshots, plus Slack, Airtable, and Tailscale exports — is not a fast operation. At typical egress speeds, this would have taken days to weeks of sustained data transfer. A 40-minute window of malicious package availability seems insufficient to establish the deep, persistent access required to systematically exfiltrate this volume of data across this many distinct systems (Aurora MySQL, S3 buckets, GitHub repositories, Airtable, Slack, Tailscale). It is entirely possible that Mercor was already compromised through other means — whether through prior credential exposure, an insider threat, or a separate vulnerability — and that the LiteLLM incident was coincidental or merely one of multiple entry points. Mercor's characterization of itself as "one of thousands of companies" affected by LiteLLM may be an attempt to deflect from deeper, more embarrassing security failures.

Lapsus$ group subsequently claimed responsibility for the breach, posting samples of the allegedly stolen data. Lapsus$ confirmed to us directly that ransom negotiations with Mercor took place and that Mercor paid. Despite that payment, the group continues to distribute samples and is actively selling the full dataset to private bidders.

Mercor confirmed the security incident but characterized itself as "one of thousands of companies" affected by the LiteLLM compromise. The company declined to answer whether any customer or contractor data had been accessed, exfiltrated, or misused.

Security researcher Archie Sengupta noted it was a "very big breach." Y Combinator president Garry Tan was more direct: "Incredible amount of SOTA training data now just available to China thanks to @mercor_ai leak. Every major lab. Billions and billions of value and a major national security issue."

What Was Taken - The Full 4TB

The attackers claim to have exfiltrated the following assets. This article only analyzes the first item — the production database. The remaining categories are not covered in this analysis but are described here to convey the full scale of exposure.

Asset	Size	Contents
Production Database	211 GB	The subject of this article. 250+ Aurora MySQL tables containing candidate profiles (resumes, work history, skills, education), PII (names, emails, phones, addresses, dates of birth, possibly SSNs and government ID documents), interview recordings/transcripts and AI assessment scores, employer/client data (companies, contracts, pricing), and internal user accounts and credentials.
Source Code	939 GB	The complete contents of Mercor's GitHub organization — including the `mercor-monorepo` and all associated repositories. This exposes proprietary AI/ML models for candidate matching and evaluation, the full platform backend and frontend code, API keys, secrets, and internal service credentials embedded in repositories, and all infrastructure-as-code (Terraform/Terragrunt deployment configs, CI/CD pipelines, cloud architecture).
Cloud Storage Buckets	~3 TB	The actual files referenced by the S3 URLs found in the database. Organized into three categories: Video — AI interview recordings of candidates (the `ai-interviewer-recordings` and `dailyco-recordings` S3 buckets), containing face and voice biometric data; GCF-Source — Google Cloud Function source code, representing additional serverless application logic beyond the main repositories; FME Review & Verification — Identity verification documents including passports, driver's licenses, and facial recognition/biometric data used in the Persona KYC flow (the `mercor-background-check-photos`, `certn-api-s3-certn-images`, `certn-api-s3-one-id-images`, and `certn-api-s3-certn-rcmp-documents` buckets). Also included: every Insightful desktop screenshot ever captured from contractor machines (the `mercor-insightful-screenshots-production` bucket), and signed legal documents (offer letters, CIIAs, NDAs).
Tailscale VPN Data	Included	Internal network topology and routing configurations, device certificates and authentication keys, access paths to internal services, dashboards, and admin tools. This is effectively a map of Mercor's internal network.
Slack Export	Included	A full export of Mercor's enterprise Slack workspace (`mercor.enterprise.slack.com`) and potentially client-specific workspaces like `project-mega.slack.com` and `glowstone-mli-rubrics.slack.com`. Slack exports include every message, file upload, DM, and channel history — candid internal discussions, client communications, incident response threads, and operational decisions.
Airtable Export	Included	Complete exports of all Airtable workspaces used for annotation and project management (6+ distinct workspace IDs found in the database). This exposes task definitions, contractor submissions, quality review data, and client project configurations — effectively the work product of Mercor's annotation pipeline.
Google Workspace	Unknown	It is unclear whether the attackers obtained a full export of Mercor's Google Workspace. Even the small sample analyzed here contains 30+ Google Doc URLs, 10+ shared Drive folder URLs, Google Sheets, and Google Forms. The full database would contain vastly more. If the Workspace was also exfiltrated, it would include all internal documents, email (Gmail), calendar entries, and shared drives.

Why This Matters Beyond Mercor

The database analyzed in this report is merely the index — the structured metadata that describes, catalogs, and points to the stolen assets. Think of it as the card catalog for an entire stolen library:

The source code reveals how the system works — every algorithm, every API endpoint, every security mechanism, and every hardcoded credential
The Slack export reveals what was said about it internally — incident responses, client negotiations, and operational discussions
The cloud storage contains the actual files — the screenshots of contractor screens showing client systems, the video interviews showing candidates' faces and voices, the passport scans and government IDs submitted for verification
The Airtable export contains the work product itself — the annotation data, task submissions, and quality reviews that Mercor's clients (including frontier AI labs) paid for
The Tailscale VPN data provides a map to anything that was missed — the internal network topology that could enable further unauthorized access if credentials haven't been fully rotated

As Garry Tan noted, the AI training data alone — the prompts, responses, evaluations, and RLHF annotations produced by Mercor's contractors for organizations like OpenAI, Meta, and Google DeepMind — represents potentially billions of dollars in value. If this data reaches competitors — whether domestic rivals or labs in other countries — it would allow them to shortcut years of investment. The source code for Mercor's proprietary ranking algorithms (MercorScore, the Bradley-Terry tournament system, the Bayesian fraud model) adds further competitive intelligence value.

Together, this represents one of the most comprehensive corporate breaches in recent memory: not a single database table or a handful of credentials, but the complete digital footprint — code, data, communications, files, network maps, and work product — of an organization entrusted with some of the most sensitive work in the AI industry.

Why This Breach Is Serious

Why AI Training Data Is Worth Billions

To understand why this breach is significant and not just another corporate data leak, it helps to understand what AI training data is and why companies like OpenAI, Anthropic, Apple, Amazon, Meta, and Google pay enormous sums to produce it.

Modern AI models like GPT-4, Claude, and Gemini are not programmed — they are trained. The raw intelligence comes from pre-training on internet text, but the ability to follow instructions, reason carefully, and refuse harmful requests comes from a second phase that depends entirely on human-generated data. This is the data Mercor's contractors produce. It falls into several categories, all of which are present in the breach:

Supervised Fine-Tuning (SFT) data — Humans write high-quality responses to prompts, demonstrating how the model should behave. The TASKS and TASK_VERSIONS tables across Mercor's 84 Airtable workspaces contain these prompt-response pairs, organized by domain (legal, medicine, finance, coding, etc.). A single SFT dataset covering a specialized domain can cost millions of dollars to produce because it requires experts — lawyers, doctors, engineers — writing at $95/hour for months.

Reinforcement Learning (RL) preference data — Humans compare two model outputs and judge which is better. This is the core of RLHF (Reinforcement Learning from Human Feedback), the technique that transformed GPT-3 into ChatGPT. The API_PREFERENCE workspaces, PHASE_1_TASKS (Amazon), and the GPT-4 vs Claude Evaluation project all contain this data — complete with the prompts, both model responses, and the human preference judgment. This data teaches models what humans actually want, which is the hardest and most expensive part of AI development.

RL rubrics and evaluation criteria — Before humans can judge model outputs, someone must define what good looks like. The CRITERIA, RUBRIC_VERSIONS, QA_SPECS, and LLM_CALL_CONFIGURATION tables across 60+ Airtable workspaces contain these rubrics. They encode the evaluation methodology itself — the scoring frameworks, the edge cases, the quality thresholds. This is proprietary intellectual property that defines how each AI lab measures progress. A competitor with access to these rubrics doesn't just get the training data — they get the recipe.

RL environments and Chain-of-Thought data — The AMAZON_LLM_COT_EVALUATION workspace contains full Chain-of-Thought traces — the step-by-step reasoning that models produce before giving a final answer. The ACADEMIC_REASONING_SFT workspace contains a COT table explicitly for reasoning supervision. The Panacea — Consulting RL Envs project built reinforcement learning environments. This data teaches models how to think, not just what to say.

Benchmark evaluation data — The ATHENA_HLE workspaces (likely Humanity's Last Exam) and AIME_RUBRICS (AIME math competition) contain evaluation data for some of the most important AI benchmarks. The MODEL_RESPONSES and AWAITING_REVIEW_METRICS tables contain graded model outputs against these benchmarks. If this data is used to train future models, it contaminates the benchmarks — the models will appear to perform better than they actually do, undermining the entire AI evaluation ecosystem.

Pre-release model outputs — The APPLE_ENDPOINT_SANDBOX workspace contains actual outputs from Apple's unreleased Foundation Models (afm-text-083, afm-model-086). These responses reveal the model's capabilities, limitations, safety alignment, and failure modes before Apple has publicly launched them. For a competitor, this is the equivalent of obtaining a rival's product prototype.

Why this data is so expensive to reproduce:

Each data point requires a skilled human — often a domain expert — spending minutes to hours crafting, evaluating, or comparing model outputs. At Mercor's reported average rate of $95/hour across 30,000+ contractors, the annual cost of data production runs into hundreds of millions of dollars. OpenAI, Anthropic, and the other labs have each spent years and billions of dollars building these datasets incrementally, refining their rubrics, and developing their evaluation methodologies.

The breach doesn't just expose data. It exposes the methodology — the rubrics, the evaluation criteria, the domain taxonomies, the quality control processes, and the scoring frameworks that each lab has spent years developing. Any competitor with access to this material — domestic or foreign — could replicate years of alignment research in months, at a fraction of the cost, by simply adopting the proven evaluation frameworks and training on the stolen preference data.

This is why Garry Tan called it "billions and billions of value." The data in these Airtable workspaces is not supplementary. It is the core competitive advantage of the AI labs that produced it — and it is now for sale.

The Extent - What Data Was Exposed

The breadth of personally identifiable information (PII) in this breach is staggering. The following inventory documents every category of sensitive data present in the database dump, with specific column names, source tables, and — where available — the format of the exposed data as observed in sample records. This inventory is intended to serve as a factual reference for affected individuals, regulators, and legal counsel.

1. Personal Identity Information

Data Element	Database Column(s)	Source Table(s)	Format Observed in Sample
Full legal name	`name`, `first_name`, `last_name`	`MercorUsers_New`, `MercorUserFinancials` (embedded in Stripe JSON)	`T O`, `Hi A**a` (full plaintext names)
Personal email address	`email`	`MercorUsers_New`, `Candidates`, `LinkedinWarmIntros`, `UserReferences`, `MLExperimentsJobPerformanceReviews`	`e**a1@gmail.com`, `a*y@gmail.com`, `a***s@gmail.com` (full plaintext)
Phone number with country code	`phone`	`MercorUsers_New`	`+4479571****` (full international format)
Date of birth	`birthday`	`UserMetadata`, `Candidates`, `WorkAuthorization_Audit`	Date field — exact DOB for each contractor
Physical home address	`physicalLocation`, `residenceCity`, `residenceState`, `residenceZipCode`	`UserMetadata`, `UserLocation`, `Candidates`	City, state, zip code, and country of residence
Profile photograph	`profilePic`	`MercorUsers_New`	URL to stored profile image
Country of residence	`residenceCountry`, `countryOfResidence`	`UserLocation`, `UserMetadata`, `Candidates`	`USA`, `United Kingdom`
LinkedIn profile URL	`linkedinUrl`, `url`	`Candidates`, `LinkedinWarmIntros`, `LinkedinUsers`	`https://www.linkedin.com/in/s-s-s****-d***` (full URL with real name)

2. Government Identity Documents and Biometrics

Data Element	Database Column(s)	Source Table(s)	Format Observed in Sample
Government ID verification outcome	`governmentIdStatus`	`IDVerificationChecks`	`not_applicable`, `passed`, `failed`
Liveness detection result	`livenessStatus`	`IDVerificationChecks`	Binary pass/fail — confirms a live facial scan was performed
Facial comparison thumbnail	`thumbnail_key` (in `providerResponse` JSON)	`IDVerificationChecks`	`intr_AAABnNOWs0wnj7Tmg0hBQpL5_thumbnail.jpg` — a stored facial image key
Persona KYC session token	`sessionId`, `sessionToken`	`IDVerificationChecks`	`face_baseline_intr_AAABnNOWs0wnj7Tmg0hBQpL5` — replayable session ID
Persona account identifier	`persona_account_id` (in `providerResponse` JSON)	`IDVerificationChecks`	`act_QMTuQh33A4QU23J8ECPSd32BBKb4`
Address verification status	`addressStatus`	`IDVerificationChecks`	Confirms whether home address was verified against government records
Verification attempt count	`attemptNumber`, `maxAttempts`	`IDVerificationChecks`	Tracks repeated identity verification attempts

Note: The cloud storage buckets (mercor-background-check-photos, certn-api-s3-one-id-images, certn-api-s3-certn-rcmp-documents) reportedly contain the actual document images — passports, driver's licenses, and RCMP criminal record documents — referenced by these database records.

3. Financial and Banking Data

Data Element	Database Column(s)	Source Table(s)	Format Observed in Sample
Bank name	`bank_name` (in `accountDetails` JSON)	`MercorUserFinancials`	`BANK OF M*******` (plaintext)
Bank routing number	`routing_number` (in `accountDetails` JSON)	`MercorUserFinancials`	`000-*` (full routing number in plaintext)
Bank account last 4 digits	`last4` (in `accountDetails` JSON)	`MercorUserFinancials`	`07**`
Bank account holder name	`account_holder_name` (in `accountDetails` JSON)	`MercorUserFinancials`	`H**i A**a` (full legal name on bank account)
Stripe Express account ID	`providerMethodId`, `stripeAccountId`	`UserPaymentMethods`, `MercorUsers_New`	`acct_1Rc*****`
Full Stripe account JSON	`accountDetails`	`MercorUserFinancials`	Complete Stripe API response including all fields above plus `charges_enabled`, `payouts_enabled`, `default_currency`, TOS acceptance timestamp, and external account details
Wise transfer & quote IDs	`wiseTransferId`, `wiseQuoteId`	`WiseDisbursements`	Transfer identifiers for international payments
Payment amounts	`totalPayableAmount`, `totalBillableAmount`, `totalAmount`	`PaymentLineItems`, `MoneyOut_Audit`, `WiseDisbursements`	Amounts in cents (e.g., `250000` = $2,500.00)
Pay rates	`payableRate`, `billableRate`	`Jobs`, `Jobs_Audit`	Exact hourly/monthly compensation — both what contractor earns and what client pays
Tax form status	`tax_form`	`Jobs`	Tax filing status per contractor
Stripe subscription ID	`stripeSubscriptionId`	`Jobs`	Billing subscription identifier
Payout schedule and currency	`schedule.interval`, `default_currency` (in JSON)	`MercorUserFinancials`	`daily` payout with `7` day delay, currency `cad`
Payment failure reasons	`dispatchFailureReason`, `failureReason`	`PaymentLineItems`, `MoneyOut_Audit`, `WiseDisbursements`	Structured failure codes revealing payment issues

The MercorUserFinancials.accountDetails field is particularly egregious — it stores the complete Stripe Connect API response as a JSON blob, which includes the contractor's full legal name, personal email, bank name, routing number, last four digits of the account, account holder name, country, currency, and TOS acceptance details. This is not a reference or a token — it is the raw financial identity of each contractor stored in a single database column.

4. Employment and Performance Records

Data Element	Database Column(s)	Source Table(s)	Format Observed in Sample
Employment contract terms	`payableRate`, `billableRate`, `commitment`, `expected_hours`, `startDate`, `expiresAt`	`Jobs`, `Jobs_Audit`	Full contract terms including pay rate, hours, and duration
Signed offer letters	`offerLetter`	`Jobs`, `WorkTrial_Audit`	S3 key or base64 encoded signed legal document
Digital signatures	`signature`	`Jobs`, `WorkTrial_Audit`, `WorkAuthorization_Audit`	Contractor's digital signature on legal agreements
CIIA/NDA agreements	`ciiaa_direct`, `ciiaaPassthrough`	`Jobs`, `WorkTrial_Audit`	Confidentiality and IP assignment agreements
Terms of work	`tow`	`Jobs`, `WorkTrial_Audit`	Full terms of engagement
Safety waiver	`safety_waiver`	`Jobs`	Safety waiver acceptance
Dismissal date and reason	`dismissalDate`, `dismissalReason`, `dismissalFlag`	`Jobs`, `JobPerformanceReviews_New`	Date of termination and categorized reason
Offboarding reason	`Offboarding Reason`	`MLExperimentsJobPerformanceReviews`	Plaintext offboarding justification
Performance scores	`score`, `Quality of Work`, `Engagement`, `performanceScore`	`JobPerformanceReviews_New`, `MLExperimentsJobPerformanceReviews`, `ContractorPerformance_New`	Numeric ratings with text justifications
Performance review text	`reviewNotes`, `Justification for rating`, `performanceSummary`, `jobPerformanceSummary`	`JobPerformanceReviews_New`, `MLExperimentsJobPerformanceReviews`	Free-text evaluations of individual contractors
Reviewer identity	`reviewedBy`, `Reviewer`	`JobPerformanceReviews_New`, `MLExperimentsJobPerformanceReviews`	Named Mercor staff who wrote the review (e.g., `A* K***`)
Client project name	`Account`, `Project`, `projectName`	`MLExperimentsJobPerformanceReviews`, `JobPerformanceReviews_New`	`OpenAI`, `Apertus - Elephant` — links contractor performance to specific client

The MLExperimentsJobPerformanceReviews table is especially damaging: it contains the contractor's full name, email, client company name (e.g., OpenAI), project name, reviewer's name, quality score, engagement score, offboarding reason, and a free-text justification — all in a single row. Sample: A***** D****, a*****s@gmail.com, OpenAI, Apertus - Elephant, reviewed by A*** K*****, rated 4 - Redefines Expectations.

5. Criminal Background and Adverse Media Checks

Data Element	Database Column(s)	Source Table(s)	Format Observed in Sample
Criminal background check status	`status`	`BackgroundCheck`, `BackgroundCheck_New`	`clear` / `consider` (whether criminal history was flagged)
Adverse media check status	`adverseMediaCheckStatus`	`BackgroundCheck`	Whether negative news/media was found about the individual
Background check package	`package`	`BackgroundCheck`	e.g., `tasker_pro` — defines which checks were run
RCMP criminal record documents	Referenced via S3 bucket	`certn-api-s3-certn-rcmp-documents-ca-central-1-production`	Royal Canadian Mounted Police criminal record check documents
External candidate ID at Checkr/Certn	`externalCandidateId`, `backgroundCheckId`, `reportId`	`BackgroundCheck`	Cross-references to external background check providers
Work location for check	`workLocation`	`BackgroundCheck`	Country/jurisdiction of background check

6. Work Authorization and Immigration Status

Data Element	Database Column(s)	Source Table(s)	Format Observed in Sample
Work authorization status	`workAuthorizationStatus`	`UserMetadata`, `Candidates`, `WorkAuthorization_Audit`	Whether individual is authorized to work in a given country
Physical country vs. residence country	`physicalCountry` vs. `residenceCountry`	`UserLocation`, `WorkAuthorization_Audit`	Mismatch between these fields is flagged as fraud — revealing who may be working from an unauthorized location
Location attestation with signature	`agreedToLocation`, `signature`, `attestedAt`	`WorkAuthorization_Audit`	Signed attestation of physical work location

Work authorization status is classified as sensitive personal data under GDPR and many state privacy laws. Its exposure, combined with physical location data and location mismatch fraud flags, could be used to identify individuals working from countries where they lack authorization — creating potential immigration enforcement risk.

7. Device Fingerprints, Network Identifiers and Surveillance Data

Data Element	Database Column(s)	Source Table(s)	Format Observed in Sample
IP address	`ip`	`InsightfulScreenshots`	`71.194..` (full IPv4 address, geolocatable)
MAC address	`gateways`	`InsightfulScreenshots`	`["1C:93:7C:::**"]` (unique hardware identifier)
Hardware fingerprint (HWID)	`hwid`	`InsightfulScreenshots`	`8f9f16f0-1fb7-47e4-a2a1-209838aa5c5e` (persistent device ID)
Computer hostname	`computer`	`InsightfulScreenshots`	`desktop-ue2kgro`
Operating system & version	`os`, `osVersion`	`InsightfulScreenshots`	`win32`, `10.0.19045`
Application file path	`appFilePath`	`InsightfulScreenshots`	`C:\Program Files\Google\Chrome\Application\chrome.exe`
Active window title	`windowTitle`	`InsightfulScreenshots`	Full window title revealing document/conversation content
Browser URL visited	`browserUrl`	`InsightfulScreenshots`	Full URL being viewed at time of screenshot
Desktop screenshot image	`storageUrl`	`InsightfulScreenshots`	Direct S3 URL to actual screenshot image file
Productivity score	`externalProductivityScore`	`InsightfulScreenshots`	Numeric productivity rating per screenshot interval
Timezone	`timezone`	`InsightfulScreenshots`, `Timelog`	`America/Chicago` — reveals approximate geographic location
Session duration	`duration`, `timeStart`, `timeEnd`	`Timelog`	Exact milliseconds worked per session
Pay deduction reason	`reasonForDeduction`, `appName`	`Deductions`	Why money was subtracted from pay, linked to specific application

The combination of IP address + MAC address + HWID creates a triple device fingerprint that uniquely identifies not just the person but the specific physical machine they used. Under GDPR, device fingerprints are explicitly classified as personal data. Under CCPA, unique device identifiers constitute personal information.

8. Fraud Profiling and Algorithmic Decision-Making

Data Element	Database Column(s)	Source Table(s)	Format Observed in Sample
Fraud probability score	`posteriorProbability`, `modelScore`	`FraudEvents`, `FraudSignalAuditLog`	Bayesian probability (0.0–1.0) that individual is fraudulent
Fraud decision	`currentDecision`, `status`	`FraudStates`, `FraudCheck`	`APPROVE` / `ESCALATE` / `REJECT` — algorithmic verdict on individual
LLM-generated fraud reasoning	`currentReasoning`, `manual_review_rational`	`FraudStates`, `FraudCheck`	AI-written paragraph explaining why individual was flagged: "The primary concern is a maximum location mismatch score of 1.0, indicating the user's IP address is entirely inconsistent with their stated profile location..."
Fraud signal inventory	`currentKeySignals`, `flag_reasons`	`FraudStates`, `FraudCheck`	`["location_mismatch: 1.0", "email_diff: 0.125", "email_is_pwned: False"]`
HaveIBeenPwned result	`email_is_pwned` (in signals)	`FraudStates`	Whether contractor's email was found in known data breaches
VPN/Tor detection	Referenced in fraud signals	`FraudStates`, `FraudSignalAuditLog`	Whether VPN or Tor usage was detected
Cheating detection	`isCheating`, `cheatingProbability`, `signs`	`CheatingDetection`	Whether individual was flagged for cheating during interviews
Duplicate account detection	`userIdList`	`DuplicateGroups`	Groups of accounts believed to belong to the same person

Automated fraud decisions directly impacted individuals' ability to earn income through the platform. Under GDPR Article 22, individuals have the right not to be subject to decisions based solely on automated processing that produce legal or similarly significant effects. The exposure of the complete fraud reasoning — including the LLM-generated explanations — reveals the inner workings of an automated decision-making system that determined whether people could work and earn money.

9. Communications and Third-Party PII

Data Element	Database Column(s)	Source Table(s)	Format Observed in Sample
In-platform message content	`content`	`Comms`, `CommsSent`	Full text of messages between contractors, recruiters, and clients
Outreach email content	`subject`, `content`, `messageTemplate`	`EmailTemplates`, `OffPlatformCampaignSteps`	Full email templates with subject lines
Phone call logs	Call metadata	`AircallComms`	Aircall VoIP call records
Professional reference PII	`name`, `email`, `company`, `relationship`	`UserReferences`	Third parties' names, emails, and employers — people who did not sign up for Mercor
LinkedIn profiles of non-users	`linkedinUrl`, `email`	`LinkedinWarmIntros`	Full LinkedIn URLs and email addresses of people contacted for warm intros
Voucher/endorser PII	`voucherUserId`, `candidateEmail`, `candidateName`, `candidateLinkedinId`	`CandidateVouches`	Names, emails, and LinkedIn IDs of both vouchers and vouched-for candidates
Recruiter notes	`noteBody`, `notesForCandidate`	`ListingNotes`, `Candidates`	Candid internal commentary about individuals

The exposure of third-party PII is particularly significant for legal liability. UserReferences contains the names, email addresses, employers, and relationships of professional references — individuals who never created Mercor accounts and never consented to having their data stored in Mercor's production database. LinkedinWarmIntros contains LinkedIn URLs and emails of people contacted for recruitment outreach. These third parties had no contractual relationship with Mercor and no opportunity to consent to or opt out of data collection.

10. PostHog Behavioral Analytics De-Anonymized

Data Element	Database Column(s)	Source Table(s)	Format Observed in Sample
User email linked to analytics session	`userEmail`	`PosthogAnalytics`	Personally identified analytics sessions (defeating anonymization)
Company context	`company`	`PosthogAnalytics`	Which company the user was associated with during the session
Session timing	`startTimeUtc`, `endTimeUtc`	`PosthogAnalytics`	Exact session start/end times
Active/inactive time	`activetime`, `inactivetime`	`PosthogAnalytics`	How long the user was actively engaged vs. idle
Entry URL	`startUrl`	`PosthogAnalytics`	The URL the user was on when the session started

PostHog sessions are typically anonymous or pseudonymous. The PosthogAnalytics table explicitly links userEmail to session data — effectively de-anonymizing behavioral analytics and creating a personally identifiable record of how each contractor and company user navigated the platform.

Legal Significance of This PII Inventory

Any single category above would trigger breach notification obligations under most privacy laws. The combination creates exposure across multiple overlapping regulatory regimes:

Regulation	Applicable Data	Key Provisions
GDPR (EU/UK)	All categories — Mercor processes data of EU/UK contractors (sample shows `United Kingdom, Harrow` residence)	Articles 5, 6, 9 (special categories), 13-14 (transparency), 22 (automated decisions), 33-34 (breach notification within 72 hours)
CCPA/CPRA (California)	Personal identity, financial, employment, device identifiers, behavioral analytics	Right to know, right to delete, right to opt-out of sale/sharing, private right of action for data breaches resulting from failure to maintain reasonable security
Illinois BIPA	Facial geometry scans from Persona liveness detection, facial comparison thumbnails stored as image keys	$1,000–$5,000 per violation statutory damages, private right of action, no harm requirement
FCRA (Federal)	Background check results, adverse media checks, fraud decisions used for employment decisions	Requires permissible purpose, adverse action notices, accuracy obligations, private right of action
ECPA / Wiretap Act	Desktop screenshots capturing communications, browser URLs, window titles	Consent requirements for interception of electronic communications
State Data Breach Notification Laws (all 50 US states)	Name + financial account number, name + SSN, name + government ID	Mandatory notification to affected individuals, typically within 30-60 days
PIPEDA (Canada)	All categories — sample shows Canadian contractor (`country: CA`, `BANK OF M*****`, `routing_number: 000-***`)	Breach notification to Privacy Commissioner and affected individuals
SOX / PCI-DSS	Financial account data, payment card information if present, bank routing numbers	Compliance obligations for financial data handling

The exposed data supports claims for:

Negligence — Failure to implement reasonable security measures for highly sensitive personal data
Breach of contract — Violation of privacy commitments made to contractors in terms of service and privacy policies
Breach of fiduciary duty — Mishandling of financial and identity data entrusted to Mercor as an employment intermediary
Violations of specific statutes — BIPA (facial geometry scans from Persona KYC liveness detection), FCRA (background check data used in employment), CCPA (failure to maintain reasonable security), GDPR (multiple articles)
Unjust enrichment — Mercor profited from collecting and processing this data without adequately protecting it
Third-party claims — Professional references, LinkedIn contacts, and vouching parties whose data was collected without direct consent

The Scope - Who Is Affected

The breach affects multiple distinct populations, each with different legal standing:

Contractors (Primary Class) — Every person who signed up, completed an interview, or performed work through Mercor has their full PII exposed: full legal name, personal email, phone number, date of birth, home address, government ID verification status, bank name and routing number, employment terms with exact pay rates, performance reviews with dismissal reasons, and in many cases desktop screenshots of their computer screens while working. The MercorUserFinancials table alone contains sufficient information for bank account fraud — the bank name, routing number, last four digits of account number, account holder name, and country are all stored in plaintext JSON.
Client Companies — Companies that hired through Mercor have their project names (including OpenAI, Apertus - Elephant), internal tooling references, billing details, hiring criteria, candidate evaluation notes, Slack workspace URLs, Okta SSO group configurations, and annotation platform URLs exposed. These include some of the most valuable and secretive AI organizations on the planet.
Mercor Employees — Internal staff are identifiable through the IacDeploymentRuns table (GitHub usernames as actor fields), CatfishAuditLog (Slack user IDs and real names), DATABASECHANGELOG (migration author names), MLExperimentsJobPerformanceReviews (reviewer names like A*** K*****), and the IAM table (users with ghost role assignments within client companies).
Third Parties Who Never Consented — Professional references (UserReferences) provided their name, email, employer, and relationship to the contractor. LinkedIn contacts (LinkedinWarmIntros) had their profile URLs and email addresses stored. Vouching parties (CandidateVouches) provided detailed relationship information. These individuals had no direct contractual relationship with Mercor, likely received no privacy notice, and had no opportunity to consent to or opt out of data collection. Their data was collected incidentally through the contractors they were associated with.

The Scale - Mercor Client Ecosystem

What elevates this breach from a typical startup data leak to an industry-wide crisis is who Mercor's clients are.

Meta, OpenAI, and Google DeepMind are among Mercor's publicly known clients — as reported by the Wall Street Journal — but even our small sample reveals direct evidence of engagements with at least six major technology companies, plus numerous additional clients identifiable through project codenames and Airtable workspace names.

Confirmed Client Engagements Found in the Sample

The sample file contains not just the production database tables but also an ./EXPORTS/ directory with full Airtable workspace dumps — organized by client name. These exports contain the actual work product: prompts, model responses, evaluation rubrics, and contractor submissions. The client names appear directly in the directory structure:

Client	Evidence in Sample	What Was Exposed
Apple	Airtable workspace: `AIRTABLE_APPLE_ENDPOINT_SANDBOX_APP3PG4U42BALES9K` containing tables: `TEXT`, `DEEP_L`, `TEXT_ORCHESTRATOR`, `RUBRIC_AUTO_GEN`	Apple's proprietary AI model outputs. The `TEXT` table contains prompt-response pairs from Apple Foundation Models (`afm-text-083`, `afm-model-085`, `afm-model-086`) — Apple Intelligence's internal language models. Sample: model `afm-text-083` responding to user prompts with temperature=0.7, top_p=0.9. The `DEEP_L` table shows translation evaluation (text→Spanish). The `TEXT_ORCHESTRATOR` table shows orchestrator model (`afm-model-086`) being tested. This is pre-release Apple Intelligence evaluation data.
Amazon	Airtable workspace: `AIRTABLE_AMAZON_LLM_COT_EVALUATION___UPDATED_APP0JM1SJ4XOHMAQC` containing tables: `DOMAINS`, `PHASE_1_TASKS`, `PHASE_1_REVIEWS`, `TALENT`	Amazon's LLM Chain-of-Thought evaluation data. The `DOMAINS` table shows evaluation categories (`math`, `stem`). The `PHASE_1_TASKS` table contains full model A vs. model B comparison data with complete Chain-of-Thought reasoning traces, final responses, and preference judgments. Tasks are claimed by named Mercor staff (e.g., `n**k@mercor.com`). This exposes Amazon's internal model evaluation methodology and scoring rubrics.**
OpenAI	Performance review record: `Account: OpenAI`, `Project: Apertus - Elephant`, reviewed by named staff. Feather platform URL: `feather.openai.com/campaigns/998855ab-...`. Project codename in `Projects_Audit`.	Named contractor (`A*** D`, `a***@gmail.com`) rated `4 - Redefines Expectations` on OpenAI project work. Direct URL to OpenAI's internal Feather annotation platform with campaign UUID.
Anthropic	Airtable workspace: `AIRTABLE_API_PREFERENCE` containing `PROMPTS`, `RESPONSES`, `ROLES`, `DOMAINS` tables. Project: `GPT-4 vs Claude Evaluation` comparing GPT-4 and Claude 3.5 Sonnet. `AgentSandboxes` table shows `agentType: claude`.	LLM preference evaluation data comparing Anthropic's Claude 3.5 Sonnet against GPT-4 across use cases. AI coding agent sandbox sessions running Claude. Exposes model comparison methodology and evaluation criteria.
Meta	Publicly confirmed client per WSJ. Project references in `Projects_Audit` and `ProjectIntegrations`.	Contractor work product, project configurations, Slack workspace integrations.
Google DeepMind	Publicly confirmed client per WSJ.	Contractor work product and project data in the full database.

Airtable Workspace Inventory

The sample file reveals 25+ distinct Airtable workspaces that were exported as part of the breach. Each workspace name follows a pattern that often includes the client name or project identifier. Beyond the named clients above, the Airtable exports include:

Airtable Workspace	Domain	Notable Tables
`APEX_LEGAL`	APEX benchmark - Legal	`TASKS`, `CRITERIA`, `TALENT`, `LLM_CALL_CONFIGURATION`
`APEX_INSURANCE`	APEX benchmark - Insurance	`TASKS`, `CRITERIA`, `TALENT`, `IMPORTED_TABLE`
`APEX_DATA_SCIENCE`	APEX benchmark - Data science	`TASKS`, `CRITERIA`, `TALENT`, `LLM_CALL_CONFIGURATION`
`APEX_MECHANICAL_ENGINEERING`	APEX benchmark - Engineering	`TASKS`, `HELPER`, `FAILURE_ANALYSIS`, `TALENT`
`APEX_DIY`	APEX benchmark - DIY/consumer	`TASKS`, `CRITERIA`, `TALENT`
`ATHENA_HLE___RUBRICS`	Athena HLE (Humanity's Last Exam) rubrics	`TASKS`, `MODEL_RESPONSES`, `AWAITING_REVIEW_METRICS`
`ATHENA_HLE__STEM_`	Athena HLE STEM evaluation	`ATHENA_STEM_V_1`, `QA_SPECS`
`BEAR_MEDICINE`	Medical domain tasks	`DISCIPLINES`, `REVIEWER_ASSESSMENT`, `WRITER_DAILY_ACTIVITY`, `BONUS_PAYOUTS`, `PODS`
`AIME_RUBRICS`	AIME (math competition) rubrics	`TEAMS`, `TASKS`, `USERS`
`ARXIV_Q_A` (multiple versions)	Academic paper Q&A generation	`WORK_QUEUE`, `DOUBLE_BLIND`, `LEAD_AUDIT_QA`, `TESTING_ARXIV_LINKS`
`AUTO_REVIEWER`	Automated review system	`SUBMISSIONS`, `LLM_CALL_CONFIGURATIONS`, `PROJECTS`
`09_29_CAND_MODEL_EVAL`	Candidate model evaluation (IB1, IB2, CML)	`IB_1`, `IB_2`, `CML`, `CML_DEPRECATED_`
`API_PREFERENCE`	API preference evaluation	`PROMPTS`, `RESPONSES`, `ROLES`, `DOMAINS`, `PROMPT_TEMPLATES`
`APEX_EXPANSION_WEBSITE_TASKS`	Website-related expansion	`CRITERION`, `FILE`, `TASK`
`APEX_EVALS`	General evaluation framework	`EVALUATION_RESULTS`
`APEX_V1_REVISION`	Apex V1 revision	`EXPERT`, `RUBRIC`, `CRITERION`, `ROLE`

The ATHENA_HLE workspaces are particularly significant — "HLE" likely refers to Humanity's Last Exam, a high-profile AI benchmark designed to test frontier model capabilities. The MODEL_RESPONSES table in the rubrics workspace suggests Mercor contractors were grading AI model outputs against this benchmark, and the AWAITING_REVIEW_METRICS table indicates an active review pipeline. If this data reached adversarial actors, it could be used to game or contaminate one of the most important AI evaluation benchmarks.

The BEAR_MEDICINE workspace reveals medical domain annotation work with DISCIPLINES, REVIEWER_ASSESSMENT, and WRITER_DAILY_ACTIVITY tables — indicating Mercor contractors were creating or evaluating medical AI training data, adding healthcare data to the breach's sensitivity profile.

Evidence from Named Projects in the Database

Beyond the Airtable exports, the production database tables contain additional project references:

Project Codename	Domain	Evidence Source
Apertus — Elephant	AI model evaluation (OpenAI-linked)	`MLExperimentsJobPerformanceReviews`: `Account: OpenAI`
Project Mega	Large-scale annotation (dedicated Slack workspace: `project-mega.slack.com`)	`ProjectIntegrations`, `ActionsQueue`
Panacea — Consulting RL Envs	Reinforcement learning environments	`Projects_Audit`, 400+ billable hours
Agentic Code Final QC Audit	AI code generation quality control (GitHub issue solving)	`TaskDefinitions`
GPT-4 vs Claude Evaluation	LLM preference ranking (GPT-4 vs Claude 3.5 Sonnet)	Airtable export: `AIRTABLE_AIRTABLE_AI_AGENT_DEMO`
Creative Writing Evals	Creative content evaluation	`Projects_Audit`
arXiv Q&A	Academic paper Q&A generation (multiple Airtable versions incl. Snowflake integration)	Airtable exports (3+ copies with dates)
Queensland (litigation)	Legal domain	`Projects_Audit`
FP&A / Corporate Finance	Finance domain	`Projects_Audit`
Obsidian	Human data client (`billingModel: "invoice"`, tagged `humandataclient`)	`Company`

The Magnificent Seven, Frontier AI Labs, and the Competitive Fallout

Mercor is not a niche startup. According to Big Think and TechCrunch, Mercor has signed deals with six of the seven "Magnificent Seven" tech giants — Apple, Microsoft, Alphabet, Amazon, Meta, and Nvidia — plus frontier model developers OpenAI and Anthropic. The company employs over 30,000 contractors, pays an average rate of $95/hour, and reached a $500 million annual revenue run rate within 17 months of launch. It is valued at $10 billion.

This means the stolen data — the 211GB database, the 939GB of source code, the 3TB of cloud storage, and the 84 Airtable workspaces documented above — contains the operational records, AI training data, and work product for engagements touching nearly every major AI program in the Western world.

The small sample analyzed in this report already confirms direct evidence of work for Apple (Foundation Model outputs), Amazon (LLM Chain-of-Thought evaluation), OpenAI (Feather platform, Apertus project), Anthropic (Claude evaluation), and Meta (multimedia annotation templates). The full 211GB database — which we have not seen — would contain the complete records for all six Magnificent Seven clients plus the frontier labs.

The competitive implications are severe:

The training data itself is the prize. The leaked RLHF annotations, model evaluation data, and preference rankings produced by Mercor's contractors represent billions of dollars in training data investment. This data — now in the hands of Lapsus$ and available to any buyer — could be used by any competitor to accelerate their own model development without incurring the cost of generating it. As Y Combinator president Garry Tan noted: "Incredible amount of SOTA training data now just available to China thanks to @mercor_ai leak. Every major lab. Billions and billions of value."
Apple Foundation Model outputs are in the dump. The AIRTABLE_APPLE_ENDPOINT_SANDBOX workspace contains actual afm-text-083 and afm-model-086 model responses — pre-release Apple Intelligence outputs. These provide direct insight into Apple's model capabilities, safety alignment approach, and weaknesses before public release. Any competitor — whether a Silicon Valley rival or a lab in Beijing, London, or Tel Aviv — now has access to Apple's unreleased model behavior.
Amazon's Chain-of-Thought evaluation methodology is exposed. The AIRTABLE_AMAZON_LLM_COT_EVALUATION workspace reveals how Amazon evaluates LLM reasoning quality, including the full prompts, complete Chain-of-Thought traces, and preference rubrics. The methodology itself is as valuable as the data — it reveals what Amazon considers "good reasoning" and how they measure it.
The Anthropic/Claude evaluation data could inform adversarial attacks. The preference evaluation data comparing Claude 3.5 Sonnet against GPT-4 — including the exact prompts, response pairs, and preference reasoning — could be used to identify weaknesses in Claude's alignment or to train models that specifically exploit those weaknesses.
Mercor's global contractor base spans dozens of jurisdictions. With 30,000+ contractors across many countries, Mercor's database contains work authorization records, physical location data, and IP-based geolocation. The platform's fraud detection system flags contractors whose physical IP doesn't match their declared residence — meaning the database contains a map of which contractors may be working from undisclosed locations.

Beyond the companies confirmed in the data, multiple sources — including former Mercor employees — claim that Mercor also maintains engagements with Chinese AI laboratories, including companies developing frontier models that compete directly with the labs whose training data is now in the breach. If true, this means Mercor was a single point of compromise connecting competing labs on opposite sides of the global AI race, with training data, evaluation methodologies, model outputs, and contractor talent pools for all of them sitting in the same breached infrastructure.

Even setting aside the question of direct Chinese client relationships, the stolen data — RLHF annotations, preference rankings, model evaluation rubrics, and Chain-of-Thought traces produced for OpenAI, Anthropic, Apple, Amazon, Meta, and Google — is now available on the black market. Given that Lapsus$ is actively auctioning the data, this material will reach whoever is willing to pay for it.

The TaskDefinitions table also references autograder configurations using openai/gpt-4.1 and openai/gpt-5 as scoring models, and task rubrics include constraints like "LLMs other than ChatGPT are prohibited" — rules that only make sense when the work product is destined for a specific model vendor's training pipeline.

The scope of client engagements extends far beyond AI companies. The Airtable workspaces alone span legal, insurance, data science, mechanical engineering, medicine, academic research, and mathematics — suggesting Mercor's contractor workforce touches data and systems across a wide range of industries. Any attacker with access to the full dump could enumerate every active client engagement by cross-referencing the Company, Projects_Audit, ProjectIntegrations, Listings_New tables, and the complete Airtable export directory.

The Airtable Export - 84 Workspaces, 1055 Files

A separate directory tree from the breach (EXPORTS/) reveals the full structure of the exfiltrated Airtable data. The export contains 84 unique Airtable workspaces totaling 1,055 JSONL files — each file containing the complete contents of one Airtable table. This is not a sample. It is the complete export of every Airtable base connected to Mercor's Fivetran data pipeline.

The directory structure reveals how Airtable sits at the center of Mercor's operation. It is used as:

The annotation task management system — Every domain-specific project has its own Airtable base with a standardized schema: TASKS, TASK_VERSIONS, CRITERIA, DOMAIN, SUBDOMAIN, TALENT, QA_SPECS, WORKFLOW, LLM_CALL_CONFIGURATION, CONTROL_PANEL, and FILES. This is a fully industrialized annotation pipeline.
The work product repository — Tables like PHASE_1_TASKS (Amazon), TEXT (Apple), PROMPTS/RESPONSES (API Preference), and MODEL_RESPONSES (Athena HLE) contain the actual task inputs and outputs — the prompts sent to AI models, the model responses, and the human evaluations. This is the training data itself.
The talent and compensation ledger — TALENT tables appear in nearly every workspace, tracking which contractors worked on which tasks. CALCULATED_BONUSES, BONUS_PAYOUTS, TIMELOG, and CLAIMS tables track compensation. WRITER_STATS, REVIEWER_STATS, and WRITER_DAILY_ACTIVITY tables (in BEAR_MEDICINE) track individual productivity.
The QA and audit system — QA_SPECS, LEAD_AUDIT_QA, DOUBLE_BLIND, and REVIEWER_ASSESSMENT tables track quality control processes.

The named workspaces can be organized into categories that reveal the full breadth of Mercor's operations:

Client-Named Workspaces (Direct Client Evidence):

Workspace	Client	Content
`APPLE_ENDPOINT_SANDBOX`	Apple	Apple Foundation Model outputs (`afm-text-083`, `afm-model-086`), translation testing (`DEEP_L`), orchestrator testing (`TEXT_ORCHESTRATOR`), rubric auto-generation
`AMAZON_LLM_COT_EVALUATION` (2 versions)	Amazon	LLM Chain-of-Thought evaluation: `DOMAINS`, `PHASE_1_TASKS`, `PHASE_1_REVIEWS`, `MODEL_A_STRENGTHS`
`AAIE___META_MULTIMEDIA_TEMPLATE_COMMAND_CENTER`	Meta	Meta multimedia annotation template with `OVERALL_META`, `PROJECTS`, `FORMS`, and `TEMPLATE` tables. Workspace name explicitly says "META" and "USE META_X_MULTIMEDIA_SPL_AIRTABLE_TEMPLATE"
`API_PREFERENCE` / `API_PREFERENCE_V2` / `API_PREFERENCE__COPY__FOR_BRENDAN` / `API_PREF___KANIX`	Anthropic/Multi-vendor	LLM API preference evaluation: `PROMPTS`, `RESPONSES`, `ROLES`, `DOMAINS`, `PROMPT_TEMPLATES`, `QA`. Multiple versions and personal copies for named staff

APEX - Mercor's AI Benchmark Suite (Compromised):

The APEX_ prefix identifies Mercor's proprietary suite of AI benchmarks — domain-specific evaluation frameworks used to measure AI model performance across verticals. Each APEX benchmark has its own Airtable workspace with a standardized schema: TASKS, TASK_VERSIONS, CRITERIA, DOMAIN, SUBDOMAIN, QA_SPECS, WORKFLOW, LLM_CALL_CONFIGURATION, and CONTROL_PANEL. The complete APEX suite spans 15+ domains:

Workspace	Benchmark Domain	Notable Tables
`APEX_LEGAL`	Legal reasoning	Standard APEX schema
`APEX_INSURANCE`	Insurance domain	Standard APEX + `IMPORTED_TABLE`
`APEX_FINANCE`	Financial services	Standard APEX + `HELPER`
`APEX_ACCOUNTING`	Accounting	Standard APEX
`APEX_CONSULTING`	Management consulting	Standard APEX + `TEST_HEX_TABLE`
`APEX_DATA_SCIENCE`	Data science	Standard APEX
`APEX_MECHANICAL_ENGINEERING`	Engineering	Standard APEX + `FAILURE_ANALYSIS`, `HELPER`
`APEX_MEDICINE`	Medical/healthcare	Standard APEX
`APEX_FOOD`	Food industry	Standard APEX + `DELIVERIES`
`APEX_GAMING`	Gaming	Standard APEX
`APEX_RETAIL___E_COMMERCE`	Retail & e-commerce	Standard APEX + `DOMAIN_QC`
`APEX_SALES___MARKETING`	Sales & marketing	Standard APEX
`APEX_SHOPPING_STYLISTS`	Personal shopping	Standard APEX
`APEX_DIY` (2 versions)	DIY/consumer	Standard APEX
`APEX_WEBSITE_TASKS` / `APEX_EXPANSION_WEBSITE_TASKS`	Web content	`CRITERION`, `FILE`, `TASK`

The exposure of the complete APEX benchmark suite — including all tasks, criteria, scoring rubrics, and LLM_CALL_CONFIGURATION — renders these benchmarks untrustworthy. Any AI model trained on the leaked APEX data will appear to perform well on these benchmarks without genuinely possessing the evaluated capabilities. This is benchmark contamination at scale. Unless Mercor rebuilds the entire APEX suite from scratch with new tasks, new criteria, and new evaluation data, every APEX benchmark result produced after this breach is suspect. The EVALS workspace — which contains APEX_RESULTS, BOREALIS_RESULTS, and LUCIUS_RESULTS — further confirms that APEX was actively used to evaluate and compare models, making the contamination risk concrete and immediate.

Other AI Benchmark and Evaluation Workspaces:

Workspace	Purpose	Notable Tables
`ATHENA_HLE___RUBRICS`	Humanity's Last Exam rubric grading	`MODEL_RESPONSES`, `AWAITING_REVIEW_METRICS`, `CLAIMS`
`ATHENA_HLE__STEM_` (4 versions incl. July 3, 2025 dated copy)	HLE STEM vertical evaluation	`ATHENA_STEM_V_1`
`APEX_HLE_BASED_RUBRICS`	HLE-derived rubric system	`CRITERIA`, `LLM_CALL_CONFIGURATION`
`APHRODITE__SEARCH_HLE`	Search-based HLE evaluation	HLE search variant
`ACADEMIC_REASONING_SFT`	Supervised fine-tuning for academic reasoning	`COT` (Chain-of-Thought), `ROLES`, `TALENTS`
`AIME_RUBRICS`	AIME math competition rubrics	`TEAMS`, `USERS`, `TASKS`
`EVALS` / `EVALS__COPY_`	General evaluation framework	`APEX_RESULTS`, `BOREALIS_RESULTS`, `LUCIUS_RESULTS`, `_09_04_HLE_RUBRICS`
`09_29_CAND_MODEL_EVAL` (5 versions)	Candidate model evaluation (IB1, IB2, CML)	Iterative model comparison datasets

Medical Domain Workspaces:

Workspace	Purpose	Notable Tables
`BEAR_MEDICINE`	Medical annotation	`DISCIPLINES`, `REVIEWER_ASSESSMENT`, `ASSESSMENT`, `WRITER_DAILY_ACTIVITY`, `REVIEWER_STATS`, `WRITER_STATS`, `ALL_TIME_TOP_5`, `BONUS_PAYOUTS`, `CLAIM_LOCK`, `AHT_STATS`, `ASSESSMENT_ANALYSIS`, `PODS`
`BEAR_RADIOLOGISTS`	Radiology-specific annotation	Radiologist-specific tasks
`BANKERS`	Financial/banking domain	Banking-specific tasks

Aircall Integration (complete phone system export):

The export also includes a full Aircall directory — Mercor's VoIP phone system — containing 27 tables: CALL, CALL_TRANSCRIPTION, CALL_TRANSCRIPTION_CONTENT_UTTERANCE, CALL_SENTIMENT, CALL_SENTIMENT_PARTICIPANT, CALL_SUMMARY, CALL_ACTION_ITEM, CALL_TAG, CALL_TOPIC, CONTACT, CONTACT_EMAIL, CONTACT_NUMBER, USERS, USER_AVAILABILITY, and more. This represents the complete call history including full transcriptions, sentiment analysis, AI-generated summaries, and contact information for every recruiter phone call.

What the Airtable Export Means:

The Airtable export transforms this breach from a database leak into a complete AI training data theft. The database tables documented in the rest of this article provide the metadata — who worked on what, when, and how much they were paid. The Airtable export contains the actual work product: every prompt, every model response, every human evaluation, every rubric score, every Chain-of-Thought trace, and every preference judgment that Mercor's contractors produced for Apple, Amazon, OpenAI, Anthropic, Meta, and dozens of other clients.

The iterative versioning visible in the workspace names (e.g., APEX_RUBRICS with 12+ dated copies from August 7, 2025 through January 23, 2026) reveals that this export captured the complete historical evolution of Mercor's benchmark and evaluation pipeline — not just a snapshot, but the full development history of rubrics, task definitions, and evaluation criteria across months of refinement. For the APEX benchmarks specifically, this means every iteration of every benchmark task is now public — an attacker can study how the benchmarks evolved and craft model training data that targets the final versions.

Customer and Third-Party Platform URLs Found in the Dump

Beyond project codenames, the dump contains direct URLs to customer platforms, internal tools, and third-party services — embedded in configuration fields, JSON blobs, onboarding documents, and metadata columns across dozens of tables. An exhaustive search of the file reveals 1,800+ unique URLs. The most sensitive are catalogued below.

Client Annotation and Work Platforms

These are URLs to the actual platforms where Mercor contractors perform work for clients. Each one identifies a specific client engagement and, in many cases, a specific campaign or task within that client's systems:

URL / Domain	What It Reveals	Source Table
`feather.openai.com/campaigns/998855ab-60e7-4aed-9f08-5fccd56fe53e`	OpenAI's internal Feather annotation platform — a specific campaign UUID, confirming Mercor contractors work directly inside OpenAI's tooling	`Projects_Audit` (annotationPlatform)
`alabaster-studio.com/project/abacus/conversation/7c9facb4-...`	A client project management / collaboration platform — captured as the live browser URL during a monitored work session	`InsightfulScreenshots` (browserUrl)
`glowstone-mli-rubrics.slack.com` (channels: `C0994P7BH2N`, `D09969QHV62`)	A client-specific Slack workspace for MLI rubric development — likely a client or partner organization's dedicated workspace	`ProjectIntegrations`, `ActionsQueue`
`project-mega.slack.com`	A dedicated Slack workspace for a single large-scale annotation project	`ProjectIntegrations`
6 distinct Airtable workspace IDs (`appX7l7xADlyFD3nL`, `appEzeshKTIKSrvBV`, `app9DBchZKUj2auMZ`, `appCZwMqiIUkP7KIQ`, `appLmn3266lQsaUXK`, `appYFQOZicXUoO2yz`)	Airtable used as an annotation and project management platform — each app ID is a distinct workspace, likely per-client or per-project	`Projects_Audit`, `OnboardingDocument`
`ta-01km6j8ztpd4vttvzb7ctgqteh-8080-ms3c95f46vnxcii7cwsi84ago.w.modal.host`	A Modal.com serverless deployment — indicating Mercor or a client runs ML model inference on Modal	`AgentSandboxes` or service configuration

Mercor Internal Infrastructure URLs

These URLs expose Mercor's own internal architecture, allowing an attacker to map the entire operational surface:

URL / Domain	What It Reveals	Source Table
`work.mercor.com`	Primary contractor work portal (100+ URLs with job IDs like `/create/job_AAABm...`)	`Comms`, `ActionsQueue`
`team.mercor.com`	Company-facing team portal	`Comms`, `EmailTemplates`
`talent.docs.mercor.com/how-to/okta-access`	Internal documentation portal — includes onboarding guides for Okta and Insightful setup	`ActionsQueue`
`api.mercor.com`	API gateway endpoint	Configuration fields
`dev.coil.mercor.com`	Development webhook endpoint for the `coil` microservice	`ProjectIntegrations`
`coil.mercor.com`	Production `coil` service endpoint	`ProjectIntegrations`
`c-mercor.okta.com`	Okta SSO instance — the identity provider for all contractor and staff authentication	`ActionsQueue`, `UserMetadata`
`linear.app/mercor`	Mercor's Linear issue tracker — exposes internal engineering project management	Configuration metadata
`pic-gen.r2.mercor.com`	Cloudflare R2 image generation service	Asset URLs
`ddcd-2601-642-4c01-5a8d-...ngrok-free.app`	An ngrok development tunnel — a temporary public URL exposing a local dev server, including the developer's IPv6 address embedded in the subdomain	Webhook configurations

AWS S3 Buckets

Each S3 bucket below contains files that are directly addressable via URL if the bucket permissions are misconfigured. The bucket names alone reveal the categories of stored data:

S3 Bucket	Contents
`mercor-insightful-screenshots-production`	Every screenshot captured from contractor desktops during monitored work
`mercor-background-check-photos`	Background check identity documents and photographs
`ai-interviewer-recordings`	Audio/video recordings of AI-conducted interviews
`dailyco-recordings`	Daily.co video call recordings
`production-pdx-5557735*****-web-recordings`	Production call recordings (AWS account ID `5557735*****` is embedded in the bucket name)
`kite-uhn-brain-injury.s3.ca-central-1.amazonaws.com`	Medical documents — bucket name references brain injury records at UHN (University Health Network), a major Canadian hospital system
`certn-api-s3-certn-images-ca-central-1-production`	Certn identity verification images
`certn-api-s3-certn-rcmp-documents-ca-central-1-production`	RCMP (Royal Canadian Mounted Police) criminal record check documents
`certn-api-s3-one-id-images-ca-central-1-production`	OneID government identity verification images

The S3 bucket kite-uhn-brain-injury is particularly alarming — it suggests that either Mercor or a client project involved handling protected medical records, and the bucket name alone leaks the nature of the data and the institution involved.

Google Workspace Documents

The dump contains direct URLs to 30+ Google Docs, 2+ Google Sheets, 2+ Google Forms, and 10+ shared Google Drive folders used for project onboarding, task instructions, rubric definitions, and team coordination:

docs.google.com/document/d/1111XpiZ9eZvH8X_... — Onboarding materials
docs.google.com/document/d/1770ZnTy0_Yt-U-U7W... — Project documentation
docs.google.com/spreadsheets/d/10LWCzAD1e-J8W7v... — Tracking spreadsheets
docs.google.com/forms/d/e/1FAIpQLSdLnOJ9DZoq... — Assessment/intake forms
drive.google.com/drive/folders/14eFptQgb2FjWoFh... — 10+ shared project folders

Many of these Google Docs likely remain live and accessible if the sharing permissions are set to "anyone with the link" — a common practice for contractor onboarding materials.

Communication and Collaboration Evidence

Platform	Evidence	Count
Slack	4 distinct workspaces: `mercor.enterprise.slack.com`, `project-mega.slack.com`, `glowstone-mli-rubrics.slack.com`, `6385b64336a9545.slack.com`	4 workspaces, 5+ named channels
Google Meet	Meeting room codes: `deo-ixih-ivt`, `cae-eois-jwn`, `hhr-erjm-svp`, `pmi-ogrs-aap`, `szd-qvcr-hfp`, `zoz-shgt-epy`	6+ meeting rooms
LinkedIn	Contractor profile URLs with full names	Multiple profiles
Aircall	Call recordings via `media-web.aircall.io` and `assets.aircall.io`	Recruiter phone call audio
Ashby HQ	Job postings at `jobs.ashbyhq.com` and `app.ashbyhq.com`	Hiring platform
Certn	Background check portals: `mercor.certn.co/hr/applications/{uuid}/`, enrollment at `certn.trustmatic.ws/web-enrolment/`	Identity verification flows

What This URL Inventory Means

An attacker with this data does not need to guess what Mercor's clients are or what systems contractors access. The URLs are already in the database. Specifically:

OpenAI's Feather platform URL with a campaign UUID gives an attacker a direct entry point to probe OpenAI's annotation infrastructure
S3 bucket names allow targeted enumeration attacks — checking whether buckets are publicly accessible or brute-forcing object keys based on the naming patterns visible in the dump
Google Docs and Drive folders may still be live and accessible if shared via link — giving an attacker access to project rubrics, onboarding materials, and task instructions
Slack workspace identifiers enable social engineering against teams working on specific projects
The ngrok tunnel URL embeds a developer's IPv6 address, adding another vector for targeting Mercor engineering staff
The AWS account ID (5557735*****) embedded in the S3 bucket name enables targeted cloud reconnaissance

The Screenshot Problem

The most dangerous element of this breach is the Insightful time-tracking screenshot system — and the danger compounds with every client Mercor serves, every platform URL catalogued above, and every S3 bucket of screenshots that can be systematically correlated.

Mercor requires contractors to install the Insightful (formerly Workpuls) monitoring agent on their computers. This agent captures a screenshot of the contractor's desktop every few minutes while they are clocked in. Each screenshot is uploaded to mercor-insightful-screenshots-production.s3.amazonaws.com and indexed in the InsightfulScreenshots table with rich metadata:

The full screenshot image (stored at a direct, addressable S3 URL — e.g., https://mercor-insightful-screenshots-production.s3.amazonaws.com/screenshots/[employeeId]/[timestamp]_[uuid].png)
The application open at the time (appName, appFileName, appFilePath)
The window title (which often contains document names, code file paths, or chat conversations)
The browser URL being visited (which can include feather.openai.com, client Airtable workspaces, or any of the platform URLs catalogued above)
The contractor's IP address, MAC address (via gateways), and hardware fingerprint (hwid)
The contractor's timezone, OS version, and Insightful agent version

A sample screenshot record from the dump shows a contractor working in Google Chrome on alabaster-studio.com/project/abacus/conversation/... — with their IP (71.194.*.*), MAC address (1C:93:7C:64:**:**), hardware ID, and full filesystem path to Chrome all recorded.

Here is why this is catastrophic in context:

The database contains all the ingredients for a systematic visual intelligence operation. An attacker can join tables to correlate screenshots with client projects and platform URLs:

Which client project a contractor was assigned to (from ProjectIAM and Jobs)
Which annotation platform that project uses (from Projects_Audit.annotationPlatform — e.g., feather.openai.com, specific Airtable workspace IDs)
Every screenshot taken while the contractor worked on that project (from InsightfulScreenshots filtered by contractorId and projectId)
The exact URLs, window titles, and application contents visible in those screenshots — cross-referenced against the known client platform URLs to confirm which client's systems are shown

This means an attacker doesn't just get a list of Mercor's clients — they get a visual archive of what contractors saw inside those clients' systems. If the project was for OpenAI, the screenshots show OpenAI's Feather annotation interface, the prompts being graded, and the evaluation criteria. If the project was for Meta, the screenshots show Meta's internal tooling. If the project involved reinforcement learning environments, the screenshots show the RL training data and reward models.

The scope of what these screenshots can reveal includes:

Proprietary client code and architecture visible in IDE windows, terminal sessions, and browser tabs
Annotation platform interfaces showing the exact tasks, rubrics, and datasets used to train frontier AI models
Internal Slack channels and email threads visible in background windows — the ProjectIntegrations table confirms contractors are added to client Slack workspaces (project-mega.slack.com, mercor.enterprise.slack.com)
Authentication tokens, API keys, and session cookies potentially visible in browser URL bars, developer tools, or terminal output
Unreleased product features, research results, and trade secrets visible in dashboards or documents
Other contractors' work and personal information if collaborative tools were open on screen

Perhaps most critically, the screenshots create an involuntary record of contractor misconduct. As the Wall Street Journal has reported on the growing concerns around AI training data supply chains, contractors in these roles often have privileged access to sensitive client systems. If any contractor was engaged in unauthorized data exfiltration — copying proprietary datasets, screenshotting confidential research, leaking model weights, or otherwise violating their employment agreements — that activity was captured frame by frame by the monitoring system and is now available to anyone with the dump.

The monitoring system that was designed to protect Mercor's clients has become a comprehensive, timestamped, visually indexed archive of everything those clients wanted to keep secret.

This creates a cascading breach. Mercor's data exposure is not just a breach of Mercor — it is a proxy breach of every client organization whose internal systems, annotation platforms, Slack workspaces, and proprietary tooling were visible on a contractor's screen during monitored work sessions. The number of indirectly breached organizations equals the number of clients Mercor has ever served.

Platform Overview

Mercor presents itself publicly as an AI-powered hiring platform. The database tells a more complete story: it is a full-stack labor marketplace and employment management system that spans acquisition, vetting, matching, contracting, surveillance, and payment.

The platform operates across at least three distinct product surfaces:

Talent Portal — Where contractors create profiles, complete interviews, apply to listings, and track their work
Company Portal — Where client companies post listings, review candidates, manage projects, and receive invoices
Godmode / Internal Admin — An internal dashboard (GodmodeCompanies, GodmodeArbitraryCells) used by Mercor staff for operations

The backend is a microservices architecture with at least 13 named services: coil, site_fe, team_fe, work_fe, mercor_go, mercor_api, mercor_api_nginx, celery, workflow, db_trigger_consumer, steve, woz, and payments_temporal_worker. These are deployed on AWS ECS and managed via Terraform/Terragrunt in the mercor-monorepo GitHub repository.

The primary database is Aurora MySQL (AWS), with the analytics warehouse being Snowflake (evidenced by dbt model tables like DbtFirmSchoolRank and DbtSchoolRankings). Schema migrations are managed by Liquibase (evidenced by DATABASECHANGELOG and DATABASECHANGELOGLOCK tables).

Evidence - The Database Layer by Layer

The following sections present a systematic walk through every domain of the exposed database, with obfuscated sample records drawn directly from the dump. This is the evidence base for the claims made above.

Part I - User and Identity Layer

The Contractor Profile

At the core of Mercor's data model is the contractor. The MercorUsers_New table stores the primary user record, while MercorUsers_New_backup appears to be a historical snapshot. A sample (obfuscated):

Field	Value
`userId`	`7d10d057-0c11-438a-ace1-9a9c8a50c925`
`email`	`e****a1@gmail.com`
`name`	`T O**`
`phone`	`+44795718****`
`location`	`United Kingdom, Harrow`
`createdAt`	`2025-08-30 09:49:20`
`lastLogin`	`2025-09-20 09:16:33`
`insightfulId`	`wesvspdyd5m3zg2`
`stripeAccountId`	`NULL`
`isDeleted`	`0`

The insightfulId field is particularly significant — it links this user to their Insightful (formerly Workpuls) monitoring agent, meaning every screenshot taken of this person while working is tied to this identifier.

The MercorUsers_New table extends the backup with additional fields: phoneVerificationStatus, phoneVerifiedAt, phoneOptIn — indicating ongoing additions to the user data model. The authType field suggests support for multiple authentication providers (Firebase, Google OAuth, email/password).

Location and Residence Data

UserLocation stores both declared residence and physical presence:

Field	Value
`residenceCountry`	`USA`
`physicalCountry`	`USA`
`residenceState`	`NULL`
`physicalState`	`NULL`

The distinction between residence and physical country is central to Mercor's fraud detection logic — a mismatch between declared location and actual IP-derived location is one of the primary fraud signals.

UserMetadata enriches the contractor record with:

workAuthorizationStatus — eligibility to work in specific countries
birthday — date of birth
physicalLocation — freeform address field
contractorMail — a Mercor-provisioned email address (e.g., @mercor.com)
oktaUserId / oktaAccountState — SSO integration
maxContracts — cap on concurrent engagements
fraudStatusEnum — a denormalized fraud verdict

UserAvailability_Audit captures declared working hours: maxWeeklyHours, desiredWeeklyHours, expectedStartOffset, and timezone — allowing Mercor to understand contractor bandwidth and scheduling preferences.

CandidateVouches is a comprehensive social trust mechanism. When a voucher endorses a candidate, they fill out a structured questionnaire:

How did you know this person? (social platforms, working together, studying together, other)
Why are they qualified? (skills, education, employer, expertise, other)

Each field has a paired *Detail text field. This creates a rich graph of professional and social relationships.

UserReferences stores professional references with names, companies, relationships, and contact emails — conventional hiring data now sitting in an exposed database.

UserState tracks lifecycle metrics: resumeUploaded, interviewsCompletedCount, jobApplicationsCount, totalMillisWorked.

Part II - Identity Verification and Fraud Detection

The KYC Layer

Mercor uses Persona as its identity verification provider. The IDVerificationChecks table records each check with:

provider: persona
source: e.g., interview-face-comparison
sessionId: the Persona interview session token
verificationStatus, governmentIdStatus, livenessStatus, addressStatus
fraudDecision: NULL / escalated / approved
providerResponse: full JSON blob from Persona's API

A sample Persona response shows:

{
  "type": "baseline",
  "interview_id": "intr_AAABnNOWs0wnj7Tmg0hBQpL5",
  "thumbnail_key": "intr_AAABnNOWs0wnj7Tmg0hBQpL5_thumbnail.jpg",
  "persona_account_id": "act_QMTuQh33A4QU23J8ECPSd32BBKb4"
}

The thumbnail key references a stored facial image from the verification session.

BackgroundCheck and BackgroundCheck_New record criminal background and adverse media checks (via Checkr or Certn):

Field	Example
`externalCandidateId`	Checkr candidate UUID
`workLocation`	`USA`
`package`	`tasker_pro`
`status`	`clear` / `consider`
`adverseMediaCheckStatus`	`clear`

ScreeningPackage defines what checks are bundled per company engagement, including checkConfig (JSON with individual check types) and graceDays (how many days a contractor has to complete checks before being blocked).

The Fraud Pipeline

Mercor operates a multi-stage fraud pipeline that is one of the most sophisticated components in the database. It runs at four stages: profile, interview, post-interview, and on-project.

FraudStates — The current fraud verdict per user, maintained as a state machine:

Field	Example Value
`userId`	`000087ef-2296-445c-b355-9d5e600e0af2`
`currentStage`	`profile`
`currentDecision`	`ESCALATE`
`currentConfidence`	`medium`
`currentReasoning`	"The primary concern is a maximum location mismatch score of 1.0, indicating the user's IP address is entirely inconsistent with their stated profile location..."
`currentKeySignals`	`["location_mismatch: 1.0", "email_diff: 0.125", "email_is_pwned: False"]`

The reasoning field contains LLM-generated natural language explanations — almost certainly from Vertex AI / Gemini based on the signal schema.

FraudCheck — The central fraud queue:

stage, interviewId, jobId — context of the check
process_status, retryCount — pipeline execution state
flag_reasons, automatedReasons, manual_review_rational, manual_review_signs
assigned_to, assigned_on — human reviewer assignment
splReview — special review flag

FraudSignalAuditLog — Every individual signal evaluated:

signalType — e.g., location_mismatch, email_is_pwned, vpn_detected
modelName — which ML model produced the score
modelScore — numeric confidence
status — accepted / rejected

FraudEvents — Bayesian belief updates per event:

priorAlpha, priorBeta, priorProbability, priorStatus
posteriorAlpha, posteriorBeta, posteriorProbability, posteriorStatus
evidence — JSON describing what caused the update

This is a textbook Beta-Binomial Bayesian fraud model — prior beliefs updated with evidence to produce posterior fraud probability estimates.

ProductionFraudState — Final fraud disposition:

fraudModality — type of fraud (identity, time, quality)
source — automated / manual
productionModelId — versioned model that made the call

OnProjectFraudWindows — Time-based on-project fraud analysis:

fraudType, flags, flagMetadata, windowMetadata, screenshotMetadata
Analyzes patterns within work sessions using screenshot data

CheatingDetection / CheatingDetection_Audit — Interview cheating detection:

isCheating, cheatingProbability, signs
Tracks whether candidates used external resources during AI interviews

QAReviewLog — Manual fraud review outcomes:

stage, signalType, decision, comments
Assigned to specific reviewerId for human-in-the-loop adjudication

AutoFraudChecks — Automated rule-based checks triggered on a schedule or event.

DuplicateGroups — Groups of user IDs believed to be the same person (userIdList), with merge tracking (mergedIntoGroupId).

Part III - The Hiring Pipeline

Listings

Listings_New is the job posting table. A Mercor listing is considerably more structured than a typical job board entry:

Field	Description
`title`	Job title
`description`	Full job description
`rateMin` / `rateMax`	Pay rate range
`hoursPerWeek`	Expected commitment
`payRateFrequency`	`hourly` / `monthly`
`workArrangement`	Remote / hybrid
`eligibleLocation`	Which countries can apply
`ineligibleResidenceLocation`	Explicitly excluded countries
`listingType`	Job category
`evaluationCriteria`	JSON rubric for ranking candidates
`automatedCommsOn`	Boolean — auto-send rejection emails
`automaticRejectionsOn`	Boolean — auto-reject below threshold
`timeToAutoReject`	Days until auto-rejection fires
`goalNumHires`	Target headcount
`referralBoost`	Bonus multiplier for referred candidates
`isExploreAlways`	Always appear on public explore page
`disableApplications`	Freeze new applications

EvaluationCriteria stores the per-listing scoring rubric used during candidate ranking — each criterion has shortCriteria, type (hard filter or soft score), hardFilter boolean, and position for display ordering.

ListingNotes stores internal recruiter notes per listing — including candid operational commentary. A sample (obfuscated):

"33 leads confirmed on sheet by B***** to send offers — @N*** to staff RM for conversion"*

This reveals that Mercor staff are managing candidate pipelines directly, with named individuals responsible for conversions.

Candidates

Candidates / Candidates_Audit tracks every application:

Field	Description
`status`	`applied` / `shortlisted` / `offered` / `rejected`
`listingStepConfigId`	Which step in the hiring funnel
`notesForCandidate`	Recruiter notes visible to candidate
`birthday`	Date of birth at application time
`physicalLocation`	Where they were when applying
`workAuthorizationStatus`	Work eligibility
`rejectionReason`	Categorized rejection reason
`starred`	Recruiter-starred flag
`automaticRejectAt`	Scheduled auto-rejection timestamp
`numCommsSent` / `lastCommSentAt`	Outreach tracking
`referralId`	Linked referral if any

CandidateMatchScores provides ML-generated match scores:

matchScore — numeric compatibility score
contextualSummary — LLM-generated natural language explanation of why this candidate fits this listing

MercorScores stores the tournament-based ranking scores:

mScoreRaw / mScoreNormalized — the MercorScore
numComparisons — how many pairwise comparisons informed the score
contextualSummary — LLM narrative on the candidate's standing
aggregateFeatureScore — combined feature vector score

PairwiseComparisons stores individual A/B comparisons:

winnerResumeId / loserResumeId
winnerUserId / loserUserId
reasoning — LLM explanation of why candidate A beat candidate B

This implements a Bradley-Terry tournament ranking model — candidates are repeatedly compared in pairs, with each comparison updating relative ranking scores.

TalentViewSearchUsers and SharableTalentViewConfig enable companies to create saved talent searches and share curated candidate shortlists with colleagues. SharableTalentViewConfigUsers adds per-candidate evaluation data including likeCount, dislikeCount, and free-text feedback.

Part IV - Interviews and Assessments

AI Interview System

Mercor's interview process is AI-conducted and rubric-graded. The Forms_Audit table reveals the full interview configuration:

items — JSON array of interview questions
evaluationCriteria — per-question scoring criteria
assessmentRubricId — linked rubric
allowCopyPaste — flag to restrict copy-paste (cheating prevention)
allowFormRetakes / maxRetakeAttempts — retry policy
prep — pre-interview preparation materials shown to candidate
feedbackConfig — how/whether to share scoring feedback

AssessmentRubrics defines the grading framework:

title, instructions — rubric metadata
sumScores, sumSquareScores, countScores — aggregate statistics across all uses of this rubric
passThreshold — minimum score to pass

AssessmentRubricItems_Audit stores individual rubric criteria:

criteria — the evaluation criterion text
shortName — abbreviated label
points — maximum points
format — scoring format (binary, scale, etc.)
webSearch — whether web search context is provided to the grader
smartScoring — whether AI auto-scoring is enabled
type — criterion type

FormSubmissions records every interview submission:

responseStatus — submitted / abandoned / in_progress
activeTimeSeconds — actual time spent on the form
posthogSessionIds — linked PostHog analytics session
assessmentVersionId — which version of the assessment was taken

AssessmentEvalState tracks the grading pipeline:

assessmentType, jobType, status
retryCount, reason
triggerSource, triggeredByUserId
durationMs — how long grading took

InterviewEvals stores scored results:

communicationScore, technicalScore
qaPairScores — per question-answer pair scores

InterviewIssues records reported problems during interviews:

issue — issue type (technical problem, suspected cheating, content issue)
source — who reported it (candidate, system, reviewer)
startPosition / endPosition — timestamp positions within the interview
reportedBy — user ID of reporter

InterviewScores provides the final aggregate score per interview.

Part V - Work Trials and Onboarding

Work Trial Contracts

WorkTrial_Audit captures the structured trial engagement contract:

Field	Description
`payableAmount`	Amount payable to contractor (cents)
`billableAmount`	Amount charged to company (cents)
`ciiaaDirect`	Confidentiality agreement (direct)
`ciiaaPassthrough`	Confidentiality agreement (passthrough)
`tow`	Terms of work
`offerLetter`	S3 key or base64 of signed offer letter
`signature`	Digital signature string
`startDate` / `endDate`	Trial period
`projectId`	Linked project
`billingAccountId`	Billing target

The presence of offerLetter and signature fields indicates that signed legal documents are stored directly in the production database.

WorkTrialConfig defines reusable work trial templates per company:

emailTemplateSubject / emailTemplateBody — invitation email content
emailTemplateSubjectExtension / emailTemplateBodyExtension — offer extension emails
interviewIds, formIds — prerequisite steps before trial activation
isUnified — whether trial is shared across listings

Onboarding Pipeline

OnboardingState defines the onboarding funnel steps:

Field	Example
`shortName`	`interview_completed`
`name`	`Interview Completed`
`threshold`	`1`
`order`	`0`

OnboardingDocument stores the per-project onboarding materials (links, instructions, or document content) shown to newly hired contractors.

TierProgress tracks contractor progress through Mercor's internal tier/certification system — mapping contractors to planId, tierId, status, and completedAt.

PlanAssignments assigns contractors to specific plans with defined startDate, endDate, userHours allocation, and tasksCompleted tracking.

Part VI - Projects and AI Task Management

Project Structure

Projects_Audit reveals the full project configuration:

Field	Description
`companyId`	Client company
`name`	Internal project name
`screenshotEnabled`	Whether Insightful monitoring is active
`userGroupEmail`	Google Group for project members
`projectType`	Project category
`annotationPlatform`	e.g., Scale AI, Label Studio
`annotationPlatformIDs`	External platform project identifiers
`ssotLink`	Single source of truth document URL
`taskMetricsDatastore`	Where task data is stored
`status`	`active` / `archived`
`notes`	Internal notes
`offerExtendedText`	Custom text in offer letters for this project

ProjectIAM / ProjectIAM_Audit defines role-based access: each record maps a userId to a roleId within a projectId, with status and assignedBy for audit purposes.

ProjectIntegrations is particularly revealing — it links each project to:

oktaGroupId / oktaOwnerGroupId / oktaEPMGroupId — Okta SSO groups
googleGroupId — Google Workspace group
slackChannelId / workspaceNotificationChannel — Slack notification channels
projectShortId — human-readable project identifier

This table effectively maps every production project to its Slack workspace and Okta group, providing a complete picture of Mercor's organizational structure.

AI Task System

TaskDefinitions / TaskDefinitions_Audit define the structure of AI training tasks:

Field	Description
`rubric`	JSON grading rubric for this task type
`autograder`	Autograding configuration (model, prompts)
`task_schema`	JSON Schema defining the task response format
`metadata`	Additional task configuration

TaskAudits records individual task submissions for review:

Field	Description
`taskDefinitionId`	Which task definition was used
`recordId`	The submitted task record
`s3KeyPrefix`	S3 location of submission artifacts
`authorId`	Contractor who submitted
`auditorId`	Reviewer assigned
`status`	`pending` / `approved` / `rejected`
`outcome`	Final grading outcome
`autoOutcome`	Automated grading result
`dispute`	Dispute information if challenged
`disputedBy`	Who filed the dispute

TaskAssignments maps tasks to specific jobs and users, with appliedBy tracking who made the assignment.

DeliverableBatches groups deliverables for invoicing:

uid, name — batch identifier
invoiceLineItemId — linked invoice line
taskCount, status
metadata — additional batch configuration

ProjectCustomColumns adds arbitrary metadata fields to projects, with sqlQuery indicating some columns are dynamically computed from database queries. ProjectCustomColumnValueHistory tracks changes to these values over time.

ProjectArchetypes stores character/role descriptions for specific project types — suggesting Mercor operates AI roleplay or persona-based annotation tasks (archetypeText, elements).

ProductivityProjectRules defines per-project productivity monitoring rules (rules JSON, is_active, versioned).

Part VII - Time Tracking and Productivity Surveillance

The Insightful Integration

This is the most invasive component of the exposed data. Mercor uses Insightful (formerly Workpuls) — a workforce monitoring agent installed on contractors' computers — to capture screenshots and activity data.

InsightfulScreenshots — Every screenshot record contains:

Field	Example (obfuscated)
`storageUrl`	`https://mercor-insightful-screenshots-production.s3.amazonaws.com/screenshots/[id]/[timestamp]_[uuid].png`
`storageKey`	`screenshots/wmcw2pdyvenmluy/1767129970810_3b62edd1-...`
`screenshotTimestamp`	`1767129970810` (Unix ms)
`ip`	`71.194..`
`gateways`	`["1C:93:7C:64::"]` (MAC address)
`os`	`win32`
`osVersion`	`10.0.19045`
`agentVersion`	`7.9.3`
`computer`	`desktop-ue2kgro`
`hwid`	`8f9f16f0-1fb7-47e4-a2a1-209838aa5c5e`
`appName`	`Google Chrome`
`appFileName`	`chrome.exe`
`appFilePath`	`C:\Program Files\Google\Chrome\Application\chrome.exe`
`windowTitle`	`Alabaster Studio - Google Chrome`
`browserUrl`	`alabaster-studio.com/project/abacus/conversation/[uuid]`
`browserSite`	`alabaster-studio.com`
`isBlurred`	`0`
`externalProductivityScore`	`1`

Every screenshot includes the contractor's IP address, MAC address (gateway), hardware fingerprint, operating system, the exact application open, the window title, and the URL being visited — all timestamped to the millisecond.

The storageUrl field contains direct S3 URLs to screenshot image files. The S3 bucket mercor-insightful-screenshots-production is referenced explicitly.

The hwid (hardware ID) field provides a persistent device fingerprint that can re-identify a contractor even if they change their email or create a new account.

Timelog

Timelog / Timelog_Audit records every work session:

Field	Description
`externalId`	Insightful shift/session ID
`externalProjectId`	Insightful project ID
`employeeId`	Insightful employee identifier
`duration`	Session duration (ms)
`timeStart` / `timeEnd`	Session timestamps
`timezone`	Contractor's timezone
`taskId` / `taskName`	Task being worked on
`lineItemUid`	Linked payment line item
`adjustmentReason`	If hours were manually adjusted
`userId`	Mercor user ID
`isCompleted`	Whether session was completed normally
`linkFailReason`	If Insightful–Mercor link failed

Deductions

Deductions records time deducted from pay:

Field	Description
`durationToSubtractMs`	Milliseconds deducted
`appName`	Application that triggered the deduction
`reasonForDeduction`	Why the time was removed
`payoutCycleID`	Which pay cycle was affected
`approvedBy` / `approvedAt`	Approval chain
`appliedBy` / `appliedAt`	Application record

This reveals that Mercor can and does subtract pay from contractors based on monitored activity, with an approval workflow for doing so.

Part VIII - Payments and Financial Infrastructure

Contractor Payment Methods

UserPaymentMethods / UserPaymentMethods_Audit stores linked payment accounts:

Field	Example (obfuscated)
`provider`	`stripe`
`providerMethodId`	`acct_1R0V****` (Stripe Express account)
`methodType`	`express_account`
`status`	`onboarded`
`countryCode`	`USA`

US contractors use Stripe Express accounts. International contractors use Wise (evidenced by WiseDisbursements). The metadata field includes context like "context": "backfill" — indicating historical payment method imports.

MercorUserFinancials stores additional financial account details:

paymentProvider — stripe / wise
providerIdentifier — account identifier
accountDetails — JSON with bank routing and account details
lastFetchedOn — when the financial data was last synced

Payment Line Items

PaymentLineItems is the core payment ledger:

Field	Description
`cycleStartTs` / `cycleEndTs`	Pay period boundaries
`totalPayableAmount`	Amount owed to contractor (cents)
`totalBillableAmount`	Amount charged to company (cents)
`status`	`pending` / `paid` / `failed`
`jobUid`	Linked job contract
`timelogUid`	Linked timelog entry
`bonusUid`	Linked bonus if applicable
`referralUid`	Linked referral payment
`dispatchFailureReason`	Why a payment failed
`moneyOutId`	Linked outbound transfer

PayoutCycles defines payment periods:

cycleStartTs / cycleEndTs — date boundaries
status — open / processing / completed
configId / configVersion — which payout configuration governs this cycle

PayoutConfigs stores payout rules:

type — payment cadence (daily, weekly, etc.)
configuration — JSON with limits, caps, and routing rules

MoneyOut_Audit records every outbound payment:

externalAccountId — contractor's Stripe or Wise account
externalTransferId — transfer ID at the payment provider
totalAmount — disbursed amount
paymentMethod — stripe / wise
status — pending / paid / failed
failureReason — structured failure code

WiseDisbursements records international transfers:

wiseTransferId, wiseQuoteId — Wise API identifiers
amount, currency
sequenceNumber — ordering within a batch
status, failureReason

Company Billing

BillingAccounts manages company-side billing:

Multiple billing accounts per company
Linked to Stripe customer IDs

BillingConfigs defines billing rules:

rules JSON — markup percentages, caps, billing model
isLatestVersion — versioned configuration

BillingRateCards defines per-contract rate structures:

formulaType — e.g., markup_percentage, flat_rate
rateRows — tiered rate table

InvoiceLineItems records invoice line entries:

rawAmount / adjustedAmount — pre- and post-adjustment amounts
taskCount — number of tasks in this line
sowId — Statement of Work identifier

RevenueAdjustments records revenue corrections:

amountCentsUsd, category, reason
revenueRecognitionDate — accounting date
formula, labels, aggregationFields
attachments, invoices — supporting documentation

Referral System

Referrals / Referrals_Audit tracks the contractor referral program:

Field	Description
`referredUserId` / `referringUserId`	The parties
`totalEarned` / `totalEarningsPotential`	Referral payment amounts
`state`	Current state
`paidAt`	When the referral bonus was paid
`disputeStatus`	If disputed
`isGuaranteedReferral`	Whether guaranteed payment applies
`referral_cap`	Maximum referral earnings
`isPaymentBlocked`	Payment hold

ReferralEligibility manages the conditions under which referral payments vest — including onboardingStateId requirements and criteriaId checks.

GuaranteedReferralQuota manages quota-based guaranteed referral programs:

quotaId, referringUserId, offPlatformUserId
shortenedLink — the referral tracking URL
weekStart, status

Part IX - Communications and Outreach

Internal Messaging

Comms is the platform messaging table:

Field	Description
`commId`	Message identifier
`groupId`	Conversation thread
`senderId` / `receiverId`	Parties
`content`	Message body
`type`	Message type (system, human, etc.)
`triggerRef`	What triggered this message
`listingReferenceUID`	Associated listing

CommsSent records delivery tracking — when messages were sent, to whom, via what channel.

EmailTemplates stores company-specific email templates:

subject, content — template body
isGlobal — available to all companies
isPersonal — creator-private
tags — categorization

External Outreach

LinkedinWarmIntros manages LinkedIn outreach campaigns:

Field	Example (obfuscated)
`linkedinUrl`	`https://www.linkedin.com/in/[username]`
`email`	`s**@homeinheritance.com`
`referringUserId`	Internal user who made the intro
`commEvent`	`WARM_INTRO/OUTREACH`
`status`	`sent`

OffPlatformCampaigns and OffPlatformCampaignSteps manage multi-step email/LinkedIn outreach sequences:

campaignType — category
stepNumber — sequence position
subject / messageTemplate — email content with template variables
scheduledAt — send time
outreachedCandidateIds / failedCandidateIds — delivery tracking

AircallComms records phone call logs from Mercor's Aircall integration — the VoIP platform used for recruiter outbound calls, with call metadata and outcomes.

FirstTimeInvites tracks first-contact outreach to candidates:

commEvent — invitation type
contentType / subject — message details
listingIdCount — how many listings the invite covers
refListingUid — the originating listing

Notification Infrastructure

AutomationTemplates defines automated workflow triggers:

handler — which service handles this automation
sourceType / sourceSql — SQL query that triggers the automation
templateBody — notification content template
cron — scheduled execution
autoApprove — whether human approval is required
triggerConfig, config — detailed trigger configuration

ProjectAutomations links automation templates to specific projects.

Reverse Engineering - Architecture and Infrastructure

The database schema, table names, column conventions, and embedded metadata allow us to reverse-engineer Mercor's complete technical architecture — from microservice names to third-party integrations — purely from the contents of this dump.

Part X - Infrastructure and DevOps

Deployment Pipeline

IacDeploymentRuns is one of the most operationally sensitive tables:

Field	Example
`runType`	`plan` / `apply`
`environment`	`staging` / `production`
`status`	`success` / `failed`
`commitSha`	`784cfd495ddfa3b67187433cb7cb66f2d27ad458`
`branch`	`dacq/backend-v2`
`actor`	`k*********77` (GitHub username)
`githubRunId`	`23520976410`
`githubRunUrl`	`https://github.com/Mercor-io/mercor-monorepo/actions/runs/23520976410`
`prNumber`	`26645`
`stacksAffected`	`["iac/aws/envs/staging"]`
`resourcesAdded`	`25`
`resourcesChanged`	`2`
`resourcesDestroyed`	`6`
`summary`	Full Terraform plan output (including deprecation warnings)
`durationSeconds`	`134`

This table exposes:

The full URL path to the private GitHub monorepo
Individual engineer GitHub usernames (actors)
Branch naming conventions
Terraform variable names and deprecated configurations
Number of AWS resources created/modified/destroyed per deployment
Complete Terraform plan output in summary

Named Terraform service stacks include: talent-success-coil, referrals-coil, iac/aws/envs/staging.

ProductionDeployment records ECS production releases:

releaseTag — semantic version tag
buildHash — Docker image hash
deployedAt — deployment timestamp
deploymentIds — ECS deployment identifiers
taskDefinitionArns — ECS task definition ARNs (include AWS account ID)

PreprodDeployment records pre-production (staging) releases:

commitSha — exact commit deployed
loadTestPassed — whether load testing passed
releaseOwner — engineer responsible for the release

ProductionVersion maintains a single-row current version pointer:

lastVersion, lastReleaseTag, lastBuildHash, updatedAt

RollbackExecution records emergency rollback events:

services — which microservices were rolled back
Details of 4-second rollback capability observed in the data

Database Schema Management

DATABASECHANGELOG and DATABASECHANGELOGLOCK are Liquibase tables that record every schema migration:

ID, AUTHOR, FILENAME, DATEEXECUTED, MD5SUM, DESCRIPTION, COMMENTS, EXECTYPE, LIQUIBASE

These tables reveal the full history of schema changes, including the names of engineers who authored migrations, the migration scripts' filenames (revealing internal project structure), and the exact timestamp each change was applied to production.

Agent Sandboxes

AgentSandboxes records AI coding agent sessions:

Field	Description
`agentType`	Type of AI agent
`status`	`active` / `stopped` / `expired`
`backendType`	Compute backend
`host`	Sandbox hostname
`stopReason`	Why session ended
`transcriptRawUrl`	S3 URL of raw conversation transcript
`transcriptConsolidatedUrl`	S3 URL of consolidated transcript
`acpSessionId`	Agent control protocol session ID
`sandboxToken`	Authentication token for sandbox
`claimedAt` / `expiresAt`	Session lifecycle timestamps

The sandboxToken field suggests that expired sandbox tokens are persisted in the database — a potential credential exposure if these tokens have long validity windows.

Part XI - Analytics and ML Layer

School and Firm Rankings

DbtFirmSchoolRank contains Mercor's proprietary employer prestige scores:

Field	Example
`firmId`	`000013c1653de847e38d755ca1c310a5`
`firmName`	`75th ranger regiment, u.s. army`
`academicField`	`overall`
`nProfiles`	`2`
`avgSchoolRank`	`90.00`
`firmSchoolRank`	`81723`
`firmSchoolRankPercentile`	`0.528839`

This table represents a proprietary ranking of ~154,000 firms by the average educational prestige of their employees — effectively a derived signal used to score resumes. It is computed from the full contractor profile database using an empirical Bayesian model (ebPriorStrength, ebAvgSchoolRank).

DbtSchoolRankings ranks individual schools within academic fields:

schoolName, academicField, schoolScore, schoolRank

Resume Evaluation

UserResumeEvaluation stores ML-generated resume scores:

Field	Description
`workExperienceScore`	Quality of work experience
`yearsOfWorkExperience`	Parsed years of experience
`graduationYear`	Estimated graduation year
`mScore`	Composite score
`inferredRole`	Predicted job function
`educationScore`	Academic credential score
`awardScore`	Competitive award weighting
`rateAcademicCompetitions`	Participation in academic competitions
`rateCompetitiveProgramming`	Competitive programming score
`rateHackathonPerformance`	Hackathon achievement score
`technicalSkills`	JSON list of detected skills
`highestDegree`	Parsed degree level
`searchFlag`, `imageFlag`, `transcriptFlag`	Data quality flags

Behavioral Analytics

PosthogAnalytics links PostHog behavioral sessions to user identity:

userEmail — email address (PII)
company — company context
startTimeUtc / endTimeUtc — session boundaries
activetime / inactivetime — engagement metrics
startUrl — entry point URL

This directly links PostHog analytics sessions (which include click-level behavior) to user identity — a significant privacy concern as PostHog sessions are typically anonymized.

SearchAnalytics records search quality metrics:

avg_relevance_score, avg_prestige_score
p99_latency_ms — 99th percentile search latency
position_weighted_relevance_score — ranking quality metric

ForecastMetrics stores ML forecast outputs:

entity, id, dt, snapshot_dt
modelVersion, predictedValue

Used for capacity planning, fill rate forecasting, and contractor supply predictions.

ML Experiments

MLExperimentsJobPerformanceReviews reveals the experimental ML pipeline:

Column	Description
`Date of review`	Review date
`Account`	Client company
`Project`	Project name
`Reviewer`	Reviewer name (Mercor staff)
`Work type`	Category of work
`Review type`	Type of performance review
`Name` / `Email`	Contractor identity
`Quality of Work`	Score
`Engagement`	Score
`Offboarding Reason`	Why contractor was removed
`Justification for rating`	Free-text explanation

This table contains raw performance review data used to train or evaluate ML models for automated contractor performance assessment — with staff names, contractor names, and qualitative judgments all stored in plaintext.

Part XII - Reference Data Layer

Skills and Certifications

Skills is the platform's skills taxonomy:

skillId, name, description, type, parent — hierarchical skill tree
CertificationPolicy — linked certification requirement

CertificationPolicies_Audit defines the rules for earning certifications:

rules — JSON eligibility criteria
isRevokable, requiresApproval
icon, iconColor, showBadge, displayText — display configuration

Certifications_Audit records individual earned certifications:

evidence — JSON array of qualifying events (e.g., {"id": "proj_...", "score": 88.84, "sourceType": "project_hours_worked"})
status — AUTO_AWARDED / MANUALLY_AWARDED / REVOKED
isCertified — current state

SkillCertifications_Audit and SkillCertificationsEvidence_Audit track per-skill certification with scores and source evidence.

ContractorEndorsements stores peer endorsements:

endorsingUserId / endorsedUserId
contents — endorsement text
tags — skills endorsed
sentiment — positive/negative
source — where the endorsement originated

Company Data

Company stores client company records:

name, description, website, logo
billingModel — pricing structure
billingStartDay / billingEndDay — billing cycle configuration
brandVisible — whether company name is shown to candidates
universe — internal company segmentation
externalName — display name if different from legal name

IAM / IAM_Audit manages company-level role assignments:

roleId — e.g., ghost (internal Mercor staff), admin, member
companyId, userId_v4, status

A sample IAM record shows a user with roleId: ghost being REMOVED from a company — revealing Mercor's internal staff operated within client company contexts under a ghost role identity.

URL Management

ShortenedUrls manages the platform's link shortening system:

Used for referral tracking, campaign links, and onboarding flows

UrlClicks records every click on shortened URLs:

urlId, clickedAt, ipHash, userId, country

Even with ipHash (rather than raw IP), the combination of userId, country, and timestamp enables click attribution across the contractor population.

Catfish Audit Log

CatfishAuditLog is a security/compliance tool:

Field	Description
`slackUserId` / `slackUserName`	Mercor staff member
`targetEmail`	Person being looked up
`platform`	Where the lookup happened
`intent`	Declared reason for the lookup
`status`	Success/failure

This table records every time an internal Mercor employee looks up a user's information through an internal tool called "Catfish" — indicating awareness that internal user lookup is an auditable, privacy-sensitive operation. Ironically, this audit log itself now sits in the exposed dataset.

Exposed Surface Area Summary

Domain	Tables	Sensitivity	Key Exposure
User & Identity	~10	Critical	PII (name, email, phone, location) for all contractors
Identity Verification & Fraud	~12	Critical	Government ID outcomes, facial comparison tokens, fraud verdicts
Hiring Pipeline	~10	High	Application status, rejection reasons, recruiter notes
Interviews & Assessments	~15	High	Interview responses, scores, cheating flags, rubrics
Work Trials & Onboarding	~6	High	Signed legal documents, offer letters, digital signatures
Projects & AI Tasks	~15	Medium-High	Client company projects, task definitions, AI training data
Time Tracking	~4	Critical	Per-minute screenshots, browser URLs, MAC addresses, hardware fingerprints
Payments & Finance	~20	Critical	Stripe account IDs, bank details, exact payment amounts, payout records
Communications	~10	Medium	Message content, outreach campaigns, phone call logs
Infrastructure & DevOps	~10	High	Commit SHAs, GitHub URLs, ECS ARNs, Terraform configs, sandbox tokens
Analytics & ML	~10	Medium	Resume scores, school rankings, PostHog identity links
Reference Data	~15	Medium	Skills taxonomy, certifications, endorsements, company configurations

Technical Architecture Reverse-Engineered

The following architecture is entirely reconstructed from database table names, column values, JSON blobs, and embedded metadata. No source code or documentation was available — everything below was inferred from the data alone.

Backend Services

Based on the database content, Mercor's backend comprises at least 13 microservices:

Service	Inferred Function
`mercor_api`	Primary API backend
`mercor_api_nginx`	API gateway / reverse proxy
`mercor_go`	Go-language service (likely performance-critical paths)
`coil`	Contractor-facing service (multiple instances by function)
`site_fe`	Public website frontend
`team_fe`	Company/team portal frontend
`work_fe`	Work/task frontend
`celery`	Async task queue
`workflow`	Workflow orchestration
`db_trigger_consumer`	Database event consumer
`steve`	Internal tool/admin service
`woz`	Fraud/ML pipeline service
`payments_temporal_worker`	Temporal.io worker for payments

Frontend Portals

Public site — site_fe, routes handled by Next.js (inferred from URL patterns)
Company portal — team_fe — for clients to manage listings and review candidates
Work portal — work_fe — for contractors to find and complete tasks
Internal admin — Godmode interface used by Mercor staff

Data Infrastructure

Primary DB: Aurora MySQL (AWS)
Analytics warehouse: Snowflake (via Fivetran sync, evidenced by dbt models)
Schema migrations: Liquibase
Object storage: S3 (screenshots, offer letters, transcripts)
Monitoring: Insightful agent on contractor machines
Auth: Firebase + Okta (SSO)
Analytics: PostHog
Feature flags / A-B: Inferred from configVersion patterns

Third-Party Integration

Provider	Purpose
Persona	Identity verification (KYC)
Stripe	US contractor payments (Express accounts)
Wise	International contractor payments
Insightful	Workforce monitoring / screenshot capture
Okta	SSO for company and internal access
Aircall	Recruiter phone calls
PostHog	Product analytics
Vertex AI / Gemini	Fraud LLM reasoning
OpenAI (GPT-4.1 / GPT-5)	AI interview conductor and task autograder
Checkr / Certn	Background checks
HaveIBeenPwned	Email breach checking
Customer.io	Transactional email
GitHub Actions	CI/CD pipeline
Terraform / Terragrunt	Infrastructure as code
Temporal.io	Payments workflow orchestration
Liquibase	Database schema versioning

Grounds for Legal Action

The evidence documented throughout this report supports multiple independent legal claims by distinct plaintiff classes. This section consolidates the factual basis for each claim, cross-referencing the specific database tables, column names, and sample values that constitute the evidentiary foundation.

I. Client Company Claims - Loss of Proprietary AI Training Data and Trade Secrets

This is the most consequential category of legal exposure. Mercor's client companies — Apple, Amazon, OpenAI, Anthropic, Meta, Google, and others — entrusted Mercor with their most valuable competitive assets: the data, methodologies, and evaluation frameworks that define how their AI models are built. All of it is now in criminal hands.

A. Trade Secret Misappropriation

Under the federal Defend Trade Secrets Act (DTSA) and state Uniform Trade Secrets Acts, a trade secret is information that derives economic value from not being generally known and is subject to reasonable efforts to maintain its secrecy. The breach exposes client trade secrets across three categories:

1. AI Training Data as Trade Secrets. The SFT data, RLHF preference rankings, and Chain-of-Thought traces produced by Mercor's contractors for each client constitute trade secrets. Each dataset represents millions of dollars of investment and years of iterative refinement. The TASKS, TASK_VERSIONS, and PHASE_1_TASKS tables across 84 Airtable workspaces contain the actual work product — prompts, model responses, and human evaluations — that each client paid to produce. Their value derives entirely from secrecy: once a competitor has access to another lab's RLHF preference data, they can train equivalent alignment without the cost.

2. Evaluation Methodology as Trade Secrets. How an AI lab evaluates its models — what rubrics it uses, what scoring thresholds it applies, how it structures domain-specific benchmarks — is core intellectual property. The CRITERIA, RUBRIC_VERSIONS, QA_SPECS, and LLM_CALL_CONFIGURATION tables across 60+ workspaces expose this methodology in full. Amazon's Chain-of-Thought evaluation framework, Apple's endpoint testing rubrics, and the cross-model preference evaluation criteria are all now available to any buyer. This is not just data — it is the recipe for how each lab measures AI progress.

3. Pre-Release Model Capabilities as Trade Secrets. The APPLE_ENDPOINT_SANDBOX workspace contains actual outputs from Apple's unreleased Foundation Models (afm-text-083, afm-model-086). These responses reveal the model's capabilities, safety alignment, and failure modes before public launch. Under trade secret law, the unauthorized disclosure of pre-release product capabilities is a textbook misappropriation.

Key legal point: Trade secret protection requires "reasonable efforts to maintain secrecy." Mercor's storage of this data — in plaintext, behind a flat network with no segmentation, accessible via a single VPN hop — likely fails this standard. Clients may argue that they maintained secrecy on their end but that Mercor's negligent security destroyed the trade secret status of the data. This creates a damages claim for the full economic value of the lost trade secrets.

B. Breach of Confidentiality and NDA Violations

The database confirms confidentiality agreements governed the relationship. The Jobs table contains ciiaa_direct, ciiaaPassthrough, confidentiality, and tow (terms of work) fields. The WorkTrial_Audit table contains signed CIIAs and offer letters. The exposure of:

Apple: Foundation Model outputs (afm-text-083, afm-model-086), endpoint sandbox testing data, translation evaluation, orchestrator configurations
Amazon: Complete LLM Chain-of-Thought evaluation framework with full reasoning traces, preference judgments, domain taxonomy (math, STEM), and named Mercor staff assignments
OpenAI: Feather platform campaign UUIDs, Apertus - Elephant project data, contractor performance reviews naming OpenAI as the account
Meta: Multimedia annotation template command center (AAIE___META_MULTIMEDIA_TEMPLATE), project configurations
Anthropic: Claude 3.5 Sonnet evaluation data compared against GPT-4, preference reasoning, agent sandbox configurations running Claude

constitutes a breach of these confidentiality obligations. Each client has a separate breach of contract claim with damages measured by the economic harm caused by the disclosure.

C. Loss of Competitive Advantage

The breach doesn't just expose data — it destroys competitive moats. If a Chinese AI lab purchases the stolen data, they acquire:

The exact prompts and rubrics that OpenAI uses to fine-tune its models
The evaluation methodology that Amazon uses to measure Chain-of-Thought reasoning quality
Apple's pre-release model outputs revealing capabilities and weaknesses
The preference data that teaches Anthropic's Claude how to respond to contentious queries

Each client's AI training pipeline is now potentially replicable by any competitor with access to the stolen Airtable workspaces. The damages extend beyond the cost of producing the data — they include the competitive harm of having that data available to rivals.

D. Secondary Breach via Desktop Screenshots

The InsightfulScreenshots table creates a mechanism for visual intelligence extraction from client systems. Screenshots captured during monitored work sessions show whatever was on the contractor's screen — client internal dashboards, Slack conversations, code repositories, proprietary tools, unreleased product interfaces. Mercor stored these screenshots on S3 with metadata linking each image to the specific projectId. An attacker can systematically extract visual intelligence about every client's internal systems by filtering screenshots by project. This constitutes a secondary breach of each client's confidential systems, for which Mercor bears direct liability.

E. APEX Benchmark Contamination

Mercor's proprietary APEX benchmark suite — covering 15+ domains from legal to medicine to mechanical engineering — is now compromised. All tasks, criteria, scoring rubrics, and evaluation data are exposed. Any client that relied on APEX benchmark results for vendor selection, model comparison, or procurement decisions now faces the risk that those results are unreliable. Models trained on the leaked APEX data will appear to perform well without genuinely possessing the evaluated capabilities. Clients may claim damages for decisions made in reliance on benchmarks that are now contaminated.

II. Contractor Class Claims

A. Financial Data Exposure and Identity Theft Risk

The MercorUserFinancials table stores the complete Stripe Connect API response as plaintext JSON — including bank name, routing number, last four digits, account holder name, email, and country. This is sufficient for bank account fraud. Every contractor whose financial data is in this table faces ongoing risk of unauthorized transactions, account takeover, and identity theft. The UserPaymentMethods table adds Stripe Express account IDs and Wise transfer identifiers. The exposure of this data — unencrypted, untokenized, in a database accessible via a single VPN hop — constitutes negligence per se under multiple state data breach statutes.

B. Surveillance Overreach and Privacy Violations

The Insightful monitoring system captured far more than work activity:

Full desktop screenshots every few minutes — not just the work application, but everything on screen
Browser URLs for all tabs, including personal browsing
IP addresses and MAC addresses from personal home networks
Hardware fingerprints of personal devices

Contractors used personal computers for Mercor work (the data shows personal Chrome installations, personal hostnames like desktop-ue2kgro). The monitoring system captured personal activity on personal devices — personal emails, banking sessions, medical information, or other private content visible in background windows. All of this is now in criminal hands. Under ECPA and state wiretap laws, the capture of third-party communications visible in screenshots (Slack messages, emails, video calls) may constitute unlawful interception.

C. Wrongful Termination via Automated Fraud Decisions

The database reveals that automated fraud decisions directly determined whether contractors could earn a living:

FraudStates.currentDecision = REJECT → contractor blocked from the platform
FraudStates.currentReasoning contains LLM-generated explanations that were almost certainly never disclosed to affected contractors
ProductionFraudState.status → final production fraud verdict with no apparent appeal mechanism

Under FCRA, if Mercor used these automated fraud scores or background check results (BackgroundCheck.status) to deny, suspend, or terminate contractor engagements without providing required adverse action notices, each instance is a separate violation. Under GDPR Article 22, EU/UK contractors have the right not to be subject to decisions based solely on automated processing.

D. Wage-Related Claims

The Deductions table records pay subtractions based on monitored activity — exact milliseconds deducted, which application triggered the deduction, and who approved it. If deductions were applied using data from the now-compromised monitoring system, or if the breach reveals inconsistent application, contractors have wage theft claims in addition to privacy claims.

III. Statutory Violations

A. CCPA/CPRA — Private right of action for data breaches resulting from failure to maintain reasonable security (Cal. Civ. Code § 1798.150). Plaintext bank routing numbers, unencrypted PII, and excessive data collection constitute failure to implement reasonable security. Statutory damages: $100–$750 per consumer per incident.

B. GDPR — EU/UK contractors confirmed in the data (sample: United Kingdom, Harrow). Violations include data minimization failure (Article 5(1)(c)), integrity/confidentiality failure (Article 5(1)(f)), automated decision-making without safeguards (Article 22), and breach notification delays (Article 33). Fines up to €20 million or 4% of annual global turnover.

C. Illinois BIPA — Persona's liveness detection requires a scan of face geometry, explicitly listed as a biometric identifier (740 ILCS 14/10). The IDVerificationChecks table confirms facial geometry scans were captured (livenessStatus), facial comparison performed (interview-face-comparison), and thumbnail images stored (thumbnail_key). Statutory damages: $1,000–$5,000 per violation, no harm requirement. (Note: MAC addresses and hardware fingerprints are not biometric identifiers under BIPA.)

D. FCRA — Background check results and automated fraud scores used in employment decisions without required adverse action notices. Per-violation damages.

E. ECPA / State Wiretap Laws — Desktop screenshots capturing third-party communications visible on screen. Per-interception damages.

F. PIPEDA — Canadian contractors confirmed (sample: country: CA, BANK OF M*******). Breach notification to Privacy Commissioner and affected individuals required.

IV. Negligence - Security Failures Evidenced in the Data

The database structure itself constitutes evidence of systemic negligence:

Plaintext financial data: Complete Stripe API responses with bank names, routing numbers, and account holder names stored as unencrypted JSON
No field-level encryption: Names, emails, phones, DOBs, and addresses readable as-is in the export
Excessive data collection: Full Stripe API responses when only an account ID was needed; desktop screenshots capturing vastly more than needed to verify work hours; HaveIBeenPwned results stored as fraud signals; Persona KYC session tokens persisted indefinitely
Infrastructure failures: ngrok dev tunnels with developer IPv6 in production config; AWS account ID embedded in S3 bucket names; sandbox tokens persisted after session expiry; GitHub Actions URLs exposing the private monorepo

V. Third-Party Claims

Individuals who never created Mercor accounts have their data exposed:

UserReferences: Names, emails, employers, and relationships of professional references
LinkedinWarmIntros: LinkedIn profile URLs and email addresses of people contacted for outreach
CandidateVouches: Relationship details provided by vouchers

These individuals never consented to data collection and likely never received a privacy notice. Under GDPR Article 14, Mercor was required to notify them within one month. The breach exposes them to targeted social engineering using their real relationship data.

Summary - Combined Legal Exposure

Claim	Plaintiff Class	Key Evidence
Trade secret misappropriation	Apple, Amazon, OpenAI, Anthropic, Meta, Google	Pre-release model outputs, evaluation methodologies, RLHF data, rubrics, CoT traces
Breach of confidentiality / NDA	All client companies	Signed CIIAs in database, client-named Airtable workspaces with proprietary data
Competitive harm	All client companies	Training data, evaluation frameworks, and benchmark data now available to rivals
APEX benchmark contamination	Companies relying on APEX results	Complete benchmark tasks, criteria, and scores exposed
Financial data negligence	30,000+ contractors	Plaintext bank routing numbers, Stripe account details
Surveillance overreach	30,000+ contractors	Desktop screenshots of personal devices, personal browsing, background windows
Automated adverse actions	Contractors denied/terminated	Fraud scores, LLM-generated reasoning, no disclosure or appeal
CCPA violations	30,000+ contractors	Failure to maintain reasonable security
GDPR violations	EU/UK contractors	Data minimization, automated decisions, notification delays
BIPA violations	Contractors who completed Persona KYC	Facial geometry scans, liveness detection
Third-party privacy	References, LinkedIn contacts, vouchers	Data collected without consent, now in criminal hands

The client claims are likely the largest in dollar terms — the economic value of the lost trade secrets (training data, evaluation methodologies, pre-release model outputs) runs into the billions. The contractor claims are the broadest in scope — affecting every individual who ever used the platform. Together, the total legal exposure is conservatively in the hundreds of millions of dollars before punitive damages.

Conclusion - What Happens Now

The breach is not a past event. It is an ongoing situation with no clear resolution.

The Data Is Still in Circulation

Mercor allegedly paid the attackers to have the data removed from the Lapsus$ leak site — a fact confirmed to us directly by Lapsus$ themselves. The data was taken down briefly. It reappeared. The group is now actively selling the full dataset to private bidders while continuing to distribute samples. The two files analyzed in this report were obtained after the ransom was paid. This is the predictable outcome of paying ransom for digital assets — there is no mechanism to verify deletion, no way to revoke copies already distributed, and every economic incentive for the attackers to continue monetizing the data through private sales, selective leaks, and derivative attacks. Mercor's ransom payment bought nothing except proof that they considered the data worth paying to suppress.

The attackers now possess:

The complete identity of every Mercor contractor — name, email, phone, date of birth, home address, bank routing number, government ID verification status, and a photographic record of their desktop activity
The complete client map — which companies use Mercor, what projects they run, which annotation platforms they use, and what their internal Slack workspaces and Okta SSO groups are called
Apple's pre-release Foundation Model outputs, Amazon's Chain-of-Thought evaluation methodology, OpenAI's Feather platform campaign UUIDs, and Anthropic's model comparison data
The source code for Mercor's entire platform — including its fraud detection algorithms, MercorScore ranking system, and payment infrastructure — providing a complete blueprint for exploitation
Tailscale VPN credentials and network topology — a map of Mercor's internal infrastructure that could enable further unauthorized access if credentials have not been fully rotated
939GB of code repositories that likely contain hardcoded API keys, database credentials, and third-party service tokens scattered across commit history

This is not a dataset that loses value over time. The PII is permanent. The bank routing numbers don't expire. The government ID verification records don't reset. The signed legal documents don't un-sign. And the AI training data — the RLHF annotations, preference rankings, and rubric evaluations produced for frontier AI labs — retains its full value to any competitor seeking to accelerate their own model development.

The Ongoing Threat

With this data, the attackers (or any subsequent buyer) can:

Launch targeted phishing campaigns against every Mercor contractor, using their real name, employer, project assignment, and pay rate to craft highly convincing social engineering attacks
Commit financial fraud using the bank names, routing numbers, and account holder names stored in MercorUserFinancials
Blackmail contractors whose desktop screenshots may reveal confidential client information, personal browsing activity, or employment at companies their current employer doesn't know about
Attack Mercor's clients using the Slack workspace URLs, Okta SSO configurations, and annotation platform campaign IDs as entry points for further social engineering or credential stuffing
Sell the AI training data — the prompts, responses, evaluations, and preference rankings — to competitors or foreign actors, undermining billions of dollars of investment by OpenAI, Anthropic, Apple, Amazon, Meta, and Google DeepMind
Exploit the source code to identify vulnerabilities in Mercor's (and potentially its clients') systems that have not yet been patched
Impersonate Mercor staff using the internal employee names, Slack IDs, and GitHub usernames found throughout the database to conduct supply-chain attacks against Mercor's clients and partners

Each of these vectors becomes more dangerous the longer the data remains in circulation — and there is no indication it will stop circulating.

The Case for Radical Transparency

There is an uncomfortable truth that Mercor, its clients, and the affected contractors must confront: the data is out. It cannot be put back.

The current trajectory — where the breach is acknowledged in vague corporate language, specific questions are deflected, and affected individuals receive minimal information about what was exposed — serves no one except the attackers. It creates an information asymmetry where the adversary has complete knowledge of what was taken, while the victims operate in the dark.

Every contractor whose bank routing number is in MercorUserFinancials deserves to know — specifically — that their bank name, routing number, and account holder name were stored in plaintext JSON and are now in the hands of criminal actors. Every contractor whose desktop screenshots are in the mercor-insightful-screenshots-production S3 bucket deserves to know that their IP address, MAC address, browser history, and application usage during work sessions are exposed. Every client whose annotation platform URLs, Slack workspaces, and proprietary model outputs appear in the Airtable exports deserves to understand the exact scope of their secondary exposure.

The alternative to transparency is prolonged paranoia. If Mercor does not disclose the specific contents of the breach, every contractor must assume the worst about what was taken. Every client must assume their internal systems were visible on a contractor's screen. Every reference, every LinkedIn contact, every vouching party must assume their personal information was collected without their knowledge and is now compromised.

Perhaps the most constructive path forward — however counterintuitive — is full, detailed, public disclosure of exactly what the breach contained. Not the raw data itself, but a complete accounting: which tables, which fields, which categories of PII, which clients, which time periods. The world can adjust to a known breach. It cannot adjust to an unknown one. Sunlight remains the best disinfectant, and in the aftermath of a breach of this magnitude, the cost of silence far exceeds the cost of honesty.

The contractors who built the AI training data that powers the world's most valuable models deserve at least that much.

A Structural Critique - Youth Velocity and the Cost of Immaturity

Mercor's three founders — Brendan Foody, Adarsh Hiremath, and Surya Midha — were 21 years old when they raised their Series A. They became the world's youngest self-made paper billionaires at 22 when their Series C valued the company at $10 billion. The average age of the Mercor team was reported at 22 years old. They are Thiel Fellows — college dropouts celebrated for building fast. They stored bank routing numbers in plaintext, ran a flat network where a single VPN hop reached everything, and let 4 terabytes walk out the door without anyone noticing.

Perhaps Mercor is best understood as a phenomenon of hype and strong mimetic desire within the AI industry. Perhaps the AI labs got ahead of themselves too early. Perhaps researchers and vendor managers chose Mercor not because they evaluated the vendor thoroughly enough to handle critical workloads, but because OpenAI was already using it.

The pattern is worth examining. OpenAI was one of Mercor's earliest major customers. The relationship began when Mercor's 20-year-old CEO cold-emailed OpenAI's head of human data operations, Shaun VanWeelden, and landed a contract to recruit Math Olympiad winners for model training. VanWeelden later left OpenAI to become Mercor's managing director. Two sitting OpenAI board members — Adam D'Angelo (Quora CEO) and Larry Summers (former U.S. Treasury Secretary) — invested in Mercor's earlier funding rounds.

This is not without precedent. Much of the AI data infrastructure landscape has been shaped by proximity to OpenAI. Scale AI's Alexandr Wang was Sam Altman's roommate during the pandemic. Scale went through Y Combinator when Altman ran it. Altman and Wang later discussed an acquisition.

With Mercor, the signal was unmistakable. OpenAI used them. OpenAI's board members invested in them. OpenAI's head of data operations joined them. Once that signal propagated, perhaps the other labs followed not because of independent evaluation, but because OpenAI had validated the choice for them. The $10 billion valuation, the press coverage, and the youngest-billionaires narrative reinforced what was already a foregone conclusion.

The Girardian irony is that this breach — the scapegoating event — may produce the same mimetic cycle in reverse. The labs may collectively abandon Mercor, collectively discover the next shiny vendor, and collectively onboard without asking the hard questions about security and privacy. The sacrifice of the scapegoat restores order. The community moves on, having learned nothing structural — only that this particular vendor was the wrong one.

Having reverse-engineered Mercor's complete operational architecture from its database schema — the annotation pipeline, the evaluation frameworks, the contractor management system, the payment infrastructure — it is clear that the underlying business is well-understood and replicable. For new entrepreneurs, the opportunity is straightforward: build the same platform, but treat security and privacy as foundational rather than an afterthought. The market for AI training data is not going away. The demand for a vendor that handles it responsibly has never been higher.

Appendix A - Complete Table Inventory

All 149+ tables organized by functional domain, with column lists and sample data where present.

Domain 1 - User and Identity

Table	Key Columns	Notes
`MercorUsers_New`	userId, email, name, phone, profilePic, createdAt, lastLogin, location, isWhiteListed, source, firebaseUID, authType, isAnonymous, insightfulId, stripeAccountId, customerId, isDeleted, phoneVerificationStatus, phoneVerifiedAt, phoneOptIn	Primary contractor user table. Sample: `e**a1@gmail.com`, `T O**`, `+44795718**`, `United Kingdom,Harrow`
`MercorUsers_New_backup`	userId, email, name, phone, profilePic, createdAt, lastLogin, location, isWhiteListed, source, firebaseUID, authType, isAnonymous, insightfulId, stripeAccountId, customerId, isDeleted	Historical backup snapshot of user table
`UserLocation`	userLocationId, userId, residenceCountry, residenceState, residenceCity, residenceZipCode, physicalCountry, physicalState, physicalCity, physicalZipCode, version, createdAt, updatedAt	Tracks declared residence vs. physical location. Used in fraud detection. Sample: `residenceCountry=USA, physicalCountry=USA`
`UserLocation_Audit`	All UserLocation columns + auditAction, auditTimestamp	Audit trail for location changes
`UserMetadata`	userMetadataId, userId, workAuthorizationStatus, birthday, physicalLocation, countryOfResidence, createdAt, updatedAt, maxHourCap, contractorMail, fraudStatus, oktaUserId, fraudStatusEnum, oktaAccountState, externalId, maxContracts, offPlatformEmail	Extended user metadata including Okta SSO ID and fraud status
`UserState`	id, userId, resumeUploaded, interviewsCompletedCount, jobApplicationsCount, totalMillisWorked, createdAt, updatedAt	Lifecycle counters — tracks user progression through platform
`UserAvailability_Audit`	availabilityId, version, userId, maxWeeklyHours, desiredWeeklyHours, expectedStartOffset, expectedStartOffsetUpdatedAt, earliestStartDateChoice, timezone, updatedAt, createdAt, auditAction, auditTimestamp	Declared working hours and timezone preferences
`UserReferences`	referenceId, email, name, company, relationship, userId	Professional references provided by contractors
`WorkAuthorization_Audit`	workAuthorizationId, userId, birthday, physicalCountry, workAuthorizationStatus, agreedToLocation, signature, attestedAt, source, version, createdAt, updatedAt, auditAction, auditTimestamp	Work authorization attestations with digital signatures
`UserPlatformStatus`	id, userId, status, action, source, sourceDetail, isLatest, createdAt	Platform access status (active, suspended, banned)
`LinkedinUsers`	id, name, url, email, company, position, lastUpdated	LinkedIn profile cache used for warm intros and candidate sourcing
`MembershipSnapshots`	scopeType, scopeId, userId, createdAt	Point-in-time snapshots of group/project memberships

Domain 2 - Identity Verification and Background Checks

Table	Key Columns	Notes
`IDVerificationChecks`	verificationCheckId, userId, candidateId, jobId, listingId, provider, source, sessionId, sessionToken, onboardingUrl, sessionStatus, verificationStatus, governmentIdStatus, livenessStatus, addressStatus, attemptNumber, maxAttempts, providerResponse, fraudDecision, flagReasons, manualReviewStatus, createdAt, updatedAt, completedAt	Persona KYC session records. `providerResponse` contains full JSON API response including facial thumbnail keys. `provider=persona`
`BackgroundCheck`	contractorID, externalCandidateId, workLocation, package, invitationId, invitationCreatedAt, invitationCompletedAt, backgroundCheckId, reportId, status, createdAt, updatedAt, adverseMediaCheckStatus	Criminal background check records (Checkr). Status: `clear` / `consider`
`BackgroundCheck_New`	Richer version of BackgroundCheck with additional fields	Updated background check schema
`BackgroundCheckDetails`	Detailed per-check results	Granular check outcomes
`ScreeningPackage`	id, companyId, name, isActive, lastUpdatedAt, checkConfig, graceDays	Per-company screening package configurations defining which checks are required

Domain 3 - Fraud Detection

Table	Key Columns	Notes
`FraudStates`	userId, currentStage, currentDecision, currentConfidence, currentReasoning, currentKeySignals, currentTimestamp, previousStageDecision, createdAt, updatedAt	Current fraud state per user. `currentDecision`: APPROVE / ESCALATE / REJECT. LLM-generated reasoning. Sample signal: `location_mismatch: 1.0`
`FraudCheck`	id, user_id, stage, interviewId, jobId, triggered_on, process_status, retryCount, flag_reasons, automatedReasons, status, priority, idVerificationStatus, manual_review_status, manual_review_rational, manual_review_signs, isMostRecent, assigned_to, assigned_on, splReview	Central fraud queue. Tracks automated and manual review states
`FraudSignalAuditLog`	id, userId, userVersionId, stage, signalType, modelName, triggeredOn, status, modelScore, createdAt	Per-signal audit trail. Every fraud signal evaluated is logged here
`FraudEvents`	id, eventId, userId, eventType, stage, priorAlpha, priorBeta, priorProbability, priorStatus, posteriorAlpha, posteriorBeta, posteriorProbability, posteriorStatus, evidence, createdAt, createdBy, notes	Bayesian belief update log. Each event updates prior→posterior fraud probability
`ProductionFraudState`	id, userId, status, fraudModality, source, sourceDetail, lastEvaluatedStage, productionModelId, userVersionId, isLatest, createdAt, updatedAt	Final production fraud verdict. `fraudModality`: identity / time / quality
`AutoFraudChecks`	Automated rule-based fraud check records	Scheduled fraud scans
`OnProjectFraudWindows`	id, employeeId, contractorId, projectId, scanDate, startTime, endTime, fraudType, fragmentCount, flags, flagMetadata, windowMetadata, screenshotMetadata, createdAt, updatedAt, userVersionId	On-project time fraud analysis windows. Analyzes screenshot patterns
`QAReviewLog`	id, userId, reviewerId, bucketName, status, assignedOn, completedAt, isActive, lockKey, createdAt, updatedAt, comments, decision, userVersionId, stage, signalType, flags	Human QA reviewer assignments and decisions for fraud cases
`CheatingDetection`	annotationId, userId, interviewId, interviewConfigId, formResponseId, formId, isCheating, cheatingProbability, signs, notes, reportedBy, createdAt, updatedAt	Interview cheating detection results
`CheatingDetection_Audit`	All CheatingDetection columns + auditAction, auditTimestamp	Audit trail for cheating detection
`DuplicateGroups`	groupId, userIdList, mergedIntoGroupId, createdAt	Groups of suspected duplicate/sock-puppet accounts

Domain 4 - Hiring Pipeline

Table	Key Columns	Notes
`Listings_New`	listingId, version, uid, companyId, title, description, commitment, referralAmount, createdAt, deletedAt, status, requiredInterviewConfigId, rateMin, rateMax, hoursPerWeek, location, formId, automatedCommsOn, payRateFrequency, isPrivate, autoRedirectToApply, evaluationCriteria, offersEquity, rejectionTemplateSubject, rejectionTemplateBody, campaignId, ownerIds, goalNumHires, goalDeadline, isExploreAlways, interviewSchedulingEnabled, interviewScheduleLink, disableApplications, isMostRecent, offerExtendedText, minHeadcount, maxHeadcount, referralBoost, timeToAutoReject, automaticRejectionsOn, computedExplorePageVisibility, workArrangement, eligibleLocation, ineligibleResidenceLocation, listingType	Primary job listing table. Includes pay ranges, location eligibility, automation settings
`Listings_New_Audit`	All Listings_New columns + auditAction, auditTimestamp	Audit trail for listing changes
`Candidates`	candidateId, userId, companyId, listingUid, createdAt, deletedAt, status, notesForCandidate, birthday, physicalLocation, workAuthorizationStatus, responseId, version, uid, source, countryOfResidence, isMostRecent, listingId, listingStepConfigId, linkedinUrl, actionItem, lastSignificantUpdatedAt, rejectionReason, updatedBy, starred, appliedAt, goalId, automaticRejectAt, addedAt, referralId, isEligible, numCommsSent, lastCommSentAt	Per-application record. Tracks status, notes, scheduled auto-rejection, outreach counts
`Candidates_Audit`	All Candidates columns + auditAction, auditTimestamp	Audit trail for application changes
`CandidateMatchScores`	candidateId, listingId, matchScore, contextualSummary	ML-generated candidate-to-listing fit scores with LLM explanations
`EvaluationCriteria`	evaluationCriteriaId, listingId, criteria, shortCriteria, type, hardFilter, position, updatedAt, evalCriterionCritique, evalCriterionCritiquePass, status	Per-listing scoring rubric criteria
`ListingNotes`	listingNoteId, listingId, authorUserId, assigneeUserId, notificationStatus, createdAt, noteBody	Recruiter notes on listings. Contains candid operational commentary
`SavedListings`	id, userId, listingId, listingUid, createdAt	Candidates who bookmarked a listing
`ListingPipelines`	Pipeline stage configurations per listing	Hiring funnel stage definitions
`TalentViewSearchUsers`	searchId, userId, score, addedAt, starredAt, deletedAt	Users surfaced in talent search results
`SharableTalentViewConfig`	viewId, name, description, userIds, userCount, maxCandidatesCount, createdAt, updatedAt, revokedAt, createdBy, expiryAt, viewCount, visibleSections, preferredTitle	Shareable talent shortlist configurations
`SharableTalentViewConfigUsers`	userId, viewId, workExperience, education, summary, createdAt, updatedAt, yearsOfExperience, interviews, forms, likeCount, dislikeCount, feedback	Per-candidate data within shared talent views
`TalentViewUserEvaluations`	criteriaId, userId, criteriaScore	Per-criteria scores for talent view candidates

Domain 5 - Interviews and Assessments

Table	Key Columns	Notes
`Forms_Audit`	formId, companyId, listingId, title, description, guide, evaluationCriteria, assessmentRubricId, items, isArchived, isAuthed, numQuestions, isUnified, allowFormRetakes, maxRetakeAttempts, allowCopyPaste, version, createdAt, updatedAt, createdBy, auditAction, auditTimestamp, prep, assessmentVersionId, feedbackConfig	Interview/assessment form definitions. `items` contains full question list
`FormSubmissions`	formResponseId, formId, companyId, userId, responseStatus, formVersion, startedAt, submittedAt, activeTimeSeconds, posthogSessionIds, createdAt, updatedAt, attempt, isLatestSubmission, assessmentVersionId, feedbackSentAt	Every interview submission. Tracks time spent (`activeTimeSeconds`)
`AssessmentRubrics`	assessmentRubricId, title, createdAt, instructions, sumScores, sumSquareScores, countScores, version, passThreshold	Scoring rubric definitions with aggregate statistics
`AssessmentRubrics_Audit`	All AssessmentRubrics columns + auditAction, auditTimestamp	Rubric change history
`AssessmentRubricItems_Audit`	assessmentRubricItemId, assessmentRubricId, criteria, shortName, points, position, format, relatedQuestionIds, version, auditAction, auditTimestamp, webSearch, smartScoring, type, config, createdAt, updatedAt	Individual rubric criteria with AI scoring configuration
`AssessmentEvalState`	id, submissionId, assessmentType, jobType, status, retryCount, createdAt, reason, triggerSource, triggeredByUserId, modalJobId, durationMs, operationId, assessmentId	Grading pipeline execution state
`AssessmentVersions`	Versioned assessment configurations	Assessment version tracking
`AssessmentAudits`	Assessment activity audit trail	Audit log for assessment operations
`GradedRubricItems`	Per-rubric-item graded scores	Individual rubric item scores per submission
`GradedRubricItems_Audit`	Audit trail for graded items	Score change history
`InterviewEvals`	interviewId, communicationScore, technicalScore, qaPairScores	Aggregate interview scores by dimension
`InterviewScores`	scoreId, userId, interviewId, interviewConfigId, points, createdAt	Final interview score per user
`InterviewIssues`	issueId, interviewId, issue, source, notes, startPosition, endPosition, reportedBy, createdAt, updatedAt	Technical and integrity issues reported during interviews
`PairwiseComparisons`	listingId, listingUid, interviewConfigId, winnerResumeId, loserResumeId, reasoning, winnerUserId, loserUserId	Bradley-Terry tournament comparisons for candidate ranking
`MercorScores`	candidateId, listingId, listingUid, resumeId, evaluationCriteria, interviewConfigId, mScoreRaw, mScoreNormalized, numComparisons, contextualSummary, userId, aggregateFeatureScore	Final MercorScore per candidate per listing

Domain 6 - Work Trials and Onboarding

Table	Key Columns	Notes
`WorkTrial_Audit`	workTrialId, userId, companyId, listingStepConfigId, status, payableAmount, billableAmount, ciiaaDirect, ciiaaPassthrough, tow, offerLetter, startDate, endDate, payout, payment, paymentMethod, signature, projectId, billingAccountId, createdAt, updatedAt, version, auditAction, auditTimestamp, updatedBy	Work trial contract records. Contains signed legal documents and pay amounts
`WorkTrialConfig`	workTrialConfigId, title, payableAmount, billableAmount, ciiaaDirect, ciiaaPassthrough, tow, endDate, emailTemplateSubject, emailTemplateBody, emailTemplateSubjectExtension, emailTemplateBodyExtension, interviewIds, formIds, createdAt, updatedAt, deletedAt, companyId, isUnified, projectId	Reusable work trial templates
`OnboardingState`	id, shortName, name, threshold, createdAt, updatedAt, order	Onboarding funnel steps. Sample: `interview_completed` threshold=1 order=0
`OnboardingDocument`	onboardingDocumentId, onboardingDocument, createdAt, projectId	Per-project onboarding materials
`TierProgress`	id, createdAt, updatedAt, userId, tierId, planId, status, completedAt, paidAt	Contractor tier/level progression tracking
`PlanAssignments`	id, createdAt, updatedAt, userId, planId, assignedBy, startDate, endDate, userHours, tasksCompleted, status	Assigns contractors to specific earning/task plans

Domain 7 - Projects and AI Task Management

Table	Key Columns	Notes
`Projects_Audit`	projectId, name, createdAt, updatedAt, companyId, archivedAt, externalId, onboardingDocumentId, userId, screenshotEnabled, userGroupEmail, description, requireAvailabilityUpdates, skills, projectType, offerExtendedText, annotationPlatform, annotationPlatformIDs, ssotLink, status, notes, version, auditAction, auditTimestamp, taskMetricsDatastore	Full project configuration audit trail
`ProjectIAM`	id, projectId, userId, roleId, status, assignedBy, version, createdAt, updatedAt	Role assignments within projects
`ProjectIAM_Audit`	All ProjectIAM columns + auditAction, auditTimestamp	Project IAM change history
`ProjectCustomColumns`	id, projectId, name, dataType, position, createdBy, createdAt, updatedAt, deletedAt, sqlQuery, source	Dynamic metadata columns per project. Some computed via SQL
`ProjectCustomColumnValueHistory`	id, customColumnId, jobId, value, changedBy, createdAt	History of custom column values
`ProjectArchetypes`	archetypeId, projectId, archetypeText, createdAt, updatedAt, version, elements	Character/persona definitions for annotation projects
`ProjectAttributeValues`	Project attribute key-value pairs	Flexible project attribute storage
`ProjectViewConfig`	viewId, title, projectId, viewContext, createdByUserId, createdAt, updatedByUserId, updatedAt, deletedAt, roleId, viewType	Saved view configurations for project management
`ProjectIntegrations`	id, projectId, groupMail, autoProvision, createdAt, updatedAt, oktaGroupId, integrationsData, oktaOwnerGroupId, oktaEPMGroupId, latestGroupBatch, latestBatchMemberCount, projectShortId, workspaceNotificationChannel, ownerGwGroup, epmGwGroup, slackChannelId	Project integrations with Okta groups and Slack channels
`ProjectAutomations`	Project-specific automation configurations	Automation bindings per project
`ProjectFunctions`	id, name, description, createdAt, updatedAt	Named functions available in project automation
`TaskDefinitions`	taskDefId, projectId, rubric, autograder, version, createdAt, updatedAt, task_schema, metadata	AI task type definitions with grading rubrics
`TaskDefinitions_Audit`	All TaskDefinitions columns + auditAction, auditTimestamp	Task definition change history
`TaskAudits`	uid, taskDefinitionId, recordId, s3KeyPrefix, authorId, auditorId, status, outcome, autoOutcome, createdAt, updatedAt, dispute, disputedBy	Individual task submission reviews with dispute tracking
`TaskAssignments`	id, createdAt, updatedAt, jobId, taskId, userId, appliedBy	Maps tasks to jobs and users
`DeliverableBatches`	id, uid, name, projectId, invoiceLineItemId, status, taskCount, version, isLatest, metadata, createdAt, updatedAt, createdBy	Grouped task deliverable batches for invoicing
`Deliverables`	deliverableId, jobId, userId, projectId, entityType, entityId, status, createdAt, updatedAt	Individual deliverable records
`Deliverables_Audit`	All Deliverables columns + isMostRecent, auditAction, auditTimestamp	Deliverable change history
`ProductivityProjectRules`	id, project_id, description, rules, created_by, is_active, version, created_at	Per-project productivity monitoring rule configurations

Domain 8 - Jobs and Contracts

Table	Key Columns	Notes
`Jobs`	jobID, contractorID, companyID, status, payableRate, commitment, ciiaa_direct, ciiaa_passthrough, tow, payment, startDate, createdAt, updatedAt, expiresAt, tax_form, expected_hours, title, stripeSubscriptionId, billableRate, version, dismissalDate, insightful, paymentMethod, projectID, checkr, idVerification, uid, payout, offerLetter, listingUID, managerId, signature, backgroundCheck, isLatest, note, referralId, roleId, provisionIdpAccess, safety_waiver, sourceId, confidentiality, billingAccountID, backgroundCheckConfig	Core employment contract. Contains pay rates, legal agreements, Stripe subscription
`Jobs_Audit`	All Jobs columns + auditAction, auditTimestamp	Job contract change history
`JobEvents`	jobEventId, jobId, contractorId, actorId, actionType, metadata, createdAt	Events on job contracts (status changes, communications). Sample: `comm`, `Contract Reminder`
`JobEventsQueue`	queueItemId, sourceType, sourceId, payload, renderedPreview, editedPreview, status, response, createdAt, resolvedBy, resolvedAt, jobEventId	Queued job events pending processing or review
`JobEventReasonAssociations`	jobEventId, reasonId, createdAt	Structured reasons associated with job events
`JobTasks`	Tasks linked to specific jobs	Job-task mapping
`JobPerformanceMetrics_New`	jobPerformanceMetricsId, jobId, performanceScore, standardError, jobPerformanceSummary, version, createdAt, updatedAt	ML-generated job performance metrics
`JobPerformanceMetrics_Audit`	jobPerformanceMetricsId, jobId, version, lvr, lvrReasoning, confidenceLevel, isFraud, wasDismissedEarly, jobSummary, auditAction, auditTimestamp, createdAt, updatedAt	Detailed performance metrics audit trail including fraud flags
`JobPerformanceReviews_New`	performanceReviewId, jobId, contractorId, companyId, projectName, taskId, score, reviewNotes, performanceReasons, dismissalFlag, dismissalReason, reviewedBy, createdAt, updatedAt, oldReviewId, feedBackFlag	Human-reviewed job performance assessments
`WeeklyProjectFeedback`	weeklyProjectFeedbackId, userId, jobID, weekStart, rating, feedbackText, submittedAt, updatedAt, createdAt	Weekly contractor feedback on their project experience
`ContractorPerformance_New`	contractorPerformanceId, contractorId, standardError, performanceScore, performanceSummary, version, createdAt, updatedAt	Aggregate contractor performance across all jobs
`ContractorPerformance_New_Audit`	All ContractorPerformance_New columns + auditAction, auditTimestamp	Contractor performance change history
`PerformanceReviews`	performanceReviewId, contractorId, reviewDate, performanceDetails, stars, taskDetails, reviewBy, createdAt, updatedAt, companyId	Company-authored contractor performance reviews with star ratings
`MLExperimentsJobPerformanceReviews`	Date of review, Account, Project, Reviewer, Work type, Review type, Name, Email, Quality of Work, Engagement, Offboarding Reason, Justification for rating	Raw performance data for ML model training

Domain 9 - Time Tracking and Productivity Surveillance

Table	Key Columns	Notes
`InsightfulScreenshots`	id, externalId, contractorId, projectId, storageBucket, storageKey, storageUrl, storageProvider, fileExtension, contentType, fileSizeBytes, vendorName, schemaVersion, vendorMetadata, externalIdentifiers, screenshotTimestamp, timestampTranslated, timezoneOffset, timezone, isBlurred, isOriginal, isRemoved, removedAt, externalProductivityScore, computer, hwid, os, osVersion, agentVersion, appName, appFileName, appFilePath, windowTitle, browserUrl, document, browserSite, ip, gateways, windowId, activityId, fragmentId, createdAt, updatedAt	Per-screenshot records with full device fingerprint (IP, MAC, HWID), application, URL, and S3 image link
`Timelog`	id, externalId, externalProjectId, employeeId, duration, timeStart, timeEnd, timezone, source, taskId, taskName, lineItemUid, adjustmentReason, uid, version, userId, isCompleted, linkFailReason, insightfulCreatedAt, insightfulUpdatedAt, createdAt, updatedAt	Work session records synced from Insightful
`Timelog_Audit`	All Timelog columns + audit metadata	Timelog change history
`Deductions`	id, contractId, contractorId, durationToSubtractMs, appName, reasonForDeduction, payoutCycleID, externalProjectId, externalEmployeeId, status, approvedBy, approvedAt, appliedBy, appliedAt, createdAt, createdBy, updatedAt	Pay deductions for non-productive time with approval chain

Domain 10 - Payments and Financial Infrastructure

Table	Key Columns	Notes
`UserPaymentMethods`	id, userId, provider, providerMethodId, methodType, status, metadata, createdAt, updatedAt, version, countryCode	Contractor payment accounts. Sample: `stripe`, `acct_1R0V****`, `express_account`, `onboarded`, `USA`
`UserPaymentMethods_Audit`	All UserPaymentMethods columns + auditAction, auditTimestamp	Payment method change history
`MercorUserFinancials`	id, userId, paymentProvider, providerIdentifier, accountDetails, lastFetchedOn, createdOn, updatedOn	Full financial account details including bank routing numbers
`PaymentLineItems`	id, version, cycleStartTs, cycleEndTs, totalPayableAmount, totalBillableAmount, status, createdAt, updatedAt, uid, jobUid, dispatchFailureReason, timelogUid, bonusUid, transferId, referralUid, companyId, projectId, contractorId, timeStamp, isLatestVersion, referralId, moneyOutId, eventTime, referralEligibilityId	Core payment ledger. Amounts in cents
`PaymentLineItems_Audit`	All PaymentLineItems columns + auditAction, auditTimestamp	Payment line item change history
`PaymentLineItems_TransactionalAudit`	Transactional-level payment audit	Fine-grained payment operation audit trail
`MoneyOut_Audit`	id, statementId, entityId, userId, entity, externalAccountId, externalTransferId, cycleStartTs, cycleEndTs, totalAmount, paymentMethod, status, createdAt, failureReason, payoutCycleId, auditTimestamp, auditAction, version	Outbound payment records
`WiseDisbursements`	id, moneyOutId, amount, currency, sequenceNumber, wiseTransferId, wiseQuoteId, status, failureReason, createdAt, updatedAt, accountId	International Wise payment records
`PayoutCycles`	cycleStartTs, cycleEndTs, id, status, configId, configVersion	Pay period definitions
`PayoutRecords`	Individual payout transaction records	Detailed payout ledger
`PayoutConfigs`	payoutConfigId, status, type, configuration, version	Payment configuration rules
`InvoiceLineItems`	id, name, companyId, invoiceId, sowId, taskCount, rawAmount, adjustedAmount, status, description, metadata, createdAt, updatedAt, createdBy	Company invoice line items
`BillingAccounts`	Company billing account definitions	Client billing account management
`BillingConfigs`	id, uid, version, isLatestVersion, rules, projectId, createdAt, updatedAt, createdBy	Billing rule configurations (markup, caps)
`BillingRateCards`	billingRateCardId, uid, version, isLatestVersion, sowId, formulaType, rateRows, createdAt, updatedAt, createdBy	Per-SOW rate card definitions
`RevenueAdjustments`	id, companyId, projectId, attestationId, cancelledAdjId, amountCentsUsd, category, revenueRecognitionDate, reason, createdAt, creatorId, isCancellation, formula, labels, aggregationFields, attachments, invoices	Revenue adjustments and corrections
`FinanceLabels`	Finance label definitions	Labels for financial categorization
`CompanyFinanceLabels`	companyId, financeLabelId, createdAt, creatorId	Finance label assignments to companies
`ReferralEligibility`	id, createdAt, updatedAt, referralUid, campaignId, referrerAmount, refereeAmount, referrerLineItemId, refereeLineItemId, criteriaId, onboardingStateId, referralId, entity_id, entity_type, type, jobId, billingAccountId, toolingIdempotencyKey, creatorId	Referral payment eligibility and vesting conditions

Domain 11 - Referrals and Growth

Table	Key Columns	Notes
`Referrals`	referralId, referredUserId, referringUserId, createdAt, version, uid, status, reason, listingId, campaignId, totalEarned, totalEarningsPotential, state, deleted_at, paidAt, disputeStatus, isActive, referral_cap, referralIdempotencyKey, isPaymentBlocked, isGuaranteedReferral	Core referral records with earnings tracking
`Referrals_Audit`	All Referrals columns + audit metadata	Referral change history
`ReferralReminder`	referralId, createdAt, lastSentAt	Referral reminder email tracking
`GuaranteedReferralQuota`	quotaId, referringUserId, offPlatformUserId, shortenedLink, weekStart, status, createdAt, updatedAt, isEmailSent	Guaranteed referral program quota management
`ReferrerMeta`	Referrer metadata and configuration	Additional referrer attributes
`OffPlatformCampaigns`	Campaign definitions for off-platform outreach	External recruitment campaign management
`OffPlatformCampaignSteps`	campaignStepId, stepNumber, campaignId, campaignType, subject, messageTemplate, parameters, scheduledAt, status, outreachedCandidateIds, failedCandidateIds, createdAt, updatedAt	Multi-step outreach sequence steps
`OffPlatformRecruitingManager`	id, managerId, offPlatformUserId, listingId, createdAt, updatedAt, updatedBy	Off-platform recruiter assignments
`OffPlatformUsersMapping`	mappingId, userId, offPlatformUserId, createdAt, updatedAt, referringUserId, status	Mapping between platform and off-platform user identities

Domain 12 - Communications and Outreach

Table	Key Columns	Notes
`Comms`	commId, groupId, senderId, receiverId, content, type, triggerRef, createdAt, listingReferenceUID	In-platform messaging with full message content
`CommsSent`	Communication delivery records	Message send tracking
`EmailTemplates`	emailTemplateId, companyId, subject, content, createdBy, createdAt, updatedAt, isGlobal, tags, isPersonal	Email template library
`AircallComms`	Phone call logs from Aircall VoIP integration	Recruiter call records
`LinkedinWarmIntros`	warmIntroId, linkedinUrl, email, referringUserId, listingId, commEvent, status, createdAt, updatedAt, sentAt	LinkedIn outreach campaign records
`PartnerChatThreads`	threadId, listingId, referralId, partnerId, createdAt	Chat threads with referral partners
`FirstTimeInvites`	commId, userId, listingId, createdAt, commEvent, refListingUid, contentType, subject, listingIdCount	First-contact invitations to candidates
`AutomationTemplates`	templateId, name, description, category, handler, sourceType, sourceSql, templateBody, paramsSchema, cron, idempotency, autoApprove, version, createdAt, updatedAt, deletedAt, triggerConfig, config	Automated notification/workflow templates
`Feedback`	id, user_id, question_text, question_response, rating, device, created_at, updated_at	In-app user feedback submissions

Domain 13 - Company and Access Management

Table	Key Columns	Notes
`Company`	companyId, name, description, website, externalName, billingModel, logo, brandVisible, billingStartDay, billingEndDay, aboutCompany, universe	Client company master records
`IAM`	roleId, companyId, status, userId_v4, id, version	Company-level role assignments
`IAM_Audit`	roleId, companyId, status, userId_v4, id, version, auditAction, auditTimestamp	IAM change history. Sample: `roleId=ghost`, `REMOVED`
`IAMOutbox`	id, resourceType, resourceId, relation, subjectType, subjectId, operation, requestedBy, requestedByService, createdAt, callerToken	IAM change outbox for event-driven propagation
`GodmodeCompanies`	companyId, createdAt, createdBy, includeInFillRate	Companies accessible via internal Godmode admin
`GodmodeArbitraryCells`	entityType, entityGmId, acKey, acValueNumber, acValueString, acValueFormula, userId, createdAt, acMetadata	Arbitrary Godmode data cells for internal operations
`Audience`	id, projectId, companyId, audienceType, slug, anchorType, anchorId, oktaGroupId, googleGroupId, slackGroupId, insightfulTaskId, createdAt, updatedAt, slackChannelId, query	Audience definitions linking projects to Okta/Slack/Insightful groups
`AudienceTargetProviders`	id, audienceId, name, externalId, type, createdAt, metadata	External providers linked to audiences
`DrivePermission`	id, driveId, googleGroupId, permissionLevel, googlePermissionId, createdAt, updatedAt	Google Drive access permissions for project documents

Domain 14 - Skills Certifications and Endorsements

Table	Key Columns	Notes
`Skills`	skillId, name, description, CertificationPolicy, type, parent, createdAt	Hierarchical skills taxonomy
`CertificationPolicies_Audit`	certificationPolicyId, companyId, name, description, rules, isActive, isUnified, createdAt, icon, isRevokable, requiresApproval, version, auditAction, auditTimestamp, iconColor, showBadge, displayText	Certification program definitions
`Certifications_Audit`	certificationId, certificationPolicyId, userId, evidence, status, isCertified, earnedAt, note, createdAt, updatedAt, version, auditAction, auditTimestamp	Individual earned certifications. `evidence` contains scoring proof
`SkillCertifications_Audit`	uid, userId, skillId, isCertified, version, lastEvaluatedAt, auditedAt, auditAction	Per-skill certification status
`SkillCertificationsEvidence_Audit`	uid, userId, skillId, isCertified, version, sourceType, sourceId, createdAt, updatedAt, auditedAt, auditAction, score, metadata	Evidence backing skill certifications
`ContractorEndorsements`	endorsementId, endorsingJobId, endorsedJobId, endorsingUserId, endorsedUserId, contents, tags, createdAt, updatedAt, source, sentiment	Peer endorsements with text content and sentiment
`UserResumeEvaluation`	evaluationId, workExperienceScore, yearsOfWorkExperience, graduationYear, mScore, inferredRole, workExperienceSkills, resumeEvalScore, awardScore, educationScore, rateAcademicCompetitions, rateCompetitiveProgramming, rateHackathonPerformance, sumScore, technicalSkills, normalisedSumScore, highestDegree, userId	ML resume evaluation scores
`CandidateVouches`	vouchId, voucherUserId, candidateUserId, candidateEmail, candidateLinkedinId, candidateName, resumeS3Key, resumeHash, howKnowSocialPlatform, howKnowSocially, howKnowWorkedTogether, howKnowStudiedTogether, howKnowOther, reasonSkills, reasonEducation, reasonEmployer, reasonExpertise, reasonOther, createdAt, updatedAt	Structured peer vouching with relationship details

Domain 15 - Analytics and ML

Table	Key Columns	Notes
`DbtFirmSchoolRank`	firmId, firmName, academicField, nProfiles, avgSchoolRank, medianSchoolRank, priorMeanSchoolRank, ebPriorStrength, ebAvgSchoolRank, firmsInField, firmSchoolRank, firmSchoolRankPercentile	Employer prestige scores for ~154,000 firms. Used in resume scoring
`DbtSchoolRankings`	academicField, schoolName, schoolScore, schoolRank	School prestige rankings by field
`PosthogAnalytics`	uuid, userEmail, company, startTimeUtc, endTimeUtc, activetime, inactivetime, startUrl	PostHog sessions linked to user email identity
`SearchAnalytics`	run_id, run_timestamp, avg_relevance_score, avg_prestige_score, p99_latency_ms, position_weighted_relevance_score, avg_relevant_prestige_score	Search quality metrics over time
`ForecastMetrics`	entity, id, dt, snapshot_dt, modelVersion, predictedValue	ML forecast outputs for capacity and fill rate planning
`MLExperimentsJobPerformanceReviews`	Date of review, Account, Project, Reviewer, Work type, Review type, Name, Email, Quality of Work, Engagement, Offboarding Reason, Justification for rating	Raw performance review data for ML training
`TalentViewUserEvaluations`	criteriaId, userId, criteriaScore	Structured per-criteria talent evaluations
`ProductivityProjectRules`	id, project_id, description, rules, created_by, is_active, version, created_at	Per-project productivity monitoring rule definitions

Domain 16 - Infrastructure and DevOps

Table	Key Columns	Notes
`IacDeploymentRuns`	id, runType, environment, status, commitSha, branch, actor, githubRunId, githubRunUrl, prNumber, stacksAffected, resourcesAdded, resourcesChanged, resourcesDestroyed, summary, durationSeconds, startedAt, completedAt, createdAt	Terraform deployment records. Exposes GitHub monorepo URLs, engineer usernames, Terraform plan output
`ProductionDeployment`	deploymentRecordId, releaseTag, buildHash, deployedAt, deploymentIds, taskDefinitionArns, status, createdAt, updatedAt	ECS production deployment records. Contains AWS task definition ARNs
`PreprodDeployment`	id, releaseTag, commitSha, deployedAt, loadTestPassed, releaseOwner, status, createdAt, updatedAt	Staging deployment records with load test results
`PreprodDeploymentTest`	id, test_message, created_at, updated_at	Test table for pre-production deployment validation
`ProductionVersion`	id, lastVersion, lastReleaseTag, lastBuildHash, updatedAt	Single-row pointer to current production version
`RollbackExecution`	Rollback event records including affected services	Emergency rollback tracking
`DATABASECHANGELOG`	ID, AUTHOR, FILENAME, DATEEXECUTED, MD5SUM, DESCRIPTION, COMMENTS, EXECTYPE, LIQUIBASE	Liquibase schema migration history. Reveals engineer names, migration filenames
`DATABASECHANGELOGLOCK`	Liquibase migration lock state	Prevents concurrent schema migrations
`AgentSandboxes`	sandboxId, userId, title, agentType, status, backendType, host, stopReason, transcriptRawUrl, transcriptConsolidatedUrl, snapshotId, lastSnapshotId, snapshotStorageKey, acpSessionId, backendId, sandboxToken, claimedAt, expiresAt, createdAt, updatedAt, deletedAt	AI coding agent sandbox sessions. `transcriptRawUrl` links to S3 conversation logs
`DrivePermission`	id, driveId, googleGroupId, permissionLevel, googlePermissionId, createdAt, updatedAt	Google Drive permission records

Domain 17 - Reference and Miscellaneous

Table	Key Columns	Notes
`Country`	id, isoCode3, name, currency, psp, createdAt, updatedAt	Country reference table with payment service provider per country
`TagAssignments_Audit`	tagAssignmentId, tagId, entityType, entityId, createdAt, updatedAt, version, auditAction, auditTimestamp	Tag assignments to entities
`ShortenedUrls`	URL shortener records	Shortened URL definitions
`UrlClicks`	id, urlId, clickedAt, ipHash, userId, country	Click tracking on shortened URLs
`BeelineJobMapping`	External job platform mapping	Maps Mercor jobs to Beeline external system
`UserManagement`	Internal user management records	Admin user management
`UserManagementWorkflows`	User management workflow state	Multi-step user management processes
`ActionsQueue`	Queued action records	General purpose action queue
`GoldenReviewSample`	Golden reference samples for review calibration	QA calibration data
`References`	Professional reference records	Additional reference management
`CatfishAuditLog`	id, slackUserId, slackUserName, targetEmail, platform, environment, intent, status, errorMessage, slackChannelId, createdAt	Internal user lookup audit. Records every time staff look up user data via "Catfish" tool
`CapacityApplicationLog`	id, capacityBudgetId, capacityLogId, projectId, actionsTakenJson, status, notes, createdAt	Capacity budget application tracking
`OffPlatformCampaignSteps`	campaignStepId, stepNumber, campaignId, campaignType, subject, messageTemplate, parameters, scheduledAt, status, outreachedCandidateIds, failedCandidateIds, createdAt, updatedAt	Off-platform outreach campaign step execution

End of Appendix A

Document prepared for security research and educational purposes. All PII has been obfuscated.

Anatomy of Mercor's Data Breach ¶

A technical analysis a complete operational data (production database, user & customer data) loss ¶

Table of Contents ¶

Executive Summary ¶

Scope of This Article and the Full Scale of the Breach ¶

The Breach at a Glance ¶

What Was Taken - The Full 4TB ¶

Why This Matters Beyond Mercor ¶

Why This Breach Is Serious ¶

Why AI Training Data Is Worth Billions ¶

The Extent - What Data Was Exposed ¶

1. Personal Identity Information ¶

2. Government Identity Documents and Biometrics ¶

3. Financial and Banking Data ¶

4. Employment and Performance Records ¶

5. Criminal Background and Adverse Media Checks ¶

6. Work Authorization and Immigration Status ¶

7. Device Fingerprints, Network Identifiers and Surveillance Data ¶

8. Fraud Profiling and Algorithmic Decision-Making ¶

9. Communications and Third-Party PII ¶

10. PostHog Behavioral Analytics De-Anonymized ¶

Legal Significance of This PII Inventory ¶

The Scope - Who Is Affected ¶

The Scale - Mercor Client Ecosystem ¶

Confirmed Client Engagements Found in the Sample ¶

Airtable Workspace Inventory ¶

Evidence from Named Projects in the Database ¶

The Magnificent Seven, Frontier AI Labs, and the Competitive Fallout ¶

The Airtable Export - 84 Workspaces, 1055 Files ¶

Customer and Third-Party Platform URLs Found in the Dump ¶

Client Annotation and Work Platforms ¶

Mercor Internal Infrastructure URLs ¶

AWS S3 Buckets ¶

Google Workspace Documents ¶

Communication and Collaboration Evidence ¶

What This URL Inventory Means ¶

The Screenshot Problem ¶

Platform Overview ¶

Evidence - The Database Layer by Layer ¶

Part I - User and Identity Layer ¶

The Contractor Profile ¶

Location and Residence Data ¶

Referral and Social Vouching ¶

Part II - Identity Verification and Fraud Detection ¶

The KYC Layer ¶

The Fraud Pipeline ¶

Part III - The Hiring Pipeline ¶

Listings ¶

Candidates ¶

Part IV - Interviews and Assessments ¶

AI Interview System ¶

Part V - Work Trials and Onboarding ¶

Work Trial Contracts ¶

Onboarding Pipeline ¶

Part VI - Projects and AI Task Management ¶

Project Structure ¶

AI Task System ¶

Part VII - Time Tracking and Productivity Surveillance ¶

The Insightful Integration ¶

Timelog ¶

Deductions ¶

Part VIII - Payments and Financial Infrastructure ¶

Contractor Payment Methods ¶

Payment Line Items ¶

Company Billing ¶

Referral System ¶

Part IX - Communications and Outreach ¶

Internal Messaging ¶

External Outreach ¶

Notification Infrastructure ¶

Reverse Engineering - Architecture and Infrastructure ¶

Part X - Infrastructure and DevOps ¶

Deployment Pipeline ¶

Database Schema Management ¶

Agent Sandboxes ¶

Part XI - Analytics and ML Layer ¶

School and Firm Rankings ¶

Resume Evaluation ¶

Behavioral Analytics ¶

ML Experiments ¶