How we validate the 509 dataset: what we corrected, why, and what it changes

Methodology · White paper · Updated 2026-06-28

The ABA Standard 509 disclosures are the best public record of who law schools admit, what they charge, and where their graduates end up — but fifteen years of spreadsheets kept by more than two hundred schools leave the raw record with real errors. This is the full accounting of how Exhibit 509 reconciles that record: the one rule we follow, the audit that backs it, every class of correction we made and why, the year each field actually describes, and the limits we keep visible.

The one rule: each field is checked against its own source of truth

Not every number on a school's page comes from the same document, so not every number is validated the same way. The governing principle is simple: each field is reconciled against the authoritative ABA source for that field, and nowhere else.

Field groupAuthoritative source
Admissions (LSAT & GPA 25/50/75, plus GRE and JD-NEXT percentiles where reported), enrollment & volume (applications, offers, JD enrollment), tuition, fees, scholarships & grants (the award-distribution shares and the grant-amount dollar percentiles), bar passageThe ABA Standard 509 PDF reports — 3,009 individual reports, 2011–2025
Employment outcomes (emp_*)The ABA's separate employment-summary disclosure — an Excel workbook, not carried in the 509 PDF

This distinction matters and we state it plainly: the 509 PDF does not contain employment data. Anyone who claims a school's employment numbers are "validated against the 509 PDF" is mistaken — the PDF has no such section. Employment's only authoritative source is the ABA's separate employment workbook, which is what we reconcile those fields against. For that reason employment sits outside the PDF audit below, by construction, not by oversight.

The PDF-controlled audit

To check the PDF-sourced fields we ran a controlled audit: the control is the authoritative 509 PDFs themselves, and each report is identified by its URL root plus reference short-name — deliberately independent of the name crosswalk that populates the dataset, so agreement also proves the crosswalk is sound. The audit compares 25,457 individual cells against the PDF they came from.

Field groupCells checkedAgreed with the PDF
Fees1,5961,596
Tuition2,7622,760
Admissions percentiles (LSAT/GPA)16,00816,005
Enrollment & volume — 2017–20253,6973,687
Enrollment & volume — 2011–20161,3941,392
Total25,45725,440

The crosswalk audited clean: 0 of 210 school mappings diverged. The 17 cells that did not match are not data errors — every one is a documented artifact of the audit's own block-parser, and the shipped value is the correct one:

Read the table as counts, not as a slogan. On the PDF-sourced fields the dataset matches the source documents cell-for-cell except for seventeen places where the audit tool, not the data, rounds or misreads. There is no category of known, uncorrected data error remaining in the PDF-sourced fields.

A second layer: reconciliation across every metric

The PDF audit checks that a value matches its source document. A second, independent pass checks whether the values are internally consistent — across 94 metrics, 218 schools and 2011–2025 — by testing the relationships the numbers must obey: percentiles that must stay in order, shares that must reach 100, sub-buckets that must add up to their total, and single-year reversals. This is also where we scrutinize the fields the PDF audit doesn't touch, including employment composition and demographics.

The rule here is the opposite of silent cleanup: every anomaly is flagged for human review and never auto-corrected. A flag is not proof of error. The latest pass raised 128 flags after de-duplication — and most are real reporting events or structural features of the ABA forms, not defects. We keep them visible rather than smooth them away; the full log is public in validation-report.md.

SeverityWhat it means
logicalA hard contradiction (A > B where that is impossible). Highest priority.
bounds / orderingA value outside its valid range, or a percentile out of order — usually a parsing or column-mapping check.
reconcileA group that should sum or match is off — often structural, verified case by case.
spikeA single-year reversal; kept as REAL when sibling buckets absorb it, flagged for review when they don't.

What the flags actually are, in order of how common they are:

A short tail of flags points to genuine open questions we name rather than explain away. Four 2025 clinic entries report more seats filled than available: the figures match what the school disclosed — the University of Pennsylvania's Carey Law shows 140 filled against 132 available, with the same pattern at UNLV, Washburn and Widener Delaware — but the mechanism isn't recoverable from the form. It could be a mid-year clinic expansion, a course-selection or graduation-placement adjustment, or a reporting-input quirk. We treat it as a data-input question that can't be resolved without contacting the school directly, so the values stay exactly as disclosed and flagged rather than guessed at or quietly changed. One grant-mix reversal — Columbia's 2021 share of students on a less-than-half-tuition grant, which reads near 80% for that single year — is likewise kept as reported. We would rather show you an open question than paper over it.

What we corrected, and why

1. The modern enrollment off-by-one

This was the largest single fix. For 2018–2025, the curated JD-enrollment series had been aligned to the prior academic year's population (the denominator the grants section uses) rather than the present-year "J.D. Enrollment as of October [Y]" census on the front of the report. We re-sourced every modern enrollment cell from that present-year census — 1,383 cells across 201 schools — and confirmed it against the raw PDF (e.g. CUNY 2020 = 672) and an independent PDF-verified dataset (99.6% match, zero regressions). A build guard now fails the build if any 2018+ enrollment cell drifts off the present-year census, so the error cannot silently return.

2. The 2026-06-27 enrollment re-shift

Closely related: modern enrollment had been normalized to comp[Y−1] to line up with the grants section's denominator. It is now sourced from comp[Y] — the year-Y report's JD-enrollment grand total — which is the convention the live site uses and which matches the 2011–2017 regime. The re-shift touched 1,552 cells and filled no blanks; it only moved values onto the correct year.

3. Tuition reconciliation

The raw tuition record carries recurring, mechanical errors: a stray apostrophe that zeroes a cell, per-term figures reported where the annual figure belongs, and cells entered at 2× or ½. Each is reconciled against the master workbook and annualized to a consistent basis. The full field guide is its own piece: The holes in 15 years of 509 tuition data.

4. Residual PDF-verified corrections

After the systematic passes, a short tail of high-confidence, individually PDF-verified cells remained — applications/offers, a few LSAT/GPA and faculty counts, and two enrollment cells (St. Thomas (Miami) 2017, Missouri-Kansas City 2025). These were applied cell-by-cell. Every adjudicated change, with its before/after value and source, is logged in the public corrections ledger.

5. The crosswalk

Mapping two hundred shifting school names — mergers, renames, campus splits — to stable identities is where silent corruption usually hides. Because the audit keys on URL root independent of the crosswalk and still agreed, the mapping is validated as a side effect: 0 of 210 divergent.

Which year each field actually describes

A 509 report has one year on its cover, but the fields inside describe different moments — we keep each one faithful to how the ABA structures the report rather than forcing them onto a single year. This is essential to reading a school's page correctly: the entering-class medians and the bar-passage number on the same row are not about the same cohort.

FieldYear it describes (in a year-Y report)
Enrollment, demographics, admissions, LSAT/GPA & GRE/JD-NEXT percentiles, faculty, tuition, feesReport year Y — the Fall-Y census and Fall-Y matriculating class
Scholarships & grants — award-distribution shares (schol_*) and grant-amount dollar percentiles (grant_*)Prior academic year — the Y−1→Y awards
Bar passage (bar, bar_2yr)Prior graduating class — the class that graduated in spring Y−1
Employment (emp_*)Prior graduating class — class-of-(Y−1) outcomes, from the separate employment workbook

The newer admissions signals: GRE, JD-NEXT, and grant amounts

Three field groups are recent additions to the 509 form, and the dataset carries them exactly as the PDFs report them — sparse where the reporting is sparse, never filled in. They come from the same 509 PDF sections as the rest of the admissions and grants data (the First-Year Class table and the Grants & Scholarships table), but because they are new and thinly reported they sit outside the headline LSAT/GPA percentile audit rather than inside it.

Coverage

The audit draws on 3,009 PDF reports across the fifteen years 2011–2025, roughly 195–210 reports per year. Coverage is a function of how many schools were ABA-accredited and reporting in a given year, which falls as schools close or merge; blank recent-year cells reflect reports the ABA has not yet released, not data we are missing.

What this does to the data

Net effect: a reader looking at a school today sees admissions, tuition, and fees that match the source PDFs cell-for-cell; enrollment that reads the correct present-year October census instead of lagging a year; tuition normalized to a single annual basis; and bar passage tied to the right graduating cohort. Employment is reconciled against the only source that carries it. The corrections moved the data measurably closer to the documents of record without inventing a single value — every fix is a re-sourcing or a re-alignment, never a fabrication, and every blank stays blank.

Limits we keep visible

Exhibit 509's dataset is derived from ABA annual questionnaire compilation spreadsheets publicly released by the ABA under its mandatory disclosure program. No data was scraped from abarequireddisclosures.org, data was manually collected. The underlying facts are not subject to copyright per Feist Publications, Inc. v. Rural Telephone Service Co., 499 U.S. 340 (1991). Data sourced from ABA Standard 509 required disclosures, submitted by law schools. Data sourced from law school self-reported disclosures, accuracy not guaranteed, raw data may have errors or omissions. State attorney-salary context from U.S. BLS OEWS 2024. Methodology: /methodology.html.
Exhibit 509 by 509αNewsletterTermsPrivacyAccessibilitySitemapLast synced June 26, 2026 · v1.94.80