For a long time now, lawyers and any serious law students have been bound to paid services like LexusNexis for access to case law, but that is slowly changing. Carl Malamud has posted free electronic copies of every U.S. Supreme Court decision and Court of Appeals ruling since 1950, 1.8 million rulings in all, online for free.
He has a history of creating public access databases on the net when the provider of the data has failed to do so or has licensed itsdata only to a private company that provides it only for pay. His technique is to build a high-profile demonstration project with the intent of getting the actual holder of the public domain information
(usually a government agency) to take over the job.
Carl’s done this in the past with the SEC’s Edgar database, with the Smithsonian, and with Congressional hearings.
But now, he’s set his eyes on the crown jewels of public data available for profit: the body of Federal case law that is the foundation of multi-billion dollar businesses such as WestLaw.
In a site that just went live tonight, Carl has begun publishing the full text of legal opinions, starting back in 1880, and outlined a process that will eventually lead to a full database of US Case law. Carl writes:
1. The short-term goal is the creation of an
unencumbered full-text repository of the Federal Reporter, the Federal
Supplement, and the Federal Appendix.
2. The medium-term goal is the creation of an unencumbered full-text repository of all state and federal cases and codes.
This is clearly public data, but as Carl wrote in a letter to West Publishing that accompanies the first data release on his site, asking for clarification about what information West considers proprietary versus public domain:
In looking through the court decisions of a decade ago
where West and your commercial competitors fought over the right to
re-publish case law, it seems fairly clear that a large part of the
publication stream is tightly interwoven into the very substance of the
operation of the courts, with West serving as the either contractual or
de-facto sole vendor reporting on behalf of the court.”
Carl’s letter goes on to ask West to release the full text of the Federal Reporter, Federal Supplement, and Federal Appendix. He says:
You have already received rich rewards for the initial
publication of these documents, and releasing this data back into the
public domain would significantly grow your market and thus be an
investment in your future.
Elsewhere in the letter, he writes:
We wish to make this information available to a population that today does not have access to the
decisions of our federal and state courts because they are not commercial subscribers to one of the
handful of services such as your award-winning Westlaw tools. Codes and cases are the very operating
system of our nation of laws, and this system only works if we can all openly read the primary sources.
It is crucial that the public domain data be available for anybody to build upon.
Now, it could be that West will eventually go along. Their real proprietary data isn’t the text of the case law itself so much as it is in their key number system and accompanying summaries, or “headnotes” as well as their value-added tools for searching and managing the voluminous amount of data. But Carl’s project is intended to point out that if they don’t, he’ll be able to make the data public anyway.
(Note: in the last decade, the Federal courts have begun publishing the data on current opinions, and law professor Tim Wu’s AltLaw site provides a full text search engine for those recent cases. But the historical record is much more difficult.)
Carl’s starting point is the “ultrafiche” version of the Federal
Reporter, which West published before the advent of online database
versions. An ultrafiche presents up to 1000 pages on a 4 by 6 inch
transparency, like the one shown below:
Carl has begun enlarging, processing, and publishing the images,
and beginning the process of OCRing them to extract the text. After 87x
enlargement, the test images are quite readable. (Click the image below
to see at full size):
In private email, Carl wrote:
The SEC database was fairly straightforward, taking a couple of
years of hard work. But, getting patents online took 5 years of
drawing lines in the sand and sending shots across the bow. Our
line in the sand here is all state and federal cases and codes, and
I guess our shot across the bow is publishing a 3.6 gbyte tiff file
and announcing our intention to systematically walk through the
5 million or so pages of federal case law.
That’s a big challenge, but with computing power and storage getting
ever cheaper, and with the dedication of volunteers like Carl, it does
indeed seem like a possible project. (After all, when Carl pressured
the SEC to put its Edgar database online in the early 90’s, they said
it would take years and millions of dollars. Carl did it in six weeks,
and operated the database for two years before persuading the SEC to
take it over.)
Via Tim O’Reilly