When Big Legal Data Isn’t Big Enough – Limitations in Legal Data Analytics
Author: Robert J. Parnell, CFA, LLB
Click image to open PDF in a browser window:
Executive Summary
The mass harvesting and storage of court records and other legal data provides an opportunity for corporate litigants and their legal counsel to complement decision making with legal data analytics. But without the use of proper statistical methods, the analysis of data can be invalid or misleading.
Legal professionals do not analyze quantitative legal data merely to observe historical data facts, but rather in an effort to draw a meaningful inference about the present, and to make decisions. Although it is widely understood, it is sometimes forgotten, that we cannot go reliably from past data to some present insight because the past is only a sample of what could happen and often it is a very imperfect one.
The central problem is that not all samples of legal data contain sufficient information to be usefully applied to decision making. By the time big data sets are filtered down to the type of matter that is relevant, sample sizes may be too small and measurements may be exposed to potentially large sampling errors. If Big Data becomes ‘small data’, it may in fact be quite useless.
To be of value in real world decisions, legal data analytics must be able to distinguish between the inherent randomness in historical data samples and statistically meaningful legal track records. This necessarily requires the application of inferential statistics.
In this article we provide legal professionals with an introduction to basic inferential statistical methods so that they will be better able to determine when ‘Big Legal Data’ is big enough in practice. The reader is introduced to key concepts at an introductory level and a number of online analytical tools are used to show how counsel can evaluate the statistical merit of their data.
Example analyses illustrate how to quantify the uncertainty in the measurement of judicial decision making, and how to determine if a law firm’s track record is statistically significant relative to its peer group. The results of statistical analyses are presented graphically.
Using basic inferential statistics such as the methods outlined here, legal professionals will be able to interrogate the statistical validity of their data and evaluate the significance of various quantitative legal metrics.
In practice, although the volume of available legal data will sometimes be sufficient to produce statistically meaningful insights, this will not always be the case. While litigants and law firms would no doubt like to use legal data to extract some kind of informational signal from the random noise that is ever-present in data samples, the hard truth is that there will not always be one. Needless to say, it is important for legal professionals to be able to identify when this is the case.
Overall, the quantitative analysis of legal data is much more challenging and error-prone than is generally acknowledged. Although it is appealing to view data analytics as a simple tool, there is a danger of neglecting the science in what is basically data science. The consequences of this can be harmful to decision making. To draw an analogy, legal data analytics without inferential statistics is like legal argument without case law or rules of precedent — it lacks a meaningful point of reference and authority.
If we are going to examine legal decisions using the quantitative analysis of data, we cannot go halfway. We must make an allowance for the role of inferential statistics – only then will we know if the data have anything to say. With the use of appropriate statistical methods and careful attention to the complexities of data analytics, corporate litigants and Big Law can benefit from this new frontier in Big Data.
Keywords and phrases: Legal data analytics, big data, data analytics, statistics, confidence interval, hypothesis test, data science, data mining, legal tech, analytics, predictive analytics, law, Big Law, cognitive bias, data visualization.