What Are Data Expectations? A Guide For Prosecutors

5 min readMay 25, 2023
Photo by Chris Ried on Unsplash

This is the second post in a series describing Justice Innovation Lab’s process for evaluating data quality with our prosecutorial partners. (The first post can be found here.) This post discusses JIL’s use of “data expectations” when working with a jurisdiction to improve and ensure data quality. With better data, offices are able to adjust practices to improve public safety and ensure that all parties are provided access to justice.

At a technical level, data expectations are a set of logical rules that dictate what data should be entered for various real-world scenarios. We began using this term based on our early use of the python package great expectations, which we initially used to define checks for data quality. For instance, an office might decide that a case is closed once there is a final disposition in the case, even if there is subsequent activity on the case such as sentencing. Expectations also include data definitions. This entails reviewing all of the possible options for drop down menus in the Case Management System (CMS) and defining what each entry means and when it should be used.¹

Setting data expectations is a necessary step for improving data quality, but also helps with cleaning historical data² and should lead to a simpler, more efficient data entry process for all staff. Some common challenges when setting data expectations include:

  • Defining what counts as a “case”
  • Assigning criminal statute charges to different offense types in order to aggregate charges to meaningful categories
  • Limiting and defining different types of dismissals based on different, meaningful situations

The setting of these expectations addresses issues with data completeness and consistency and makes it more likely that the data is correct.

During our meetings with a jurisdiction, in addition to asking an extensive list of data defining questions, we keep a running document — a single Google doc with agendas from each of our meetings — from which we extract a jurisdiction’s data expectations. Clear data expectations can be written up as code by a programmer and used to test data, looking for violations of the expectation. At JIL, we use these expectations to identify cases that violate expectations and provide these back to the jurisdiction so they can quickly address data entry errors. These expectations also form the basis of a future data entry manual for a jurisdiction.

A side note: Many of these concepts are prevalent in other social sciences and especially in international development. Many of the best resources for data collection and analysis can be found at the World Bank and JPAL.

What Is A Data Entry Manual And How Does It Help?

Data entry manuals clearly spell out what data should be entered, when to enter it, and by whom. They often consist of screenshots of the CMS with clear directions and a longer reference manual that defines terms. A data entry manual is a good document for: (1) training new staff, (2) assigning clear responsibility for data entry, and (3) acting as a reference manual for staff when they are unsure what to enter in a given case.

Like all manuals though, even good ones are mostly a safety valve — referred to only in the case of emergency and if the user remembers it exists. Most staff will not use the manual on a daily basis and office’s are prone to letting manuals fall into neglect, sitting in desks or on a SharePoint. Creating the manual is a momentum builder towards improving data quality. Great entry manuals — even if infrequently used — are usually maintained by more than one person in the office and are regularly updated based on new situations and suggestions from the office as a whole. Keeping a data entry manual up to date is part of a process of creating a data-informed culture in an office.

To have an accurate and updated data entry manual, an office needs to identify a diverse group of staff to own the document and that regularly meets to review and improve the document. In addition, office leadership needs to use the data for management and discuss with the group that owns the document necessary changes for leadership. This regular feedback loop holds people accountable and naturally leads an office to focusing on the questions they are most interested in and defining the data in such a way that those questions can be answered. Often when an office first starts using data, the data they have cannot answer questions to the level of desired specificity, but in the process of setting data expectations and assigning responsibility, an office defines the level of data specificity needed.

A word of warning though, specificity and documentation always need to be measured against the burden of entering data. Often offices end up with significant bloat and ill-defined data fields because they attempt to create idiosyncratic data entry options that become outdated. This can be avoided through clear direction from leadership as to what they would like to track and discussions with staff as to why that is the question and what the burden is in entering the necessary data. A general principle to follow is to keep data entry to a minimum and to identify and eliminate any redundant data entry — this includes connecting data directly with other agencies that already enter that data such as the court or police. Direct database connections is the “gold standard” method of ensuring data quality. To further improve data entry, the office should have timely and regular feedback on data entry errors and make data actionable in the office.

¹ All of these choices were likely originally chosen by the jurisdiction when picking a CMS, but they change over time and become outdated. Here is a guide to reviewing CMS options that is a helpful technical document if an office is looking to change or reviewing their current CMS.

² Cleaning historical data includes both finding and fixing factually inaccurate data e.g. the disposition of a case was entered incorrectly as well as properly aggregating and labeling data e.g. labeling cases as involving violence or as being closed.

By: Rory Pulvino, Justice Innovation Lab Director of Analytics

For more information about Justice Innovation Lab, visit www.JusticeInnovationLab.org.

--

--

Justice Innovation Lab
Justice Innovation Lab

Written by Justice Innovation Lab

Justice Innovation Lab builds data-informed, community-rooted solutions for a more fair and effective justice system.

No responses yet

  翻译: