Automated Narrative Extraction from Administrative Records
Abstract
The U.S. Probation and Pretrial Services Office staff produce billions of pages of information on defendants’ and offenders’ profile and conduct. While it is critical for probation officers and district chiefs to have up-to-date knowledge on their clients to better assist and reduce risk of recidivism, the data are often stored in narrative texts in multiple large documents. As a result, these records remain mostly out of reach without the use of painstaking manual review. This paper describes an analytic prototype developed to automatically acquire structured information from natural language text in probation office documents through the application of PDF content extraction, text mining, and language analytics. Since serious mental illness is very prevalent in the U.S. corrections system, the first phase of the project focused on extracting information and constructing timelines from narrative text regarding the Charles E. Horowitz The MITRE Corporation McLean, VA, USA chorowitz@mitre.org Stacy J. Petersen The MITRE Corporation McLean, VA, USA spetersen@mitre.org history have allowed the probation office to have a better understanding of their client population and to perform analyses that were previously unavailable to the organization. This technical approach can be applied across organizations, legal institutions, clinical administrations, and government agencies that maintain large amounts of information in the form of free text narratives. 1 Introduction The U.S. Probation and Pretrial Services Office (PPSO) staff supervise more than 300,000 people a year and collect and produce billions of pages of information on defendants’ and offenders’ profile and conduct, as well as on the strategies and defendants’ mental health conditions, substance use and treatment history.
Public released
yes
External link:
Download Document
(if available)