Data Extraction Process
Extracting electronic health record (EHR) data for your research is a process. This is because clinical data entered into the more user-friendly Mi-Chart (the patient-centric EHR system) is transferred and stored in the much less user-friendly Clarity (an Oracle relational database that requires specialized expertise and certification to use) in the health system data warehouse. To retrieve data from Clarity, extractors (programmers with specialized expertise in Clarity) need a clear understanding of the data being requested and time to locate and curate it for use.
We at PEACH are here to help! But we want to set expectations clearly by describing key components of the data extraction process and your role in that process. Please note that while sequential, the following components can, and often do, involve iteration.
1. Entering the queue: Hubs like PEACH that facilitate data extraction keep queues for extraction work. Predicting the progression of a project through a queue is an imperfect science, as some projects at the front can involve unforeseen complexity and obstacles that delay completion and others may be completed very quickly. We try to provide approximate deadlines and updates on your project’s status as we progress through our queue.
2. Specification: A foundational task once you are in the queue is drafting and revising a PEACH Data Request Form through meetings and email communication with PEACH staff. The goal is to translate your proposed project into the exact data elements that can be extracted from Clarity. Clearly defined data elements help expedite this process, e.g.:
Diagnoses (i.e., ICD codes) and procedures (CPT codes)
Time parameters
Specific units or locations
3. Investigation: Once an extractor has your preliminary PEACH Data Request Form, they can start to work on locating data elements in Clarity. To do this, extractors rely on prior knowledge, search available Clarity data dictionary and training manuals, and conduct chart review (looking at a given data point in MiChart first and then Clarity second to determine how data on the front end shows up on the back end). This process can take significant time if the extractor is not already familiar with the data type and/or cannot consult with extractors who can share coding or relevant information. It also may lead to more questions and require greater specificity from you about the data you want!
4. Drafting and initiating preliminary data pull: Once the data are located and the Data Request Form finalized, the extractor can create SQL query (structured query language, i.e., “code”) to extract data from Clarity in the form of a cohort table and additional target tables (e.g., all procedures in each time parameter for each patient in a cohort). The purpose of a preliminary data pull is to provide data to Project Leads such as yourself for vetting, including potentially initial counts (e.g., number of procedures identified within a time parameter) and comma-separated value (.csv) files that Project Leads can chart review to ensure validity of data being extracted.
5. Extracting and transforming data: Once the preliminary data pull has been vetted and any additional issues accounted for in the SQL query, the extractor can pull the full dataset. As a part of this process, the data need to be transformed to ensure cohort and target tables are pulled in a format that is functional for you as Project Lead and for data analysts. If a sample of these data have not yet been validated via chart review, it is critical to do so at this stage to ensure validity.
6. Re-work: In some cases, once analyses are under way or complete, Project Leads or Quality Improvement Specialists might identify new issues with the data or additional data elements they would like to extract. Requesting further data extraction (“re-work”) is not as simple as pushing a button. Depending on the nature of the request, it can take significant time to again pull data and ensure it is consistent with prior data extraction parameters. Projects must re-enter the queue. Our goal is to specify requests and validate data in such a way that we eliminate as much re-work as possible, but when it’s required, we are here to help!