Lab Instructions
OMOP Lab Instructions
Prerequisites
Updated Code and Environment
Checked out Lab Repo from GitHub CS595 Lab RepoStep
Pull the code to get the latest updates for GitHub Repo
Python version 3.10 or later
A Python virtual environment to link and use for the project
Do Git Pull on lab repo to make sure you have the latest updates
R Environment Set up the R environment and verify the installation.
Patient Data and JDBC connector
Create OMOP folder
Download Synthetic Patients
Move the zip file to OMOP folder
Extract the zip file
Download JDBC jar
OMOP DB Up and Running
Start CCD Services
Connect to AODB via PgAdmin
Check the OMOP data is available in OMOP Database - CDM54 Schema
SELECT count(1) FROM cdm54.concept;
OMOP Analytics Schemas Create the below mentioned analytics schemas within omop DB
results
webapi
temp
synthea
Instructions
This is a resource intensive process.
Close any IDEs or tools which are not required
Load Synthea CSV data to OMOP CDM
Open and open ETL-Synthea.R from <Project ROOT>/labs/omop
You may get a pop up to install 'DatabaseConnector', 'devtools'.
Click on Install and wait for installation to complete.
Path updates
MacOS
Update the path to pathToDriver
Update the path to syntheaFileLoc
Windows
Update server = "localhost/omop"
Possible Errors: database name not included in server string but is required for PostgreSQL. Please specify server as <host>/<database> Error in rJava::.jcall(jdbcDriver, "Ljava/sql/Connection;", "connect", : org.postgresql.util.PSQLException: FATAL: password authentication failed for user "postgres"
If localhost is not working use "0.0.0.0\omop"
Update the path to pathToDriver (use \\ as path seprator)
Update the path to syntheaFileLoc (use \\ as path seprator)
Running the script
Place your cursor on the first line of script
Run the script instruction by instruction by waiting for the instruction to complete
Some instructions related to data loading takes time.
Please wait patiently for each instruction to complete before running the next one.
Debugging
Timeout During loading data
Duplicate Key Value errors or Unique constraint error
Recovering
Stop the CCD Service
Take a backup of postgres_data folder
Delete the postgres_data folder
Repopulate the DB
Download the DB Zip file
Copy the Zip file to <CS595 Lab Folder>/labs/ccd folder
Extract the Zip file
Restart the CCD service
Start running the script from first line
Refer to ETL-Synthea-Success-Run.log file in Sample Logs
Capture Results for the below queries
DO $$ DECLARE r RECORD; cnt BIGINT; BEGIN FOR r IN SELECT tablename FROM pg_tables WHERE schemaname = 'synthea' LOOP EXECUTE format('SELECT COUNT(*) FROM synthea.%I', r.tablename) INTO STRICT cnt; RAISE NOTICE 'Table: %, Rows: %', r.tablename, cnt; END LOOP; END $$;
DO $$ DECLARE r RECORD; cnt BIGINT; BEGIN FOR r IN SELECT tablename FROM pg_tables WHERE schemaname = 'cdm54' LOOP EXECUTE format('SELECT COUNT(*) FROM cdm54.%I', r.tablename) INTO STRICT cnt; RAISE NOTICE 'Table: %, Rows: %', r.tablename, cnt; END LOOP; END $$;
Achilles ACHILLES is a software tool that provides for characterization and visualization of a database conforming to the CDM. It can also be a critical resource to evaluate the composition of CDM databases in a network. ACHILLES is an R package, and produces reports based on the summary data it generates in the “Data Sources” function of ATLAS.
Populate Results Schema
Go to PGAdmin
Expand schema and right click on results schema
Open QueryTool
Open File results_schema.sql from <Project ROOT>/labs/omop
Run the SQL from begining
This will take few minutes.
Please wait patiently for all queries to run.
Open R Studio and open achilles.R from <Project ROOT>/labs/omop
Path updates
MacOS
Update the path to pathToDriver
Windows
Update server = "localhost/omop"
Possible Errors: database name not included in server string but is required for PostgreSQL. Please specify server as <host>/<database> Error in rJava::.jcall(jdbcDriver, "Ljava/sql/Connection;", "connect", : org.postgresql.util.PSQLException: FATAL: password authentication failed for user "postgres"
If localhost is not working use "0.0.0.0/omop"
Update the path to pathToDriver (use \\ as path seprator)
Run instructions one by one
Ignore the below error that you see while running the last instruction
Achilles::achilles An error occurred while the 'DatabaseConnector' package was updating the RStudio Connections pane: Error in NULL: host must be a single element of type 'character' If necessary, these warnings can be squelched by setting `options(rstudio.connectionObserver.errorsSuppressed = TRUE)`.
Populate Concept Counts
Go to PGAdmin
Expand schema and right click on results schema
Open QueryTool
Open File concept_counts.sql from <Project ROOT>/labs/omop
Run the SQL from begining
This will take few minutes.
Please wait patiently for all queries to run.
Capture Counts
select count(*) achilles_results; select count(*) achilles_results_dist; select count(*) achilles_analysis;
For more information on Achilles refer to:
Data Quality Dashboard DATA QUALITY DASHBOARD applies a harmonized data quality assessment terminology to data that has been standardized in the OMOP Common Data Model. Where ACHILLES runs characterization analyses to provide an overall visual understanding of a CDM instance, the DQD goes table by table and field by field to quantify the number of records in a CDM that do not conform to the given specifications. In all, over 1,500 checks are performed, each one organized into the Kahn framework. For each check, the result is compared to a threshold whereby a FAIL is considered to be any percentage of violating rows falling above that value.
Open R Studio and open dqd.R from <Project ROOT>/labs/omop
Path updates
MacOS
Update the path to pathToDriver
Windows
Update server = "localhost/omop"
Possible Errors: database name not included in server string but is required for PostgreSQL. Please specify server as <host>/<database> Error in rJava::.jcall(jdbcDriver, "Ljava/sql/Connection;", "connect", : org.postgresql.util.PSQLException: FATAL: password authentication failed for user "postgres"
If localhost is not working use "0.0.0.0/omop"
Update the path to pathToDriver (use \\ as path seprator)
Run instructions one by one
Ignore any errors that you see while running the executeDqChecks
Wait for the entire process to complete
For more information on DQD refer to: https://ohdsi.github.io/DataQualityDashboard/index.html
Analysis
<Project ROOT>/labs/omop
Open omop_analysis.ipynb
Read Through
Example 1 - Age and Gender Distributions
Example 2 - Procedure Count by Age Group
Implement
Exercise 1 - Condition Prevalence
Exercise 2 - Most Prescribed Drugs
Exercise 3 - Length of Stay
Exercise 4 - Condition-Drug Co-Occurrence
Submission
Create a zip file with the below submission items and submit one zip file.
Short Report (1–2 pages PDF)
Document difficulties or errors you encountered, and how you resolved them.
Document your observations on running the script and queries.
Results for Synthea and CDM54 ETL loading - Step 2
Counts for Achilles from - Step 3
Screenshots of Data Quality Dashboard - Step 4
Data Quality Assessment (Overview)
METADATA
CSV download of Results
Analysis
Updated Code File
Screenshot of Graphs for Exercise 1 to Excercise 4
Sample Logs
Last updated