Lab Instructions

OMOP Lab Instructions

Prerequisites

  1. Updated Code and Environment

    1. Checked out Lab Repo from GitHub CS595 Lab RepoStep

    2. Pull the code to get the latest updates for GitHub Repo

    3. Python version 3.10 or later

    4. A Python virtual environment to link and use for the project

    5. Do Git Pull on lab repo to make sure you have the latest updates

  2. R Environment Set up the R environment and verify the installation.

  3. Patient Data and JDBC connector

    1. Create OMOP folder

    2. Download Synthetic Patients

      1. Move the zip file to OMOP folder

      2. Extract the zip file

    3. Download JDBC jar

  4. OMOP DB Up and Running

    1. Start CCD Services

    2. Connect to AODB via PgAdmin

    3. Check the OMOP data is available in OMOP Database - CDM54 Schema

      SELECT count(1) FROM cdm54.concept;

  5. OMOP Analytics Schemas Create the below mentioned analytics schemas within omop DB

    1. results

    2. webapi

    3. temp

    4. synthea

Instructions

  1. This is a resource intensive process.

    1. Close any IDEs or tools which are not required

  2. Load Synthea CSV data to OMOP CDM

    1. Open and open ETL-Synthea.R from <Project ROOT>/labs/omop

    2. You may get a pop up to install 'DatabaseConnector', 'devtools'.

      1. Click on Install and wait for installation to complete.

      2. Path updates

        1. MacOS

          1. Update the path to pathToDriver

          2. Update the path to syntheaFileLoc

        2. Windows

          1. Update server = "localhost/omop"

            Possible Errors:
            
            database name not included in server string but is required for PostgreSQL. Please specify server as <host>/<database>
            
            Error in rJava::.jcall(jdbcDriver, "Ljava/sql/Connection;", "connect",  : 
              org.postgresql.util.PSQLException: FATAL: password authentication failed for user "postgres"

            1. If localhost is not working use "0.0.0.0\omop"

          2. Update the path to pathToDriver (use \\ as path seprator)

          3. Update the path to syntheaFileLoc (use \\ as path seprator)

      3. Running the script

        1. Place your cursor on the first line of script

        2. Run the script instruction by instruction by waiting for the instruction to complete

        3. Some instructions related to data loading takes time.

        4. Please wait patiently for each instruction to complete before running the next one.

      4. Debugging

        1. Timeout During loading data

        2. Duplicate Key Value errors or Unique constraint error

        3. Recovering

          1. Stop the CCD Service

          2. Take a backup of postgres_data folder

          3. Delete the postgres_data folder

          4. Repopulate the DB

            • Download the DB Zip file

            • Copy the Zip file to <CS595 Lab Folder>/labs/ccd folder

            • Extract the Zip file

          5. Restart the CCD service

          6. Start running the script from first line

      5. Refer to ETL-Synthea-Success-Run.log file in Sample Logs

    3. Capture Results for the below queries

      DO $$
      DECLARE
          r RECORD;
          cnt BIGINT;
      BEGIN
          FOR r IN 
              SELECT tablename FROM pg_tables WHERE schemaname = 'synthea'
          LOOP
              EXECUTE format('SELECT COUNT(*) FROM synthea.%I', r.tablename)
              INTO STRICT cnt;
      
              RAISE NOTICE 'Table: %, Rows: %', r.tablename, cnt;
          END LOOP;
      END $$;
      

      DO $$
      DECLARE
          r RECORD;
          cnt BIGINT;
      BEGIN
          FOR r IN 
              SELECT tablename FROM pg_tables WHERE schemaname = 'cdm54'
          LOOP
              EXECUTE format('SELECT COUNT(*) FROM cdm54.%I', r.tablename)
              INTO STRICT cnt;
      
              RAISE NOTICE 'Table: %, Rows: %', r.tablename, cnt;
          END LOOP;
      END $$;
      

  3. Achilles ACHILLES is a software tool that provides for characterization and visualization of a database conforming to the CDM. It can also be a critical resource to evaluate the composition of CDM databases in a network. ACHILLES is an R package, and produces reports based on the summary data it generates in the “Data Sources” function of ATLAS.

    1. Populate Results Schema

      1. Go to PGAdmin

      2. Expand schema and right click on results schema

      3. Open QueryTool

      4. Open File results_schema.sql from <Project ROOT>/labs/omop

      5. Run the SQL from begining

      6. This will take few minutes.

      7. Please wait patiently for all queries to run.

    2. Open R Studio and open achilles.R from <Project ROOT>/labs/omop

      1. Path updates

        1. MacOS

          1. Update the path to pathToDriver

        2. Windows

          1. Update server = "localhost/omop"

            Possible Errors:
            
            database name not included in server string but is required for PostgreSQL. Please specify server as <host>/<database>
            
            Error in rJava::.jcall(jdbcDriver, "Ljava/sql/Connection;", "connect",  : 
              org.postgresql.util.PSQLException: FATAL: password authentication failed for user "postgres"

          2. If localhost is not working use "0.0.0.0/omop"

          3. Update the path to pathToDriver (use \\ as path seprator)

      2. Run instructions one by one

        1. Ignore the below error that you see while running the last instruction

          Achilles::achilles
          
          An error occurred while the 'DatabaseConnector' package was updating the RStudio Connections pane:
          Error in NULL: host must be a single element of type 'character'
          If necessary, these warnings can be squelched by setting `options(rstudio.connectionObserver.errorsSuppressed = TRUE)`.
    3. Populate Concept Counts

      1. Go to PGAdmin

        1. Expand schema and right click on results schema

        2. Open QueryTool

        3. Open File concept_counts.sql from <Project ROOT>/labs/omop

        4. Run the SQL from begining

        5. This will take few minutes.

        6. Please wait patiently for all queries to run.

    4. Capture Counts

      select count(*) achilles_results;
      select count(*) achilles_results_dist;
      select count(*) achilles_analysis;

  4. Data Quality Dashboard DATA QUALITY DASHBOARD applies a harmonized data quality assessment terminology to data that has been standardized in the OMOP Common Data Model. Where ACHILLES runs characterization analyses to provide an overall visual understanding of a CDM instance, the DQD goes table by table and field by field to quantify the number of records in a CDM that do not conform to the given specifications. In all, over 1,500 checks are performed, each one organized into the Kahn framework. For each check, the result is compared to a threshold whereby a FAIL is considered to be any percentage of violating rows falling above that value.

    1. Open R Studio and open dqd.R from <Project ROOT>/labs/omop

    2. Path updates

      1. MacOS

        1. Update the path to pathToDriver

      2. Windows

        1. Update server = "localhost/omop"

          Possible Errors:
          
          database name not included in server string but is required for PostgreSQL. Please specify server as <host>/<database>
          
          Error in rJava::.jcall(jdbcDriver, "Ljava/sql/Connection;", "connect",  : 
            org.postgresql.util.PSQLException: FATAL: password authentication failed for user "postgres"

        2. If localhost is not working use "0.0.0.0/omop"

        3. Update the path to pathToDriver (use \\ as path seprator)

      3. Run instructions one by one

        1. Ignore any errors that you see while running the executeDqChecks

        2. Wait for the entire process to complete

    3. For more information on DQD refer to: https://ohdsi.github.io/DataQualityDashboard/index.html

  5. Analysis

    1. <Project ROOT>/labs/omop

    2. Open omop_analysis.ipynb

    3. Read Through

      1. Example 1 - Age and Gender Distributions

      2. Example 2 - Procedure Count by Age Group

    4. Implement

      1. Exercise 1 - Condition Prevalence

      2. Exercise 2 - Most Prescribed Drugs

      3. Exercise 3 - Length of Stay

      4. Exercise 4 - Condition-Drug Co-Occurrence

Submission

Create a zip file with the below submission items and submit one zip file.

  1. Short Report (1–2 pages PDF)

    1. Document difficulties or errors you encountered, and how you resolved them.

    2. Document your observations on running the script and queries.

  2. Results for Synthea and CDM54 ETL loading - Step 2

  3. Counts for Achilles from - Step 3

  4. Screenshots of Data Quality Dashboard - Step 4

    1. Data Quality Assessment (Overview)

    2. METADATA

    3. CSV download of Results

  5. Analysis

    1. Updated Code File

    2. Screenshot of Graphs for Exercise 1 to Excercise 4

Sample Logs

Last updated