Lab Instructions

Lab Instructions for Clinical Ontology Tokenization

Prerequisites

OpenRouter API Key You will be using the free models available on OpenRouter for this exercise.
1. Sign up for OpenRouter
2. Go to Profile (Top Right corner) --> Keys
3. Click on Create Key
4. Fill Up Name, Credit Limit($5) : Create Key
5. Copy the Key
Updated Code and Environment
1. Checked out Lab Repo from GitHub CS595 Lab RepoStep
2. Pull the code to get the latest updates for GitHub Repo
3. Python version 3.10 or later
4. A Python virtual environment to link and use for the project
5. Do Git Pull on lab repo to make sure you have the latest updates
Setup and Verify LOF Services
1. Open <Project Root>
2. Activate the python virtual environment
3. Go to lof folder
4. Edit .env file Update client_id and client_secret values with the credentials you have received
  client_id= client_secret=
5. Install Requirements pip install -r requirements.txt
6. Run services.py
  1. You should see the message : LoF Services verified successfully
7. Possible Error Messages and resolution:
  1. Missing or Incorrect client_id
    Failed to get LoF auth token: 400 : {"error":"Invalid client ID"}
    Resolution: Update correct client_id in .env file
  2. Missing or Incorrect client_secret
    Failed to get LoF auth token: 401 : {"error":"Unauthorized: Client authentication failed","status":401}
    Resolution: Update correct client_secret in .env file

In this lab we will be

Retrieve tokens for two sample notes
1. Using google/gemini-2.0-flash-lite-preview-02-05:free and IMO
Compare the tokens from both and display the results in a table

Open <Project Root>
Activate the python virtual environment
Go to /labs/tokenization
Install Requirements pip install -r requirements.txt
Edit medical_note_tokenizer.py
1. Configure OPENROUTER_API_KEY (Created in Prerequisites)

Understand the prompt and sample response structure in constants.py
Understand the below code blocks in medical_note_tokenizer.py
1. Tokenizers: OpenRouterTokenizer and IMOTokenizer
2. TokenizationResult
Implement the code blocks in medical_note_tokenizer.py
1. process_entity_codes
2. display_comparison
Run the Tokenizer
```
streamlit run medical_note_tokenizer.py
```
1. This launches the streamlit Tokenizer web application
2. Select sample 1.txt from sample_notes folder
3. Click on Tokenize and wait for results
4. Compare the results from OpenRouter and IMO. Note down your observations.
  1. Difference in Assertion Status
  2. Difference in codes captured (Ignore imo:<code>. Compare for others like SNOMED, ICD10, ICD9, LOINC, RxNORM, CPT etc..)
    In case of differences search (https://atlas-demo.ohdsi.org/) for codes given by OpenRouter and IMO, note down your observations on what's difference of representation. As showcased in the example below the SNOMED code given by OpenRouter gemini model doesn't seem to be right. We have the got the codes for body structure instead of a problem/condition.
    Prepare a report with these observations

At times the OpenRouter response may be incomplete
1. This will result in json.decoder.JSONDecodeError.
2. Simply rerun if this is for OpenRouter.
3. This should not happen for IMO
At times OpenRouter response may not be in the same format as instructed in TOKEN_PROMPT
1. You may see errors like
  1. OpenRouter API Error: list indices must be integers or slices, not str
  2. KeyError 'entities'
2. Simply rerun if this is for OpenRouter.
3. If the error is persistent, you may need to adjust TOKEN_PROMPT to give the response as per instructed JSON format

Create a zip file with the below submission items and submit one zip file.

Short Report (1–2 pages PDF)
1. Document difficulties or errors you encountered, and how you resolved them.
2. Document your observations on using an LLM to tokenize/code and using IMO.
Observations on code differences for sample_1.txt and sample_2.txt
Screenshots of token listing for sample_1.txt and sample_2.txt
CSV download, from the Tokens Table, for sample_1.txt and sample_2.txt
1. You can mouse-hover or expand the table to see the download option

Last updated 4 months ago