# Lab Instructions

## Prerequisites

1. **OpenRouter API Key**\
   You will be using the free models available on OpenRouter for this exercise.
   1. Sign up for [OpenRouter](https://openrouter.ai/)
   2. Go to Profile (Top Right corner) --> Keys
   3. Click on Create Key
   4. Fill Up Name,  Credit Limit($5) : Create Key
   5. Copy the Key
2. **Updated Code and Environment**
   1. Checked out Lab Repo from [#github-cs595-lab-repo](https://leap-of-faith-technologies.gitbook.io/cs-595-digital-healthcare-informatics-and-ai/prerequisites#github-cs595-lab-repo "mention")Step
   2. Pull the code to get the latest updates for GitHub Repo
   3. Python version 3.10 or later
   4. A Python virtual environment to link and use for the project
   5. Do Git Pull on lab repo to make sure you have the latest updates
3. **Setup and Verify LOF Services**
   1. Open \<Project Root>
   2. Activate the python virtual environment
   3. Go to `lof` folder
   4. Edit `.env file`\
      Update client\_id and client\_secret values with the credentials you have received

      ```
      client_id=
      client_secret=
      ```
   5. Install Requirements `pip install -r requirements.txt`
   6. Run `services.py`
      1. You should see the message : `LoF Services verified successfully` <br>
   7. Possible Error Messages and resolution:
      1. Missing or Incorrect client\_id
         1. `Failed to get LoF auth token: 400 : {"error":"Invalid client ID"}`
         2. `Resolution: Update correct client_id in .env file`&#x20;
      2. Missing or Incorrect client\_secret
         1. `Failed to get LoF auth token: 401 : {"error":"Unauthorized: Client authentication failed","status":401}`
         2. `Resolution: Update correct client_secret in .env file`&#x20;

## &#x20;Tokenization lab

In this lab we will be&#x20;

1. Retrieve tokens for two sample notes&#x20;
   1. Using google/gemini-2.0-flash-lite-preview-02-05:free and IMO&#x20;
2. Compare the tokens from both and display the results in a table

### Setup Tokenization lab

1. Open \<Project Root>
2. Activate the python virtual environment
3. Go to /labs/tokenization
4. Install Requirements `pip install -r requirements.txt`
5. Edit medical\_note\_tokenizer.py
   1. Configure OPENROUTER\_API\_KEY (Created in [#prerequisites](#prerequisites "mention"))

###

### Instructions

1. Understand the prompt and sample response structure in constants.py
2. Understand the below code blocks in medical\_note\_tokenizer.py
   1. Tokenizers: OpenRouterTokenizer and IMOTokenizer
   2. TokenizationResult
3. Implement the code blocks in medical\_note\_tokenizer.py
   1. process\_entity\_codes
   2. display\_comparison
4. Run the Tokenizer

   ```
   streamlit run medical_note_tokenizer.py
   ```

   1. This launches the streamlit Tokenizer web application
   2. Select sample 1.txt from sample\_notes folder<br>

      <figure><img src="https://1416543717-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsLrPphE1zKqQYozL5Ue3%2Fuploads%2FTDCMlpoOcARBWowamdtT%2Fimage.png?alt=media&#x26;token=ac12cb49-f56c-4f98-aeb2-257408df4265" alt=""><figcaption></figcaption></figure>
   3. Click on Tokenize and wait for results<br>

      <figure><img src="https://1416543717-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsLrPphE1zKqQYozL5Ue3%2Fuploads%2FR937EMRB2YizACYLAIEA%2Fimage.png?alt=media&#x26;token=bedae31d-24c7-4234-bd1c-388d486e50a0" alt=""><figcaption></figcaption></figure>

      <figure><img src="https://1416543717-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsLrPphE1zKqQYozL5Ue3%2Fuploads%2FvOIfOlD0nchkspKtoMpZ%2Fimage.png?alt=media&#x26;token=a78bc74d-b110-4bd8-a340-f9159fc18406" alt=""><figcaption></figcaption></figure>
   4. Compare the results from OpenRouter and IMO. Note down your observations.
      1. Difference in Assertion Status
      2. Difference in codes captured (Ignore imo:\<code>. Compare for others like SNOMED, ICD10, ICD9, LOINC, RxNORM, CPT etc..)
         1. In case of differences search (<https://atlas-demo.ohdsi.org/>) for codes given by OpenRouter and IMO, note down your observations on what's difference of representation. As showcased in the example below the SNOMED code given by OpenRouter gemini model doesn't seem to be right. We have the got the codes for body structure instead of a problem/condition.<br>

            <figure><img src="https://1416543717-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsLrPphE1zKqQYozL5Ue3%2Fuploads%2F5MVh3utuzWqa3Wjunj7l%2Fimage.png?alt=media&#x26;token=3f2ef814-d0e7-4a64-bafa-12b5a8a02a43" alt=""><figcaption></figcaption></figure>

            <figure><img src="https://1416543717-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsLrPphE1zKqQYozL5Ue3%2Fuploads%2FUslUA8Aawlxw7fYU7ybU%2Fimage.png?alt=media&#x26;token=660ec0e4-43f8-4ff4-b650-efbd3eff32ae" alt=""><figcaption></figcaption></figure>

            <figure><img src="https://1416543717-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsLrPphE1zKqQYozL5Ue3%2Fuploads%2FjMbyv6rfwwAtdkdOgIcd%2Fimage.png?alt=media&#x26;token=4ba6f52c-a3d6-4a32-874d-2a4a0a44ca08" alt=""><figcaption></figcaption></figure>
         2. Prepare a report with these observations

## Debugging

1. At times the OpenRouter response may be incomplete
   1. This will result in json.decoder.JSONDecodeError.&#x20;
   2. Simply rerun if this is for OpenRouter.&#x20;
   3. This should not happen for IMO
2. At times OpenRouter response may not be in the same format as instructed in TOKEN\_PROMPT
   1. You may see errors like&#x20;
      1. OpenRouter API Error: list indices must be integers or slices, not str
      2. KeyError 'entities'
   2. Simply rerun if this is for OpenRouter.&#x20;
   3. If the error is persistent, you may need to adjust TOKEN\_PROMPT to give the response as per instructed JSON format

## Submission

**Create a zip file with the below submission items and submit one zip file.**

1. Short Report (1–2 pages PDF)
   1. Document difficulties or errors you encountered, and how you resolved them.
   2. Document your observations on using an LLM to tokenize/code and using IMO.
2. Observations on code differences for sample\_1.txt and sample\_2.txt
3. Screenshots of token listing for sample\_1.txt and sample\_2.txt
4. CSV download, from the Tokens Table, for sample\_1.txt and sample\_2.txt
   1. You can mouse-hover or expand the table to see the download option<br>

      <figure><img src="https://1416543717-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsLrPphE1zKqQYozL5Ue3%2Fuploads%2FtyskP3HmO0z9kQg8YQyQ%2Fimage.png?alt=media&#x26;token=13e3f052-0ef0-4c83-b20c-2c4ae8aba4d0" alt=""><figcaption></figcaption></figure>
