專案

一般

配置概況

動作

Feature #173

已結束
SC SC

ICD-10 Pipeline 優化 — search_terms 解剖學擴展、Debug 輸出、候選數量提升

Feature #173: ICD-10 Pipeline 優化 — search_terms 解剖學擴展、Debug 輸出、候選數量提升

是由 Sashiba Chou約 1 個月 前加入. 於 約 1 個月 前更新.

狀態:
Closed
優先權:
Normal
被分派者:
開始日期:
2026-02-20
完成日期:
2026-02-20
完成比例:

100%

預估工時:
10:00 小時
耗用工時:

概述

h1. 問題

  1. Stage 1 LLM 產生的 search_terms 幾乎無擴展效果(20/20 測試皆為 on→of 字面替換),口語部位名稱(palm、shin、scalp 等)未映射到 ICD-10
    標準術語(hand、lower leg、head),導致 Stage 2 向量搜尋第二條 query 形同虛設
  2. per_query_k=30 不足,特定部位的正確 ICD-10 碼排在 30 名以外(如 S61.412A 左手掌撕裂傷)
  3. 最終僅輸出 TOP 3 候選,臨床上不夠
  4. Stage 1 prompt 範例與測試案例高度相似,LLM 直接複製範例而忽略真實傷勢
  5. 中文子字串 bug:"伴有異物" in "未伴有異物" 為 True,導致「未伴有異物」碼被誤刪
  6. 缺乏 pipeline 全流程 debug 工具,prompt 調校困難

h1. 變更內容

A. 混合式 search_terms 解剖學擴展(新功能)

  • 新增 _ANATOMY_SYNONYM 對照表(~25 組口語→ICD-10 標準部位映射)
  • 新增 _expand_anatomy_queries() 方法,在 Stage 2 程式化產生額外向量搜尋 query
  • 修改 Stage 1 prompt:指令 #6 明確要求 LLM 做部位映射 + 範例全部改為展示映射效果
  • 雙管齊下:LLM 做對時程式化 query 僅為冗餘(search_codes_multi 按 code 去重);LLM 沒做對時程式化映射保底
  • 測試結果:12/20 情境觸發程式化擴展,LLM search_terms 品質同步大幅提升

B. Stage 1 Prompt 強化

  • HOLISTIC REVIEW 三步驟(STEP A/B/C)取代逐段擷取
  • 新增 Rule 2:MUST extract ALL injuries(防漏)
  • 新增 Rule 7:Do NOT copy examples(防複製)
  • 範例改用不同部位(right sole / left shin / right temple / left calf),展示術語映射

C. Stage 2 程式化過濾強化

  • 新增 4 條規則:D/S suffix、laterality、foreign body(含中文子字串修正)、deep structure
  • 修正中文子字串 bug:"伴有異物" 命中但 "未伴有異物" 也命中時視為「無異物」→保留
  • 新增 removed_log 參數記錄每筆被移除候選的原因
  • 新增 search_codes_multi() 多 query 合併去重(按 code 保留最佳 distance)

D. 候選數量提升

  • per_query_k: 30 → 50
  • 過濾後上限: 30 → 50
  • 最終輸出: TOP 3 → TOP 5

E. Debug MD 輸出

  • 新增 _write_debug_md() 方法,每次 ICD-10 擷取後自動寫入 debug_icd10_output.md
  • 包含:病歷全文、Stage 1 prompt + raw response、Stage 2 raw/removed/filtered candidates、Stage 3 prompt + raw response、最終預測

h1. 影響檔案

  • backend/rag_service.py — 全部變更集中於此

h1. Prompts

h2. stage 1


You are a medical information extractor specializing in Emergency Room trauma notes.

=== TRAUMA NOTE ===
{trauma_note}
=== END OF NOTE ===

=== HOLISTIC REVIEW (CRITICAL) ===
Do NOT extract injuries section-by-section. Follow these steps:

STEP A ??Identify base injuries from Physical Examination (wound type + body location + laterality).
STEP B ??Scan Mechanism of Injury, History, and Symptoms for MODIFIERS that change the diagnosis:
  - Retention: Any evidence of material embedded or retained in the wound (e.g. stone debris, glass, seed, any solid object) ??add "with foreign body" to the injury text.
  - Bite: Injury caused by a bite from any living being (e.g. dog bite, human bite, insect bite) ??reclassify the wound as "open bite wound" instead of "laceration".
  - Crush: Injury from crushing force ??classify as "crush injury".
STEP C ??Merge modifiers into the corresponding base injury. Match each modifier to the CORRECT body region it belongs to.
  Example: PE says "Laceration on right forearm" + Symptoms says "Right arm pain - glass fragments in wound" ??output "laceration with foreign body on right forearm".
  Example: Mechanism says "cat bite" + PE says "Laceration on right ankle" ??output "open bite wound on right ankle".

=== EXTRACTION RULES ===
1. Each item = one distinct injury (one wound = one item, one fracture = one item)
2. You MUST extract ALL injuries listed in Physical Examination ??do NOT skip any body region
3. Each item must include: injury type + anatomical location + laterality (if mentioned)
4. Do NOT include vital signs, patient demographics, or non-injury information
5. Keep each injury description SHORT (under 15 words)
6. For each injury, also provide "search_terms": map the injury to ICD-10 standard anatomy terms.
   - Body part mapping: palm?and, shin/calf?ower leg, temple/scalp/forehead?ead, sole/heel?oot
   - Injury type mapping: laceration??Open wound", bruise??Contusion"
   - Write as a single natural English phrase, NOT comma-separated keywords.
7. Do NOT copy examples below ??generate results from the actual trauma note above

=== OUTPUT FORMAT (JSON only) ===
{"injuries": [
  {"text": "laceration on right sole", "search_terms": "Open wound of right foot"},
  {"text": "abrasion on left shin", "search_terms": "Abrasion of left lower leg"},
  {"text": "contusion with foreign body on right temple", "search_terms": "Contusion with foreign body of head"},
  {"text": "open bite wound on left calf", "search_terms": "Open bite of left lower leg"}
]}

YOUR RESPONSE:

h2. stage 3


  You are an expert ICD-10-CM medical coder working in an Emergency Room setting.

  === MANDATORY RULES ===

  RULE 1 (Initial Encounter ONLY):
  - You MUST select codes ending in 'A' (Initial encounter)
  - REJECT any code ending in 'D' (Subsequent encounter) or 'S' (Sequela)
  - Exception: only if text explicitly says "follow-up", "subsequent visit", or "sequela"

  RULE 2 (Laterality):
  - Match the exact laterality mentioned (left/right)
  - If not specified, use 'unspecified' laterality codes

  RULE 3 (Specificity Matching):
  - If the injury text names a SPECIFIC body part (e.g. "palm"), prefer the code for that exact part over a general parent (e.g. "hand,
  unspecified").
  - If the injury text is VAGUE (e.g. "hand injury"), prefer "unspecified" codes over overly specific ones.

  === SELECTION ===
  - Select the TOP 5 codes ranked by specificity match (1 = best match)
  - If fewer than 5 valid codes exist, return only the valid ones


  TASK

  === TASK ===
  You are given a trauma note and multiple injuries, each with pre-filtered ICD-10-CM code candidates.
  For EACH injury, select the TOP 5 most appropriate codes.

  TRAUMA NOTE:
  {trauma_note}

  INJURIES AND THEIR CANDIDATES:
  Injury: "{injury_text_1}"
  Candidates:
    1. S61.412A: Laceration without foreign body of left hand ...
    2. S61.402A: Unspecified open wound of left hand ...
    ...

  Injury: "{injury_text_2}"
  Candidates:
    1. S01.01XA: Laceration without foreign body of scalp ...
    ...

  === OUTPUT FORMAT (JSON only) ===
  {
    "results": [
      {
        "injury": "",
        "selected": [
          {"rank": 1, "code": "CODE", "confidence": "high"},
          {"rank": 2, "code": "CODE", "confidence": "high"},
          {"rank": 3, "code": "CODE", "confidence": "medium"},
          {"rank": 4, "code": "CODE", "confidence": "low"},
          {"rank": 5, "code": "CODE", "confidence": "low"}
        ]
      }
    ]
  }

  YOUR RESPONSE:

SC 是由 Sashiba Chou約 1 個月 前更新 動作 #1

  • 狀態New 變更為 Closed

補充任務內容

h1. Stage 2 四規則過濾

. Stage 2 程式化過濾僅有 D/S suffix 和 foreign body 兩條規則,缺少 laterality 和 deep structure 過濾,大量無關候選碼進入 Stage 3 浪費 LLM token
並干擾排序
. Foreign body 檢查採關鍵字比對(glass/metal/wood...),無法偵測語意性描述(如 "stone debris inside
wound"),且中文「伴有異物」子字串匹配「未伴有異物」產生誤判

  A. Stage 2 程式化過濾:2 規則 → 4 規則

┌─────────────┬────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 規則 │ 舊版 │ 新版 │
├─────────────┼────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Rule 1: D/S │ ✅ 已有 │ ✅ 保留 │
│ suffix │ │ │
├─────────────┼────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Rule 2: │ ❌ 無 │ ✅ 新增 — 比對 injury 的 left/right 與候選碼描述的 left/right,移除反向側性碼 │
│ Laterality │ │ │
├─────────────┼────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Rule 3: │ ✅ │ ✅ 重寫 — 改為信任 Stage 1 LLM 語意輸出(HOLISTIC REVIEW STEP B),檢查 "foreign body" in │
│ Foreign │ 關鍵字比對 │ injury_text;修正中文子字串 bug("伴有異物" in "未伴有異物" → 加 guard 條件) │
│ body │ │ │
├─────────────┼────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Rule 4: │ │ ✅ 新增 — 若 injury │
│ Deep │ ❌ 無 │ 未提及深層結構(artery/vein/nerve/tendon/muscle),移除含這些關鍵字的候選碼;支援中英文雙語比對(EN: │
│ structure │ │ artery/vein/nerve/tendon/muscle/vessel/arch、ZH: 動脈/靜脈/神經/肌腱/肌肉) │
└─────────────┴────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

新增 removed_log 參數記錄每筆被移除候選的原因(供 debug 輸出使用)。

B. 混合式 search_terms 解剖學擴展(新功能)

  • 新增 _ANATOMY_SYNONYM 對照表(~25 組口語→ICD-10 標準部位映射)
  • 新增 _expand_anatomy_queries() 方法,在 Stage 2 程式化產生額外向量搜尋 query
  • 修改 Stage 1 prompt:指令 #6 明確要求 LLM 做部位映射 + 範例全部改為展示映射效果
  • 雙管齊下:LLM 做對時程式化 query 僅為冗餘(search_codes_multi 按 code 去重);LLM 沒做對時程式化映射保底
  • 測試結果:12/20 情境觸發程式化擴展,LLM search_terms 品質同步大幅提升

C. Stage 1 Prompt 強化

  • HOLISTIC REVIEW 三步驟(STEP A/B/C)取代逐段擷取,跨段落交叉比對 PE + Mechanism + Symptoms
  • 新增 Rule 2:MUST extract ALL injuries(防漏)
  • 新增 Rule 7:Do NOT copy examples(防複製)
  • 範例改用不同部位(right sole / left shin / right temple / left calf),展示術語映射

D. Stage 2 向量搜尋架構改善

  • 新增 search_codes_multi() 多 query 合併去重(按 code 保留最佳 distance)
  • 支援三層 query:injury_text 原文 + LLM search_terms + 程式化同義詞擴展

E. 候選數量提升

  • per_query_k: 30 → 50
  • 過濾後上限: 30 → 50
  • 最終輸出: TOP 3 → TOP 5

F. Debug MD 輸出(新功能)

  • 新增 _write_debug_md() 方法,每次 ICD-10 擷取後自動寫入 debug_icd10_output.md
  • 包含:病歷全文、Stage 1 prompt + raw response、Stage 2 raw/removed/filtered candidates、Stage 3 prompt + raw response、最終預測

h1. 影響檔案

  • backend/rag_service.py — 全部變更集中於此
動作

匯出至 PDF Atom