Feature #173: ICD-10 Pipeline 優化 — search_terms 解剖學擴展、Debug 輸出、候選數量提升 - ER_note_NG - Mission Center

動作

複製連結

Feature #173

已結束

SC SC

ICD-10 Pipeline 優化 — search_terms 解剖學擴展、Debug 輸出、候選數量提升

Feature #173: ICD-10 Pipeline 優化 — search_terms 解剖學擴展、Debug 輸出、候選數量提升

是由 Sashiba Chou 於 3 個月前加入. 於 3 個月前更新.

狀態:

Closed

優先權:

Normal

被分派者:

Sashiba Chou

開始日期:

2026-02-20

完成日期:

2026-02-20

完成比例:

100%

預估工時:

10:00 小時

耗用工時:

10:00 小時

概述

h1. 問題

Stage 1 LLM 產生的 search_terms 幾乎無擴展效果（20/20 測試皆為 on→of 字面替換），口語部位名稱（palm、shin、scalp 等）未映射到 ICD-10
標準術語（hand、lower leg、head），導致 Stage 2 向量搜尋第二條 query 形同虛設
per_query_k=30 不足，特定部位的正確 ICD-10 碼排在 30 名以外（如 S61.412A 左手掌撕裂傷）
最終僅輸出 TOP 3 候選，臨床上不夠
Stage 1 prompt 範例與測試案例高度相似，LLM 直接複製範例而忽略真實傷勢
中文子字串 bug："伴有異物" in "未伴有異物" 為 True，導致「未伴有異物」碼被誤刪
缺乏 pipeline 全流程 debug 工具，prompt 調校困難

h1. 變更內容

A. 混合式 search_terms 解剖學擴展（新功能）

新增 _ANATOMY_SYNONYM 對照表（~25 組口語→ICD-10 標準部位映射）
新增 _expand_anatomy_queries() 方法，在 Stage 2 程式化產生額外向量搜尋 query
修改 Stage 1 prompt：指令 #6 明確要求 LLM 做部位映射 + 範例全部改為展示映射效果
雙管齊下：LLM 做對時程式化 query 僅為冗餘（search_codes_multi 按 code 去重）；LLM 沒做對時程式化映射保底
測試結果：12/20 情境觸發程式化擴展，LLM search_terms 品質同步大幅提升

B. Stage 1 Prompt 強化

HOLISTIC REVIEW 三步驟（STEP A/B/C）取代逐段擷取
新增 Rule 2：MUST extract ALL injuries（防漏）
新增 Rule 7：Do NOT copy examples（防複製）
範例改用不同部位（right sole / left shin / right temple / left calf），展示術語映射

C. Stage 2 程式化過濾強化

新增 4 條規則：D/S suffix、laterality、foreign body（含中文子字串修正）、deep structure
修正中文子字串 bug："伴有異物" 命中但 "未伴有異物" 也命中時視為「無異物」→保留
新增 removed_log 參數記錄每筆被移除候選的原因
新增 search_codes_multi() 多 query 合併去重（按 code 保留最佳 distance）

D. 候選數量提升

per_query_k: 30 → 50
過濾後上限: 30 → 50
最終輸出: TOP 3 → TOP 5

E. Debug MD 輸出

新增 _write_debug_md() 方法，每次 ICD-10 擷取後自動寫入 debug_icd10_output.md
包含：病歷全文、Stage 1 prompt + raw response、Stage 2 raw/removed/filtered candidates、Stage 3 prompt + raw response、最終預測

h1. 影響檔案

backend/rag_service.py — 全部變更集中於此

h1. Prompts

h2. stage 1


You are a medical information extractor specializing in Emergency Room trauma notes.

=== TRAUMA NOTE ===
{trauma_note}
=== END OF NOTE ===

=== HOLISTIC REVIEW (CRITICAL) ===
Do NOT extract injuries section-by-section. Follow these steps:

STEP A ??Identify base injuries from Physical Examination (wound type + body location + laterality).
STEP B ??Scan Mechanism of Injury, History, and Symptoms for MODIFIERS that change the diagnosis:
  - Retention: Any evidence of material embedded or retained in the wound (e.g. stone debris, glass, seed, any solid object) ??add "with foreign body" to the injury text.
  - Bite: Injury caused by a bite from any living being (e.g. dog bite, human bite, insect bite) ??reclassify the wound as "open bite wound" instead of "laceration".
  - Crush: Injury from crushing force ??classify as "crush injury".
STEP C ??Merge modifiers into the corresponding base injury. Match each modifier to the CORRECT body region it belongs to.
  Example: PE says "Laceration on right forearm" + Symptoms says "Right arm pain - glass fragments in wound" ??output "laceration with foreign body on right forearm".
  Example: Mechanism says "cat bite" + PE says "Laceration on right ankle" ??output "open bite wound on right ankle".

=== EXTRACTION RULES ===
1. Each item = one distinct injury (one wound = one item, one fracture = one item)
2. You MUST extract ALL injuries listed in Physical Examination ??do NOT skip any body region
3. Each item must include: injury type + anatomical location + laterality (if mentioned)
4. Do NOT include vital signs, patient demographics, or non-injury information
5. Keep each injury description SHORT (under 15 words)
6. For each injury, also provide "search_terms": map the injury to ICD-10 standard anatomy terms.
   - Body part mapping: palm?and, shin/calf?ower leg, temple/scalp/forehead?ead, sole/heel?oot
   - Injury type mapping: laceration??Open wound", bruise??Contusion"
   - Write as a single natural English phrase, NOT comma-separated keywords.
7. Do NOT copy examples below ??generate results from the actual trauma note above

=== OUTPUT FORMAT (JSON only) ===
{"injuries": [
  {"text": "laceration on right sole", "search_terms": "Open wound of right foot"},
  {"text": "abrasion on left shin", "search_terms": "Abrasion of left lower leg"},
  {"text": "contusion with foreign body on right temple", "search_terms": "Contusion with foreign body of head"},
  {"text": "open bite wound on left calf", "search_terms": "Open bite of left lower leg"}
]}

YOUR RESPONSE:

h2. stage 3


  You are an expert ICD-10-CM medical coder working in an Emergency Room setting.

  === MANDATORY RULES ===

  RULE 1 (Initial Encounter ONLY):
  - You MUST select codes ending in 'A' (Initial encounter)
  - REJECT any code ending in 'D' (Subsequent encounter) or 'S' (Sequela)
  - Exception: only if text explicitly says "follow-up", "subsequent visit", or "sequela"

  RULE 2 (Laterality):
  - Match the exact laterality mentioned (left/right)
  - If not specified, use 'unspecified' laterality codes

  RULE 3 (Specificity Matching):
  - If the injury text names a SPECIFIC body part (e.g. "palm"), prefer the code for that exact part over a general parent (e.g. "hand,
  unspecified").
  - If the injury text is VAGUE (e.g. "hand injury"), prefer "unspecified" codes over overly specific ones.

  === SELECTION ===
  - Select the TOP 5 codes ranked by specificity match (1 = best match)
  - If fewer than 5 valid codes exist, return only the valid ones


  TASK

  === TASK ===
  You are given a trauma note and multiple injuries, each with pre-filtered ICD-10-CM code candidates.
  For EACH injury, select the TOP 5 most appropriate codes.

  TRAUMA NOTE:
  {trauma_note}

  INJURIES AND THEIR CANDIDATES:
  Injury: "{injury_text_1}"
  Candidates:
    1. S61.412A: Laceration without foreign body of left hand ...
    2. S61.402A: Unspecified open wound of left hand ...
    ...

  Injury: "{injury_text_2}"
  Candidates:
    1. S01.01XA: Laceration without foreign body of scalp ...
    ...

  === OUTPUT FORMAT (JSON only) ===
  {
    "results": [
      {
        "injury": "",
        "selected": [
          {"rank": 1, "code": "CODE", "confidence": "high"},
          {"rank": 2, "code": "CODE", "confidence": "high"},
          {"rank": 3, "code": "CODE", "confidence": "medium"},
          {"rank": 4, "code": "CODE", "confidence": "low"},
          {"rank": 5, "code": "CODE", "confidence": "low"}
        ]
      }
    ]
  }

  YOUR RESPONSE:

SC 是由 Sashiba Chou 於 3 個月前更新動作
複製連結
#1

狀態從 New 變更為 Closed

補充任務內容

h1. Stage 2 四規則過濾

. Stage 2 程式化過濾僅有 D/S suffix 和 foreign body 兩條規則，缺少 laterality 和 deep structure 過濾，大量無關候選碼進入 Stage 3 浪費 LLM token
並干擾排序
. Foreign body 檢查採關鍵字比對（glass/metal/wood...），無法偵測語意性描述（如 "stone debris inside
wound"），且中文「伴有異物」子字串匹配「未伴有異物」產生誤判

  A. Stage 2 程式化過濾：2 規則 → 4 規則

┌─────────────┬────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 規則 │ 舊版 │ 新版 │
├─────────────┼────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Rule 1: D/S │ ✅ 已有 │ ✅ 保留 │
│ suffix │ │ │
├─────────────┼────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Rule 2: │ ❌ 無 │ ✅ 新增 — 比對 injury 的 left/right 與候選碼描述的 left/right，移除反向側性碼 │
│ Laterality │ │ │
├─────────────┼────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Rule 3: │ ✅ │ ✅ 重寫 — 改為信任 Stage 1 LLM 語意輸出（HOLISTIC REVIEW STEP B），檢查 "foreign body" in │
│ Foreign │ 關鍵字比對 │ injury_text；修正中文子字串 bug（"伴有異物" in "未伴有異物" → 加 guard 條件） │
│ body │ │ │
├─────────────┼────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Rule 4: │ │ ✅ 新增 — 若 injury │
│ Deep │ ❌ 無 │ 未提及深層結構（artery/vein/nerve/tendon/muscle），移除含這些關鍵字的候選碼；支援中英文雙語比對（EN: │
│ structure │ │ artery/vein/nerve/tendon/muscle/vessel/arch、ZH: 動脈/靜脈/神經/肌腱/肌肉） │
└─────────────┴────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

新增 removed_log 參數記錄每筆被移除候選的原因（供 debug 輸出使用）。

B. 混合式 search_terms 解剖學擴展（新功能）

新增 _ANATOMY_SYNONYM 對照表（~25 組口語→ICD-10 標準部位映射）
新增 _expand_anatomy_queries() 方法，在 Stage 2 程式化產生額外向量搜尋 query
修改 Stage 1 prompt：指令 #6 明確要求 LLM 做部位映射 + 範例全部改為展示映射效果
雙管齊下：LLM 做對時程式化 query 僅為冗餘（search_codes_multi 按 code 去重）；LLM 沒做對時程式化映射保底
測試結果：12/20 情境觸發程式化擴展，LLM search_terms 品質同步大幅提升

C. Stage 1 Prompt 強化

HOLISTIC REVIEW 三步驟（STEP A/B/C）取代逐段擷取，跨段落交叉比對 PE + Mechanism + Symptoms
新增 Rule 2：MUST extract ALL injuries（防漏）
新增 Rule 7：Do NOT copy examples（防複製）
範例改用不同部位（right sole / left shin / right temple / left calf），展示術語映射

D. Stage 2 向量搜尋架構改善

新增 search_codes_multi() 多 query 合併去重（按 code 保留最佳 distance）
支援三層 query：injury_text 原文 + LLM search_terms + 程式化同義詞擴展

E. 候選數量提升

per_query_k: 30 → 50
過濾後上限: 30 → 50
最終輸出: TOP 3 → TOP 5

F. Debug MD 輸出（新功能）

新增 _write_debug_md() 方法，每次 ICD-10 擷取後自動寫入 debug_icd10_output.md
包含：病歷全文、Stage 1 prompt + raw response、Stage 2 raw/removed/filtered candidates、Stage 3 prompt + raw response、最終預測

h1. 影響檔案

backend/rag_service.py — 全部變更集中於此

動作

複製連結

匯出至 PDF Atom

專案

一般

配置概況

ER_note_NG

Feature #173

ICD-10 Pipeline 優化 — search_terms 解剖學擴展、Debug 輸出、候選數量提升

SC 是由 Sashiba Chou 於 3 個月前更新動作
複製連結
#1

專案

一般

配置概況

ER_note_NG

Feature #173

ICD-10 Pipeline 優化 — search_terms 解剖學擴展、Debug 輸出、候選數量提升

SC 是由 Sashiba Chou 於 3 個月 前更新 動作複製連結 #1

SC 是由 Sashiba Chou 於 3 個月前更新動作
複製連結
#1