Feature #173
已結束ICD-10 Pipeline 優化 — search_terms 解剖學擴展、Debug 輸出、候選數量提升
概述
h1. 問題
- Stage 1 LLM 產生的 search_terms 幾乎無擴展效果(20/20 測試皆為 on→of 字面替換),口語部位名稱(palm、shin、scalp 等)未映射到 ICD-10
標準術語(hand、lower leg、head),導致 Stage 2 向量搜尋第二條 query 形同虛設 - per_query_k=30 不足,特定部位的正確 ICD-10 碼排在 30 名以外(如 S61.412A 左手掌撕裂傷)
- 最終僅輸出 TOP 3 候選,臨床上不夠
- Stage 1 prompt 範例與測試案例高度相似,LLM 直接複製範例而忽略真實傷勢
- 中文子字串 bug:"伴有異物" in "未伴有異物" 為 True,導致「未伴有異物」碼被誤刪
- 缺乏 pipeline 全流程 debug 工具,prompt 調校困難
h1. 變更內容
A. 混合式 search_terms 解剖學擴展(新功能)
- 新增 _ANATOMY_SYNONYM 對照表(~25 組口語→ICD-10 標準部位映射)
- 新增 _expand_anatomy_queries() 方法,在 Stage 2 程式化產生額外向量搜尋 query
- 修改 Stage 1 prompt:指令 #6 明確要求 LLM 做部位映射 + 範例全部改為展示映射效果
- 雙管齊下:LLM 做對時程式化 query 僅為冗餘(search_codes_multi 按 code 去重);LLM 沒做對時程式化映射保底
- 測試結果:12/20 情境觸發程式化擴展,LLM search_terms 品質同步大幅提升
B. Stage 1 Prompt 強化
- HOLISTIC REVIEW 三步驟(STEP A/B/C)取代逐段擷取
- 新增 Rule 2:MUST extract ALL injuries(防漏)
- 新增 Rule 7:Do NOT copy examples(防複製)
- 範例改用不同部位(right sole / left shin / right temple / left calf),展示術語映射
C. Stage 2 程式化過濾強化
- 新增 4 條規則:D/S suffix、laterality、foreign body(含中文子字串修正)、deep structure
- 修正中文子字串 bug:"伴有異物" 命中但 "未伴有異物" 也命中時視為「無異物」→保留
- 新增 removed_log 參數記錄每筆被移除候選的原因
- 新增 search_codes_multi() 多 query 合併去重(按 code 保留最佳 distance)
D. 候選數量提升
- per_query_k: 30 → 50
- 過濾後上限: 30 → 50
- 最終輸出: TOP 3 → TOP 5
E. Debug MD 輸出
- 新增 _write_debug_md() 方法,每次 ICD-10 擷取後自動寫入 debug_icd10_output.md
- 包含:病歷全文、Stage 1 prompt + raw response、Stage 2 raw/removed/filtered candidates、Stage 3 prompt + raw response、最終預測
h1. 影響檔案
- backend/rag_service.py — 全部變更集中於此
h1. Prompts
h2. stage 1
You are a medical information extractor specializing in Emergency Room trauma notes.
=== TRAUMA NOTE ===
{trauma_note}
=== END OF NOTE ===
=== HOLISTIC REVIEW (CRITICAL) ===
Do NOT extract injuries section-by-section. Follow these steps:
STEP A ??Identify base injuries from Physical Examination (wound type + body location + laterality).
STEP B ??Scan Mechanism of Injury, History, and Symptoms for MODIFIERS that change the diagnosis:
- Retention: Any evidence of material embedded or retained in the wound (e.g. stone debris, glass, seed, any solid object) ??add "with foreign body" to the injury text.
- Bite: Injury caused by a bite from any living being (e.g. dog bite, human bite, insect bite) ??reclassify the wound as "open bite wound" instead of "laceration".
- Crush: Injury from crushing force ??classify as "crush injury".
STEP C ??Merge modifiers into the corresponding base injury. Match each modifier to the CORRECT body region it belongs to.
Example: PE says "Laceration on right forearm" + Symptoms says "Right arm pain - glass fragments in wound" ??output "laceration with foreign body on right forearm".
Example: Mechanism says "cat bite" + PE says "Laceration on right ankle" ??output "open bite wound on right ankle".
=== EXTRACTION RULES ===
1. Each item = one distinct injury (one wound = one item, one fracture = one item)
2. You MUST extract ALL injuries listed in Physical Examination ??do NOT skip any body region
3. Each item must include: injury type + anatomical location + laterality (if mentioned)
4. Do NOT include vital signs, patient demographics, or non-injury information
5. Keep each injury description SHORT (under 15 words)
6. For each injury, also provide "search_terms": map the injury to ICD-10 standard anatomy terms.
- Body part mapping: palm?and, shin/calf?ower leg, temple/scalp/forehead?ead, sole/heel?oot
- Injury type mapping: laceration??Open wound", bruise??Contusion"
- Write as a single natural English phrase, NOT comma-separated keywords.
7. Do NOT copy examples below ??generate results from the actual trauma note above
=== OUTPUT FORMAT (JSON only) ===
{"injuries": [
{"text": "laceration on right sole", "search_terms": "Open wound of right foot"},
{"text": "abrasion on left shin", "search_terms": "Abrasion of left lower leg"},
{"text": "contusion with foreign body on right temple", "search_terms": "Contusion with foreign body of head"},
{"text": "open bite wound on left calf", "search_terms": "Open bite of left lower leg"}
]}
YOUR RESPONSE:
h2. stage 3
You are an expert ICD-10-CM medical coder working in an Emergency Room setting.
=== MANDATORY RULES ===
RULE 1 (Initial Encounter ONLY):
- You MUST select codes ending in 'A' (Initial encounter)
- REJECT any code ending in 'D' (Subsequent encounter) or 'S' (Sequela)
- Exception: only if text explicitly says "follow-up", "subsequent visit", or "sequela"
RULE 2 (Laterality):
- Match the exact laterality mentioned (left/right)
- If not specified, use 'unspecified' laterality codes
RULE 3 (Specificity Matching):
- If the injury text names a SPECIFIC body part (e.g. "palm"), prefer the code for that exact part over a general parent (e.g. "hand,
unspecified").
- If the injury text is VAGUE (e.g. "hand injury"), prefer "unspecified" codes over overly specific ones.
=== SELECTION ===
- Select the TOP 5 codes ranked by specificity match (1 = best match)
- If fewer than 5 valid codes exist, return only the valid ones
TASK
=== TASK ===
You are given a trauma note and multiple injuries, each with pre-filtered ICD-10-CM code candidates.
For EACH injury, select the TOP 5 most appropriate codes.
TRAUMA NOTE:
{trauma_note}
INJURIES AND THEIR CANDIDATES:
Injury: "{injury_text_1}"
Candidates:
1. S61.412A: Laceration without foreign body of left hand ...
2. S61.402A: Unspecified open wound of left hand ...
...
Injury: "{injury_text_2}"
Candidates:
1. S01.01XA: Laceration without foreign body of scalp ...
...
=== OUTPUT FORMAT (JSON only) ===
{
"results": [
{
"injury": "",
"selected": [
{"rank": 1, "code": "CODE", "confidence": "high"},
{"rank": 2, "code": "CODE", "confidence": "high"},
{"rank": 3, "code": "CODE", "confidence": "medium"},
{"rank": 4, "code": "CODE", "confidence": "low"},
{"rank": 5, "code": "CODE", "confidence": "low"}
]
}
]
}
YOUR RESPONSE:
SC 是由 Sashiba Chou 於 約 1 個月 前更新
- 狀態 從 New 變更為 Closed
補充任務內容
h1. Stage 2 四規則過濾
. Stage 2 程式化過濾僅有 D/S suffix 和 foreign body 兩條規則,缺少 laterality 和 deep structure 過濾,大量無關候選碼進入 Stage 3 浪費 LLM token
並干擾排序
. Foreign body 檢查採關鍵字比對(glass/metal/wood...),無法偵測語意性描述(如 "stone debris inside
wound"),且中文「伴有異物」子字串匹配「未伴有異物」產生誤判
A. Stage 2 程式化過濾:2 規則 → 4 規則
┌─────────────┬────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 規則 │ 舊版 │ 新版 │
├─────────────┼────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Rule 1: D/S │ ✅ 已有 │ ✅ 保留 │
│ suffix │ │ │
├─────────────┼────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Rule 2: │ ❌ 無 │ ✅ 新增 — 比對 injury 的 left/right 與候選碼描述的 left/right,移除反向側性碼 │
│ Laterality │ │ │
├─────────────┼────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Rule 3: │ ✅ │ ✅ 重寫 — 改為信任 Stage 1 LLM 語意輸出(HOLISTIC REVIEW STEP B),檢查 "foreign body" in │
│ Foreign │ 關鍵字比對 │ injury_text;修正中文子字串 bug("伴有異物" in "未伴有異物" → 加 guard 條件) │
│ body │ │ │
├─────────────┼────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Rule 4: │ │ ✅ 新增 — 若 injury │
│ Deep │ ❌ 無 │ 未提及深層結構(artery/vein/nerve/tendon/muscle),移除含這些關鍵字的候選碼;支援中英文雙語比對(EN: │
│ structure │ │ artery/vein/nerve/tendon/muscle/vessel/arch、ZH: 動脈/靜脈/神經/肌腱/肌肉) │
└─────────────┴────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
新增 removed_log 參數記錄每筆被移除候選的原因(供 debug 輸出使用)。
B. 混合式 search_terms 解剖學擴展(新功能)
- 新增 _ANATOMY_SYNONYM 對照表(~25 組口語→ICD-10 標準部位映射)
- 新增 _expand_anatomy_queries() 方法,在 Stage 2 程式化產生額外向量搜尋 query
- 修改 Stage 1 prompt:指令 #6 明確要求 LLM 做部位映射 + 範例全部改為展示映射效果
- 雙管齊下:LLM 做對時程式化 query 僅為冗餘(search_codes_multi 按 code 去重);LLM 沒做對時程式化映射保底
- 測試結果:12/20 情境觸發程式化擴展,LLM search_terms 品質同步大幅提升
C. Stage 1 Prompt 強化
- HOLISTIC REVIEW 三步驟(STEP A/B/C)取代逐段擷取,跨段落交叉比對 PE + Mechanism + Symptoms
- 新增 Rule 2:MUST extract ALL injuries(防漏)
- 新增 Rule 7:Do NOT copy examples(防複製)
- 範例改用不同部位(right sole / left shin / right temple / left calf),展示術語映射
D. Stage 2 向量搜尋架構改善
- 新增 search_codes_multi() 多 query 合併去重(按 code 保留最佳 distance)
- 支援三層 query:injury_text 原文 + LLM search_terms + 程式化同義詞擴展
E. 候選數量提升
- per_query_k: 30 → 50
- 過濾後上限: 30 → 50
- 最終輸出: TOP 3 → TOP 5
F. Debug MD 輸出(新功能)
- 新增 _write_debug_md() 方法,每次 ICD-10 擷取後自動寫入 debug_icd10_output.md
- 包含:病歷全文、Stage 1 prompt + raw response、Stage 2 raw/removed/filtered candidates、Stage 3 prompt + raw response、最終預測
h1. 影響檔案
- backend/rag_service.py — 全部變更集中於此