Notice
Recent Posts
Recent Comments
ยซ   2024/12   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31
Tags
more
Archives
Today
Total
๊ด€๋ฆฌ ๋ฉ”๋‰ด

Hello Potato World

[ํฌํ…Œ์ดํ†  ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] (TCAV) Interpretability Beyond Feature Attribution:Quantitative Testing with Concept Activation Vectors ๋ณธ๋ฌธ

Paper Review๐Ÿฅ”/XAI

[ํฌํ…Œ์ดํ†  ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] (TCAV) Interpretability Beyond Feature Attribution:Quantitative Testing with Concept Activation Vectors

Heosuab 2021. 8. 29. 21:01

 

โ‹† ๏ฝก หš โ˜๏ธŽ หš ๏ฝก โ‹† ๏ฝก หš โ˜ฝ หš ๏ฝก โ‹† 

[XAI paper review]

 

 

 


 Concepts


 CNN์€ Input์œผ๋กœ๋ถ€ํ„ฐ layer๊ฐ€ ๊นŠ์–ด์งˆ์ฃผ๋ก high-level์˜ feature๋ฅผ ์ถ”์ถœํ•œ๋‹ค. ๋‚ฎ์€ layer์—์„œ๋Š” Edge, Comer, Color์™€ ๊ฐ™์€ ๊ตฌ์ฒด์ ์ธ ์ •๋ณด๋“ค(Low-level Features)์„ ์ถ”์ถœํ•˜๊ณ , ๊นŠ์€ layer์—์„œ๋Š” Object์˜ ์ผ๋ถ€๋ถ„์ด๋‚˜ ์ „์ฒด Object์ฒ˜๋Ÿผ ์‚ฌ๋žŒ์ด ์ง๊ด€์ ์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ๋Š” ์ •๋ณด๋“ค(High-level Features)๋ฅผ ์ถ”์ถœํ•œ๋‹ค. 

 Image classifier์™€ ๊ฐ™์€ ๋งŽ์€ ์‹œ์Šคํ…œ์—์„œ high-level feature๋ณด๋‹ค๋Š” low-level feature๋ฅผ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์‚ฌ๋žŒ์ด ์ง๊ด€์ ์œผ๋กœ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ํŠน์ง• ๋˜๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ์ง์ ‘ ์ •์˜ํ•œ ํŠน์ง•์„ Concept์ด๋ผ๊ณ  ์ •์˜ํ•˜๊ณ  ์ด Concept์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด๋ฏธ์ง€๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฒ•(CAV: Concept Activation Vector)์„ ์†Œ๊ฐœํ•œ๋‹ค.

 

 

  • Model Interpretability๋ฅผ ์œ„ํ•œ ๊ธฐ์กด ์ ‘๊ทผ ๋ฐฉ๋ฒ•

 ๋ชจ๋ธ์˜ ์˜ˆ์ธก์„ Input feature์˜ ๊ด€์ ์—์„œ ์ดํ•ดํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, Logistic regression classifier์—์„œ๋Š” coefficient weights๊ฐ€ ๋ชจ๋ธ ํ•ด์„์— ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๊ณ , Saliency Maps์—์„œ๋Š” ๊ฐ๊ฐ์˜ pixel์— ๋Œ€ํ•œ ์ผ์ฐจ ๋ฏธ๋ถ„ weights๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ •๋ณด๋Š” ์‚ฌ๋žŒ์ด ์ง๊ด€์ ์œผ๋กœ ์ดํ•ดํ•˜๊ธฐ ํž˜๋“ค๋‹ค.

  •  Concept์„ ์‚ฌ์šฉํ•œ ์ ‘๊ทผ ๋ฐฉ๋ฒ•

 ์œ„์—์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ์ด, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์‚ฌ๋žŒ์ด ๋ฐ”๋กœ ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์šด ๊ฐœ๋…(=Concept)์„ ํ†ตํ•ด ๋ชจ๋ธ์„ ์ดํ•ดํ•˜๊ณ ์ž ํ•œ๋‹ค. Concept์€ Object, Color, ๋˜๋Š” ์–ด๋– ํ•œ Idea๋ฅผ ํฌ๊ด„ํ•˜์—ฌ ์–ด๋–ค๊ฒƒ์ด๋“  ๋  ์ˆ˜ ์žˆ๋Š”๋ฐ, "์–ผ๋ฃฉ๋ง"์„ ๋ถ„๋ฅ˜ํ•˜๋Š” classifier์—์„œ๋Š” "Dotted", "Zigzagged", "Striped"์™€ ๊ฐ™์€ concept์ด ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๊ณ , ์ƒ‰๊น”์˜ ๊ด€์ ์—์„œ๋Š” "Red", "Yellow", "Green"๋“ฑ ์–ด๋– ํ•œ ๊ฒƒ๋„ concept์œผ๋กœ ์ •์˜๋  ์ˆ˜ ์žˆ๋‹ค.

 

 


 CAV: Concept Activation Vector


 ์—ฌ๊ธฐ์„œ CAV(Concept Activation Vector)๋ž€, ๋ชจ๋ธ์ด ์ด๋ฏธ์ง€๋ฅผ ํ•ด์„ํ•  ๋•Œ ์‚ฌ์šฉํ•˜๋Š” low-level(์˜ˆ๋ฅผ ๋“ค๋ฉด pixel values)์˜ feature๋ฅผ concept์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ณผ์ •์„ ๋งํ•œ๋‹ค. 

  • $e_m$ : input features, neural activation๊ณผ ๊ฐ™์€ data์— ๊ด€๋ จ๋œ ๋ฒกํ„ฐ
  • $E_m$ : $e_m$ ๋ฒกํ„ฐ๋“ค์— ์˜ํ•ด ๋งŒ๋“ค์–ด์ง€๋Š” ๋ฒกํ„ฐ๊ณต๊ฐ„
  • $e_h$ : human-interpretable concept์— ๊ด€๋ จ๋œ ๋ฒกํ„ฐ
  • $E_h$ : $e_h$ ๋ฒกํ„ฐ๋“ค์— ์˜ํ•ด ๋งŒ๋“ค์–ด์ง€๋Š” ๋ฒกํ„ฐ๊ณต๊ฐ„

 ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋˜์—ˆ์„ ๋•Œ, CAV๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ํ•จ์ˆ˜ g๋กœ ํ‘œํ˜„๋˜๊ณ , g๊ฐ€ linearํ•  ๋•Œ linear Interpretability๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.

 ํ•จ์ˆ˜์˜ ์˜ˆ๋กœ, ์ด๋ฏธ์ง€์—์„œ color ๋˜๋Š” texture, edge์ •๋ณด๋“ค์„ ์‚ฌ์šฉํ•˜์—ฌ zebra์˜ "Striped" concept์„ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด ๋•Œ concept $E_h$๋Š” input feature๋‚˜ training data์— ์˜ํ•ด ์ œํ•œ๋˜์ง€ ์•Š์•„๋„ ๋˜๊ณ , ๊ธฐ์กด features์— ๊ด€๋ จ์ด ์—†๋Š” ์ƒˆ๋กœ์šด ๊ฐœ๋…์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 CAV๋ฅผ ์ •์˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํŠน์ • layer $l$์—์„œ ์‚ฌ์šฉ์ž๊ฐ€ ๊ด€์‹ฌ์žˆ๋Š” concept์— ๊ด€๋ จ๋œ data example๋“ค์˜ activation์„ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๊ณ , ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‹จ๊ณ„๋ฅผ ๊ฑฐ์นœ๋‹ค.

  1. Input์œผ๋กœ ์ฃผ์–ด์ง„ example์ค‘์—์„œ concept $C$์— ๊ด€๋ จ๋œ ๋ฐ์ดํ„ฐ๋“ค(concept set)๊ณผ ๊ทธ ๋ฐ์ดํ„ฐ๋“ค์— ๋ฐ˜๋Œ€๋˜๋Š” ๋ฐ์ดํ„ฐ๋“ค(random examples)๋ฅผ ์ •์˜ํ•œ๋‹ค.(โ“: "striped"์— ๋Œ€ํ•œ concept set๊ณผ random examples)
    • $P_C$: concept set
    • $N$: random examples
  2. ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚จ๋‹ค. (โ“‘: "zebra" class๋ฅผ classifyํ•˜๊ธฐ ์œ„ํ•œ training data, โ“’: ํ•™์Šต๋œ Network)
  3. ํŠน์ • layer $l$์—์„œ, $P_C$๊ณผ $N$ ๊ฐ๊ฐ์— ์˜ํ•ด ์ƒ์„ฑ๋˜๋Š” activation์„ ๊ณ„์‚ฐํ•˜๊ณ , ์ด ๋‘ activation space๋ฅผ ๋ถ„๋ฆฌํ•˜๋Š” linear classifier๋ฅผ ํ•™์Šตํ•œ๋‹ค. (โ““: ๊ฐ๊ฐ์˜ activation space์™€ linear classifier)
    • {$f_l(x): x \in P_C$} vs {$f_l(x): x \in N$}
  4. ๊ตฌํ•ด์ง„ linear classifier์— ์ง๊ตํ•˜๋Š” ๋ฒกํ„ฐ๋ฅผ ํ†ตํ•ด CAV($v^l_C$)๋ฅผ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค. 
    • $v^l_C \in \Re^m$ : concept $C$์˜ linear CAV. input $x \in \Re^n$์ด๊ณ  layer $l$์˜ neuron์˜ ๊ฐœ์ˆ˜๊ฐ€ m๊ฐœ์ผ ๋•Œ, $f_l : \Re^n \rightarrow \Re^m$

 

 


 TCAV: Testing with CAV


 ์ด ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•˜๋Š”, CAV๋ฅผ ์‚ฌ์šฉํ•œ ์ƒˆ๋กœ์šด linear interpretability ๋ฐฉ๋ฒ•์„ TCAV๋ผ๊ณ  ํ•œ๋‹ค. TCAV๋Š” ๋ฐฉํ–ฅ๋„ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ CAV์— ์˜ํ•ด ํ•™์Šต๋œ high level concept์— ๋Œ€ํ•œ ๋ชจ๋ธ์˜ prediction์˜ ๋ฏผ๊ฐ๋„๋ฅผ ์ธก์ •ํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด "zebra"๋ฅผ ๋ถ„๋ฅ˜ํ•ด๋‚ด๋Š” ๋ชจ๋ธ์ด ์žˆ๋‹ค๊ณ  ํ•˜๋ฉด, TCAV๋Š” ์ƒˆ๋กœ ์ •์˜๋œ "striped"๋ผ๋Š” concept์ด "zebra"๋ผ๋Š” prediction์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰, TCAV๋Š” ํŠน์ • Concept์ด ํ•ด๋‹น Class์™€ ์–ผ๋งˆ๋‚˜ ๊ถํ•ฉ์ด ์ž˜ ๋งž๋Š”์ง€ ๋ถ„์„ํ•œ๋‹ค. 

 TCAV์˜ 4๊ฐ€์ง€ Goal์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. (๋˜๋Š” ์žฅ์ ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๊ฒ ๋‹ค.)

  1. Accessibility : Machine Learning ์ „๋ฌธ๊ฐ€๊ฐ€ ์•„๋‹ˆ์—ฌ๋„ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋‹ค.
  2. Customization : ์–ด๋– ํ•œ Concept์ด๋“  ์ œํ•œ ์—†์ด ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค.
  3. Plug-in readiness : ์ด๋ฏธ ํ•™์Šต๋œ ML ๋ชจ๋ธ์„ ์žฌํ•™์Šตํ•˜๊ฑฐ๋‚˜ ์ˆ˜์ •ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค.
  4. Global quantification : Input data์˜ ๊ฐ๊ฐ์— ๋Œ€ํ•œ ํ•ด์„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ „์ฒด data, ๋˜๋Š” ์ „์ฒด class์— ๋Œ€ํ•œ ํ•ด์„์„ ์ œ์‹œํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 Saliency map์œผ๋กœ ์˜ˆ์‹œ๋กœ ๊ณ„์‚ฐ๋ฒ•์„ ์‚ดํŽด๋ณด๋ฉด, ํŠน์ • class $k$์— ๋Œ€ํ•œ ํ•˜๋‚˜์˜ data $x$์˜ logit๊ฐ’์„ $h_k(x)$๋ผ๊ณ  ํ•˜๊ณ , ํ•ด๋‹น data $x$์˜ (a,b)์ขŒํ‘œ์—์„œ์˜ pixel์„ $x_{a,b}$๋ผ๊ณ  ํ•˜๋ฉด,

 pixel (a,b)์˜ ๊ฐ’์ด ๋ณ€ํ–ˆ์„ ๋•Œ class $k$์— ๋Œ€ํ•œ ๋ชจ๋ธ์˜ ๋ฏผ๊ฐ๋„๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ๋ฏธ๋ถ„์„ ํ†ตํ•ด ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ CAV์™€ ๋ฐฉํ–ฅ ๋„ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด, ๋ชจ๋ธ์˜ ๋ฏผ๊ฐ๋„๋ฅผ ์˜๋ฏธํ•˜๋Š” $S_{C,k,l}(x)$ (Conceptual Sensitivity๋ผ๊ณ  ํ•œ๋‹ค.) ๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰, (Activation์˜ ๋ณ€ํ™”๋Ÿ‰) * (Concept $C$์˜ CAV์˜ ๋ณ€ํ™”๋Ÿ‰ logit)์œผ๋กœ ๊ณ„์‚ฐ๋œ๋‹ค. ์ด ๋•Œ h๋Š” $h_{l,k} : \Re^m \rightarrow \Re$.

 

 TCAV๋Š” ์ตœ์ข…์ ์œผ๋กœ ์œ„์—์„œ ๊ตฌํ•œ Conceptual Sensitivity๊ฐ’์„ ์‚ฌ์šฉํ•œ๋‹ค. Supervised task์—์„œ $k$๊ฐ€ class label์„ ์˜๋ฏธํ•œ๋‹ค๊ณ  ํ•  ๋•Œ, $X_k$๋Š” ์ฃผ์–ด์ง„ label์— ๊ด€๋ จ๋œ ๋ชจ๋“  input data๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์ด $X_k$ ๋ฐ์ดํ„ฐ ์ค‘์—์„œ concept $C$์— ๊ด€๋ จ๋œ Conceptual Sensitivity์˜ data์˜ ๋น„์œจ์„ ๊ตฌํ•˜๋ฉด, ํ•ด๋‹น label์— ๋Œ€ํ•œ concept $C$์˜ globalํ•œ ์˜ํ–ฅ๋ ฅ์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค.

 ๋ถ„์ž๋Š” $X_k$๋ฐ์ดํ„ฐ ์ค‘ positive(์–‘์ˆ˜)ํ•œ Conceptual Sensitivity๋ฅผ ๊ฐ€์ง€๋Š” data์˜ ์ง‘ํ•ฉ์ด ๋˜๊ณ , ๋”ฐ๋ผ์„œ $TCVA_{Q_{C,K,l}}$์€ [0,1]์˜ ๋ฒ”์œ„๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋œ๋‹ค.

 

 

 

 


 Results


  • GoogleNet์˜ ๋ชจ๋“  layer์— ๋Œ€ํ•œ Relative TCAV์˜ ๊ฒฐ๊ณผ๊ฐ’๊ณผ, Inception V3์˜ ๋งˆ์ง€๋ง‰ ์„ธ ๊ฐœ์˜ layer์˜ Relativa TCAV๊ฒฐ๊ณผ๊ฐ’์ด๋‹ค. ์ตœ์ข… "Fire engine" class์—๋Š” "Red" Concept์ด ์ค‘์š”ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ณ , "Zebra" class์—๋Š” "Striped" Concept์ด ์ค‘์š”ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๋“ฑ์˜ ๊ฒฐ๋ก ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

  • CAV๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ • Concept์— ๊ด€๋ จ๋œ ์ด๋ฏธ์ง€๋“ค์„ ๋ถ„๋ฅ˜ํ•˜๋Š” ์ž‘์—…์„ ํ•  ์ˆ˜๋„ ์žˆ๋‹ค.
    • ์•„๋ž˜ ์˜ˆ์‹œ ์ค‘ ์™ผ์ชฝ ๊ทธ๋ฆผ์€ ๋” ์ถ”์ƒ์ ์ธ "CEO"๋ผ๋Š” Concept์œผ๋กœ ํ•™์Šต๋œ CAV๋ฅผ ํ†ตํ•ด "Stripes" Concept์„ ๋ถ„๋ฅ˜ํ•œ ๊ฒฐ๊ณผ์ธ๋ฐ, ๊ฐ€์žฅ Most similar(Top 3)์˜ ๊ฒฐ๊ณผ๋กœ ๋‚˜์˜จ Stripes๊ฐ€ CEO๊ฐ€ ๋” ์ž์ฃผ ์ž…์„๋งŒํ•œ ์ˆ˜ํŠธ ๋˜๋Š” ๋„ฅํƒ€์ด์— ์–ด์šธ๋ฆฌ๋Š” Striped ๋””์ž์ธ์œผ๋กœ ํ•ด์„๋˜๊ณ , least similar๋กœ ๋‚˜์˜จ ๊ฒฐ๊ณผ๋Š” CEO์— ์–ด์šธ๋ฆฌ์ง€ ์•Š๋Š” Striped ๋””์ž์ธ์œผ๋กœ ํ•ด์„๋  ์ˆ˜ ์žˆ๋‹ค. 
    • ์˜ค๋ฅธ์ชฝ ๊ทธ๋ฆผ์€ "Model Women"์œผ๋กœ ํ•™์Šต๋œ CAV๋ฅผ ํ†ตํ•ด "Neckties"๋ผ๋Š” class์— ๋งค์นญ๋˜๋Š” ๋ฐ์ดํ„ฐ ์ด๋ฏธ์ง€๋“ค์„ ๋ถ„๋ฅ˜ํ•œ ๊ฒฐ๊ณผ์ธ๋ฐ, Most similar(Top 3)์˜ ๊ฒฐ๊ณผ๊ฐ€ "Neckties"๋ฅผ ํ•œ "Women"์˜ ์ด๋ฏธ์ง€๊ฐ€ ๋‚˜์˜จ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

 

  • ๋˜ ๋‹ค๋ฅธ CAV์˜ ํ™œ์šฉ ๋ฐฉ๋ฒ•์œผ๋กœ Empirical Deep Dream์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. Empirical Deep Dream์€ CAV๋ฅผ ์ตœ๋Œ€ํ•œ ํ™œ์„ฑํ™”ํ•˜๋Š” ํŒจํ„ด์— ์ตœ์ ํ™”ํ•˜๊ณ  ํ•ด๋‹น Concept์˜ Semanticํ•œ ๊ฐœ๋…๊ณผ ๋น„๊ตํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค. ์ฆ‰, ํŠน์ • Neuron ๋˜๋Š” Neuron์˜ ์ง‘ํ•ฉ์„ ์ตœ๋Œ€ํ•œ์œผ๋กœ ํ™œ์„ฑํ™”ํ•˜๋Š” ํŒจํ„ด์„ Visualizeํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‹ค์Œ ๊ฒฐ๊ณผ๋Š” "Knitted Texture", "Corgis", "Siberian Huskey"๋กœ ํ•™์Šต๋œ CAV๋ฅผ ๋ฐ˜์˜ํ•œ Deep Dreamed ํŒจํ„ด์„ ์‹œ๊ฐํ™”ํ•œ ๊ฒƒ์ด๋‹ค. ์ด ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด TCAV๊ฐ€ ํŠน์ • layer์—์„œ ํ•™์Šต๋˜๋Š” direction์ด๋‚˜ ํŒจํ„ด์„ ์ •์˜ํ•˜๊ณ  ์‹œ๊ฐํ™”ํ•  ์ˆ˜ ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

 

 

 


  References


[1] Kim, Been, et al. "Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)." International conference on machine learning. PMLR, 2018.

 

 

 

 

 

 

Comments