Notice
Recent Posts
Recent Comments
ยซ   2025/01   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
Tags
more
Archives
Today
Total
๊ด€๋ฆฌ ๋ฉ”๋‰ด

Hello Potato World

[ํฌํ…Œ์ดํ†  ์Šคํ„ฐ๋””] Detecting Concepts ๋ณธ๋ฌธ

Study๐Ÿฅ”/XAI

[ํฌํ…Œ์ดํ†  ์Šคํ„ฐ๋””] Detecting Concepts

Heosuab 2021. 6. 28. 18:14

 

 

โ‹† ๏ฝก หš โ˜๏ธŽ หš ๏ฝก โ‹† ๏ฝก หš โ˜ฝ หš ๏ฝก โ‹† 

[XAI study_ Interpretable Machine Learning]

 

 


7.3 Detecting Concepts


- Black-box ๋ชจ๋ธ์„ ์„ค๋ช…ํ•˜๋Š” ๋ฐฉ๋ฒ•์˜ ๋‘ ๊ฐ€์ง€ ๊ด€์ 

  1. Feature-based approach
  2. Concept-based approach

 

- ์ง€๊ธˆ๊นŒ์ง€ ๋ณธ Feature-based approach ๋ฐฉ๋ฒ•๋“ค์˜ ๋‹จ์ 

  1. Feature๋“ค์ด Interpretability ๊ด€์ ์—์„œ ์ „๋ถ€ ์œ ์šฉํ•˜์ง€ ์•Š๋‹ค.
    (ex) image์—์„œ ๊ฐ๊ฐ์˜ pixel์˜ ์ค‘์š”๋„๊ฐ€ ์˜๋ฏธ์žˆ๋Š” ํ•ด์„์„ ๋งŒ๋“ค์ง„ ์•Š์Œ
  2. ๋„์ถœ๋˜๋Š” explanation์˜ ํ‘œํ˜„์ด feature์˜ ๊ฐฏ์ˆ˜์— ์˜ํ•ด ์ œ์•ฝ์„ ๋ฐ›๋Š”๋‹ค.

 

- Concept-based Approach

  • concepts : ์–ด๋– ํ•œ ์ถ”์ƒ์ ์ธ ๊ฒƒ๋“ค๋„ ๊ฐ€๋Šฅ(color, object, even an idea)
  • Concept-based Approach : Neural network์— ์˜ํ•ด ํ•™์Šต๋œ latent space๋‚ด์— embedded๋œ concept์„ ์ฐพ๋Š” ๊ฒƒ
    NN์ด ์ฃผ์–ด์ง„ concept์— ๋Œ€ํ•ด ๋ช…ํ™•ํ•˜๊ฒŒ ํ•™์Šต๋˜์ง€ ์•Š์•„๋„ ์ž˜ ์ฐพ์•„๋‚ผ ์ˆ˜ ์žˆ๋‹ค.
    => NN์˜ feature space์— ์˜ํ•ด ์ œ์•ฝ๋ฐ›์ง€ ์•Š๋Š” explanation์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

 


7.3.1 TCAV : Testing with Concept Activation Vectors


NN์˜ global explanation์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•
์–ด๋– ํ•œ concept์ด ํŠน์ • class์— ๋Œ€ํ•œ ๋ชจ๋ธ์˜ prediction์— ์ฃผ๋Š” ์˜ํ–ฅ์„ ์ธก์ •

 

(ex) TCAV๋Š” ๋ชจ๋ธ์ด ์ด๋ฏธ์ง€๋ฅผ "zebra"๋ผ๊ณ  classifyํ•˜๋Š”๋ฐ์— "striped"๋ผ๋Š” concept์ด ์ฃผ๋Š” ์˜ํ–ฅ ์ธก์ •
single prediction์„ ์„ค๋ช…ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ํ•ด๋‹น concept๊ณผ class ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ์„ค๋ช…(global)

 

 


7.3.1.1 CAV: Concept Activation Vector


CAV : NN์˜ ํ•˜๋‚˜์˜ layer๋‚ด์˜ activation space์—์„œ ํŠน์ • concept์— ๋Œ€ํ•œ ์ˆ˜์น˜์ ์ธ ํ‘œํ˜„
$v^C_l$ : Neural network layer ๐‘™์—์„œ concept ๐ถ์˜ CAV (๐‘™ : bottleneck of the model)

 

 

     - concept ๐ถ์˜ CAV ๊ณ„์‚ฐ

 

          1. 2๊ฐœ์˜ ๋ฐ์ดํ„ฐ์…‹ ์ค€๋น„

               (1) ๐ถ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” concept ๋ฐ์ดํ„ฐ์…‹ : concept "striped"๋ฅผ ํ‘œํ˜„ํ•˜๋Š”, striped objects์˜ ๋ฐ์ดํ„ฐ์…‹
               (2) random ๋ฐ์ดํ„ฐ์…‹ : stripe๊ฐ€ ์—†๋Š” group์˜ random ๋ฐ์ดํ„ฐ์…‹

          2. Hidden layer์ค‘ ํ•˜๋‚˜์ธ ๐‘™์„ ํƒ€๊ฒŸ์œผ๋กœ Binary classifier๋ฅผ ํ•™์Šต์‹œํ‚จ๋‹ค.

               Binary classifier : 1๋ฒˆ์˜ ๋‘ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ์…‹ ๋ถ„๋ฆฌ(SVM, logistic regression...) 

               CAV $v^C_l$ : Trained binary classifier์˜ coefficient vector

          3. input x์— ๋Œ€ํ•œ Conceptual sensitivity ๊ณ„์‚ฐ (์ถ”๊ฐ€์ )

               Conceptual Sensitivity : CAV๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ•œ ํ•ด๋‹น concept์— ๋Œ€ํ•œ ๋ฏผ๊ฐ๋„

 

 

 

   - Conceptual Sensitivity

ํ•˜๋‚˜์˜ CAV์˜ direction์œผ๋กœ์˜ prediction์— ๋Œ€ํ•œ directional derivative ๊ณ„์‚ฐ

(  $f_1$  : layer  ๐‘™์—์„œ input  ๐‘ฅ์˜ activation vector์—ฐ์‚ฐ,  $h_{l,k}$  : activation vector๋ฅผ class  ๐‘˜์— ๋Œ€ํ•œ logit output์œผ๋กœ ์—ฐ์‚ฐ )

   - ์ˆ˜์‹์—์„œ์˜ ๋‚ด์ ๊ฐ’์„ ํ†ตํ•œ ์˜๋ฏธ ํŒŒ์•…

gradient of $h_{l,k}(f_l(x))$์™€ $v^C_l$์‚ฌ์ด์˜ ๊ฐ

  • Greater than 90 degrees : $v^C_l$๊ฐ’์€ ์–‘์ˆ˜
  • less than 90 degrees : $S_{C,k,l}$๊ฐ’์€ ์Œ์ˆ˜
    conceptual sensitivity : $v^C_l$์ด $h_{l,k}$๋ฅผ ์ตœ๋Œ€ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉํ–ฅ๊ณผ ๊ฐ™์€ ๋ฐฉํ–ฅ์„ ๊ฐ€๋ฆฌํ‚ค๊ณ  ์žˆ๋Š”์ง€ ํ‘œํ˜„
    ๋”ฐ๋ผ์„œ, $v^C_l$>0์ด๋ผ๋Š” ๊ฒƒ์€ Concept C๊ฐ€, ๋ชจ๋ธ์ด input ๐‘ฅ๋ฅผ class ๐‘˜๋กœ ๊ตฌ๋ถ„ํ•˜๋„๋ก encourageํ•œ๋‹ค๋Š” ๋œป

 


7.3.1.2 TCAV : Testing with CAVs


์•ž์„œ ๋ณธ CAV๋Š” ๋‹จ์ผ input ๐‘ฅ์— ๋Œ€ํ•œ conceptual sensitivity ๊ณ„์‚ฐ
๋ชจ๋“  class์— ๋Œ€ํ•œ conceptual sensitivity๋ฅผ ์„ค๋ช…ํ•˜๋Š” Global explanation์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ

 

 

   - TCAV

์ „์ฒด input์ค‘์—์„œ ์–‘์ˆ˜์˜ conceptual sensitivity๋ฅผ ๊ฐ€์ง€๋Š” input์˜ ๋น„์œจ ๊ณ„์‚ฐ

(ex) ๋ชจ๋ธ์ด ์ด๋ฏธ์ง€๋“ค์„ "zebra"๋กœ classifyํ•˜๋Š”๋ฐ์— concept "striped" ์–ผ๋งˆ๋‚˜ ์˜ํ–ฅ์„ ์ฃผ๋Š”์ง€ ํŒŒ์•…ํ•˜๊ณ ์ž ํ•  ๋•Œ,
"zebra"๋กœ ๋ ˆ์ด๋ธ”๋œ data๋ฅผ ์ˆ˜์ง‘ํ•œ ํ›„์— ๊ฐ๊ฐ์˜ input์— ๋Œ€ํ•œ conceptual sensitivity ๊ณ„์‚ฐ
TCAV score : ์–‘์ˆ˜์˜ conceptual sensitivity๋ฅผ ๊ฐ€์ง€๋Š” "zebra"์ด๋ฏธ์ง€์˜ ๊ฐœ์ˆ˜ / ์ „์ฒด "zebra"์ด๋ฏธ์ง€์˜ ๊ฐœ์ˆ˜
=> $TCAV_{Q_{"striped","zebra",l}}=0.8$ ์ด๋ผ๋ฉด, "zebra" class๋กœ predict๋œ ์ด๋ฏธ์ง€์˜ 80%๊ฐ€ "striped"๋ผ๋Š” concept์— ๊ธ์ •์ ์ธ(positive) ์˜ํ–ฅ์„ ๋ฐ›๊ณ ์žˆ๋‹ค๋Š” ๋œป

 

 

     - TCAV์˜ ๊ณ„์‚ฐ

          1. ๐‘๊ฐœ์˜ random ๋ฐ์ดํ„ฐ์…‹ ์ˆ˜์ง‘ (๐‘>=10 ๊ถŒ์žฅ)

          2. concept ๋ฐ์ดํ„ฐ์…‹์„ ๊ณ ์ •ํ•˜๊ณ  ๐‘๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ๊ฐ์˜ TCAV score๊ณ„์‚ฐ

          3. random CAV์— ์˜ํ•ด ์ƒ์„ฑ๋œ ๋‹ค๋ฅธ ๐‘๊ฐœ์˜ TCAV์— ๋ฐ˜๋Œ€๋˜๋Š” ๐‘๊ฐœ์˜ TCAV score์˜ ์–‘์ธก t๊ฒ€์ •์„

             ์ ์šฉํ•œ๋‹ค.
             (random CAV๋Š” random ๋ฐ์ดํ„ฐ์…‹์„ concept ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์„ ํƒํ•˜์—ฌ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.)
             ์—ฌ๋Ÿฌ๊ฐ€์ง€ hypotheses๊ฐ€ ์žˆ๋‹ค๋ฉด multiple testing correction์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
             hypothesis์˜ ๊ฐœ์ˆ˜=testํ•˜๊ณ ์ž ํ•˜๋Š” concept์˜ ๊ฐœ์ˆ˜

 

 


7.3.2 Example


"striped", "zigzagged", "dotted"๋ผ๋Š” concept์— ๋Œ€ํ•œ TCAV scores์˜ ๊ฒฐ๊ณผ

 

(Image classifier : InceptionV3, convolutional neural network trained using ImageNet data 
Targeted bottleneck : "mixed4c"๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” layer
Concept ๋ฐ์ดํ„ฐ์…‹ 50๊ฐœ, Random ๋ฐ์ดํ„ฐ์…‹ 50๊ฐœ
0.05์˜ ์œ ์œ„์ˆ˜์ค€์œผ๋กœ, 10๊ฐœ์˜ random ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ test๊ฒ€์ •)

"dotted" : ์œ ์œ„์ˆ˜์ค€ 0.05๋ณด๋‹ค p-value๊ฐ€ ๋†’์•„์„œ test๊ฒ€์ •์„ ํ†ต๊ณผํ•˜์ง€ ๋ชปํ•จ
"striped", "zigzagged" : ๋‘˜ ๋‹ค test๊ฒ€์ •์„ ํ†ต๊ณผํ•˜๊ณ , TCAV์— ๋”ฐ๋ฅด๋ฉด ๋ชจ๋ธ์ด ์ด๋ฏธ์ง€๋“ค์„ "zebra"๋ผ๊ณ  ์‹๋ณ„ํ•˜๋Š”๋ฐ ๋งค์šฐ ์œ ์šฉํ•œ concept๋“ค

 


7.3.3 Advantages


  • TCAV๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ด€์‹ฌ์žˆ๋Š” concept๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹๋งŒ ์ˆ˜์ง‘ํ•˜๋ฉด ๋˜๊ธฐ ๋•Œ๋ฌธ์—, ๋น„์ „๋ฌธ๊ฐ€๊ฐ€ ์‚ฌ์šฉํ•˜๊ธฐ ์‰ฝ๋‹ค
  • Feature attribution์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜์—ฌ Customize ๊ฐ€๋Šฅ
    concept ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์ •์˜๋  ์ˆ˜ ์žˆ๋Š” ์–ด๋– ํ•œ concept๋„ ์กฐ์‚ฌํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ์–ด๋– ํ•œ class์— ์—ฐ๊ด€ํ•ด์„œ๋„ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ Glabal explanation์„ ์ƒ์„ฑํ•œ๋‹ค.

 


7.3.4 Diasdvantages


  • Neural Network๊ฐ€ ๊นŠ์„์ˆ˜๋ก ์ž˜ ๋™์ž‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์–•์€ ๋ชจ๋ธ์— ์ž˜ ๋™์ž‘ํ•˜์ง€ ๋ชปํ•œ๋‹ค.
  • Concept dataset์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ ์ถ”๊ฐ€์ ์ธ annotation์ด ํ•„์š”ํ•ด์„œ, ๋ผ๋ฒจ๋ง์ด ์ž˜ ๋˜์–ด์žˆ์ง€ ์•Š๋Š” ๊ฒฝ์šฐ ๋งค์šฐ ๊ท€์ฐฎ์€ ์ž‘์—…์ด ๋  ์ˆ˜ ์žˆ๋‹ค.
  • ๋„ˆ๋ฌด ์ถ”์ƒ์ ์ด๊ฑฐ๋‚˜ ๋„ˆ๋ฌด ์ผ๋ฐ˜์ ์ธ concept์— ๋Œ€ํ•ด์„œ๋Š” ์ž˜ ๋™์ž‘ํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ๋‹ค.
  • TCAV๋Š” image data์— ๋งŽ์ด ์ ์šฉ๋˜๊ณ  ์žˆ์–ด์„œ, Text data๋‚˜ Tabular data์— ๋Œ€ํ•œ application์€ ์ƒ๋Œ€์ ์œผ๋กœ ํ•œ์ •์ ์ด๋‹ค.

 

 


References


[1] Interpretable Machine Learning, Christoph Molnar

 

 

 

 

Comments