Hello Potato World

[ํฌํ…Œ์ดํ†  ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning ๋ณธ๋ฌธ

Paper Review๐Ÿฅ”/Few-shot, Zero-shot

[ํฌํ…Œ์ดํ†  ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning

Heosuab 2023. 1. 8. 18:02

โ‹† ๏ฝก หš โ˜๏ธŽ หš ๏ฝก โ‹† ๏ฝก หš โ˜ฝ หš ๏ฝก โ‹† 

[Zero-shot learning paper review]

 

 

 


 Compositional Zero-shot Learning (CZSL)


  ์ธ๊ฐ„์€ ์ด๋ฏธ ์•Œ๊ณ  ์žˆ๋Š” ๊ฐœ์ฒด์— ๋Œ€ํ•œ ์ •๋ณด๋“ค์„ ์กฐํ•ฉํ•˜๊ณ  ๊ตฌ์„ฑํ•˜์—ฌ ์ƒˆ๋กœ์šด ๊ฐœ์ฒด์— ์ผ๋ฐ˜ํ™”ํ•˜๋Š” ๋Šฅ๋ ฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๋‹ค์‹œ ๋งํ•ด์„œ, ์ด๋ฏธ ์•Œ๊ณ  ์žˆ๋Š” "whole apple"๊ณผ "sliced banana"์˜ ์ •๋ณด๋ฅผ ์กฐํ•ฉํ•ด์„œ ์ƒˆ๋กœ์šด ๊ฐœ์ฒด์ธ "sliced apple" ๋˜๋Š” "whole banana"๋ฅผ ์ƒ๊ฐํ•ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค. ์ด ๋Šฅ๋ ฅ์„ AI ์‹œ์Šคํ…œ์—์„œ ๋ชจ๋ฐฉํ•˜๊ณ ์ž ํ•˜๋Š” task๋ฅผ Compositional Zero-shot  Learning (CZSL)์ด๋ผ๊ณ  ํ•œ๋‹ค.

[figure 01] Compositional Zero-shot

  CZSL์—์„œ๋Š” ๊ฐ๊ฐ์˜ ๊ฐœ์ฒด(composition)์„ ๋‘ ๊ฐ€์ง€์˜ ๊ตฌ์„ฑ ์š”์†Œ, state์™€ object๋กœ ๋ถ„ํ•ดํ•œ๋‹ค. "whole", "sliced"์ฒ˜๋Ÿผ ๊ฐœ์ฒด์˜ ์ƒํƒœ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ์š”์†Œ๋Š” state, "apple", "banana"์ฒ˜๋Ÿผ ๊ฐœ์ฒด์˜ ํ˜•ํƒœ์™€ ์ข…๋ฅ˜๋ฅผ ๊ตฌ๋ถ„์ง“๋Š” ์š”์†Œ๋Š” object๋กœ ์ •์˜๋œ๋‹ค. CZSL์˜ ๋ชฉํ‘œ๋Š” Training data์™€ test data๊ฐ€ ๊ณตํ†ต ์›์†Œ๋ฅผ ๊ฐ€์ง€์ง€ ์•Š๊ณ  ๋ถ„๋ฆฌ๋˜์–ด ์žˆ๋‹ค๋Š” ๊ฐ€์ • ํ•˜์—, ์ƒˆ๋กœ์šด(unseen) test composition์„ ์‹๋ณ„ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. 

  State์™€ object๋ฅผ ๋ชจ๋‘ ์‹๋ณ„ํ•˜๊ธฐ ์œ„ํ•ด CZSL์—์„œ ์‚ฌ์šฉํ•˜๋˜ ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š”,

  • ๋‘ ๊ฐ€์ง€์˜ classifier๋ฅผ ๋”ฐ๋กœ ๋‘์–ด state์™€ object๋ฅผ ๋”ฐ๋กœ ํ•™์Šตํ•˜๋Š” ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด ๋ฐฉ๋ฒ•์€ state-object ์‚ฌ์ด์˜ ์ƒํ˜ธ ์ž‘์šฉ ๋˜๋Š” entanglement๋ฅผ ๋ฌด์‹œํ•˜๊ฒŒ ๋œ๋‹ค.
  • ๋˜ ๋ชจ๋“  composition๊ณผ visual feature๋“ค์ด ํ•œ๋ฒˆ์— ํˆฌ์˜๋  ์ˆ˜ ์žˆ๋Š” ๊ณตํ†ต embedding space๋ฅผ ํ•™์Šตํ•˜์—ฌ embedding ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ๋Š”๋ฐ, training๊ณผ test composition ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ๋ฌด์‹œํ•˜์—ฌ ๋น„์Šทํ•œ ๊ฐœ์ฒด๋“ค์„ ํ˜ผ๋™ํ•  ์ˆ˜ ์žˆ๋‹ค. (e.g., young cat and young tiger

  ๋”ฐ๋ผ์„œ ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” state์™€ object ๊ฐ๊ฐ์˜ ๊ตฌ๋ถ„์ ์ธ prototype๋ฅผ ํ™œ์šฉํ•˜์ง€๋งŒ ๋‘˜ ์‚ฌ์ด์˜ joint representation๋„ ํ•จ๊ป˜ ํ•™์Šตํ•˜๋Š” Siamese Contrastive Embedding Network (SCEN)์„ ์ œ์•ˆํ•œ๋‹ค.

 

 

 


 Siamese Contrastive Embedding Network (SCEN)


  States์˜ ์ง‘ํ•ฉ์„ A, objects์˜ ์ง‘ํ•ฉ์„ O๋ผ๊ณ  ํ•˜๋ฉด, state-object์˜ ์Œ์œผ๋กœ ๊ตฌ์„ฑ๋˜๋Š” components์˜ ์ง‘ํ•ฉ C๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ,

C=Aร—O={(a,o)โˆฃaโˆˆA,oโˆˆO}

  Training dataset์—์„œ์˜ images ์ง‘ํ•ฉ์„ Is, ๊ทธ์— ๋Œ€์‘ํ•˜๋Š” component๋ฅผ Cs (CsโŠ‚C) ๋ผ๊ณ  ํ•˜๋ฉด, image-component์˜ ์Œ์œผ๋กœ ๊ตฌ์„ฑ๋˜๋Š” training dataset Dtr์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„๋œ๋‹ค.

Dtr={((i,c)โˆฃiโˆˆIs,cโˆˆCs}

  CZSL task์˜ ์ •์˜์— ๋”ฐ๋ผ training data์™€ test data๋Š” ๊ณตํ†ต ์›์†Œ๋ฅผ ๊ฐ€์ง€์ง€ ์•Š์•„์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, training data์˜ composition์„ Cs, test data์˜ composition์„ Cu๋ผ๊ณ  ํ•œ๋‹ค๋ฉด CsโˆฉCu=โˆ…์„ ๋งŒ์กฑํ•ด์•ผ ํ•œ๋‹ค. ๋˜ํ•œ ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€๋ฅผ seen, unseen composition ์ค‘์—์„œ ์˜ˆ์ธกํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, {Is,Cs}๋กœ ํ•™์Šต๋œ mapping function Iโ†’CsโˆชCu๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. 

 

 

[figure 02] SCEN framework

  SCEN ๊ตฌ์กฐ๋ฅผ ์„ธ ๊ฐ€์ง€ ๋ชจ๋“ˆ๋กœ ์š”์•ฝํ•˜๋ฉด,

  1. Encoding : states์™€ objects ๊ฐ๊ฐ์˜ encoding
  2. Contrastive learning : state/object ๊ฐ๊ฐ์˜ contrastive space์—์„œ์˜ prototypes ์ถ”์ถœ
  3. Augmentation : State Transition Module(STM)์„ ํ†ตํ•ด ๊ฐ€์ƒ์˜ composition ์ƒ์„ฑ

 

- Module 1. Encoding

  ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€๊ฐ€ feature extractor FC๋ฅผ ํ†ต๊ณผํ•ด์„œ ์–ป์€ visual feature x๋Š”, state/object ๊ตฌ์„ฑ ์š”์†Œ๋กœ ๋ถ„ํ•ดํ•˜๊ธฐ ์œ„ํ•ด ๋‘ ๊ฐœ์˜ embedding์œผ๋กœ ์ธ์ฝ”๋”ฉ๋œ๋‹ค. State-specific Encoder Es๋Š” state๋ฅผ ์ž˜ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด, Object-specific Encoder Eo๋Š” object๋ฅผ ์ž˜ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ํ•™์Šต๋œ๋‹ค. 

hs=Es(x)

ho=Eo(x)

  State-object์˜ ๋‹ค์–‘ํ•œ ์กฐํ•ฉ์„ ํ†ตํ•ด ์—ฌ๋Ÿฌ composition์„ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋Ÿฌํ•œ joint representation์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด ์„ธ ๊ฐ€์ง€์˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์ •์˜ํ•œ๋‹ค.

  • ๊ณ ์ •๋œ state์— ๋‹ค์–‘ํ•œ objects๋ฅผ ์กฐํ•ฉํ•˜๋Š” State-constant database Ds
  • ๊ณ ์ •๋œ object์— ๋‹ค์–‘ํ•œ states๋ฅผ ์กฐํ•ฉํ•˜๋Š” Object-constant database Do
  • ๋‹ค์–‘ํ•œ objects-states ์กฐํ•ฉ ์ค‘์—์„œ input instance์™€ ๊ด€๋ จ์ด ์—†๋Š” Irrelevant database Dir

state a^์™€ object o^๋กœ ์ด๋ฃจ์–ด์ง„ input instance x=(a^,o^)โˆˆIs๊ฐ€ ์ž…๋ ฅ๋˜๋ฉด, ์„ธ ๊ฐ€์ง€์˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌ์„ฑ๋œ๋‹ค.

Ds={(a,o)โˆฃa=a^,(a,o)โˆˆCs}

Do={(a,o)โˆฃo=o^,(a,o)โˆˆCs}

Dir={(a,o)โˆฃaโ‰ a^,oโ‰ o^,(a,o)โˆˆCs}

 

 

- Module 2. Contrastive learning

  Es,Eo๋ฅผ ํ†ตํ•ด ๋‘ ๊ฐœ์˜ ๋…๋ฆฝ๋œ embedding space (Siamese contrastive space)๋กœ ํˆฌ์˜๋œ hs,ho๋Š” contrastive learning์„ ํ†ตํ•ด state์™€ object ๊ฐ๊ฐ์„ ๊ฐ€์žฅ ์ž˜ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” prototype์œผ๋กœ ํ•™์Šต๋œ๋‹ค. ํ•˜์ง€๋งŒ ๊ธฐ์กด์˜ contrastive loss๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ state์™€ object๋ฅผ ๋”ฐ๋กœ ํ•™์Šตํ•˜๋ฉด state-object interaction์ด ๋ฌด์‹œ๋˜๊ธฐ ๋•Œ๋ฌธ์—, ์ •์˜๋œ ์„ธ ๊ฐ€์ง€์˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ loss๋ฅผ ์ƒˆ๋กœ ์ •์˜ํ•œ๋‹ค.


  • State-based contrastive loss Lscl

      input x์˜ state encoding hs๊ฐ€ state-based contrastive space์˜ anchor๋กœ ์„ค์ •๋œ๋‹ค. input x์™€ ๋™์ผํ•œ state๋ฅผ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค Ds๋กœ๋ถ€ํ„ฐ positive sample hsss์„ ์ถ”์ถœํ•˜๋ฉฐ, ๋™์ผํ•˜์ง€ ์•Š์€ state๋ฅผ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค Dir๋กœ๋ถ€ํ„ฐ k๊ฐœ์˜ negative samples {hs1ir,...,hskir}๋ฅผ ์ถ”์ถœํ•œ๋‹ค.
      anchor์™€ positive ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋Š” ๊ฐ€๊นŒ์›Œ์ง€๋„๋ก, anchor์™€ negative ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋Š” ๋ฉ€์–ด์ง€๋„๋ก ํ•™์Šตํ•˜๋Š” contrastive loss๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋œ๋‹ค. (ฯ„s>0 : temperature parameter)
    Lscl=โˆ’logexp((hs)โŠคhsss/ฯ„s)exp((hs)โŠคhsss/ฯ„s)+โˆ‘i=1Kexp((hs)โŠคhsiir/ฯ„s)

  • Object-based contrastive loss Locl

      object-based contrastive space์˜ anchor๋Š” ho๋กœ ์„ค์ •๋œ๋‹ค. ๋™์ผํ•œ object๋ฅผ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค Do๋กœ๋ถ€ํ„ฐ positive sample hoos๋ฅผ ์ถ”์ถœํ•˜๋ฉฐ, ๋™์ผํ•˜์ง€ ์•Š์€ object๋ฅผ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค Dir๋กœ๋ถ€ํ„ฐ k๊ฐœ์˜ negative samples {ho1,...,hokir}๋ฅผ ์ถ”์ถœํ•œ๋‹ค.
      state-object interaction์„ ๊ณ ๋ คํ•˜๊ธฐ ์œ„ํ•ด Dir๋กœ๋ถ€ํ„ฐ ์ถ”์ถœํ•˜๋Š” negative samples๋Š” ๋‘ ๊ฐ€์ง€์˜ loss์—์„œ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. (ฯ„o>0 : temperature parameter)
    Locl=โˆ’logexp((ho)โŠคhoos/ฯ„o)exp((ho)โŠคhoos/ฯ„o)+โˆ‘j=1Kexp((ho)โŠคhojir/ฯ„o)

  • Classification Loss Lcls

      Classifier๊ฐ€ state์™€ object ๊ฐ๊ฐ์˜ prototype์„ ํ†ตํ•ด ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์žˆ๋„๋ก, ๋‘ ๊ณต๊ฐ„์—์„œ์˜ classification loss๋ฅผ ๋…๋ฆฝ์ ์œผ๋กœ ๊ณ„์‚ฐํ•œ๋‹ค. Ca ์™€ Co๋ฅผ state์™€ object ๊ฐ๊ฐ์— ๋Œ€ํ•œ classification์„ ํ•˜๋Š” fully connected layers๋ผํ•  ๋•Œ, ์ „์ฒด classification loss๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
    L=Ca(hs,a)+Co(ho,o)

 ์œ„์—์„œ ์ •์˜ํ•œ ์„ธ ๊ฐ€์ง€์˜ loss๋ฅผ ํ†ตํ•ด  Siamese Contrastive Space์˜ ์ „์ฒด loss Lcts๊ฐ€ ์ •์˜๋œ๋‹ค.

Lcts=Lscl+Locl+Lcls

 

 

- Module 3. Augmentation

  Training data์— ๋“ฑ์žฅํ•˜์ง€ ์•Š๋Š” unseen composition์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋†’์ž„์œผ๋กœ์จ training๊ณผ test ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด, ๊ฐ€์ƒ์˜ composition์„ ์ƒ์„ฑํ•˜๋Š” State Transition Module (STM) ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. STM์€ Training data์— ๋‘ ๊ฐ€์ง€์˜ composition, "sliced apple", "red fox"๊ฐ€ ์žˆ๋‹ค๊ณ  ํ–ˆ์„ ๋•Œ, ๋ฐ์ดํ„ฐ์—๋Š” ์—†์ง€๋งŒ ์‹ค์ œ๋กœ ์กด์žฌํ•  ๋ฒ•ํ•œ "red apple"์„ ์ƒ์„ฑํ•˜๋˜, ๋ฐ์ดํ„ฐ์—๋„ ์—†๊ณ  ์‹ค์ œ๋กœ๋„ ์กด์žฌํ•˜์ง€ ์•Š๋Š” "sliced fox"๋Š” ๊ตฌ๋ณ„ํ•ด๋‚ด๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. 

[figure 03] STM framework

  1.   ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ object์™€ ๋‹ค๋ฅธ ์ด๋ฏธ์ง€๋“ค์˜ ๋‹ค์–‘ํ•œ state๋ฅผ ์กฐํ•ฉํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ๊ฐ์˜ prototype์„ ์–ป๋Š”๋‹ค. Object-specific encoder๋ฅผ ํ†ตํ•ด input x์˜ object prototype ho๋ฅผ ์ถ”์ถœํ•˜๊ณ , state-specific encoder๋ฅผ ํ†ตํ•ด ๋‹ค๋ฅธ ์ƒ˜ํ”Œ {s1,s2,...,sn}์˜ state prototype hs~={hs1,hs2,...,hsn}๋ฅผ ์ถ”์ถœํ•œ๋‹ค. 

  2. Generator G ๋Š” ์ถ”์ถœํ•œ prototype์„ ์กฐํ•ฉํ•˜์—ฌ ๊ฐ€์ƒ์˜ composition์„ ๋งŒ๋“ ๋‹ค. 
    G(hs~,ho)=x^s~,o

  3. Discriminator D ๋Š” ์ƒ์„ฑ๋œ ๊ฐ€์ƒ์˜ composition ๋‚ด์—์„œ ์‹ค์ œ๋กœ ์กด์žฌํ•˜์ง€ ์•Š์„๋งŒํ•œ ๋ฐ์ดํ„ฐ (irrational composition)์„ ํŒ๋ณ„ํ•œ๋‹ค. 
    maxDminG,Es,EoV(G,D)=Es,o(logD(xa,o))+Ehs~,ho(log(1โˆ’D(G(hs~,ho))))

  4. ์ƒˆ๋กœ ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ๋กœ Es ์™€ Eo ์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด์ง€๋งŒ, ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋Š” label์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— re-encode ๊ณผ์ •์„ ๊ฑฐ์นœ๋‹ค. ๋‹ค์‹œ ์ธ์ฝ”๋”ฉ ๊ณผ์ •์„ ๊ฑฐ์ณ์„œ ์ถ”์ถœ๋œ state/object prototype์„ ํ†ตํ•ด ํ•™์Šตํ•˜๋Š” re-classification loss๋ฅผ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค.
    Lclsre=Ca(Es(G(hs~,ho)),a~)+Co(Eo(G(hs~,ho)),o)

์œ„์—์„œ ์ •์˜ํ•œ ๋‘ ๊ฐ€์ง€์˜ loss๋ฅผ ํ†ตํ•ด State Transition Module (STM)์˜ ์ „์ฒด loss Lstm๊ฐ€ ์ •์˜๋œ๋‹ค.

Lstm=maxDminG,Es,EoV(G,D)+Lclsre

 

๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ SCEN framework์˜ final loss๋Š” Lcts,Lstm์˜ weighted sum์œผ๋กœ ์ •์˜๋œ๋‹ค.

Ltotal=ฮฑLcts+ฮฒLstm

 

 

 

 


 Results


  CZSL์˜ ๋Œ€ํ‘œ์ ์ธ benchmark dataset ์„ธ ๊ฐ€์ง€์—์„œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ธ๋‹ค : MIT-States, UT-Zappos, C-GQA

[table 01] SCEN in MIT-States, UT-Zappos results

  MIT-States ๋ฐ์ดํ„ฐ์…‹์—์„œ test AUC score๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ–ˆ์„ ๋•Œ ๊ธฐ์กด์˜ SOTA์ธ 5.1%๋ฅผ ๋›ฐ์–ด๋„˜๋Š” 5.3% (+0.2%)๋ฅผ ๊ธฐ๋กํ–ˆ์œผ๋ฉฐ, Harmonic Mean(HM) score๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ–ˆ์„ ๋•Œ 18.4% (+1.2%)๋ฅผ ๊ธฐ๋กํ–ˆ๋‹ค. State์™€ object ๊ฐ๊ฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก์˜ accuracy๋กœ ๋ณด์•„๋„, 28.2% (+0.3%)์™€ 32.2% (+0.4%)์˜ ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค. ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ UT-Zappos์—์„œ๋„ SOTA performance๋ฅผ ๊ธฐ๋กํ•˜์˜€๋‹ค. 

[table 02] SCEN in C-GQA results

  ๊ฐ€์žฅ ์ตœ๊ทผ ๋ฐœํ‘œ๋œ C-GQA ๋ฐ์ดํ„ฐ์…‹์—์„œ๋„ AUC, HM, state/object accuracy ๋ชจ๋‘ ํ–ฅ์ƒ๋œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ธ๋‹ค.

 

 

 


References


[1] Li, Xiangyu, et al. "Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

Comments