Skip to main content
Fig. 1 | EURASIP Journal on Image and Video Processing

Fig. 1

From: Multimodal few-shot classification without attribute embedding

Fig. 1

The proposed model for multimodal few-shot learning with losses between components shown. Reconstruction loss is enforced on the input \(x_i\) and reconstructed features \(x_i'\). Semantic loss is enforced on the output of encoder \(E_s\) to ensure that it is as close as possible to the ground-truth attributes. To ensure cyclic consistency, two different losses are used for each modality: the cosine similarity loss between \(z_s\) and \(z_s'\) for the semantics, and L2 loss between \(z_v\) and \(z_v'\) for the visual

Back to article page