Multi-attention-based approach for deepfake face and expression swap detection and localization

EURASIP Journal on Image and Video Processing

Table 2 Quantitative results in terms of ACC (%) on the FF++ [28] dataset were obtained for four different manipulation methods, including Deepfake (DF), Face2Face (F2F), FaceSwap (FS), and NeuralTextures (NT)

Method	Input	Mask	Face swap				Expression swap
Method	Input	Mask	DF (HQ)	DF (LQ)	FS (HQ)	FS (LQ)	NT (HQ)	NT (LQ)	F2F(HQ)	F2F (LQ)
Steg.Features+SVM [54]	RGB	N	77.12	65.58	79.51	68.93	76.94	60.69	74.68	60.58
Cozzolino et al. [51]	RGB	N	81.78	68.26	85.69	73.79	80.60	62.42	85.32	62.08
Bayar and Stamm [59]	RGB	N	90.18	80.95	93.14	82.52	86.04	72.38	94.93	76.83
MesoNet [12]	RGB	N	95.26	89.52	81.24	61.17	85.95	75.74	95.84	83.56
XceptionNet [28]	RGB	N	98.85	94.88	98.23	92.17	94.50	82.11	98.23	91.56
Multi-Task [41]	RGB	Y	93.92	85.77	–	–	88.05	80.67	92.77	82.31
Sun et al. [38]	RGB	N	–	69.1	–	68.1	–	60.8	–	65.7
SSTNet [14]	RGB	N	–	95.33	–	94.09	–	–	–	90.48
SPSL [53]	FREQ	N	–	93.48	–	92.26	–	76.78	-	86.02
ADD [43]	RGB	Y	97.45	–	97.20	–	90.84	–	98.33	–
Proposed method	RGB+FREQ	Y	99.97	96.47	97.88	93.88	96.06	90.55	95.97	90.92

This table summarizes the results, with“LQ” indicating low image quality, “HQ” indicating high image quality, “RGB” representing color images, and “FREQ” indicating frequency input. The best results are highlighted in bold font, while “–” indicates unavailable results