Multi-attention-based approach for deepfake face and expression swap detection and localization

EURASIP Journal on Image and Video Processing

Table 3 Comparison of AUC (%) for cross-dataset evaluation on CelebDF [8] and DFDC-P [29], including results of other methods cited from [11, 24, 37, 38, 60, 61]

Method	FF++	CelebDF	DFDC-P
Two-stream [30]	70.10	53.80	61.4
MesoNET [12]	84.70	54.80	75.3
HeadPose [19]	47.3	54.6	55.9
VA-MLP [18]	66.4	55.0	61.9
FWA [57]	80.10	56.90	72.7
Xception-raw [28]	99.70	48.20	49.9
Xception-C23 [28]	99.70	65.30	72.2
Xception-C40 [28]	95.50	65.50	69.7
Capsule [10]	96.60	57.50	53.3
Multi-task [41]	76.30	54.30	53.6
Two-branch [11]	93.18	73.41	64.0
\(F^3\)Net [16]	98.10	65.17	70.1
EfficientNet [33]	99.70	64.29	70.12
Sun et al. [38]	99.3	64	69
GocNet [17]	97.55	67.43	–
ADD [43]	91.71	66.48	–
CViT [39]	91.08	63.60	67.3
MaDD [24]	99.80	67.44	67.1
FakePoI [37]	94.7	61.2	72.5
Proposed method	97.78	68.25	79.10

Bold values indicate the best performace against the specfic dataset in each column