From: Erratum to: Refining deep convolutional features for improving fine-grained image recognition
Methods | Train phase | Test phase | Dim. | Model | Acc. | DPD |
---|---|---|---|---|---|---|
 |  | Dataset: cub |  |  |  |  |
Part-Stacked CNN [1] | BBox + Parts | BBox | 4,096 | Part-Stacked CNN | 76.2% | 1.484 |
Deep LAC [2] | BBox + Parts | BBox | 12,288 | Alex-Net | 80.3% | 0.521 |
PN-CNN [3] | BBox + Parts | n/a | 13,512 | Alex-Net | 85.4% | 0.506 |
PG-Alignment [4] | BBox | n/a | 126,976 | VGG-19 | 82.8% | 0.052 |
Symbolic [5] | BBox | BBox | 20,992 | Shallow feature: SIFT | 59.4% | 0.226 |
Cross layer pooling[6] | BBox | BBox | 4,096 | Alex-Net | 73.5% | 1.436 |
Mask-CNN [12] | Parts | n/a | 8,192 | VGG-16 + FCN | 85.4% | 0.834 |
Spatial Transformer CNN [33] | n/a | n/a | 4,096 | ST-CNN | 84.1% | 1.643 |
Bilinear CNN [8] | n/a | n/a | 262,144 | VGG-16 + VGG-M | 84.1% | 0.026 |
Compact Bilinear CNN [25] | n/a | n/a | 8,192 | VGG-16 | 84.0% | 0.820 |
PD + SWFV [14] | n/a | n/a | 69,632 | VGG-16 | 84.5% | 0.097 |
SCDA [13] | n/a | n/a | 4,096 | VGG-16 | 80.5% | 1.572 |
Ours | n/a | n/a | 69,992 | VGG-16 | 86.4% | 0.099 |
Ours (Compact vector) | n/a | n/a | 4,096 | VGG-16 | 84.5% | 1.650 |
 |  | Dataset: air |  |  |  |  |
Symbolic [5] | BBox | BBox | 20,992 | Shallow feature: SIFT | 72.5% | 0.276 |
Re-Fisher Vector [34] | n/a | n/a | 655,360 | Shallow feature: SIFT | 81.5% | 0.001 |
Bilinear CNN [8] | n/a | n/a | 262,144 | VGG-16 + VGG-M | 83.9% | 0.0256 |
Ours (Full Vector + MI 2) | n/a | n/a | 69,992 | VGG-16 | 87.7% | 0.100 |
Ours (Compact vector) | n/a | n/a | 4,096 | VGG-16 | 82.5% | 1.611 |
 |  | Dataset: cars |  |  |  |  |
Symbolic [5] | BBox | BBox | 20,992 | Shallow feature: SIFT | 78.0% | 0.297 |
PG-Alignment [4] | BBox | n/a | 126,976 | VGG-19 | 92.6% | 0.058 |
Re-Fisher Vector [34] | n/a | n/a | 655,360 | Shallow feature: SIFT | 82.7% | 0.011 |
Bilinear CNN [8] | n/a | n/a | 262,144 | VGG-16 + VGG-M | 91.3% | 0.028 |
Ours | n/a | n/a | 69,992 | VGG-16 | 92.4% | 0.106 |
Ours (Compact vector) | n/a | n/a | 4,096 | VGG-16 | 87.5% | 1.709 |
 |  | Dataset: dogs |  |  |  |  |
Symbolic [5] | BBox | BBox | 20,992 | Shallow feature: SIFT | 45.6% | 0.174 |
Selective Pooling [35] | BBox | BBox | 163,840 | Shallow feature: SIFT | 52.0% | 0.025 |
Re-Fisher Vector [34] | n/a | n/a | 327,680 | Shallow feature: SIFT | 52.9% | 0.013 |
NAC[36] | n/a | n/a | 4,096 | Alex-Net | 68.6% | 1.340 |
PD + SWFV [14] | n/a | n/a | 36,864 | Alex-Net | 71.9% | 0.156 |
Ours | n/a | n/a | 40,000 | Alex-Net | 72.6% | 0.145 |
Ours (Compact vector) | n/a | n/a | 4,096 | Alex-Net | 68.4% | 1.335 |