Skip to main content

Table 5 Comparison of performance of our methods with some recent state-of-the-arts methods in cub. BBox, Parts denote bounding-box and parts annotation respectively

From: Erratum to: Refining deep convolutional features for improving fine-grained image recognition

Methods

Train phase

Test phase

Dim.

Model

Acc.

DPD

  

Dataset: cub

    

Part-Stacked CNN [1]

BBox + Parts

BBox

4,096

Part-Stacked CNN

76.2%

1.484

Deep LAC [2]

BBox + Parts

BBox

12,288

Alex-Net

80.3%

0.521

PN-CNN [3]

BBox + Parts

n/a

13,512

Alex-Net

85.4%

0.506

PG-Alignment [4]

BBox

n/a

126,976

VGG-19

82.8%

0.052

Symbolic [5]

BBox

BBox

20,992

Shallow feature: SIFT

59.4%

0.226

Cross layer pooling[6]

BBox

BBox

4,096

Alex-Net

73.5%

1.436

Mask-CNN [12]

Parts

n/a

8,192

VGG-16 + FCN

85.4%

0.834

Spatial Transformer CNN [33]

n/a

n/a

4,096

ST-CNN

84.1%

1.643

Bilinear CNN [8]

n/a

n/a

262,144

VGG-16 + VGG-M

84.1%

0.026

Compact Bilinear CNN [25]

n/a

n/a

8,192

VGG-16

84.0%

0.820

PD + SWFV [14]

n/a

n/a

69,632

VGG-16

84.5%

0.097

SCDA [13]

n/a

n/a

4,096

VGG-16

80.5%

1.572

Ours

n/a

n/a

69,992

VGG-16

86.4%

0.099

Ours (Compact vector)

n/a

n/a

4,096

VGG-16

84.5%

1.650

  

Dataset: air

    

Symbolic [5]

BBox

BBox

20,992

Shallow feature: SIFT

72.5%

0.276

Re-Fisher Vector [34]

n/a

n/a

655,360

Shallow feature: SIFT

81.5%

0.001

Bilinear CNN [8]

n/a

n/a

262,144

VGG-16 + VGG-M

83.9%

0.0256

Ours (Full Vector + MI 2)

n/a

n/a

69,992

VGG-16

87.7%

0.100

Ours (Compact vector)

n/a

n/a

4,096

VGG-16

82.5%

1.611

  

Dataset: cars

    

Symbolic [5]

BBox

BBox

20,992

Shallow feature: SIFT

78.0%

0.297

PG-Alignment [4]

BBox

n/a

126,976

VGG-19

92.6%

0.058

Re-Fisher Vector [34]

n/a

n/a

655,360

Shallow feature: SIFT

82.7%

0.011

Bilinear CNN [8]

n/a

n/a

262,144

VGG-16 + VGG-M

91.3%

0.028

Ours

n/a

n/a

69,992

VGG-16

92.4%

0.106

Ours (Compact vector)

n/a

n/a

4,096

VGG-16

87.5%

1.709

  

Dataset: dogs

    

Symbolic [5]

BBox

BBox

20,992

Shallow feature: SIFT

45.6%

0.174

Selective Pooling [35]

BBox

BBox

163,840

Shallow feature: SIFT

52.0%

0.025

Re-Fisher Vector [34]

n/a

n/a

327,680

Shallow feature: SIFT

52.9%

0.013

NAC[36]

n/a

n/a

4,096

Alex-Net

68.6%

1.340

PD + SWFV [14]

n/a

n/a

36,864

Alex-Net

71.9%

0.156

Ours

n/a

n/a

40,000

Alex-Net

72.6%

0.145

Ours (Compact vector)

n/a

n/a

4,096

Alex-Net

68.4%

1.335

  1. The 'n/a' entries in the table means that bounding box or part annotation is not used