Skip to main content

Table 2 GoogLeNet(v3) architecture

From: First-person reading activity recognition by deep learning with synthetically generated images

Index

Module type

Output shape

1

Input

299×299×3

2

Convolution

149×149×32

3

Convolution

147×147×32

4

Convolution

147×147×64

5

Max pooling

73×73×64

6

Convolution

73×73×80

7

Convolution

71×71×192

8

Max pooling

35×35×192

9

Inception

35×35×256

10

Inception

35×35×256

11

Inception

35×35×256

12

Inception

17×17×736

13

Inception

17×17×768

14

Inception

17×17×768

15

Inception

17×17×768

16

Inception

17×17×768

17

Inception

8×8×1280

18

Inception

8×8×2048

19

Inception

8×8×2048

20

Average pooling

2048

21

Output

2