Skip to main content

Table 2 GoogLeNet(v3) architecture

From: First-person reading activity recognition by deep learning with synthetically generated images

Index Module type Output shape
1 Input 299×299×3
2 Convolution 149×149×32
3 Convolution 147×147×32
4 Convolution 147×147×64
5 Max pooling 73×73×64
6 Convolution 73×73×80
7 Convolution 71×71×192
8 Max pooling 35×35×192
9 Inception 35×35×256
10 Inception 35×35×256
11 Inception 35×35×256
12 Inception 17×17×736
13 Inception 17×17×768
14 Inception 17×17×768
15 Inception 17×17×768
16 Inception 17×17×768
17 Inception 8×8×1280
18 Inception 8×8×2048
19 Inception 8×8×2048
20 Average pooling 2048
21 Output 2