From: First-person reading activity recognition by deep learning with synthetically generated images
Index | Module type | Output shape |
---|---|---|
1 | Input | 299×299×3 |
2 | Convolution | 149×149×32 |
3 | Convolution | 147×147×32 |
4 | Convolution | 147×147×64 |
5 | Max pooling | 73×73×64 |
6 | Convolution | 73×73×80 |
7 | Convolution | 71×71×192 |
8 | Max pooling | 35×35×192 |
9 | Inception | 35×35×256 |
10 | Inception | 35×35×256 |
11 | Inception | 35×35×256 |
12 | Inception | 17×17×736 |
13 | Inception | 17×17×768 |
14 | Inception | 17×17×768 |
15 | Inception | 17×17×768 |
16 | Inception | 17×17×768 |
17 | Inception | 8×8×1280 |
18 | Inception | 8×8×2048 |
19 | Inception | 8×8×2048 |
20 | Average pooling | 2048 |
21 | Output | 2 |