From: Learning attention for object tracking with adversarial learning network
Block | Output size | Backbone |
---|---|---|
Conv1 | 125 × 125 | 7 × 7, 64, stride 2 |
Conv2_x | 63 × 63 | \( \left[\begin{array}{l}1\times 1,64\\ {}3\times 3,64\\ {}1\times 1,256\end{array}\right]\times 3 \) |
Conv3_x | 31 × 31 | \( \left[\begin{array}{l}1\times 1,128\\ {}3\times 3,128\\ {}1\times 1,512\end{array}\right]\times 4 \) |
Conv4_x | 31 × 31 | \( \left[\begin{array}{l}1\times 1,256\\ {}3\times 3,256\\ {}1\times 1,1024\end{array}\right]\times 6 \) |