Are CNNs rotation invariant and how to cater this?

Yet another interview question

Some of the previous questions I have written about are as follows:

To answer this question, we first need to discriminate between the individual filters in the network along with the final trained network. Individual filters in a CNN are not invariant to changes in how an image is rotated.

However, a CNN as a whole can learn filters that fire when a pattern is presented at a particular orientation.

Unless your training data includes images that are rotated across the full 360-degree spectrum, your CNN is not truly rotation invariant.

The same can be said about scaling — the filters themselves are not scale-invariant, but it is highly likely that CNN has learned a set of filters that fire when patterns exist at varying scales.

We can also “help” our CNNs to be scale-invariant by presenting our example image to them at the testing time under varying scales and crops, then averaging the results together.

Translation invariance; however, is something that a CNN excels at. Keep in mind that a filter slides from left-to-right and top-to-bottom across an input, and will activate when it comes across a particular edge-like region, corner, or color blob. During the pooling operation, this large response is found and thus “beats” all its neighbors by having a larger activation. Therefore, CNNs can be seen as “not caring” exactly where an activation fires, simply that it does fire — and, in this way, we naturally handle translation inside a CNN.