The problem that I see with supervised training of a linear classifier after uns...

The problem that I see with supervised training of a linear classifier after unsupervised training is that if the unsupervised network is large enough, it allows the supervised trainer to choose the working components. As shown in [1] that can lead to randomly initialized networks working well, too, meaning that this does not necessarily show that the unsupervised training produced useful features.

I would instead suggest to train a categorization classifier unsupervised, too, for example using mutual information loss with the correct number of categories, as suggested in [2]. Afterwards, one can then deduct the mapping between the categories learnt unsupervised and the groundtruth categories to allow evaluation. That way, good results clearly prove a good unsupervised training method.

The problem that I mean in the second part was that most networks trained for object recognition work on low-level features such as colors and textures, as shown in [3]. The turtle clearly has a turtle shape and arrangement and looks overwhelmingly like a turtle to humans. But its high-frequency surface details are those that the neural network associates with a rifle, which is why those networks are fooled even on photos from varying perspectives.

Training a network with a loss to ensure that the local area of an image produces features that are highly correlated to the global features of the same image does not avoid this problem, because the high-frequency patters that the AI erroneously uses for detection are present both in the local as well as in the global scale. Sadly, I don't have any idea on how to improve that either.

[1] What's Hidden in a Randomly Weighted Neural Network? https://arxiv.org/abs/1911.13299

[2] Invariant Information Clustering for Unsupervised Image Classification and Segmentation https://arxiv.org/abs/1807.06653

[3] Synthesizing Robust Adversarial Examples https://arxiv.org/abs/1707.07397