> My take during that era was neural nets were considered taboo after the second AI winter of the early 90s.
I'm sure there is more detail to unpack here (more than one paragraph, either yours or mine, can do). But as written this isn't accurate.
The key thing missing from "were considered taboo ..." is by whom.
My graduate studies in neural net learning rates (1990-1995) were supported by an NSF grant, part of a larger NSF push. The NeurIPS conferences, then held in Denver, were very well-attended by a pretty broad community during these years. (Nothing like now, of course - I think it maybe drew ~300 people.) A handful of major figures in the academic statistics community would be there -- Leo Breiman of course, but also Rob Tibshirani, Art Owen, Grace Wahba (e.g., https://papers.nips.cc/paper_files/paper/1998/hash/bffc98347...).
So, not taboo. And remember, many of the people in that original tight NeurIPS community (exhibit A, Leo Breiman; or Vladimir Vapnik) were visionaries with enough sophistication to be confident that there was something actually there.
But this was very research'y. The application of ANNs to real problems was not advanced, and a lot of the people trying were tinkerers who were not in touch with what little theory there was. Many of the very good reasons NNs weren't reliably performing well are (correctly) listed in your reply starting with "At the time".
If you can't reliably get decent performance out of a method that has such patchy theoretical guidance, you'll have to look elsewhere to solve your problem. But that's not taboo, that's just pragmatic engineering consensus.
You're probably right in terms of the NN research world, but I've been staring at a wall reminiscing for a 1/2 hour and concluded... Neural networks weren’t widely used in the late 90s and early 00s in the field of computer vision.
Face detection was dominated by Viola-Jones and Haar features, facial feature detection relied on active shape and active appearance models (AAMs), with those iconic Delaunay triangles becoming the emblem of facial recognition. SVMs were used to highlight tumors, while kNNs and hand-tuned feature detectors handled tumors and lesions. Dynamic programming was used to outline CTs and MRIs of hearts, airways, and other structures, Hough transforms were used for pupil tracking, HOG features were popular for face, car, and body detectors, and Gaussian models & Hidden Markov Models were standard in speech recognition. I remember seeing a few papers attempting to stick a 3-layer NN on the outputs of AAMs with limited success.
The Yann LeCun paper felt like a breakthrough to me. It seemed biologically plausible, given what I knew of the Neocognitron and the visual cortex, and the shared weights of the kernels provided a way to build deep models beyond one or two hidden layers.
At the time, I felt like Cassandra, going from past colleagues and computer vision-based companies in the region, trying to convey to them just how much of a game changer that paper was.
I'm sure there is more detail to unpack here (more than one paragraph, either yours or mine, can do). But as written this isn't accurate.
The key thing missing from "were considered taboo ..." is by whom.
My graduate studies in neural net learning rates (1990-1995) were supported by an NSF grant, part of a larger NSF push. The NeurIPS conferences, then held in Denver, were very well-attended by a pretty broad community during these years. (Nothing like now, of course - I think it maybe drew ~300 people.) A handful of major figures in the academic statistics community would be there -- Leo Breiman of course, but also Rob Tibshirani, Art Owen, Grace Wahba (e.g., https://papers.nips.cc/paper_files/paper/1998/hash/bffc98347...).
So, not taboo. And remember, many of the people in that original tight NeurIPS community (exhibit A, Leo Breiman; or Vladimir Vapnik) were visionaries with enough sophistication to be confident that there was something actually there.
But this was very research'y. The application of ANNs to real problems was not advanced, and a lot of the people trying were tinkerers who were not in touch with what little theory there was. Many of the very good reasons NNs weren't reliably performing well are (correctly) listed in your reply starting with "At the time".
If you can't reliably get decent performance out of a method that has such patchy theoretical guidance, you'll have to look elsewhere to solve your problem. But that's not taboo, that's just pragmatic engineering consensus.