Everyone’s experience is different but I’ve been in dozens of MLE interviews (some of which I passed!) and have never once been asked to explain the internals of an optimizer. The interviews were all post 2020, though.
Unless someone had a very good reason I would consider it weird to use anything other than AdamW. The compute you could save on a slightly better optimizer pale in comparison to the time you will spend debugging an opaque training bug.
Unless someone had a very good reason I would consider it weird to use anything other than AdamW. The compute you could save on a slightly better optimizer pale in comparison to the time you will spend debugging an opaque training bug.