In general, scanning/lexing uses FSMs, parsing is done with push down automata (PDAs) (you need a stack to parse, you don’t really need a stack to scan).
You are making an example of my point? When parsing something like a comma separated list I generally don't get carried away with anything more with a switch/case state machine.
Even if it has commas in quotes, it's a comma seen in the state 'collecting text'. There is no need for anything more complicated than an FSM. Maybe there is in leetcode. One character of pushback is not a 'stack' unless you are answering a Google interviewer.
In computer science, parsing is distinguished from scanning via nesting. Or you can think of it as languages that you can recognize with FSMs (scanning) like regular expressions, and languages you can recognize with PDAs (parsing) like CFGs.
This would never come up in a Google interview and really only comes up in CS 101 (or whatever your program's first CS theory course is) when you study computational models , languages and automata for the first time. Anyone who doesn't go on to do compilers really doesn't care much about the distinction.