> I assumed MOE was where each expert analyzed the problem in a different way
Uh sorta but not like parent described at all. You have multiple "experts" and you have a routing layer(s) that decide which expert to send it to. Usually every token is sent to at least 2. You can't just send half the tokens to one expert and half to another.
Also the "experts" are not "___domain experts" - there is not a "programming expert" and an "essay expert".
Uh sorta but not like parent described at all. You have multiple "experts" and you have a routing layer(s) that decide which expert to send it to. Usually every token is sent to at least 2. You can't just send half the tokens to one expert and half to another.
Also the "experts" are not "___domain experts" - there is not a "programming expert" and an "essay expert".