We can’t even get human languages right. I can imagine walking into a Lion pen with a steak and saying “here is your lunch, eat up!” and having it converted to “I am your lunch, eat me!”
It’s paywalled but the main issue I see is in getting useful labels on data. For example, we might know that dolphins make all sorts of noises but without any information about what those noises are it will be basically impossible derive any deeper meaning.
Modern translation actually doesn't use labels much, it uses structural alignment. I'm more concerned with how to make a comparable "corpus" to human languages.
Well, there’s an implicit label, isn’t there? Translation models are shown pairs of texts in two languages, so the implicit label is “these two texts are saying the same thing”. Won’t you always need some form of supervision to learn the mapping from one ___domain to another?