Just playing with it now for German, and the second ever set of words it taught me were ‘Er’, ‘aus’, and ‘Dorf’, and it then asked me to fill in the blank:
Er ist aus unserem Dorf
…only it hadn’t taught me ‘unserem’, so I had no way of even guessing the right answer. (I just had to get it wrong intentionally.)
Is this intended behaviour or should it also have taught me ‘unserem’ first?
That is not the intended behaviour, but happens sometimes when the NLP pipeline and the dictionary don’t agree on the part-of-speech of a particular word.
I’ll have to run a script identifying all such cases in the built-in decks and most likely correct them by hand.
Happy to hear any other thoughts or issues so I can make it better :)
(Obviously) I don't fully understand your architechure, but isn't there a (relatively) simple check that it shouldn't test for a word that it has never taught?
(And on the bright side, I'm sure I'll remember unserem now!)
Well, yes, but the system filters definitions based on part of speech. For example, you don't want to show "to guide or conduct" (lead, verb) for a sentence talking about the metal lead (lead, noun).
In this case, unserem was assigned the part of speech determiner by the NLP pipeline (shoutout to https://stanfordnlp.github.io/stanza/, it's brilliant), but the dictionary only has an entry for unserem as a pronoun.
Perhaps a better design is to always show all entries, and just put the entries whose POS match the word in the sentence (if any) above the others.
If Reddit has stored historical mod actions, they could probably train an AI mod that mimics the behavior of the existing mods. Mods have much less leverage than they did 5 years ago.
Would love to hear any feedback thoughts!