It was picked because it is difficult to train a reinforcement learning model to...

It was picked because it is difficult to train a reinforcement learning model to play it well. In most other games you can create a reward function based on the score or something similar, and then the AI can explore possible actions that gives the best score. In those cases AI players are already doing quite well. In this case finding the key requires long term planning to get an actual reward and the AI has previously got stuck before that.