Hacker News new | past | comments | ask | show | jobs | submit login

> I wonder, how would an AI perform on the same test.

This situation is called Multi-armed Bandit. In this setup you have a number of actions at your disposal and need to maximise rewards by selecting the most efficient actions. But the results are stochastic and the player doesn't know which action is best. They need to 'spend' some time trying out various actions but then focus on those that work better. In a variant of this problem, the rewards associated to actions are also changing in time. It's a very well studied problem, a form of simple reinforcement learning.




If the rewards are changing, then isnt it a moving target problem?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: