Files
Abstract
Temporal difference learning algorithms have been used successfully to train neural networks to play backgammon at human expert level. This approach has subsequently been applied to deterministic games such as chess and Go with little success, but few have attempted to apply it to other nondeterministic games. We use temporal difference learning to train neural networks for four such games: backgammon, hypergammon, pachisi, and Parcheesi. We investigate the influence of two training variables on these networks: the source of training data (learner-versus-self or learner-versus-other game play) and its structure (a simple encoding of the board layout, a set of derived board features, or a combination of both of these).We show that this approach is viable for all four games, that self-play can provide effective training data, and that the combination of raw and derived features allows for the development of stronger players.