Hippocampal forward sweeps and the balance of goal-directed and habitual controllers: a Bayesian approach.

Pezzulo, Giovanni; Francesco, Rigoli; Donnarumma, Francesco; Chersi, Fabian

doi:10.3389/conf.fnins.2012.86.00014

How do animals select their actions in uncertain and volatile environments? Converging evidence indicates that there are multiple mechanisms of choice, habitual, goal-directed, and Pavlovian, which compete over time. Researchers are using multiple techniques --animal and human experiments, computational modeling-- to understand how these multiple controllers compete, in which situations one controller dominates over the other, and what are the neural underpinnings of this multifaceted architecture of decision-making. To organize evidence on goal-directed animal foraging and decision-making, we take a normative approach, and cast action selection as the solution of an exploration-exploitation dilemma. Here "exploitation" means choosing action on the basis of the already available information, and "exploration" means performing lookahead predictions to access expectancies and associated reward predictions, so as to improve the quality of choice (note that these "mental explorations" are not overt exploratory actions). Our framework combines model-based and model-free methods of reinforcement learning. In this framework goal-directed and habitual processes rely on partially overlapping computational mechanisms and neural substrate (contrary to the view that they are separate behavioral controllers). The choice of using lookahead predictions depends on the level of confidence -a computation that considers (an optimal combination of) multiple factors, such as value and variance of alternative actions, and volatility of the environment. Habituation makes this process unnecessary. We use this framework to interpret neurophisiological data, and in particular evidence on a neural circuit including (rats) hippocampus (for forward sweeps), ventral striatum and orbitofrontal cortex (for different aspects of value learning).