Everyday life often requires arbitrating between pursuing an ongoing action plan by possibly adjusting it versus exploring a new action plan instead. Resolving this so-called exploitation-exploration dilemma involves the medial prefrontal cortex (mPFC). Using human intracranial electrophysiological recordings, we discovered that neural activity in the ventral mPFC infers and tracks the reliability of the ongoing plan to proactively encode upcoming action outcomes as either learning signals or potential triggers to explore new plans. By contrast, the dorsal mPFC exhibits neural responses to action outcomes, which results in either improving or abandoning the ongoing plan. Thus, the mPFC resolves the exploitation-exploration dilemma through a two-stage, predictive coding process: a proactive ventromedial stage that constructs the functional signification of upcoming action outcomes and a reactive dorsomedial stage that guides behavior in response to action outcomes.
Domenech P, Rheims S, Koechlin E. Science. 2020 Aug 28