Seek reward, avoid punishment – it’s a simple, well-established concept of learning long-known to humans. But new research puts a spin on that concept: rewards can become less appealing and punishments more enduring if the learning took place under even subtle degrees of cognitive conflict.

Researchers at the University of New Mexico and Brown University tested the concept in a new study by adding more conflict to some trials while holding rewards and punishments constant. People showed subtle biases to prefer the trials that they had previously learned without conflict.

The study, titled “Conflict acts as an implicit cost in reinforcement learning” and published recently in Nature Communications, was led by UNM Department of Psychology Assistant Professor James Cavanagh, and Brown University Associate Professor Michael Frank.

The study includes new research involving the frontal cortex and the striatum. Both key areas of the brain involved in reinforcement learning. The frontal cortex facilitates stimuli from sensory regions and is important in behavioral responses to stimuli, both external and internal. The striatum, which is located in cerebrum and striped with layers of gray and white matter, plays a pivotal role in learning and motor control.

“One of them occurs when you know how to do things, and the other when you have to learn how to do things.” - James Cavanagh

The relationship between conflict and reinforcement learning suggests that the circuits in the frontal cortex that calculate the degree of conflict, effort and difficulty of actions are integrated with dopamine-driven circuits in the striata that reinforce the perceptions of reward and punishment.

“This study had a rather simple premise,” said Cavanagh. “If we cause a really minor level of response conflict and give you rewards right after, will your brain learn to down-value the rewards that occurred following conflict as opposed to following more benign circumstances.”

Cavanagh capitalized on two specific areas of psychological science: performance monitoring and simple learning. Performance monitoring involves making an error and adjusting afterward, while simple learning involves seeking regard and avoiding punishment.

“In psychology, these are looked at as two different fields,” Cavanagh said. “One of them occurs when you know how to do things, and the other when you have to learn how to do things. Of course reality isn’t so bifurcated in structure. Reality is more nuanced.”

Cavanagh and Frank added a twist to the research by introducing a touch of conflict to make learning a task more difficult. The researchers used a specific spatial conflict task called a Simon task, which involves reaction times that tend to be fast and more accurate when the stimulus occurs in the same side of the screen as the response hand.

In the training phase (A&B), four different stimuli were associated with different reinforcement probabilities. In a subsequent testing phase (C&D), participants had to choose the ‘most rewarding’ stimulus in a two alternative forced-choice scenarios and the hypothesized effects (E) of the cost of conflict during training on action selection during the testing phase.

The conflict in experimental learning involved using the left hand to make a selection on the right hand of a screen and vice versa. This simple case of spatial conflict is well established in cognitive psychology.

“You present the stimuli to either the left or right visual fields and this creates spatial conflict,” Cavanagh said. “If you put stimuli on the left visual field you want to press the left button. It’s just a basic human tendency. But if the rule is ‘see blue, press right’ and the stimulus is in the left visual field, you have to overcome that innate tendency. That effort is known as cognitive conflict – it’s slightly difficult but it’s very, very rapid.”

The research included four shapes (A-D) that could appear on either side of the screen. Each shape had a different probability of providing a one point reward when the correct button was pressed. Participants were asked to press the left button every time a symbol on a game pad appeared blue, and the right button every time a yellow symbol appeared.

A was always rewarded and D was seldom rewarded. However, B and C were each equivalently rewarded 50 percent of the time, but B never provided a point when it appeared on the side opposite from the button and C’s rewarded only when it appeared on the side opposite from the button. In this method, punishment or no points for B became associated with the opposite-side conflict ad did C’s reward (one point).

“What we had done after people correctly identified this basic simple, performance rule, we rewarded different stimuli,” Cavanagh said. “Some of them were always rewarded, some of them were never rewarded, some of them were only rewarded when the response was easy and some were only rewarded when the response was hard.”

After the conflict-infused learning phase, test subjects moved on to a second phase of the experiment where they were shown pairs of the previously observed shapes and had to indicate their preferences in terms of which on they thought was more rewarding. Everyone learned A was more rewarding and D was not, but learned perceptions of B and C were skewed in one of two ways for each participant.

The twist biased learning in tricky tasks by reducing the influence of reward, which increased the influence of aversion to punishment. For those who learn better from avoiding punishment, conflict acted to enhance experienced punishment value leading to a greater avoidance of B. Essentially, the latter effect was like adding insult to injury where conflict made gaining no points even more aversive.

Behavioral observation occurred in various methods. The data was compiled introducing several methods to review the results including EEG scans, genetic tests, manipulation with a low dose of a dopamine-related drug, even tracking eye blinks.

Training phase EEG (FCz electrode) to conflict and feedback, demonstrating a common theta band burst to conflict and punishment.

The EEG sensors monitored the mid-cingulate cortex, the location in the brain that determines effort, difficulty and conflict. The sensors measured the strength of the theta frequency brainwaves while people carried out the different phases of the task.

“The degree to which conflict reduced reward-related theta activity of C compared with B was related to preferences for B, and the degree to which conflict enhanced punishment related-theta activity of B compared with C was related to avoidance of B,” the authors wrote in the paper. “These findings suggest that conflict acted to both diminish reward value and to boost punishment avoidance within cortical systems associated with interpreting the salience of feedback.”

The researchers looked at how dopamine is processed in downstream areas in particular a gene called DARPP-22. Research has indicated that people with some variants of the gene are more sensitive to reward learning, while people with other variants are more sensitive to punishment avoidance learning. The gene affects dopamine function in neurons sensitive to rewards and punishment in the striatum

In a third test, an ultimate test of presumed psychological and neuropsychological function, volunteers were given a pharmacological challenge – a drug called cabergoline, which is a specific type of dopamine agonist. An extremely low dose was given so patients couldn’t actually tell if they were on it or not. This actually caused an enhancement of this effect – an enhancement on the avoidance of conflict.

“We were able to track individual differences on how much the drug affected their dopaminergic system by looking at spontaneous eye blink rate,” Cavanagh said. “This is a really well-known yet still slightly controversial way of noninvasively estimating central dopaminergic tone – the more dopamine you have, the more you blink.”

The researchers used correlative (neuroimaging techniques, genetics, and non-invasive psychophysiology) as well as causal (pharmacological challenge) techniques, all put together to support this general hypothesis of the domain general function of some brain systems involved in both performance and learning.

“We’re hoping these findings help merge together what are otherwise separate fields in psychology literature with this common neurobiological basis,” Cavanagh said. “Hopefully we can create a better understanding of how the brain underlies conflict psychological states or how a smaller number of brain systems underlie complex and disparate psychological states.”