-
Categories
-
Pharmaceutical Intermediates
-
Active Pharmaceutical Ingredients
-
Food Additives
- Industrial Coatings
- Agrochemicals
- Dyes and Pigments
- Surfactant
- Flavors and Fragrances
- Chemical Reagents
- Catalyst and Auxiliary
- Natural Products
- Inorganic Chemistry
-
Organic Chemistry
-
Biochemical Engineering
- Analytical Chemistry
-
Cosmetic Ingredient
- Water Treatment Chemical
-
Pharmaceutical Intermediates
Promotion
ECHEMI Mall
Wholesale
Weekly Price
Exhibition
News
-
Trade Service
Written by Su Yiting | Xidopamine is an important neurotransmitter secreted by the brain.
As a "happiness hormone", it can not only transmit happy information, but also is considered by scientists to play an important role in rewarding learning and motivational behavior.
According to the reward prediction error (RPE) hypothesis proposed by Schultz in 1997 [1], the dopamine signal reflects the deviation between the expected reward and the actual reward in reinforcement learning (RL).
Prediction Error, RPE).
Since the neural output of dopamine is very widely distributed in the cerebral cortex, the synchronization of the firing of dopamine neurons is also very high.
In theory, this also provides the possibility for the "synchronous large-area broadcasting" of RPE.
But on the other hand, if the dopamine signal only encodes an error value, how can it flexibly encode information accurately in complex behaviors? If there is only a whole RPE encoding signal to strengthen all dopamine loops, this is obviously a very inefficient way.
Effective reinforcement learning should allocate RPE to these areas and loops in proportion to the degree of relevance of different areas to participate in the task.
Recently, the teams of Christopher I.
Moore and Michael J.
Frank from Brown University published a research paper Wave-like dopamine dynamics as a mechanism for spatiotemporal credit assignment in the Cell journal, through mouse behavior experiments, calcium signal imaging, and dopamine fluorescence.
Probe detection, combined with computer simulation and other methods, proposed a time-conditioning control mechanism for dopamine-encoded reward behavior.They discovered for the first time that dopamine exhibits a wave-like movement in the dorsal striatum, and it appears in different areas and times according to the difference in behavioral tasks.
More importantly, this wave-like dopamine movement encodes stepwise.
Information related to rewards, so that reinforcement learning is more efficient and accurate.
The researchers used the calcium indicator GCaMP6f and the dopamine fluorescent sensor dLight to record the dynamic changes of the dopamine neuron axons in the dopamine neuron axons and the dynamic changes of the dopamine neuron in the dorsal striatum in the dark environment.
They found that both the calcium signal and the dopamine signal showed asynchronous (asynchronous) activities in the striatum region, and these activities were mainly distributed in the dorsal medial (dorsomedial, DMS) and dorsolateral ( dorsolateral, DLS) in two major regions.
These signals originate in a specific place (source), then migrate to the surrounding area at various speeds, and finally flow into the "dopamine sink".
Interestingly, this wave-like flow does not occur randomly, but is more obviously biased in the direction of a specific axis: they are divided into three main "dopamine wave motion modules", which are from the center to the periphery (center-out).
, CO), from the lateral to the medial (lateromedial, LM), and from the medial to the lateral (mediolateral, ML), and each has its own unique dynamic characteristics.
Figure 1: Medial to peripheral dopamine waves in the dorsal striatum (A), lateral to medial dopamine waves (B), medial to lateral dopamine waves (C) In order to explore the biological functions of dopamine waves on the complex behaviors of mice and the underlying factors For computational significance, the researchers designed two operational behavior tasks to explore whether dopamine contains the manipulation information of mouse behavior in a specific context by changing the contextual relationship between behavior and outcome.
In the first “instrumental task”, the reward process depends on the running process of the mouse.
The mouse can control the increasing sound progress bar by controlling its own exercise process (that is, the faster the running speed, The frequency of the prompt sound will increase as quickly as possible) until the end of the reward is issued. In the second "Paplovian" task, the frequency of the prompt sound also reflects the reward time, but this time the reward time for the mouse does not depend on the running process, but obeys an even distribution.
They found that after the reward of the “instrumental task” is over, the dorsal striatum will produce ML dopamine waves from the inside to the outside, and after the reward of the “Paplovian task” is over, the striatum will be outward.
Dopamine waves within the LM.
If the behavior module is designed to alternate between the two tasks, it is found that the running speed of the mice in the "instrumental task" increases significantly (because the faster the progress, the earlier the reward), and the switch to the "Paplovian task" At the same time, the running speed gradually decreased (because the reward time does not depend on the running process).
At the same time, the researchers also observed the alternating appearance of ML dopamine waves and LM dopamine waves in the dorsal striatum.
It is worth noting that the direction of the dopamine waves in the past test is highly correlated with the running speed of the mice in the next test, and the closer the past test is to the current time, the greater the influence of the direction of the dopamine wave on the behavior of the next test.
.
These experiments prove that the transmission dynamics of dopamine waves in mice will constantly switch in different task situations and rely on past experience to guide decision-making.
Figure 2: Mouse running speed switching (A), dopamine wave direction switching (B) and historical correlation of behavior (C) in the two tasks.
Then, how does the dopamine system of mice infer which situation they are in? Can action be the most efficient way to get rewards? The researchers combined the existing theories and experimental conclusions in the past, and proposed a hierarchical multi-agent mixture of experts (MoE): the first-level experts at the highest level are responsible for ultimately judging whether they can control the results of the actions.
In order to achieve the goal, the expert needs to consider various behavior-result relationships in past experience.
Therefore, it needs the help of multiple "sub-experts", which are distributed in the striatum.
Each area, and represents different situations. The downstream of the second-level expert has a third-level RPE coding system (sub-expert RPEs, sRPEs), that is, every time the prompt progresses earlier or later than expected, it can increase or reduce the reward prediction error of the second-level expert ( sRPE).
After the trial is over, the reward credit will be allocated to the expert with the most predictability (that is, the lowest RPE), and the result will be used to guide the behavior of the next trial.
In order to verify the actual manifestation of the MoE model in the brain, the researchers believe that the dynamics of dopamine can be used to reflect the predictive ability of the region.
First of all, in the "instrumental task", the dopamine signal in the DMS area increases proportionally with the progress of the task.
They further speculate that if the increase in dopamine affects the accuracy of reward prediction, then the increase in dopamine is the highest in the prediction phase (the steepest slope) ) Should be the first to receive dopamine reward credits.
As expected, the researchers discovered the inverse relationship between dopamine slope and dopamine release time, proving the role of the first-level expert in the MoE model.
Second, the researchers found that the dopamine slopes in different subregions of the dorsal striatum are also different, just like the second-level experts in the model, they each predict different situations.
Finally, the third-level sRPE coding system will produce a corresponding reward prediction error for each prompt in the current situation.
The magnitude of the error should be reflected in the change of dopamine.
Sure enough, they found that each segment of the dopamine axon Will respond to a specific sound change, and different axon regions correspondingly encode the change process of different sounds until the frequency of the entire sound is covered.
And in the short test (when the sRPE is larger), the researchers also detected a greater deviation of dopamine changes, which also proved the possibility of the third-level sRPE coding system.
Figure 3: The theoretical structure of the MoE model (A) and the biological response of the striatal dopamine dynamics to the MoE model (B).
In short, this study challenges the previous hypothesis that dopamine encodes RPE alone and proposes a brand new reward behavior for mice Time air conditioning control mechanism, Arif A.
Hamid is the first author and main corresponding author of this article. Original link: Platemaker: Eleven References 1.
Schultz, Wolfram, Peter Dayan, and P.
Read Montague.
"A neural substrate of prediction and reward.
" Science 275.
5306 (1997): 1593-1599.
doi: 10.
1126/science.
275.
5306.
1593 Reprinting instructions [Original article] BioArt original articles are welcome to share and reprint without permission is prohibited.
The copyrights of all works are owned by BioArt.
BioArt reserves all statutory rights and offenders must be investigated.
As a "happiness hormone", it can not only transmit happy information, but also is considered by scientists to play an important role in rewarding learning and motivational behavior.
According to the reward prediction error (RPE) hypothesis proposed by Schultz in 1997 [1], the dopamine signal reflects the deviation between the expected reward and the actual reward in reinforcement learning (RL).
Prediction Error, RPE).
Since the neural output of dopamine is very widely distributed in the cerebral cortex, the synchronization of the firing of dopamine neurons is also very high.
In theory, this also provides the possibility for the "synchronous large-area broadcasting" of RPE.
But on the other hand, if the dopamine signal only encodes an error value, how can it flexibly encode information accurately in complex behaviors? If there is only a whole RPE encoding signal to strengthen all dopamine loops, this is obviously a very inefficient way.
Effective reinforcement learning should allocate RPE to these areas and loops in proportion to the degree of relevance of different areas to participate in the task.
Recently, the teams of Christopher I.
Moore and Michael J.
Frank from Brown University published a research paper Wave-like dopamine dynamics as a mechanism for spatiotemporal credit assignment in the Cell journal, through mouse behavior experiments, calcium signal imaging, and dopamine fluorescence.
Probe detection, combined with computer simulation and other methods, proposed a time-conditioning control mechanism for dopamine-encoded reward behavior.They discovered for the first time that dopamine exhibits a wave-like movement in the dorsal striatum, and it appears in different areas and times according to the difference in behavioral tasks.
More importantly, this wave-like dopamine movement encodes stepwise.
Information related to rewards, so that reinforcement learning is more efficient and accurate.
The researchers used the calcium indicator GCaMP6f and the dopamine fluorescent sensor dLight to record the dynamic changes of the dopamine neuron axons in the dopamine neuron axons and the dynamic changes of the dopamine neuron in the dorsal striatum in the dark environment.
They found that both the calcium signal and the dopamine signal showed asynchronous (asynchronous) activities in the striatum region, and these activities were mainly distributed in the dorsal medial (dorsomedial, DMS) and dorsolateral ( dorsolateral, DLS) in two major regions.
These signals originate in a specific place (source), then migrate to the surrounding area at various speeds, and finally flow into the "dopamine sink".
Interestingly, this wave-like flow does not occur randomly, but is more obviously biased in the direction of a specific axis: they are divided into three main "dopamine wave motion modules", which are from the center to the periphery (center-out).
, CO), from the lateral to the medial (lateromedial, LM), and from the medial to the lateral (mediolateral, ML), and each has its own unique dynamic characteristics.
Figure 1: Medial to peripheral dopamine waves in the dorsal striatum (A), lateral to medial dopamine waves (B), medial to lateral dopamine waves (C) In order to explore the biological functions of dopamine waves on the complex behaviors of mice and the underlying factors For computational significance, the researchers designed two operational behavior tasks to explore whether dopamine contains the manipulation information of mouse behavior in a specific context by changing the contextual relationship between behavior and outcome.
In the first “instrumental task”, the reward process depends on the running process of the mouse.
The mouse can control the increasing sound progress bar by controlling its own exercise process (that is, the faster the running speed, The frequency of the prompt sound will increase as quickly as possible) until the end of the reward is issued. In the second "Paplovian" task, the frequency of the prompt sound also reflects the reward time, but this time the reward time for the mouse does not depend on the running process, but obeys an even distribution.
They found that after the reward of the “instrumental task” is over, the dorsal striatum will produce ML dopamine waves from the inside to the outside, and after the reward of the “Paplovian task” is over, the striatum will be outward.
Dopamine waves within the LM.
If the behavior module is designed to alternate between the two tasks, it is found that the running speed of the mice in the "instrumental task" increases significantly (because the faster the progress, the earlier the reward), and the switch to the "Paplovian task" At the same time, the running speed gradually decreased (because the reward time does not depend on the running process).
At the same time, the researchers also observed the alternating appearance of ML dopamine waves and LM dopamine waves in the dorsal striatum.
It is worth noting that the direction of the dopamine waves in the past test is highly correlated with the running speed of the mice in the next test, and the closer the past test is to the current time, the greater the influence of the direction of the dopamine wave on the behavior of the next test.
.
These experiments prove that the transmission dynamics of dopamine waves in mice will constantly switch in different task situations and rely on past experience to guide decision-making.
Figure 2: Mouse running speed switching (A), dopamine wave direction switching (B) and historical correlation of behavior (C) in the two tasks.
Then, how does the dopamine system of mice infer which situation they are in? Can action be the most efficient way to get rewards? The researchers combined the existing theories and experimental conclusions in the past, and proposed a hierarchical multi-agent mixture of experts (MoE): the first-level experts at the highest level are responsible for ultimately judging whether they can control the results of the actions.
In order to achieve the goal, the expert needs to consider various behavior-result relationships in past experience.
Therefore, it needs the help of multiple "sub-experts", which are distributed in the striatum.
Each area, and represents different situations. The downstream of the second-level expert has a third-level RPE coding system (sub-expert RPEs, sRPEs), that is, every time the prompt progresses earlier or later than expected, it can increase or reduce the reward prediction error of the second-level expert ( sRPE).
After the trial is over, the reward credit will be allocated to the expert with the most predictability (that is, the lowest RPE), and the result will be used to guide the behavior of the next trial.
In order to verify the actual manifestation of the MoE model in the brain, the researchers believe that the dynamics of dopamine can be used to reflect the predictive ability of the region.
First of all, in the "instrumental task", the dopamine signal in the DMS area increases proportionally with the progress of the task.
They further speculate that if the increase in dopamine affects the accuracy of reward prediction, then the increase in dopamine is the highest in the prediction phase (the steepest slope) ) Should be the first to receive dopamine reward credits.
As expected, the researchers discovered the inverse relationship between dopamine slope and dopamine release time, proving the role of the first-level expert in the MoE model.
Second, the researchers found that the dopamine slopes in different subregions of the dorsal striatum are also different, just like the second-level experts in the model, they each predict different situations.
Finally, the third-level sRPE coding system will produce a corresponding reward prediction error for each prompt in the current situation.
The magnitude of the error should be reflected in the change of dopamine.
Sure enough, they found that each segment of the dopamine axon Will respond to a specific sound change, and different axon regions correspondingly encode the change process of different sounds until the frequency of the entire sound is covered.
And in the short test (when the sRPE is larger), the researchers also detected a greater deviation of dopamine changes, which also proved the possibility of the third-level sRPE coding system.
Figure 3: The theoretical structure of the MoE model (A) and the biological response of the striatal dopamine dynamics to the MoE model (B).
In short, this study challenges the previous hypothesis that dopamine encodes RPE alone and proposes a brand new reward behavior for mice Time air conditioning control mechanism, Arif A.
Hamid is the first author and main corresponding author of this article. Original link: Platemaker: Eleven References 1.
Schultz, Wolfram, Peter Dayan, and P.
Read Montague.
"A neural substrate of prediction and reward.
" Science 275.
5306 (1997): 1593-1599.
doi: 10.
1126/science.
275.
5306.
1593 Reprinting instructions [Original article] BioArt original articles are welcome to share and reprint without permission is prohibited.
The copyrights of all works are owned by BioArt.
BioArt reserves all statutory rights and offenders must be investigated.