Understanding Operant Conditioning: Reinforcement Explained

School
Red Deer College**We aren't endorsed by this school
Course
PSYCHOLOGY 282
Subject
Psychology
Date
Dec 12, 2024
Pages
31
Uploaded by LieutenantComputer12693
Topic 5: Operant (Instrumental) Conditioning:ReinforcementOperant Conditioning BasicsOperant ConditioningOperant (Instrumental) Conditioning: learning that is controlled by the consequences of the organisms behaviour ReinforcementReinforcement: process in which a behaviour is strengthened by the immediate consequence that reliably follows its occurrence oStrengthened = more likely to occur again in the future Thorndike’s Law of EffectSkinner’s operant boxesE. L. Thorndike’s Law of Effect“If a response, in the presence of a stimulus, is followed by a satisfying state of affairs, the bond between stimulus and response will be strengthened.”oSatisfaction= stamping in oDiscomfort = stamping out Early Investigations: E. L. ThorndikeLaw of Effect: oIf response in presence of a stimulus is followed by satisfying event, the association between S and R is strengthened oIf response is followed by annoying event, association is weakened Reinforcement Contingencies
Background image
Defining ReinforcementThe Three-Term Contingency
Background image
Is the Behaviour Strengthened?Do we observe: oIncrease in frequency oIncrease in duration oIncrease in intensity oIncrease in speed (decrease in latency) What is Operant Behaviour?Operant (Behaviour):A behaviour that is strengthened through the process of reinforcement oAKA Operant response; instrumental behaviour, etc. Acts on the environment to produce a consequence If consequence (stimulus or event) strengths the operant behaviour, it’s a reinforcerOperant Learning: A change in a behaviour as a function of the consequences that followed it Defining Reinforcement
Background image
Effect of ConsequencesTwo Types of ReinforcementImportant DefinitionsReinforcement: The procedure of providing consequences for a behaviour that increase or maintain the probability of that behaviour occurring in the future. Reinforcer:Any event or stimulus that follows an operant response and increases or maintains its future probability. Positive Reinforcement:Any event or stimulus that, when presented as a consequence of a behaviour, increases or maintains the future probability of that behaviour.
Background image
Negative Reinforcement: Any event or stimulus that, when removed as a consequence of a behaviour, increases or maintains the future probability of that behaviour. Escape and AvoidanceEscape Behaviour: oWhen operant behaviour increases by removing an ongoing event or stimulus e.g., Pressing a lever to stop a electric shock Avoidance Behaviour: oWhen operant behaviour increases by preventing the onset of the event or stimulus e.g., Pressing a lever to prevent an electric shock Things to Keep in MindReinforcement is NOT a theory Reinforcement IS a functional description Reinforcement is NOT circular oIncorrect Usage:“The consequence (e.g., food) increased the probability of the response (e.g., lever pressing) because it was reinforcing.”oCorrect Usage:“The consequence (e.g., food) functioned as a reinforcer for the response (e.g., lever pressing).”oCorrect Usage:“The consequence (e.g., food) reinforced the response (e.g., lever pressing).”The explanatory power of “reinforcement” comes from discovering that:othe stimuli that will function as a reinforcer. othe conditions that allow a stimulus to have a reinforcing function. “Increase the probability of” is often shortened to “strengthened”How Do We Look at Operant Behaviour?Pause for Methodologies Examining Operant BehaviourDiscrete Trial Procedure oInstrumental response produced once per trial oEach training trial end with removal of the animal from the apparatus Discrete Trial Procedures
Background image
Examining Operant Bheaviour Free-Operant Procedure oAnimals remain in apparatus and can make many responses oNo intervention by the experimenter Developed by BF Skinner Free-Operant in Rats
Background image
Free-Operant in Pigeons
Background image
Free-Operant in Chickadees
Background image
Cumulative RecordBased on old cumulative recorder device (1957) oConstant paper output, pen jumps with each
Background image
Plot of cumulative responses (y-axis) over time (x-axis) Plot of cumulative responses (y-axis) over time (x-axis) Plot of cumulative responses (y-axis) over time (x-axis)
Background image
“Frequency” vs. “Cumulative Frequency”
Background image
“Frequency” vs. “Cumulative Frequency”
Background image
TimeFrequencyCumulative Frequency10020031141252460471582793101011111112“Frequency” vs. “Cumulative Frequency”
Background image
Background image
“Frequency” vs. “Cumulative Frequency”TimeFrequencyCumulative Frequency10020031141252460471582793101011111112
Background image
“Frequency” vs. “Cumulative Frequency”Qualities of the Reinforcer and Reinforcement ProcessTwo Types of Reinforcer1.Unconditional (Primary) Reinforcer: a.A reinforcer that acquired its properties as a function of species evolutionary history. b.i.e., Stimuli and events that have phylogenetic importance. c.e.g., food, sex, water, sleep, social interaction, escape from harmful stimuli (e.g., extreme heat), etc. d.Usually depends on some amount of deprivation. e.Often species specific! Liberman et al.(1973)Differentially reinforced a incompatible set of behaviour o“Rational talk” and “Irrational talk” are incompatible behavioursoReinforced “rational talk”oDid not reinforce (i.e., extinguished) “irrational talk”
Background image
Two Types of Reinforcer2.Conditional (Secondary) Reinforcer: a.Otherwise neutral stimuli or events that have acquired the ability to reinforce due to a contingent relationship with other, typically unconditional, reinforcers Variables Affecting ReinforcementImmediacy
Background image
oA stimulus is more effective as a reinforcer when it is deliver immediately after the behaviour. Specific Reinforcer Used oe.g., Chocolate > Sunflower seeds Task Characteristics oe.g., Reinforce a pigeon pecking for food vs. a hawk pecking for food Contingency oA stimulus is more effective as a reinforcer when it is delivered contingent on the behaviour. Contiguity oNearness of events in time (temporal contiguity) or space (spatial contiguity) High contiguity often referred to as “pairing”oLess contiguity (i.e., longer delays) between the operant response and the reinforcer, diminishes the effectiveness of the reinforcer Well described by the “Hyperbolic Decay Function”Variables Affecting ReinforcementContingency The degree of correlation between a behaviour and its consequence
Background image
Reinforcer Characteristics oSpecific Reinforcer Used e.g., Chocolate > Sunflower seeds oTask Characteristics e.g., Reinforce a pigeon pecking for food vs. a hawk pecking for food Reinforcer Characteristics oMotivating Operations Establishing operations make a stimulus more effective as a reinforce at a particular time e.g., Deprivation oAbolishing operationsmake a stimulus less potent as a reinforce at a particular time e.g., Satiation Reinforcer Characteristics oReinforcer Magnitude Generally, a more intense stimulus is a more effective reinforcer Relation between size and effectiveness is NOT linear Generally, the more you increase magnitude, the less benefit you get from the increase Effectiveness of unconditional reinforcers tends to diminish quickly
Background image
Schedules of ReinforcementSchedule of Reinforcement: oA rule describing the delivery of reinforcement. oDifferent schedules produce unique schedule effects
Background image
Schedule Effect:Particular pattern and rate of behaviour over time. oOver the long-term, effects are very predictable oOccur in numerous species (humans included) Continuous Reinforcement (CRF) Schedule oBehaviour is reinforced each time it occurs oRate of behaviour increases rapidly Useful when shaping a new behaviour oRare in the natural environment! Intermittent Reinforcement Schedule oMany different types oFour (4) main types: Fixed-ratio (FR) Variable-ratio (VR) Fixed-Interval (FI) Variable-Interval (VI) Fixed-Ratio Schedule (FR)Behaviour reinforced after a fixed-number of times oe.g., FR-120 Generates Post-Reinforcement Pause(PRP) oPausing typically increases with ratio size and reinforcer magnitudeGenerates steady run rates following the PRP Variable-Ratio Schedule (VR)
Background image
The number of responses needed varies each time Ratio-requirement varies around anaverageoe.g., VR-360 Ratios: 1, 10, 20, 30,60, 100, 180, 240, 300, 360, 420,480, 540, 600, 660, 690, 690, 720, and 739 responses Mean = 360 (Average Ratio) Shuffled Ordering: 20, 240, 720, 420, 480, 60, 10, 690, 30, 739, 360, 690, 300, 1, 660, 600, 540, 100, 180 PRPs are rare and very short oInfluenced by the lowest ratio and/or the average ratio Produces higher rates than a comparable Fixed-Ratio Schedule Common in natural environmentsTwo Common Variations: oRandom-Ratio Schedule is controlled by a random number generator. Produces similarly high rates of responding. Type of ratio used in casino games & video games! oProgressive-Ratio Ratio requirements move from small to large e.g., 1,2,3,4,5,6,7,8…PRPs increase with ratio size Creates a “break-point” measure of how hard an organism will work
Background image
Fixed-Interval Schedule (FI)Behaviour is reinforced when it occurs after a given period of time oe.g., FI-4min Produce PRPs Responding increases gradually producing a “scallop” shapeUncommon in the natural environment Variable-Interval Schedule (VI)The timing of the response needed varies each time Interval varies around an average oe.g., VI-3mins Ratios (in seconds):
Background image
300, 30, 280, 120, 360, 300, 0, 240, 220, 180, 10, 280, 100, 60PRPs are rare and short Steady rates of responding oNot as high as a VR Common in natural environments “Ideal” Examples of Different Schedules
Background image
Variables Affecting ReinforcementReinforcer Characteristics oCompeting Contingencies e.g., Should I watch YouTube or study? Premack PrincipleIn nature, different behaviours have different probabilities of occurring oe.g., eating → high probability; lever pressing → low probabilityPremack Principle: oL → H, reinforces LoH → L, does not reinforce HHigh-probability behaviour reinforces low-probability behaviour! oe.g., If a child prefers playing pinball to eating candy, you can reinforce eating candy by letting them play pinball each time they eat some candy. Problems: oDoesn’t nicely account for conditional reinforcement effectsoLow prob. behaviour can reinforce high-prob. behaviour when the organism has been deprived of the low-prob. behaviour Applications of Premack PrincipleClinical patients oFind out what behaviour is reinforcing (high probability of occurring) for each individual e.g., Sitting still in individual patients with schizophrenia e.g., Stereotyped behaviours in children with autism Differential ReinforcementAn operant training procedure in which some behaviours are systematically reinforced and others are not oReinforcement and extinction together Types of Differential ReinforcementDifferential Reinforcement of Low rate (DRL) Differential Reinforcement of High rate (DRH) Differential Reinforcement of Other behaviour (DRO) Differential Reinforcement of Alternative behaviour (DRA)
Background image
Differential Reinforcement of Low rate (DRL)Behaviour is reinforced only if it occurs no more than a specified number of times in a given period. Results in low rates of responding oExample: Reinforce pigeon peck only after 5 sec have elapsed since last peck Each peck resets the clock Pecking before the 5 seconds does not provide food Longer intervals produce even lower rates Might produce superstitious behaviour Useful for reducing the rates of problem behaviours Using DRL1.Determine if DRL is appropriate procedure. i.e., decrease behaviour, but not eliminate it! 2.Determine an acceptable level of the behaviour. 3.Decide to implement full-session or spaced-responding DRL. 4.Inform the client about the procedure & criterion for reinforcement. 5.Provide feedback on performance. Differential Reinforcement of High rate (DRH)Behaviour is reinforced only if it occurs at least a specified number of times in a given period. Results in very high rates of responding oExample: Reinforce pigeon peck only when it has pecked at least 5 times within 10 seconds Less than 5 responses receive nothing and the clock resets Useful when the goal is to increase rates of behaviour! Differential Reinforcement of Other behaviour (DRO)Reinforcement is contingent on complete absence of behaviour for a period of time. oExample:
Background image
Reinforce pigeon only after 10 sec have elapsed with no pecking. Reinforcement is only provided if the behaviour doesn’t occur.Pecking resets the clock. Not the same as DRL oDRL reduces rates of behaviour oDRO eliminates rate of behaviour Especially useful when extinction is not an option and reinforcers are intrinsic to the behaviour Using DRO1.Define the target behaviours to increase and decrease. 2.Choose a reinforcer for DRO. 3.Choose the initial DRO interval. The length of the interval should be tied to the baseline rate of problem behaviour oi.e., frequent problem behaviour = short DRO interval oinfrequent problem behaviour = long DRO interval 4.Eliminate the reinforcer for the problem behaviour and deliver the reinforcer for the absence of the problem behaviour. 5.Reset the interval if the problem behaviour occurs. 6.Gradually increase the interval length. Whole Interval vs. Momentary DRO Whole interval DRO: The problem must be absent for the whole interval for reinforcement (referred to simply as DRO) Momentary DRO:The problem must be absent at the end of the interval for reinforcement Differential Reinforcement of Alternative behaviour (DRA)A desired (replacement) behaviour is reinforced while an undesired behaviour is extinguished Using DRA1.Define the target behaviours to increase and decrease. 2.Identify the reinforcer for the problem behaviour. 3.Choose a reinforcer for the desirable behaviour. 4.Reinforce desirable behaviour immediately and consistently. Prompt the behaviour if necessary. Prompt before undesirable behaviour occurs, not after. Prompt when important EO and SDare present. Desirable behaviour should require less effort than the undesirable behaviour.
Background image
5.Extinguish or devalue the reinforcer for the undesirable behaviour(s). Consider using DRL, DRØ procedures if not possible. 6.Begin to incorporate intermittent schedules of reinforcement for the desirable behaviour. Easiest if the desirable behaviour occurs occasionally or can be prompted Can promote creativity oDifferentially reinforce responses that have not recently been used oCreates response variability rather than repetition Identifying Putative Reinforcers1.Use reinforcer maintaining undesirable behaviour. 2.Observe activities that are enjoyable and occur with high probability. e.g., Playing video games, watching T.V. 3.Ask questions. What do they like? What do they enjoy doing? 4.Conduct Preference Assessment How to Choose Reinforcers Observe Ask Test oSingle stimulus preference assessment oPaired stimulus preference assessment oMultiple stimulus preference assessment (MSWO; i.e., without replacement) Preference AssessmentSingle Stimulus Assessment oPotential reinforcers presented individually multiple times in random orderings oPercentage of approaches are calculated Paired Stimulus Assessment oPotential reinforcers are presented in pairs oEach stimulus is presented with every other stimulus multiple times oPercentage of approaches/selections are calculated Multiple Stimulus Assessment oPotential reinforcers are presented in a full array oItems are removed as they are chosen oProcess is repeated with varied item orderings
Background image
oItems chosen first are likely more reinforcing But what if the problem behaviour is negatively reinforced?Differential Negative Reinforcement of Alternative behaviour (DNRA)Extinction for Neg. Reinforced SIBse.g., slamming desk and rocking back and forth when asked to complete school work Attention delivered every 15 sec without SIB Breaks from academic tasks delivered every 20 minutes for the absence of problem behaviors Variations of DRADifferential Reinforcement of Incompatible Behaviour (DRI) oBehaviour that is incompatible with the unwanted behaviour is reinforced oIncreasing rate of desired behaviour also decreases the rate of undesired behaviour because the two cannot occur simultaneously Differential Reinforcement of Communication (DRC) oA communication response is reinforced to replace the problematic behaviour oThe communication response delivers the reinforcer more rapidly than the problem behaviour oAlso called “Functional Communication Training”More on Differential Reinforcement of Communication (DRC)When a child’s problem behaviour is reinforced by attention…otaught to ask for attention o“How am I doing?” & teacher responds to this behaviour with attention
Background image
When a child’s problem behaviour is reinforced by escape…ofrom a difficult academic task, taught to ask for assistance o“I don’t understand” & teacher responds by providing assistanceSummaryDRL:When you want to decrease but not necessarily eliminate a target behaviour DRO:When you want to eliminate a problem behaviour DRA:When you want to increase the frequency of an existing desirable behaviour Remember that behaviours exist because they are reinforced Extinction/punishment is often insufficient without differential reinforcement training Focuses attention on the desirable behaviour oIf you decrease a behaviour, a new behaviour will take its place oExtinction and Punishment do not teach new behaviours, they only decrease current ones How to Test Premack Principle1.Establish baseline responding for different behaviours 2.Instrumental conditioning procedure with: L → HH → LImplications: Any high probability response can serve as a reinforcer for a lower probability response
Background image
Background image