This website does readability filtering of other pages. All styles, scripts, forms and ads are stripped. If you want your website excluded or have other feedback, use this form.

Beyond Stimulus Cues and Reinforcement Signals: A New Approach to Animal Metacognition

Warning: The NCBI web site requires JavaScript to function. more...

My NCBISign in to NCBISign Out


US National Library of Medicine
National Institutes of Health J Comp Psychol. Author manuscript; available in PMC 2011 Nov 1. Published in final edited form as: J Comp Psychol. 2010 Nov; 124(4): 356–368. doi:  10.1037/a0020129 PMCID: PMC2991470 NIHMSID: NIHMS205291

Beyond Stimulus Cues and Reinforcement Signals: A New Approach to Animal Metacognition

Justin J. Couchman, Mariana V. C. Coutinho, Michael J. Beran, and J. David Smith Justin J. Couchman, University at Buffalo, The State University of New York; Contributor Information. Correspondence concerning this article should be addressed to Justin J. Couchman, Department of Psychology, Park Hall, SUNY Buffalo, Buffalo, NY 14260, or by [email protected]. Justin J. Couchman, Department of Psychology, University at Buffalo, the State University of New York. Mariana V. C. Coutinho, Department of Psychology, University at Buffalo, the State University of New York. Michael J. Beran, Language Research Center, Georgia State University. J. David Smith, Department of Psychology and Center for Cognitive Science, University at Buffalo, State the University of New York. Author information ► Copyright and License information ► Copyright notice and Disclaimer Publisher's Disclaimer The publisher's final edited version of this article is available at J Comp Psychol See other articles in PMC that cite the published article.


Some metacognition paradigms for nonhuman animals encourage the alternative explanation that animals avoid difficult trials based only on reinforcement history and stimulus aversion. To explore this possibility, we placed humans and monkeys in successive uncertainty-monitoring tasks that were qualitatively different, eliminating many associative cues that might support transfer across tasks. In addition, task transfer occurred under conditions of deferred and rearranged feedback—both species completed blocks of trials followed by summary feedback. This ensured that animals received no trial-by-trial reinforcement. Despite distancing performance from associative cues, humans and monkeys still made adaptive uncertainty responses by declining the most difficult trials. These findings suggest that monkeys’ uncertainty responses could represent a higher-level, decisional process of cognitive monitoring, though that process need not involve full self-awareness or consciousness. The dissociation of performance from reinforcement has theoretical implications concerning the status of reinforcement as the critical binding force in animal learning.

Keywords: metacognition, uncertainty monitoring, primate cognition, comparative psychology, monkeys

Humans can discern when they are certain or uncertain. They know when they know and do not know, and they can use this knowledge to avoid difficult situations or seek out additional information. Extensive research on uncertainty monitoring and metacognition has explored these phenomena (Benjamin et al., 1998; Brown et al., 1983; Flavell, 1979; Hart, 1965; Koriat, 1993, 2007; Koriat et al., 2006; Metcalfe, 2000; Metcalfe & Shimamura, 1994; Nelson, 1992; Schwartz, 1994; Serra & Dunlosky, 2005).

Researchers take humans’ metacognitive behaviors to indicate important mental capacities, including hierarchical layers of cognitive control (Nelson & Narens, 1990), self-awareness (Gallup, 1982), and declarative consciousness (Nelson, 1996). In fact, metacognition may be such a sophisticated cognitive capacity in humans that it is unique to humans (Metcalfe & Kober, 2005). Thus, it is an important empirical question with wide-ranging theoretical implications whether nonhuman animals have similar cognitive capacities and what these capacities might say about the nonhuman mind (Kornell, 2009; Smith, Shields, & Washburn, 2003; Terrace & Metcalfe, 2005).

Recent research has suggested that nonhuman animals (hereafter, animals) also have a capacity for metacognition or cognitive monitoring (Beran, Smith, Redford, & Washburn, 2006; Call & Carpenter, 2001; Foote & Crystal, 2007; Hampton, 2001; Inman & Shettleworth, 1999; Kornell, Son, & Terrace, 2007; Shields, Smith, & Washburn, 1997; Shields, Smith, Guttmannova, & Washburn, 2005; Smith, Beran, Redford, & Washburn, 2006; Smith, Schull, et al., 1995; Smith, Shields, Schull, & Washburn, 1997; Smith, Shields, Allendoerfer, & Washburn, 1998; Suda-King, 2008; Sutton & Shettleworth, 2008; Washburn, Smith, & Shields, 2006). In these studies, researchers used perception and memory tasks that presented a mix of easy and difficult trials. They gave animals primary discrimination responses (e.g., Sparse-Dense; Familiar-Unfamiliar) but also a secondary response that allowed them to decline any trials they chose. Animals with the capacity to monitor their cognitive states should recognize difficult trials as problematic and decline these trials selectively. Animals have produced data patterns in some uncertainty-monitoring tasks that are strikingly like those of humans (Shields et al., 1997; Smith et al., 1997; Smith et al., 1998). This secondary trial-decline response has come to be called the uncertainty response.

Comparative researchers naturally proceed cautiously in attributing metacognitive capacities to animals. Indeed, comparative psychology’s tradition of parsimony, as exemplified by Morgan’s Canon (1906. p. 53), demands a search for an explanation of the animal data patterns that relies on the lowest-level psychological capabilities possible. This is why the appropriate psychological interpretation of animals’ uncertainty responses has been a source of ongoing theoretical discussion.

As part of this discussion, Smith, Beran, Couchman, and Coutinho (2008) explored possible associative explanations for uncertainty responses. One persistent methodological concern is that researchers have rewarded animal participants directly for their uncertainty responses (e. g., Foote & Crystal, 2007; Inman & Shettleworth, 1999; Kornell et al., 2007; Hampton, 2001; Suda-King, 2008; Sutton & Shettleworth, 2008). This approach has the problem that it might give the uncertainty response a positive response strength independent of any metacognitive role it plays in a task. It might be used because its reward properties are attractive, producing the observed data patterns absent any metacognitive assessment by the animal. If so, a first-order associative account of those data patterns would be more parsimonious than a metacognitive account. One purpose of the present research was to evaluate animals’ uncertainty responses when no direct food rewards were ever offered for those responses. Of course eliminating the primary reinforcers attending the uncertainty response does not address all possible associative explanations of it. Accordingly, this article also considers carefully more indirect and secondary associative explanations.

In fact, another potential problem with uncertainty paradigms is that researchers generally make reinforcement transparent by giving feedback on every trial. As a result, every consequence can be immediately and directly associated to the stimulus-response pairing that produced the negative or positive outcome. Difficult stimuli/trials—seldom rewarded and frequently punished—could come to be aversive for animals and they could be conditioned not to make primary discrimination responses in those trial contexts. The uncertainty response could then become the default avoidance response to aversive stimuli, instead of a metacognitive report. This potential problem was raised from a formal-modeling perspective by Smith et al. (2008) and Staddon, Jozefowiez, and Cerutti (2007), and from a philosophical perspective by Carruthers (2008). A second purpose of the present research was to evaluate animals’ uncertainty responses when they were denied the trial-by-trial reinforcement that could produce gradients of stimulus avoidance/aversion.

In a first attempt to dissociate metacognitive from associative strategies in uncertainty tasks, Smith, Beran, Redford, and Washburn (2006) found that humans and one of two monkeys were able to make cognitive, decisional uncertainty responses that were independent of feedback signals.

Smith et al. (2006) gave humans and monkeys a psychophysical density-discrimination task and then trained them to complete the task under deferred feedback. That is, humans and monkeys were adjusted to situations in which they performed blocks of four trials and then received summary feedback for the block. In that way, they were denied trial-by-trial feedback along with the possibility of directly associating outcomes with specific stimulus-response pairs. The reinforcement situation thus became opaque and they could not construct reinforcement histories or response tendencies based on their trial-by-trial experience. Humans and one of two monkeys were able to make cognitive, decisional uncertainty responses that were independent of feedback signals.

However, this study had two significant limitations. First, only one monkey of two successfully transferred the use of the uncertainty response from one sparse-dense task to the next. The present article sought a general finding based on results from several monkeys. Second, and more important, the stimulus continua in all transfer tasks were Sparse-to-Dense continua and the primary responses did not change in kind. Thus, while humans and monkeys could not track the reinforcement history of the new tasks, they could have transferred their general knowledge of reinforcement history and task structure from one task to the next. To address this concern, the present research used transfer tasks that were qualitatively different from one another, so that neither task knowledge, associative cues, reinforcement history, or aversion/reinforcement gradients could easily transfer from one task to another. If monkeys could adapt their use of an uncertainty response to these qualitatively different tasks, and do so without any recourse to trial-by-trial feedback from their responses, it would indicate that monkeys cognitively construe new tasks and respond Uncertain based on psychological signals of indeterminacy/difficulty, not based on cues of aversion/avoidance. This would represent an important, converging line of evidence that nonhuman animals have an uncertainty-monitoring system that has functional similarities to metacognition of humans.

Experiment 1: Humans

Threshold tasks play a prominent role in human and animal psychophysics (Au & Moore, 1990; MacMillan & Creelman, 1991; Schusterman & Barrett, 1975; Thompson & Herman, 1975; Yunker & Herman, 1973). In these tasks, the experimenter moves one stimulus distribution nearer to or farther from a stable, contrast stimulus distribution in order to titrate the perceptual limen or discrimination threshold between the stimulus classes (Corso, 1963; Fechner, 1860/1966). The threshold task creates constant, focused task difficulty because at threshold the identity of the stimulus on all trials is barely discernible. Uncertainty monitoring and responding should be at a premium in these tasks. Experiment 1 examines human uncertainty monitoring in psychophysical threshold tasks performed with deferred and rearranged feedback.

We had a specific reason for evaluating humans’ uncertainty monitoring under threshold conditions. In the original study of psychophysical uncertainty monitoring by monkeys (Smith et al., 1997), monkeys performed both threshold and constant-stimuli psychological tasks. (In the constant-stimuli task, each trial represents a random choice from a set/constant distribution of stimuli, not a focused test of the animal’s discrimination performance near threshold). Both monkeys responded Uncertain adaptively and robustly in the threshold paradigm. But one monkey essentially did not respond Uncertain in the constant-stimuli task. Smith et al. (2006) reported the same lack of uncertainty responding by one monkey in a constant-stimuli task. Thus, we had reason to believe that the threshold paradigm might produce the most robust uncertainty responding by monkeys. We tested humans in the threshold paradigm in order to have a human performance profile that would be directly comparable to that of the monkeys.



Seventy-two undergraduates at the University at Buffalo, the State University of New York participated.

Psychophysical tasks

Each participant first completed a Sparse-Dense task. The threshold task was run along a continuum of 100 density levels designated Levels 20-120. The number of lit (white) pixels in the 200 × 100 box was given by PixelsLevel = round ((11800 – (120 – Level) 2) div 4). Thus, the density continuum went from 450 pixels (Sparse, Level 20) to 2950 pixels (Dense, Level 120).

Participants then transferred to a Continuity task. They judged whether a white circle was Discontinuous or Continuous based on the number of radial dots it contained on its perimeter. The number of radial dots was determined by the formula DotsLevel = round ((11800 – (120 – Level)2) div 100). Each dot’s X and Y coordinates, respectively, were determined by the formulae round (100 * cos(j * 2 * pi / DotsLevel)) and round (80 * sin(j * 2 * pi / DotsLevel)). In this way, as the variable j increased from 1 up to the required number of dots, the dots were evenly distributed around the circle’s perimeter. The angular distances between radial dots ranged from 20 degrees (Discontinuous, Level 20, 18 total radial dots) to about 3 degrees (Continuous, Level 120, 118 radial dots).

Finally, participants transferred to an Ellipse task. Humans judged whether a red ellipse was relatively round or flattened. In this task, the lengths of the X-radius and Y-radius of the ellipse were manipulated. The X- and Y-radii, respectively, were given by the formulae 90 + round ((11800 – (120 – Level) 2) div 400) and 50 - round((11800 – (120 – Level) 2) div 400). This resulted in X-radii that ranged from 94 (Round, Level 20) to 119 (Flat, Level 120) and Y-radii that correspondingly ranged from 46 to 21.


Threshold tasks have one stable stimulus distribution against which another roving stimulus distribution moves in order to titrate the participant’s threshold. We arranged the response grammar of the threshold task so as to honor this structural asymmetry. A stimulus was presented at the screen’s top right. A response icon “S” was presented at the screen’s top left. On all Level 120 trials—that is, Dense, Continuous, and Flattened stimuli—participants needed to make a response that selected the stimulus itself. On all Level 20-119 trials—that is, Sparse, Discontinuous, and Rounded stimuli—participants needed to make a response that selected the “S” response icon. This response grammar was close to the asymmetrical Go/No Go discrimination paradigm that is familiar to comparative psychologists. Correct responses resulted in a computer-generated whoop sound. Incorrect responses resulted in an 8s computer-generated buzz. The Uncertain response was a “?” in the bottom-center of the screen, and it allowed humans to escape the current trial (with no reward or penalty). During the Density task, these consequences were given immediately; during the Continuity and Ellipse task they were not (see below).

Titrating threshold

As the session began, the participant made Sparse-Dense decisions about the extreme stimuli in the task (Level 20, Level 120). As participants correctly completed trials, the level of the sparse trials increased so that they became more similar in density to Level 120 trials. The level of the sparse trials increased in 2-step increments in the range of Levels 20-64, allowing task difficulty to approach threshold rapidly while difficulty was still fairly low. The level of the sparse trials increased in 1-step increments in the range of Levels 65-119, allowing fine adjustments of task difficulty in the region surrounding each participant’s threshold. As the sparse level approached threshold, the accuracy in comparing Level 120s (true Denses) and (e.g.) Level 85s (threshold Sparses) eventually dropped below 70%. Then, and whenever accuracy on Sparse trials fell below 70%, the roving level of sparse trials decreased, loosening the discrimination and making it easier. Whenever accuracy on Sparse trials rose above 70%, the roving level of sparse trials increased, tightening the discrimination and making it more difficult. In this way, participants’ discriminative capabilities were continually challenged, their thresholds were monitored and maintained, and the task presented highly focused difficulty and sustained uncertainty. Accuracy was determined by finding the proportion of correct responses over the last ten trials completed for the given task (Density, Continuity, or Ellipse).

The threshold-titration method was the same in all three tasks. In each task, approximately 30% of trials were spent approaching the participant’s threshold and 70% of trials were spent titrating around the threshold.


At the beginning of the experiment, participants were told: “You will see boxes that are DENSELY or SPARSELY filled with pixels. Your job is to decide whether the boxes are DENSE or SPARSE.” They were shown the keys to press to make the stimulus-touch or “S” response, informed of the rewards and penalties, then told, “if you are UNCERTAIN, press the ‘?’ key”. Instructions for the Continuity and Ellipse tasks were in exactly the same format, with appropriate changes corresponding to the stimuli (i.e., “boxes” and “DENSE” changed to “ellipses” and “FLATTENED” for the ellipse task). For the Continuity and Ellipse task they were also told, “you will now find out how you are doing every 4 trials.”

Deferred/rearranged feedback

During deferred and rearranged feedback, participants completed four trials, with each response immediately bringing about the next trial. After four trials were completed, all reward whoops earned for the four trials were given, followed by all penalty buzzes. Consequences were separated by 250ms, so that it was apparent how many rewards and penalties had been accrued. Any uncertainty responses made during the four trials simply reduced the total number of rewards and penalties. For example, if a participant made 2 correct responses, 1 incorrect and 1 Uncertainty response, they would receive 2 reward whoops followed by 1 penalty buzz. It was thus possible to get direct feedback on a trial by making the Uncertainty response to the first 3 stimuli and attempting to answer the fourth. However, no human (and no monkey) responded in this way. The feedback cycle was immediately followed by the next block of trials. Each trial took about 1s to complete. Stimuli were not arranged in blocks. Every trial was selected randomly, with a 60% chance of being a Level 120 and a 40% chance of being the current roving level. Thus, there was no way to know or strategize about upcoming trials in a block based on what trials had already occurred.

Task durations

Participants were given 301 trials in the Density task with trial-by-trial feedback. They were given 453 trials in the Continuity and Ellipse transfer tasks, with all trials receiving only deferred and rearranged feedback. Thus, humans completed two qualitatively novel tasks (making the reinforcement history from previous tasks uninformative and unhelpful) entirely under deferred feedback (making the reinforcement history from the present task unavailable).


Performance bins

Each task contained 100 stimulus levels. Humans were highly accurate on lower stimulus levels, and progressed through those levels quickly, with few trials delivered. To create data bins with more equal numbers of trials, the data were binned as follows: Levels 20-29, Levels 30-39, Levels 40-49, Levels 50-59, and Levels 60-69, respectively, became Bins 1-5 (10 levels per bin). Levels 70-74 and 75-79, respectively, became Bins 6 and 7 (5 levels per bin). Levels 80-115 became Bins 8-19 at 3 levels per bin. Levels 116-119 became Bin 20. For all bins up to and including Bin 20, “S” responses were correct. The final level (Level 120) – the only level for which the stimulus-touch response was correct – became Bin 21. Bins with fewer than 10 trials were eliminated from analysis and from the corresponding graphs, because it is not useful to estimate three response proportions based on fewer than 10 events.

Bin effects

In the Sparse-Dense task, peak uncertainty responding was 31.9% at Density Bin 15. Accuracy here was 62.5%, appropriately low because this point was near the perceptual threshold of our sample relative to true Dense trials at Density Bin 21. Figure 1A shows humans’ proportional use of the three responses in this task. At each density bin, the three response proportions sum to 1.0 because participants made one of three responses for each trial. This same additivity applies to Figures ​Figures2,2, ​,4,4, and ​and5.5. The error bars indicate 95% confidence intervals for the peak of uncertainty responding at Bin 15, for the first level to the left of the peak at which uncertainty responding significantly declined, and for Bin 21 representing the true Dense trials. Uncertainty responding fell off to either side of the peak, becoming reliably different at Density Bin 6 to the left (where participants could generally tell that stimuli were Sparse) and Density Bin 21 to the right (where participants could generally tell that stimuli were Dense).

Figure 1 A. Humans’ performance on a density discrimination with trial-by-trial feedback using the titrating threshold method. Grey diamonds, grey triangles, and black squares represent the proportion of Sparse, Dense, and Uncertain responses, respectively. ... Figure 2 A,B,C. The performance of monkey Gale in the Length, Continuity, and Ellipse discriminations using deferred feedback and the titrating threshold method, depicted as in Figure 1. Figure 4 A,B,C. The performance of monkey Lou in the Slope, Continuity, and Asterisk discriminations using deferred feedback and the titrating threshold method, depicted as in Figure 1. Figure 5 A,B,C. The performance of monkey Murph in the Length, Continuity, and Asterisk discriminations using deferred feedback and the titrating threshold method, depicted as in Figure 1.

In the Continuity task, peak uncertainty responding was 33.3% at Bin 14. Accuracy here was 55.0%. In the Ellipse task, peak uncertainty responding was 57.8% at Bin 20. Accuracy here was 58.7%. Figures 1B and 1C show humans’ performance on the Continuity and Ellipse tasks, respectively. The error bars in these figures were placed as just described. In both tasks, participants declined more the more difficult trials at the crux of their ability to discriminate Discontinuous from Continuous circles and Rounded from Flattened ellipses.

In all three tasks, there was a significant effect of Bin on the level of uncertainty responding, F(1, 20) = 125.3, p < 0.001; η2 = .86. Humans were able to decline selectively the most difficult trials. In the two tasks featuring deferred and rearranged reinforcement, they were clearly not dependent on trial-by-trial feedback to do so. There was also a significant effect of task on the pattern of uncertainty responding, F(2, 40) = 7.9, p < 0.001; η2 = .28. This suggests that uncertainty responding was flexible and adapted to the structure of each task. It was not a function of previously experienced reinforcement history, nor was it a carryover effect from previous tasks. Their trial-decline responses were made to the most difficult stimuli in the task whether reinforcement-based cues were present (Figure 1A) or not (Figures 1B, 1C). In the latter two tasks, they had to decide to choose the uncertainty response based on their cognitive assessment of the difficulty of the trial. Thus, in both these tasks, humans illustrated a data pattern under deferred reinforcement that suggests their capacity for metacognition shorn of reinforcement cues. Though this conclusion is not surprising, it gives us a comparative standard against which we can compare the uncertainty responding of monkeys placed into a similar situation.

Experiment 2 Monkeys

Experiment 2 gave rhesus monkeys the threshold situation of Experiment 1. If monkeys could make adaptive uncertainty responses even in qualitatively novel tasks when feedback was deferred and rearranged, then it would be difficult for the uncertainty response to be conditioned by feedback signals, to be responsive to reinforcement history, or to be based in low-level associative cues. If that response were preserved, it would suggest that monkeys were choosing uncertainty responses at a higher, decisional level that could be isomorphic to humans’ uncertainty-monitoring performance in Experiment 1.


Gale, Lou, and Murph—all male rhesus monkeys (Macaca mulatta) and 15, 15, and 25 years old, respectively—were tested. They had been trained, using procedures described elsewhere (Rumbaugh, Richardson, Washburn, Savage-Rumbaugh, & Hopkins, 1989; Washburn & Rumbaugh, 1992), to respond to computer-graphic stimuli by manipulating a joystick. They had also been tested in prior studies on a variety of computer tasks, including a related Sparse-Dense discrimination in which they rated their low or high “confidence” in their discrimination response before receiving feedback (Shields et al., 2005). The monkeys were tested in their home cages with ad lib access to the test apparatus, working or resting as they chose during long sessions. They were neither food deprived nor weight reduced for the purposes of testing and they had continuous access to water.


The monkeys were tested using the Language Research Center’s Computerized Test System—LRC-CTS (described in Rumbaugh et al., 1989; Washburn & Rumbaugh, 1992)—comprising a Compaq DeskPro computer, a digital joystick, a color monitor, and a pellet dispenser. Monkeys could manipulate the joystick through the mesh of their home cages, producing isomorphic movements of a computer-graphic cursor on the screen. Contacting appropriate computer-generated stimuli with the cursor resulted in the delivery of a 94-mg fruit-flavored chow pellet (Bioserve, Frenchtown, NJ) using a Gerbrands 5120 dispenser interfaced to the computer using a relay box and output board (PIO-12 and ERA-01; Keithley Instruments, Cleveland, OH). For incorrect responses, the monkeys received no food pellet, and received instead a 20s trial-less timeout period.

Psychophysical tasks

Six threshold tasks were used in Experiment 2. All animals received the Sparse-Dense threshold task described in Experiment 1. This task served as the training task for monkeys as it had for humans. Some animals also received the Continuity and Ellipse tasks described in Experiment 1.

In addition, in the Length task, animals judged whether a light cyan line was short or long. Each line’s length was determined by the formula round ((11800 – (120 – Level)2) div 70). Lengths ranged from 25 pixels (Level 20) to 168 pixels (Level 120).

In the Asterisk task, animals judged whether few or many lines radiated from the center of an invisible circle. Line Number was determined by the formula round (12 × 1.01Level) and ranged from 15 (Level 20) to 40 (Level 120). The formulae for determining the X-Y coordinates of the line endpoints were round (100 * cos(((j * 2 * pi) + random (12)) / Line NumberLevel)) and round (80 * sin (((j * 2 * pi) + random (12)) / Line NumberLevel)), respectively. In this way, as the variable j increased from 1 up to the required number of radii, the radii were evenly distributed around the circle’s perimeter. The resulting asterisk stimuli grew “busier” as Level increased. The “random (12)” in the formulae randomly shifted the lines’ endpoints slightly so animals could not cue off the exact line orientations presented on particular levels.

In the Slope task, animals judged whether a line had a steep or gentle slope. One endpoint of the line was held constant, while the X-Y coordinates of the other endpoint were determined by the formulas round (100 * cos ((2 * pi) / Level)) and round (80 * sin ((2 * pi) / Level)), respectively. To ensure that monkeys responded to slope, and not to the character of lines of different slopes as drawn on the screen, the line stimuli were drawn either dashed or dotted. Slopes ranged from −2.45 (steeply down, top-left to bottom-right) to −0.17 (gently down, top-left to bottom-right).

Selection of transfer tasks

To be sure that each monkey would complete three suitable and novel transfer tasks, the five transfer tasks already described were created. This was prudent, because some tasks interacted badly with some monkeys. That is, animals sometimes showed response biases on tasks that ruled out titrating their psychophysical threshold and therefore ruled out evaluating their uncertainty responses at threshold. The monkeys completed these transfer tasks: Length, Continuity, Ellipse (Gale), Slope, Continuity, Asterisk (Lou), Length, Continuity, Asterisk (Murph).


The monkeys used the joystick to move their cursor to touch the “S” to make a sparse (Sparse-Dense), discontinuous (Continuity), round (Ellipse), short (Length), few (Asterisk), or steep (Slope) response. The monkeys touched the stimulus itself to make a dense, continuous, flat, long, many, or gentle response. Like Experiment 1, touching the “?” cleared the current trial and brought the next trial without any other feedback or consequence.

Procedure: training task (Sparse-Dense)

Monkeys were first given the extreme anchors of the Sparse-Dense task (Levels 20, 120) with trial-by-trial reinforcement. When they reached a criterion of 70% correct on the most recent 10 trials, their Sparse trials began to be increased in density one level at a time so that we could eventually approach each animal’s psychophysical threshold. At the same time, animals began to be weaned away from trial-by-trial reinforcement. This was accomplished by using 17 stages of deferred and re-arranged feedback, with each stage containing a probabilistic rule that determined how many stimuli the macaques would respond to before receiving feedback for those stimuli. Macaques progressed from one stage to the next when their current accuracy was greater than 70%. The first-stage rule specified feedback after 1 trial 100% of the time. The second-stage rule specified feedback after 1 or 2 trials, 90% or 10% of the time, respectively. This process continued, with longer blocks of trials becoming more frequent, until Stage 17 when macaques received 4-trial blocks before feedback 100% of the time. This process allowed the macaques to gradually transition from their normal feedback situation into completely deferred feedback without sudden changes that might disrupt their response strategies. In addition, by requiring a 70% transition criterion, we ensured that the macaques would treat each trial in a block as a valid trial even though they were not immediately reinforced for it.

Animals reprised the training sequence just described during each session of Sparse-Dense performance. As with other deferred feedback tasks, the progression occurred far more rapidly as they gained comfort and experience with the task and the deferred-feedback regimen (see Couchman, Coutinho, & Smith, in press). All other aspects of deferred and re-arranged feedback were identical to those described for humans in Experiment 1. Each trial took monkeys about 0.5s to complete.

Once again, stimuli were not arranged in blocks. Every trial was selected randomly, with a 60% chance of being a Level 120 and a 40% chance of being the current roving level. For the monkeys, as for the humans, there was no way to know or strategize about upcoming trials in a block based on what trials had already occurred.

Procedure: transfer tasks

With the introduction of each new transfer task, monkeys were first trained on stimuli from the endpoints of the stimulus continuum (Levels 20 and 120) and were given trial-by-trial feedback with no “?” response available. After achieving 90% accuracy, monkeys were moved to fully deferred feedback, while still only responding to trials on the continuum’s endpoints. After achieving 90% accuracy under deferred feedback, monkeys were given all three response options and were gradually moved toward their threshold. The “?” was thus on the screen for all trial levels, ensuring that it did not become associated only with the most difficult levels. Monkeys reprised the training sequence just described during each transfer-task session.

The threshold-titration method was the same in all three tasks. In each task, monkeys spent approximately 20% of trials approaching their threshold and 80% of trials titrating around their threshold.

Like humans, for tasks subsequent to the Sparse-Dense task, monkeys never received direct reinforcement signals for any stimuli at Levels 21-119, and particularly not for any of the difficult and uncertain threshold stimuli that produce the transfer tasks’ critical phenomena. Therefore, they could not know which trial levels they had missed in any block, which trial levels had poor reinforcement histories attached to them, and which trials levels were aversive and should be avoided through uncertainty responses. In this sense Uncertainty responses—considered trial level by trial level—were independent of direct feedback and reinforcement signals in the deferred-feedback tasks. This structure to the tasks ruled out some low-level interpretations of animals’ uncertainty responses based in stimulus aversion or response avoidance.

Results: Gale

Results shown are from 7,470 Length trials, 9,054 Continuity trials, and 5,856 Ellipse trials, all under fully deferred feedback. Figures ​Figures2A,2A, ​,2B,2B, and ​and2C2C show his performance on these tasks.

Performance bins

Gale’s data were binned across levels as already described in Experiment 1.

Gale: Length task (Figure 2A)

Gale’s threshold for discriminating Short from Long fell at about Bin 12 in this task; his uncertainty responding peaked there as well. The error bars indicate 95% confidence intervals for the peak of uncertainty responding, for the first level to the left at which uncertainty responding significantly declined, and for Bin 21 representing the true Long trials. Even denied direct signals of reinforcement and assignable feedback across the range of difficult trials, Gale performed the primary discrimination at a high level and used the uncertainty response adaptively and appropriately. For him, the use of that response was not dependent on trial-by-trial feedback to condition or occasion particular responses given particular trial contexts. In all respects, his performance was similar to that by humans in Experiment 1.

However, monkeys were given information about their overall performance, and could potentially determine how well they were doing based on how many rewards and penalties they received during summary feedback. To ensure that monkeys were not simply monitoring their overall feedback and increasing uncertainty responding based on increased penalties, we performed an additional analysis on the monkey data to determine which trials elicited the most uncertainty responding. Figure 3 shows Gale’s uncertainty responding when the stimulus was Short (Bin 1-15) or Long. Long stimuli are all Level 120 (falling into Bin 21), but are shown in the figure at the roving stimulus level that provided the contrasting local context in the task at the point the Level 120 trials occurred. Gale declined more Short trials than Long trials, F(1, 29) = 24.4, p < 0.01; η2 = .64. This suggests that the task was always a well-behaved psychophysical threshold task. Uncertainty responding did not simply increase when the animal faced global feedback that contained more penalties. Instead, Gale adaptively increased uncertainty responding to the kind of stimuli that he had seen less often. All monkeys showed similar patterns, declining the trials that they ought to be most uncertain about.

Figure 3 The proportion of uncertainty responses made by monkey Gale in the Length discrimination. Black squares indicate uncertainty responses made for lines of different lengths that were all defined to be Short by the rules of the task. Open squares indicate ...

Gale: Continuity task (Figure 2B)

In this task, Gale performed as in the Length task. However, this task was far more difficult for him, and his threshold for discriminating Discontinuous from Continuous (and his peak of uncertainty responding) fell at about Bin 6. The error bars were placed as already described. Here, too, Gale’s region of uncertainty responding was positioned—absent any direct reinforcement signals—to encompass the most difficult trials.

Gale: Ellipse task (Figure 2C)

The same data pattern was observed again, this time with a broad region of trial difficulty that ran from Bin 6 to Bin 14 and that was coordinated with elevated uncertainty responding.

In all three transfer tasks, there was a significant effect of Bin on the level of uncertainty responding, F(1, 15) = 95.1, p < 0.001; η2 = .86. Gale was able to build up a coherent decisional framework for performing these threshold tasks, adaptively declining the most difficult trials. There was also a significant effect of task on uncertainty responding, F(2, 30) = 3.3, p < 0.05; η2 = .18, suggesting that uncertainty responding was flexible and adapted to the structure of each task. It was not a function of previously experienced reinforcement history, nor was it a carryover effect from previous tasks.

His performance was independent of reinforcement signals that could have conditioned particular responses in particular trial contexts or reinforcement-histories for the critical trials that could govern his responding at an associative level. Though this does not rule out all other explanations, Gale’s results support the idea that his performance at threshold was cognitive and decisional, not fully determined by associative processes, and functionally isomorphic to humans in Experiment 1.

Results: Lou

Results shown are from 6,091 Slope trials, 8,409 Continuity trials, and 5,585 Asterisk trials, all under fully deferred feedback. Figures ​Figures4A,4A, ​,4B,4B, and ​and4C4C show his performance on these tasks.

Performance bins

Lou’s data were binned across levels as already described.

Lou: Slope task (Figure 4A)

Lou’s threshold for discriminating Steep from Gentle fell at about Bin 14; his uncertainty responding peaked there as well. The error bars in Figure 4 are placed as already described. Even denied reinforcement signals and assignable feedback across the range of difficult trials, Lou performed the primary discrimination well and used the uncertainty response adaptively. His use of that response was evidently not dependent on trial-by-trial feedback to condition or occasion particular responses given particular trial contexts. In all respects, his performance was similar to that by Gale and by humans in Experiment 1.

Lou’s performance testifies to the robustness with which threshold paradigms elicit uncertainty responding from animals. In several experiments using the Method of Constant Stimuli (e.g., Smith et al., 2006), Lou had in the past essentially refused to ever respond Uncertain. Yet, in this task, and in all three transfer tasks, he responded Uncertain appropriately. Possibly this difference reflects the focused difficulty of the threshold task. (Constant-Stimuli tasks present many easy trials that could make it seem not worthwhile to include the cognitively effortful Uncertain response in the task’s response repertory.) Or, possibly, the response asymmetry of the threshold task (S vs. stimulus-touch responses, or, in a sense, No Go or Go) facilitated Lou’s recruitment of the uncertainty response here. Whichever is the case, we flag this important strength of the threshold paradigm for practitioners in the field of animal metacognition.

Lou: Continuity task (Figure 4B)

In this ask, Lou showed a broad region of trial difficulty and elevated uncertainty responding that ran from Bin 8 to Bin 16. This data pattern clearly shows an appropriate coordination among uncertainty responding, trial difficulty, and poor performance, a coordination achieved in the absence of any direct reinforcement signals over that range of trials.

Lou: Asterisk task (Figure 4C)

The Asterisk task was clearly easy for Lou. He discriminated well between even the hardest few- and many-line asterisks. His uncertainty responding was appropriately reduced given his strong discriminative ability in this task. Still, he appropriately showed elevated uncertainty responding in the region of his poorest performance. That he was able to respond Uncertain less, in a task in which he discriminated better, suggests that monkeys are able to reduce their overall use of the uncertainty response when accuracy on the primary discrimination is high.

Lou responded uncertain significantly more to more difficult Bins, F(1, 20) = 54.7, p < 0.001; η2 = .73. The data from Lou’s three transfer tasks confirm that uncertainty-response regions were established adaptively—at different trial levels depending on his perceptual limen in the task, and of different width depending on the task’s difficulty—absent trial-by-trial reinforcement for any of the critical trial levels, F(2, 40) = 5.0, p < 0.01; η2 = .20. Thus, Lou’s results also support the idea that monkeys may be choosing uncertainty responses at threshold cognitively and decisionally.

Results: Murph

Results shown are from 6,607 Length trials, 7,768 Continuity trials, and 3,972 Asterisk trials, all under fully deferred feedback. Figures ​Figures5A,5A, ​,5B,5B, and ​and5C5C show his performance on these tasks.

Performance bins

Murph’s data were binned across levels as already described.

Murph: Length task (Figure 5A)

Murph found this task quite difficult. There was a broad region of trial difficulty—coordinated with elevated uncertainty responding—that ran from Bin 4 to Bin 11. It is a constant feature of the data from the monkeys that they are able to answer broad-ranging difficulty with broad-ranging uncertainty responding, and focused difficulty with focused uncertainty responding.

Murph: Continuity task (Figure 5B)

In this task, Murph had a high threshold near Bin 17, with coordinated elevated uncertainty responding there. Murph’s performance in this task had many similarities to the human Continuity task (Figure 1B).

Murph Asterisk task (Figure 5C)

The Asterisk task was clearly easy for Murph, as it was for Lou. Murphy reached a high threshold near Bin 19. He answered the focused difficulty of the task in that region with focused uncertainty responding.

Like Gale and Lou, Murph responded uncertain significantly more to more difficult Bins, F(1, 18) = 246.8, p < 0.001; η2 = .93, and there was a significant main effect of task on uncertainty responding, F(2, 36) = 3.9, p < 0.05; η2 = .18, suggesting that he also made adaptive uncertainty responses that were not motivated solely by reinforcement tracking or associative learning. In all respects, Murph’s data further support the idea that monkeys choose uncertainty responses cognitively and decisionally, without the need for conditioning outcomes and associated reinforcement histories at particular trial levels to ground or occasion these responses.

General Discussion

Some theorists of animal metacognition have suggested that animals avoid difficult trials based on reinforcement history and stimulus aversion—not based on monitored uncertainty (Smith et al., 2008). Some have even argued that all existing demonstrations of uncertainty monitoring by animals have a low-level, associative basis because they can all be modeled using a signal-detection framework (Staddon, Jozefowiez, & Cerutti, 2007; but see Smith et al., 2008). The goal of our discussion is to consider carefully how that strong view stands in light of the present results, and to consider the empirical advance achieved by the present methodology.

The qualitatively different tasks used in Experiments 1 and 2 ensured that associative cues would not transfer across tasks. Clearly, animals cannot condition their responses to lines of different orientations based on their responses to boxes of different densities. For this reason the present results rule out some stimulus generalization phenomena—in particular, those based in cross-task transfer and those that threatened the cross-task transfer results in Smith et al. (2006).

The deferred and rearranged feedback ensured that animals received no trial-by-trial reinforcement for the critical trials near the threshold regions of the perceptual continua. Accordingly, associative descriptions based in stimulus aversion caused by direct, trial-by-trial reinforcement are also ruled out. Yet despite distancing performance from across-task stimulus cues and direct reinforcement signals, humans and monkeys still made adaptive uncertainty responses by declining the most difficult trials. Moreover, they showed that they can broaden or narrow their region of uncertainty responding at will, and place that region in coordination with the difficulty that each task presented to them, even without some classes of stimulus- and reinforcement-based cues.1

Nor can one claim that the uncertainty response had some primary reinforcer attached to it. In fact, the present research is distinctive because in this case all primary reinforcers were removed from the functionality of the uncertainty response. That is, it brought no reward, no hint about the trial’s answer, no especially easy next trials, or any other of the primary reinforcers that have encouraged associative interpretations of the uncertainty response in other studies.

Indeed, one can see that the no-consequence contingency attending the uncertainty response makes it neutral associatively in important ways. As it is used on a trial that would have been answered correctly or incorrectly, respectively, it delays reward (because the animal gives up a future reward it would have earned), or speeds reward (because the animal fends off a future timeout period). Thus, from an economic standpoint, there is no overall rate of uncertainty responding that optimally speeds reward and there is no simple way for the animal to be conditioned toward some overall rate of uncertainty responding. The issue is more cognitively complex than that.

Still, there are associative explanations that appear to meet this cognitive complexity head-on. One idea is that monkeys established generalization gradients for each new task based on their initial training with the easiest stimuli at the extreme ends of the continua. Smith et al.’s (2008) models explored the associative consequences of such gradients. This description might include the idea, as Smith et al.’s models did, that the uncertainty response also has a certain (low) level of attractiveness in the new task based on its usefulness in the initial Sparse-Dense task. In combination, these associative forces could have combined to produce maximal uncertainty responding at some intermediate place along the new continuum, but for gradient-associative or response-strength reasons, not metacognitive reasons. This place would be where the response strengths for the two primary responses were very low, so that the higher response strength for the third, default response controlled behavior.

Regarding the present tasks, we view this idea as less likely than the cognitive explanation for several reasons, but we acknowledge that the present paradigm does not completely rule it out. For one thing, animals never received direct reinforcement on any of the critical (difficult and uncertain) trials. So, the strongest gradient-forming mechanisms of direct trial-by-trial reinforcement were denied them. Instead, they would have had to extrapolate generalization gradients all across the continuum based on experience with only the two extreme stimulus levels. In addition, this endpoint conditioning could only plausibly produce symmetrical generalization gradients that would meet at the middle of the new continuum. There the response strength of the two primary responses would be lowest, and there, if the uncertainty response had some default response strength of its own, is where the most uncertainty responding would have occurred. The results of course do not support the idea of these symmetrical gradients—to the contrary, the animals’ actual cross-over points were highly asymmetrical along the new perceptual continua. Nor do the models in Smith et al. explain why the cross-over of these gradients always lay just at the animals’ psychophysical threshold, where there was a true point of psychological indeterminacy between the two stimulus input classes. Moreover, the generalization-gradient idea cannot explain the different places where animals chose to response uncertain in different tasks, or the changing breadths of their uncertainty regions. In contrast, the alternative idea—that animals monitored uncertainty and trial difficulty and responded adaptively on that basis—explains every aspect of the data simply and intuitively.

Illustrating these points, the associative models used by Smith et al. (2008) to explicate associative concerns about some uncertainty tasks did produce generalization gradients that were symmetrical and never produced generalization gradients with cross-over points at psychophysical threshold. Indeed, the mathematical models in Smith et al. had no way to embody the idea of the psychophysical threshold. In fact, Smith et al. (2008) showed in a milder case that their associative models collapsed when facing asymmetrical data, displaced from the center of the continuum, of the sort reported here. This aspect of that article has been sometimes overlooked but must not be. Those models would also qualitatively fail to fit the present data patterns. Nonetheless, we endorse the ongoing value of discussion and debate between uncertainty/metacognitive interpretations of data patterns like the present ones, and low-level, associative interpretations—if those associative interpretations are disciplined and fully grounded in the animal’s true psychology within a task.

A second idea is that monkeys must have found the uncertainty response rewarding or advantageous in some way, or else they would not have used it in the task. For example, a behavioral-economics model (Crystal & Foote, 2009; Staddon et al., 2009) suggests that the uncertainty response might be used to speed the arrival of rewards, and thus maximize overall rewards, by decreasing the number of penalty timeouts. However, in the present paradigm, just using the “?” more to escape more trials is self-defeating—it reduces penalties on trials that would have been answered incorrectly, but also reduces rewards on trials that would have been answered correctly. Moreover, using the “?” response to randomly decline a certain percentage of trials may fend off a few timeouts, but it will also equally take food rewards off the table. In fact, the “?” response avoids delays (reducing time to the next reward and thus maximizing rewards) only when difficult trials are selectively declined but easy trials are selectively completed. This economic strategy – which we believe is probably employed by animals in this paradigm – can only work when an animal makes an informed “?” response based on its monitoring of the difficulty of the trials in the task. This is why grounding one’s understanding of performance psychologically is so crucial. The psychological process of monitoring is causal and explanatory. The effect of reward maximization alone is not.

This discussion raises a concern about the use of mathematical models to describe animals’ metacognitive performances. Mathematical models do sometimes emulate well animals’ response strategies, as shown by Smith et al. (2008) and Staddon et al. (2009). However, these models are often psychologically empty. Their parameters—response strengths, decision criteria, and so forth—are defined abstractly, without regard to the cognitive processes and representations that may actually organize humans’ or animals’ performances in the existing tasks. In our view, the interpretation of a performance as metacognitive or not must depend on a careful consideration of these processes and representations, not on the fit of the abstract model. The psychological emptiness of model fits in the animal-metacognition area has been insufficiently appreciated. Smith et al. (2008) discussed this issue carefully, but that dimension of the article has also been sometimes overlooked. When one does undertake a careful cognitive analysis of tasks like the present ones, it is clear that a higher-level explanation in terms of uncertainty monitoring is reasonable and probably warranted (see also Smith, Beran, Couchman, Coutinho, & Boomer, 2009). That is, the results probably show that animals are responding to a cognitive signal based in something like difficulty or indeterminacy. However, we do not believe the present results imply that they do so fully consciously, or that their uncertainty systems have all the sophistication and awareness aspects that humans’ systems have.

The overall contribution of the present results is to set aside many associative claims, because they represent the strongest dissociation yet achieved of uncertainty responses from reinforcement signals. They can be viewed as a significant extension of other results that have hinted at the sophisticated, decisional nature of animals’ uncertainty-monitoring capacity. For example, Shields et al. (1997) showed that monkeys responded Uncertain adaptively in a relational-judgment task in which stimulus cues could not be the basis for performance. Hampton (2001), Kornell et al. (2007), and Smith et al. (1998) showed that animals can respond to memory uncertainty. Smith et al. (2008) showed that monkeys made adaptive uncertainty responses while multi-tasking, suggesting the stimulus and task generality of the psychological signal that occasions these responses. Washburn et al. (2006) showed that monkeys sought hints and information on the first trial of novel discrimination problems—when associative forces could not yet have been at work. Kornell et al. (2007) and Shields et al. (2005) showed that monkeys can seemingly rate their retrospective confidence after they have completed a discrimination response.

The important conclusion—that animals’ uncertainty responses might sometimes require a higher-level, non-associative explanation—potentially extends to many other findings in the literature. To see this, note that humans were clearly metacognitive under deferred feedback, but performed similarly under trial-by-trial feedback. Everyone would extend the metacognitive interpretation to both performances. It seems unlikely that humans suddenly reverted to reactive associationism when they could. The latter claim is unparsimonious—the simplest explanation is that humans used basic uncertainty-monitoring processes to behave adaptively in both tasks.

Parsimony cuts the same way for monkeys. Once they show that they are making cognitive, decisional uncertainty responses in some tasks, it may be possible to interpretatively elevate—by bootstrapping—some of their other performances. It is theoretically plausible and parsimonious that they also bring basic uncertainty-monitoring processes to the adaptive performance of many tasks. Hampton (2009) endorsed this possibility that animals may be using more private, uncertainty-based signals to decline trials, even when the task provides more objective, reinforcement-based signals of difficulty. The theoretical resistance to this idea is understandable from a historical perspective, but puzzling from a common-sense perspective. Situations of uncertainty are likely ubiquitous in the natural world (Griffin, 2003; Smith et al., 2003, p. 367), and many of these situations will likely occur absent a full associative history that can trigger adaptive responses. Thus, animals clearly would have benefited greatly from having a generalized uncertainty-monitoring capacity.

Interestingly, one constructive way to theoretically balance between associative and uncertainty-based interpretations would be to set these labels aside, and instead focus more directly on the representations and processes that actually do underlie animals’ uncertainty system (see Hampton, 2009 for some possible mechanisms). Further work is certainly needed to explore whether such mechanisms exist and when each is at work. However, the nature of the threshold task suggests that associative cues alone might not completely account for the metacognitive responding in the present paradigm.

Even associative theorists accept that the state of threshold is psychologically unique: animals are minimally informed observers there; animals struggle to behave adaptively there; the rules of stimulus control are different there (Boneau & Cole, 1967; Commons et al., 1991; Miller et al., 1980; Terman & Terman, 1972). The classical psychophysicists agreed that the threshold state is psychologically complex (Boring, 1920; Fernberger, 1914; George, 1917; Thomson, 1920; Watson et al., 1973; Woodworth, 1938). However, these theoretical perspectives raise the question of the special psychology of threshold responding, without yet answering it.

But one can answer the question intuitively. Consider a threshold light-detection task. Some intervals contain a barely detectable light signal; some do not. Only two stimulus events occur, and two responses are dedicated to them. Notice: there is no intermediate stimulus class that could ground associative processing or prompt the use of a third response. This is not a Green-Blue discrimination wherein one could perceive Teal in between. Here there is no “teal,” because the threshold task plays out within the span of a single just-noticeable difference (JND) of perception. So, there is nothing between Light and No Light except Light-No Light indeterminacy. Thus, one knows that uncertainty responding in tasks of this kind is about resolving indeterminacy. The same thing goes for the threshold tasks in the present article.

Shiffrin and Schneider (1977) went a step further. They explained why cognitive indeterminacy disallows low-level, reactive responding. The indeterminate mental representations map inconsistently and unreliably onto behavioral responses. As a result, the organism must resolve indeterminacy using higher level, controlled cognitive processing (deliberate, slow, serial-order). It is a fair summary of the present results, and the field’s wider range of empirical findings, to say that animals are using second-order, controlled decisional processes to resolve the indeterminacy of the stimuli that threshold tasks present relentlessly.

Of course there are other profoundly important questions. Are these controlled processes executive, explicit, declarative—possibly even conscious? These and related psychological questions can and should be embraced, in addition to ongoing questions about underlying associative mechanisms, as this field’s theoretical horizon expands by including a more cognitive-representational theoretical framework.

However, we and Carruthers (2008) point out that an adaptive response to indeterminacy could exist absent some important psychological components. It would not need to be fully conscious. It would not necessarily need to involve self-awareness by the organism that it is in difficulty (e.g., Proust, 2003, 2007). It would not need to be explicitly reportable or fully meta-representational. Therefore, we caution that the present results do not require one to elevate what animals are doing to the full status of human metacognition with all the trimmings (see also Carruthers, 2009; Couchman, Coutinho, Beran, & Smith, 2009; Proust, 2009; Smith et al., 2008).

This caution highlights a central theme of our research, which is that the field should avoid an all-or-none approach toward animal metacognition. For that approach could distance researchers from the theoretically fertile middle ground wherein one grants animals a fairly sophisticated uncertainty-monitoring capacity without over-interpreting what they do. We believe that in this middle ground lies the phylogenetic emergence of human metacognition, and probably also the ontogenetic emergence of metacognition in human development. Therefore, we believe that it is critical that the animal-metacognition literature not take an all-or-none approach toward a focal construct that itself is not all-or-none.

Finally, we point out that our deferred-feedback approach toward animal metacognition might contribute to many other lines of comparative inquiry. For a century, researchers essentially always gave animals immediate feedback. This methodological approach produced beautiful research, though it also reflected behaviorists’ associative model of animal mind. It is likely that this research focused on animals’ implicit, procedural learning system, within which the association of stimulus and response is accomplished through the catalysis of nearly simultaneous reinforcement signals.

However, recent work in cognitive neuroscience has shown the potential importance of another, dissociable learning system. This explicit system is far less dependent on reinforcement, in some cases operating under conditions of absent or delayed feedback, and relying on self-instruction and self-guided hypothesis testing. The explicit system is thought to use analytic, rule-based processes, and to depend on working memory and executive attention (Ashby et al., 2002; Ashby et al., 1999; Maddox et al., 2003; Waldron & Ashby, 2001; Maddox et al., 2004; Ashby & Maddox, 2005). It is a remarkable fact that little is known about the presence or the robustness of this explicit learning system in the nonhuman primates (see Couchman, et al., in press). The deferred-feedback approach, that demands the construction of decisional frameworks for tasks with substantial independence from reinforcement signals, is one of the paradigms that could fill this empirical gap. Explicit learning by humans is a core element of their thought and reasoning that is closely linked to their explicit consciousness and declarative cognition. Thus, research using the deferred-feedback approach could evaluate the possibility of an important continuity across the primate lineage, one that would bear on issues of declarative cognition and consciousness in primates.


The preparation of this article was supported by Grant BCS-0634662 from the National Science Foundation and by Grant HD-38051 from the National Institute of Child Health and Human Development.


1A reviewer suggested that one might explore alternative indirect reinforcement schedules. In particular, he or she made the intriguing suggestion that one might make the temporally displaced reinforcement regimen even more strongly non-associative by making the deferred reinforcement completely random and completely non-contingent on the animal’s behavior in the previous block. We want to acknowledge our interest in alternative approaches to the problem of deferred/temporally displaced reinforcement, and to endorse the value to the field of further explorations in this area. In our opinion, though, this non-contingent regimen might be too extreme for the present paradigm. Both a metacognitive system and an associative system could be misled and disrupted by completely random feedback, and so if this occurred it would not help one judge which processing system was governing performance.

Publisher's Disclaimer: The following manuscript is the final accepted manuscript. It has not been subjected to the final copyediting, fact-checking, and proofreading required for formal publication. It is not the definitive, publisher-authenticated version. The American Psychological Association and its Council of Editors disclaim any responsibility or liabilities for errors or omissions of this manuscript version, any version derived from this manuscript by NIH, or other third parties. The published version is available at

Contributor Information

Justin J. Couchman, University at Buffalo, The State University of New York.

Mariana V. C. Coutinho, University at Buffalo, The State University of New York.

Michael J. Beran, Language Research Center, Georgia State University.

J. David Smith, University at Buffalo, The State University of New York.


  • Angell F. On judgments of “like” in discrimination experiments. American Journal of Psychology. 1907;18:253.
  • Ashby FG, Maddox WT. Human category learning. Annual Review of Psychology. 2005;56:149–178. [PubMed]
  • Ashby FG, Maddox WT, Bohil CJ. Observational versus feedback training in rule-based and information-integration category learning. Memory & Cognition. 2002;30:666–677. [PubMed]
  • Ashby FG, Queller S, Berretty P. On the dominance of unidimensional rules in unsupervised learning. Perception and Psychophysics. 1999;61:1178–1199. [PubMed]
  • Au WW, Moore PW. Critical ratio and critical bandwidth for the Atlantic bottlenose dolphin. Journal of the Acoustical Society of America. 1990;88:1635–1638. [PubMed]
  • Benjamin AS, Bjork RA, Schwartz BL. The mismeasure of memory: When retrieval fluency is misleading as a metacognitive index. Journal of Experimental Psychology: General. 1998;127:55–68. [PubMed]
  • Beran MJ, Smith JD, Coutinho MVC, Couchman JC. The psychological organization of ”uncertainty” responses and “middle” responses: A dissociation in capuchin monkeys (Cebus apella) Journal of Experimental Psychology: Animal Behavior Processes. 2009;35:371–81. [PMC free article] [PubMed]
  • Beran MJ, Smith JD, Redford JS, Washburn DA. Rhesus macaques (Macaca mulatta) monitor uncertainty during numerosity judgments. Journal of Experimental Psychology: Animal Behavior Processes. 2006;32:111–119. [PubMed]
  • Boneau CA, Cole JL. Decision theory, the pigeon, and the psychophysical function. Psychological Review. 1967;74:123–135. [PubMed]
  • Boring EG. The control of attitude in psychophysical experiments. Psychological Review. 1920;27:440–452.
  • Brown AS. A review of the tip-of-the-tongue experience. Psychological Bulletin. 1991;109:204–223. [PubMed]
  • Brown AL, Bransford JD, Ferrara RA, Campione JC. Learning, remembering, and understanding. In: Flavell JH, Markman EM, editors. Handbook of child psychology. Vol. 3. Wiley; New York: 1983. pp. 77–164.
  • Brown W. University of California Publications in Psychology. Vol. 1. The University Press; Berkeley, CA: 1910. The judgment of difference; pp. 1–71.
  • Call J, Carpenter M. Do apes and children know what they have seen? Animal Cognition. 2001;4:207–220.
  • Carruthers P. Meta-cognition in animals: A skeptical look. Mind and Language. 2008;23:58–89.
  • Carruthers P. How we know our own minds: The relationship between mindreading and metacognition. Behavioral and Brain Sciences. 2009;32 [PubMed]
  • Commons ML, Nevin JA, Davison MC, editors. Signal detection: mechanisms, models, and applications. Erlbaum; Hillsdale, NJ: 1991.
  • Corso JF. A theoretico-historical review of the threshold concept. Psychological Bulletin. 1963;60:356–370. [PubMed]
  • Couchman JJ, Coutinho MVC, Beran MJ, Smith JD. Metacognition is Prior. Behavioral and Brain Sciences. 2009;32
  • Couchman JJ, Coutinho MVC, Smith JD. Rules and resemblance: Their changing balance in the category learning of humans (Homo sapiens) and monkeys (Macaca mulatta) Journal of Experimental Psychology: Animal Behavior Processes. in press. [PMC free article] [PubMed]
  • Crystal JD, Foote AL. Metacognition in animals. Comparative Cognition and Behavior Reviews. 2009;4:1–16. [PMC free article] [PubMed]
  • Fechner GT. Elements of psychophysics. vol. 1. Holt, Rinehart and Winston, Inc.; New York: 1860/1966.
  • Fernberger SW. The effect of the attitude of the subject upon the measure of sensitivity. American Journal of Psychology. 1914;25:538–543.
  • Fernberger SW. The use of equality judgments in psychophysical procedures. Psychological Review. 1930;37:107–112.
  • Flavell JH. Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry. American Psychologist. 1979;34:906–911.
  • Foote A, Crystal J. Metacognition in the rat. Current Biology. 2007;17:551–555. [PMC free article] [PubMed]
  • Gallup GG. Self-awareness and the emergence of mind in primates. American Journal of Primatology. 1982;2:237–248.
  • George SS. Attitude in relation to the psychophysical judgment. American Journal of Psychology. 1917;28:1–38.
  • Griffin DR. Significant uncertainty is common in nature. The Behavioral and Brain Sciences. 2003;26:346.
  • Hampton RR. Rhesus monkeys know when they remember. Proceedings of the National Academy of Sciences. 2001;98(9):5359–5362. [PMC free article] [PubMed]
  • Hampton RR. Multiple demonstrations of metacognition in nonhumans: Converging evidence or multiple mechanisms. Comparative Cognition and Behavior Reviews. 2009;4:17–28. [PMC free article] [PubMed]
  • Hart JT. Memory and the feeling-of-knowing experiments. Journal of Educational Psychology. 1965;57:347–349. [PubMed]
  • Inman A, Shettleworth SJ. Detecting metamemory in nonverbal subjects: A test with pigeons. Journal of Experimental Psychology: Animal Behavior Processes. 1999;25:389–395.
  • Koriat A. How do we know that we know? The accessibility model of the feeling of knowing. Psychological Review. 1993;100:609–39. [PubMed]
  • Koriat A. Metacognition and consciousness. In: Zelazo PD, Moscovitch M, Thompson E, editors. The Cambridge handbook of consciousness. Cambridge University Press; Cambridge, UK: 2007. pp. 289–325.
  • Koriat A, Ma’ayan H, Nussinson R. The intricate relationships between monitoring and control in metacognition: Lessons for the cause-and-effect relation between subjective experience and behavior. Journal of Experimental Psychology: General. 2006;135:36–69. [PubMed]
  • Kornell N. Metacognition in humans and animals. Current Directions in Psychological Sciences. 2009;18:11–15.
  • Kornell N, Son L, Terrace H. Transfer of metacognitive skills and hint seeking in monkeys. Psychological Science. 2007;18:64–71. [PubMed]
  • MacMillan NA, Creelman CD. Detection theory: a user’s guide. Cambridge University Press; Cambridge, UK: 1991.
  • Maddox WT, Ashby FG, Bohil CJ. Delayed feedback effects on rule-based and information-integration category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2003;29:650–662. [PubMed]
  • Maddox WT, Ashby FG, Ing AD, Pickering AD. Disrupting feedback processing interferes with rule-based but not information-integration category learning. Memory & Cognition. 2004;32:582–591. [PubMed]
  • Metcalfe J. Metamemory: Theory and data. In: Tulving E, Craik FIM, editors. The Oxford handbook of memory. Oxford University Press; New York: 2000. pp. 197–211.
  • Metcalfe J, Kober H. Self-reflective consciousness and the projectable self. In: Terrace HS, Metcalfe J, editors. The missing link in cognition: Origins of self-reflective consciousness. Oxford University Press; New York: 2005. pp. 57–83.
  • Metcalfe J, Shimamura A. Metacognition: Knowing about knowing. Bradford Books; Cambridge, MA: 1994.
  • Miller JT, Saunders SS, Bourland G. The role of stimulus disparity in concurrently available reinforcement schedules. Animal Learning & Behavior. 1980;8:635–641.
  • Morgan CL. An introduction to comparative psychology. Walter Scott; London: 1906.
  • Nelson TO, editor. Metacognition: Core readings. Allyn and Bacon; Toronto: 1992.
  • Nelson TO. Consciousness and metacognition. American Psychologist. 1996;51:102–116.
  • Nelson TO, Narens L. Metamemory: A theoretical framework and new findings. The Psychology of Learning and Motivation. 1990;26:125–41.
  • Proust J. Does metacognition necessarily involve metarepresentation? The Behavioral and Brain Sciences. 2003;26:352.
  • Proust J. Metacognition and metarepresentation. Synthese. 2007;159:271–295.
  • Proust J. Overlooking metacognitive experience. Behavioral and Brain Sciences. 2009;32
  • Rumbaugh DM, Richardson WK, Washburn DA, Savage-Rumbaugh ES, Hopkins WD. Rhesus monkeys (Macaca mulatta), video tasks, and implications for stimulus-response spatial contiguity. Journal of Comparative Psychology. 1989;103:32–38. [PubMed]
  • Schusterman RJ, Barrett B. Detection of underwater signals by a California sea lion and a bottlenose porpoise: variation in the payoff matrix. The Journal of the Acoustical Society of America. 1975;57(6):1526–32. [PubMed]
  • Schwartz BL. Sources of information in metamemory: Judgments of learning and feelings of knowing. Psychonomic Bulletin and Review. 1994;1:357–375. [PubMed]
  • Serra MJ, Dunlosky J. Does retrieval fluency contribute to the underconfidence-with-practice effect? Journal of Experimental Psychology: Learning, Memory, and Cognition. 2005;31:1258–1266. [PubMed]
  • Shields WE, Smith JD, Guttmannova K, Washburn DA. Confidence judgments by humans and rhesus monkeys. Journal of General Psychology. 2005;132:165–186. [PMC free article] [PubMed]
  • Shields WE, Smith JD, Washburn DA. Uncertain responses by humans and rhesus monkeys (Macaca mulatta) in a psychophysical same-different task. Journal of Experimental Psychology: General. 1997;126:147–164. [PubMed]
  • Shiffrin RM, Schneider W. Controlled and automatic human information processing: II. Perceptual learning, automatic attending, and a general theory. Psychological Review. 1977;84:127–190.
  • Smith JD, Beran MJ, Redford JS, Washburn DA. Dissociating uncertainty states and reinforcement signals in the comparative study of metacognition. Journal of Experimental Psychology: General. 2006;135:282–297. [PubMed]
  • Smith JD, Beran MJ, Couchman JJ, Coutinho MVC. The Comparative Study of Metacognition: Sharper Paradigms, Safer Inferences. Psychonomic Bulletin and Review. 2008;15:679–691. [PMC free article] [PubMed]
  • Smith JD, Beran MJ, Couchman JJ, Coutinho MVC, Boomer JB. Animal metacognition: Problems and prospects. Comparative Cognition and Behavior Reviews. 2009;4:40–53.
  • Smith JD, Redford JS, Haas SM, Coutinho MVC, Couchman JJ. The comparative psychology of same-different judgments by humans (Homo sapiens) and Monkeys (Macaca mulatta) Journal of Experimental Psychology: Animal Behavior Processes. 2008b;34:361–74. [PubMed]
  • Smith JD, Redford JS, Beran MJ, Washburn DA. Rhesus monkeys (Macaca mulatta) adaptively monitor uncertainty while multi-tasking. 2008. Manuscript submitted for publication. [PMC free article] [PubMed]
  • Smith JD, Shields WE, Allendoerfer KR, Washburn WA. Memory monitoring by animals and humans. Journal of Experimental Psychology: General. 1998;127:227–250. [PubMed]
  • Smith JD, Shields WE, Schull J, Washburn DA. The uncertain response in humans and animals. Cognition. 1997;62:75–97. [PubMed]
  • Smith JD, Shields WE, Washburn DA. The comparative psychology of uncertainty monitoring and metacognition. The Behavioral and Brain Sciences. 2003;26:317–373. [PubMed]
  • Smith JD, Washburn DA. Uncertainty monitoring and metacognition by animals. Current Directions in Psychological Science. 2005;14:19–24.
  • Son LK, Kornell N. Metaconfidence judgments in rhesus macaques: explicit vs. implicit mechanisms. In: Terrace HS, Metcalfe J, editors. The missing link in cognition: Origins of self-reflective consciousness. Oxford University Press; New York: 2005. pp. 296–320.
  • Staddon JER, Jozefowiez J, Cerutti D. Metacognition: A problem not a process. PsyCrit. 2007:1–5.
  • Staddon JER, Jozefowiez J, Cerutti D. Metacognition in animals: How do we know that they know? Comparative Cognition and Behavior Reviews. 2009;4:29–39.
  • Suda-King C. Do orangutans (Pongo pygmaeus) know when they do not remember? Animal Cognition. 2008;11:21–42. [PubMed]
  • Sutton JE, Shettleworth SJ. Memory without awareness: Pigeons do not show metamemory in delayed matching to sample. Journal of Experimental Psychology: Animal Behavior Processes. 2008;34(2):266–282. [PubMed]
  • Terman M, Terman J. Concurrent variation of response bias and sensitivity in an operant-psychophysical test. Perception and Psychophysics. 1972;11:428–432.
  • Metcalfe J. In: The missing link in cognition: Origins of self-reflective consciousness. Terrace HS, editor. Oxford University Press; New York: 2005.
  • Thompson RK, Herman LM. Underwater frequency discrimination in the bottlenosed dolphin (1-140 kHz) and the human (1-8 kHz) Journal of the Acoustical Society of America. 1975;57:943–948. [PubMed]
  • Thomson GH. A new point of view in the interpretation of threshold measurements in psychophysics. Psychological Review. 1920;27:300–307.
  • Waldron EM, Ashby FG. The effects of concurrent task interference on category learning: Evidence for multiple category learning systems. Psychonomic Bulletin & Review. 2001;8:168–176. [PubMed]
  • Washburn DA, Rumbaugh DM. Testing primates with joystick-based automated apparatus: Lessons from the Language Research Center’s Computerized Test System. Behavior Research Methods, Instruments, and Computers. 1992;24:157–164. [PubMed]
  • Washburn DA, Smith JD, Shields WE. Rhesus Monkeys (Macaca mulatta) immediately generalize the uncertain response. Journal of Experimental Psychology: Animal Behavior Processes. 2006;32:85–89. [PubMed]
  • Watson CS, Kellogg SC, Kawanishi DT, Lucas PA. The uncertain response in detection-oriented psychophysics. Journal of Experimental Psychology. 1973;99:180–185.
  • Woodworth RS. Experimental psychology. Holt; New York: 1938.
  • Yunker MP, Herman LM. Discrimination of auditory temporal differences by the bottlenose dolphin and by the human. Journal of the Acoustical Society of America. 1973;56:1870–1875. [PubMed]



Support Center Support Center External link. Please review our privacy policy. NLM NIH DHHS

National Center for Biotechnology Information, U.S. National Library of Medicine 8600 Rockville Pike, Bethesda MD, 20894 USA

Policies and Guidelines | Contact