The impact of incentives, and the value of gentle rule enforcement
People are not always rational, but they can learn to choose the rational option when that choice minimizes the probability of regret. This observation clarifies the value of gentle rule enforcement.
Recent research demonstrates that human reactions to incentives are highly sensitive to the probability of regret. Quick learning toward rational behaviour was found only when the option that maximizes expected return also minimizes the probability of regret. This observation implies that a gentle rule enforcement system that ensures a high probability of enforcement is often more effective than more aggressive rule enforcement systems.
Basic decisionmaking research documents many violations of rational economic theory (see Ariely, 2009; Kahneman, 2011). This research highlights one condition that facilitates quick learning toward rational choices (Erev & Haruvy, 2016). Fast learning toward maximization (when the agent chooses the option that is best for her) is observed when the option that maximizes returns also minimizes the probability of regret. The current paper clarifies this observation, and highlights some of its implications to rule enforcements.
To clarify the effect of experience on choice behaviour we focus on experiments that used the simple clicking paradigm described in Figure 1 (see review in Erev & Haruvy, 2016). In each trial of the experiments described here the participants are asked to select one of two unmarked keys, and then receive feedback consisting of their obtained payoff (the payoff from the selected key), and the forgone payoff (the payoff that the participant could have received had he selected the other key).
The current experiment includes many trials. Your task, in each trial, is to click on one of the two keys presented on the screen. Each click will be followed by the presentation of the keys’ payoffs. Your payoff for the trial is the payoff of the selected key.

Fig. 1. The instructions screen in experimental studies that use the basic version of the «clicking paradigm». In this version the participants do not receive a description of the payoff distributions. The feedback after each choice is a draw from each of the two payoff distributions, one for each key.
Figure 2 summarizes the results of two studies that used the clicking paradigm to examine situations in which the prospect that maximizes expected return leads to the worst outcome in most trials (it does not minimize the probability of regret). Both studies focus on choice between a status quo option (0 with certainty) and an action that can lead to positive or negative outcomes. In Problem 1, the action yielded the gamble (10 with p = 0.1; +1 otherwise); this choice has negative expected return (EV = 0.1), but it yields the best payoff in 90% of the trials. In Problem 2, the action (+10 with p = 0.1; 1 otherwise) has positive expected return (EV = +0.1), but it yields the worst payoff in 90% of the trials. The participants received a show up fee of 25 Israeli Shekels (1 Shekel = $0.25) plus the payoff (in Shekels) from one randomly selected trial.
The two curves show the aggregated choice rate of the risky action in five blocks of 20 trials (Nevo & Erev, 2012; and Teodorescu et al., 2013; see Figure 2). The results reveal that the typical participant favoured the risky prospect when it impaired expected return (Rrate of 60% in Problem 1 when the EV of the risky prospect is 0.1), but not when it maximizes expected return (Rrate of 27% in Problem 2 when the EV of the risky prospect is +0.1). Thus, the typical results in both problems reflect deviation from maximization. The participants appear to be risk seekers in Problem 1, and risk averse in Problem 2. Another way of interpreting the data is that in both cases they reflect underweighting of rare events (see Barron & Erev, 2003). That is, the typical participant behaves «as if» he does not pay enough attention to the rare (10%) outcomes.
Fig. 2: Underweighting of rare events. The action rate (proportion of choices of the alternative to the status quo) in the study of Problems 1 and 2 (described in the Figure) in 5 blocks of twenty trials. The curves present the means over the 128 subjects run (in Nevo & Erev, 2012; and Teodorescu et al., 2013) using the clicking paradigm (see Figure 1).
Another set of robust deviations from maximization is illustrated by the four experiments summarized in Figure 3. The experiments were run using the clicking paradigm and the procedure described above. The results reveal quick learning toward maximization in Problems 3 and 4, and almost no learning in Problems 5 and 6. Notice that the observed deviations from maximization cannot be explained as indications of risk aversion or risk seeking: The difference between Problems 3 and 5 appears to suggest risk aversion (lower maximization rate when the high EV prospect has higher variance), but the difference between Problems 4 and 6 appears to reflect risk seeking.
Fig. 3: The payoff variability effect. The Action rate (proportion of choices of the alternative to the status quo) in five blocks of 20 trials in the study of Problems 3, 4, 5 and 6 (described in the Figure). The curves present the means over the 35 subjects run (in Erev & Haruvy, 2016; and Di Guida et al. 2012) using the clicking paradigm.
Erev and Barron (2005) show that the deviations from maximization summarized above can be explained if the agents select the option that has led to the best average payoff in the past, and the average payoff is computed based on a small sample of past experiences. For example, if the subjects rely on five past experiences while facing Problems 1 and 2, the probability that their sample includes the rare (10%) event is only1(.9)^{5} = 0.42. Reliance on small samples implies fast learning to maximize when the best option minimizes the probability of regret.
Erev et al. (2010) evaluated the implications of the results summarized above in the context of rule enforcement. Their analysis starts with the observation that many rule enforcement problems have two extreme Nash equilibria. In one equilibrium, obeying the rule is the norm, and the rule enforcers can easily detect and punish deviations if they occur. Thus, no one is motivated to start violating the rule. In a second equilibrium, violation is the norm, and the enforcers are unable to cope with the frequent violations.
The results summarized above imply that in order to reach the noviolation equilibrium, it is important that obeying the rule will maximize expected return, and also minimize the probability of regret. Erev et al. evaluated this assertion in a project that focused on reducing cheating in exams. Their study was run on the final semester’s exam period of undergraduate courses at the Technion. At the time, instructions for exam proctors at the Technion included the following points:
 The student’s ID should be collected at the beginning of the exam,
(2) A map of students’ seating should be prepared.
Since the collection of the ID is the first step in the construction of the map, the common interpretation of these instructions was that the map should be prepared at the beginning of the exam. Early preparation of the map reflects an attempt to use aggressive punishments (the detection of correlation in the mistakes made by two neighbouring students can be used to justify severe punishments), but distracts the proctors and reduces the probability of gentle punishments (like moving a student that looks around suspiciously to the first row).
The experiment compared two conditions. The experimental condition featured a minimal modification of the instructions to proctors that increases the proctors’ ability to follow a gentle rule enforcement policy (i.e., promptly warn students whose gaze was wandering). The manipulation was a change of the second instruction to the proctors. These were changed to:
(2e) “A map of the students seating should be prepared 50 minutes after the beginning of the exam.”
Seven undergraduate courses were selected to participate in the study. In all courses the final exam was conducted in two rooms. One room was randomly assigned to the experimental condition, and the second was assigned to the control condition. The only difference between the two conditions involved the timing of the preparation of the map in the instructions to the proctors. In the control group the instructions stated:
(2c) “A map of the students’ seating should be prepared immediately after the beginning of the exam.
Key findings
After finishing the exam, students were asked to complete a brief questionnaire in which they were asked to “rate the extent to which students cheated in this exam relative to other exams.” The results reveal a large and consistent difference between the two conditions. The perceived cheating level was lower in the experimental condition in all seven comparisons. That is, the intervention that impaired the enforcers ability to use severe punishments also facilitated gentle punishments, that appear to reduce rule violations.
Relationship to Terror Networks and Organized Crime
The effort to use economic incentives in an attempt to take down terrorist networks and organized crime is not always successful. The current research suggests that the effectiveness of economic incentives can be enhanced by selecting incentives that hold two attributes: They should insure that the desired behavior maximizes expected return, and also minimizes the probability of regret. This suggestion holds for both positive and negative incentives. For example, to reinforce efforts to join normative society, it is important to ensure that the probability of reinforcement is high. Similarly, to reduce the tendency to join illegal organizations, it is important to ensure that steps toward these organizations are rarely reinforcing.
Note: This article is partially based on the papers “Erev, I. and Roth, A.E., 2014. Maximization, learning, and economic behavior. Proceedings of the National Academy of Sciences, 111(Supplement 3), pp.1081810825.» And «Erev, I., Ingram, P., Raz, O. and Shany, D., 2010. Continuous punishment and the potential of gentle rule enforcement. Behavioural Processes, 84(1), pp.366371.»
Author
Doron Cohen, Ido Erev (TECHNION, Israel)
References
Ariely, D., 2009. Predictably irrational (p. 71). New York: HarperCollins.
Barron, G. and Erev, I., 2003. Small feedback‐based decisions and their limited correspondence to description‐based decisions. Journal of Behavioral Decision Making, 16(3), pp.215233.
Di Guida, S., Marchiori, D. and Erev, I., 2012. Decisions among defaults and the effect of the option to do nothing. Economics letters, 117(3), pp.790793.
Erev, I. and Barron, G., 2005. On adaptation, maximization, and reinforcement learning among cognitive strategies. Psychological review, 112(4), p.912.
Erev, I. and Haruvy, E., 2016. Learning and the economics of small decisions. The handbook of experimental economics, 2
Kahneman, D., 2011. Thinking, fast and slow. Macmillan.
Nevo I. and Erev, I., 2012. On surprise, change, and the effect of recent outcomes. Frontiers in psychology, 3, p.24.
Teodorescu, K., Amir, M., & Erev, I. (2013). The experiencedescription gap and the role of the inter decision interval. In N. Srinivasan & V. S. C. Pamni (Eds.), Progress in brain research (1st ed., pp. 99 –115). Amsterdam, the Netherlands: Elsevier