Here’s an interesting problem from a homework assignment in a class on stochastic processes: A woman standing at an intersection wants to cross the street. She will only cross if she feels that she can get all the way to the other side before a car arrives. Cars arrive randomly at a particular rate, but she can see them far enough off to determine if she has time to cross. On average, how long will the woman need to wait before she can cross the street?1Ross, Sheldon M., Introduction to Probability Models, Academic Press, 2010.
While the original problem provides more details (e.g. an exact description of the rate at which cars arrive, and a parameter that describes how cautious the woman is as a function of how long she requires the gap between cars to be), it is possible to learn something about mathematical thinking without going into those details.
The Hard Way
The basic goal of the problem is to find an average, or expected, value for the amount of time that the woman must wait. One approach would be to attempt to calculate this average directly. The general idea when calculating an average is to multiply each possible outcome by how likely that outcome is, and add up the results.2This may look different from what you were taught in school, which may have been to add up all of the outcomes, then divide by the total number of outcomes. However, it turns out that this is the same thing. Suppose you want to find the average of the five numbers 4, 4, 5, 7, and 10. If you add them all up, you get 30. Dividing by 5 (the number of outcomes), you get 6. So the average is 6. On the other hand, the probability of getting a 4 is 2 out of 5 (0.4) (there are a total of five numbers, two of which are fours), and the probabilities of each of the other numbers is 1 out of 5 (0.2). We then calculate the average as \((0.4)4+(0.2)(5+7+10) = 6\). We get the same result in either case.
For instance, if we know that a bag of M&M’s contains 24 candies with probability 10%, 25 candies with probability 85%, and 26 candies with probability 5%, then the average number of candies per bag is given by
\(24(0.10)+25(0.85)+26(0.05) = 24.95.\)
This tells us that the average bag of M&M’s contains about 24.95 candies (which we would normally round up).
So if we want to know how long the woman has to wait before she can cross the street, we need to figure out what the possible outcomes are (i.e. what are the possible amounts of time that she could wait), and how likely each of those outcomes is. Once we figure those out, we just have to do the multiplication and add everything up.
What are the possible outcomes?
In general, the woman could wait for any period of time. If no cars are coming, then she might be able to cross immediately. If a car is coming, then she is going to have to wait for it to go by, then either cross or wait some more. Cars arrive at random, so she could wait for one second, one minute, one hour, an infinite amount of time, or anything in between.
On the other hand, the situation can be broken down a little bit. Each time a car goes by, the woman is either going to cross the road as soon as it passes, or wait for the next car. That means that we only need to determine how many cars pass before she can cross. She can cross immediately, or after the first car goes by, or after the second car goes by, and so on.
Basically, the possible outcomes are the number of cars that go by before she crosses.
How likely is each outcome?
The problem actually gives the rate at which cars go by, and this rate is related to the probability of each outcome. Again, it is helpful to break things down a bit.
When the woman arrives at the intersection, there is a certain probability that she will be able to cross the street immediately. This probability is related to the rate at which the cars go by, and is given in the problem. The exact details are not important, so I will spare you the notation.
If she can’t cross immediately, then she is going to have to wait for at least one car to go by, then decide whether or not to cross. After the first car goes by, there is a certain probability that she will be able to cross, and a certain probability that she will have to wait. It turns out that these probabilities are the same as when she first arrived, times an extra factor to take into account the fact that she has already waited for the first car to pass.
As we start figuring out the probabilities of waiting for more and more cars, we should discover that there is a pattern. Again, the details aren’t really important—the important bit is that the probability of needing to wait for a certain number of cars to pass goes down as the number of cars increases. That is, she is less likely to have to wait for 100 cars than just 2.
We will take advantage of this pattern in order to determine the probability of waiting for a certain number of cars to go by before the woman can cross.
Putting it together.
The general idea is that we are going to have to multiply some number of cars by the probability that we have to wait for that number of cars, then add up all of these products. As the woman could, in theory, wait for an infinite number of cars to pass, we have to add up an infinite number of things.
Moreover, while there is a helpful pattern for determining the probabilities, this pattern depends on a random variable (the length of time between cars going by). In order to come to terms with these probabilities, we are going to have to integrate.
There are mathematical techniques for getting a handle on infinite sums (they aren’t actually as scary as they sound), and integration isn’t too terribly difficult once you learn about it, but these techniques are relatively advanced. Most students won’t see them until college, if ever. Calculus will get us the correct solution, but there has to be a better way!
The Clever Way
The hard way of solving the problem takes a naive approach, and attacks the problem with advanced mathematical techniques. There is another approach which requires that the problem solver be clever.
The basic logic follows from the question “If I toss a fair coin and it comes up heads, what is the probability that the next toss of the coin will be heads?” If you have taken any probability at all, you probably know that the result of the second toss doesn’t depend on the result of the first toss. They are independent events. That means that the second toss has an equal probability of being heads or tails, no matter what the first toss was.
Another question that we could ask is “If the first toss is tails, how many times will I have to toss the coin until it comes up heads?” It turns out that the number of tosses before I get a head does not depend upon the first toss at all. If I start tossing a fair coin right now, I expect to toss it an average of two times before I get a head. If I toss the coin and get a tail on the first toss, I expect to toss the coin an additional two times before getting a head. In fact, no matter how many times I have tossed the coin, and no matter what the previous tosses were, I can always expect to toss the coin two more times to get a head.
This actually relates to the woman crossing the street. When she first arrives at the intersection, she will expect to wait for a certain amount of time. This waiting time is the sum of two different possible waiting times: if she cross immediately, she doesn’t wait any time at all; otherwise, she waits for the first car to pass, then expects to wait for exactly the same amount of time that she would have expected to wait in the first place. Figure out the probability of either waiting or not waiting, and it isn’t too hard to find the expected waiting time.
The English here is a little tricky, so let me use a little bit of notation, which may clear things up a bit. Let’s call the amount of time that the woman expects to wait \(E[T]\).3The notation \(E[X]\) is a pretty common way of writing “the expected value of X,” which means more or less the same thing as the average of X. The notation \(Pr\{\hbox{an event}\}\) is a way of writing “the probability that an event will occur.” So \(Pr\{\hbox{no wait}\}\) is “the probability of there being no wait,” which, in the context of the woman waiting to cross the road, is the probability that she will not have to wait for any cars to pass.
\(E[T] = (0\hbox{ min})Pr\{\hbox{no wait}\}+([\hbox{wait until first car goes by}]+E[T])Pr\{\hbox{wait}\}\).
The problem gives numbers for the amount of time that the woman has to wait for the first car to go by, and enough information to figure out the probability of the woman not having to wait. Given those numbers, the problem reduces to something that can be done after high school algebra.
Which Way?
There are at least two ways of solving this problem, which I have labeled “the hard way,” and “the clever way.” Both methods ultimately render exactly the same result. However, when I typed up my results, the hard way required two pages of writing, and some dense mathematical notation. The clever way took only one paragraph and some simple algebra.
This dichotomy of methodologies is pretty common, both in mathematics, and in the wider world. There is often a straightforward-but-complicated way of solving a problem, and a clever-but-simple way of solving the same problem. One can jury-rig a quick fix which gets the job done, but could break; or one can engineer a lasting solution.
Finding simple, elegant, clever solutions is at the heart of mathematics. Every mathematician that is worth her salt prefers to use as little notation as possible, and to find solutions that depend on a clever bit of logic rather than a brute-force attack on the problem. Why do things the hard way when they can be done the clever way?