*This is the fourth post in a series that attempts to tackle the issue that “my bus is never on time!”. I recommend you read the first, second, and third posts before this one, as there may be some references to the ideas discussed there.*

Last time we discovered just how much passengers can cause instability in the movement of buses at one stop. This time, I figured we’d continue our one-stop adventure and dig a little deeper into this holding control strategy I keep talking about. While we do that, we might just learn something about what we’re trying to accomplish in the first place.

Up to now we’ve made a lot of speculations about bus movements, and some of you might be a bit wary of how valid all this might be if we don’t have any actual information about bus behavior to back it up. Feeling like this is common in the area of “theoretical modelling”, and that’s why as we develop our ideas we try and leave things as general as possible, so that when someone goes out and collects data, the model falls into a definite place. That’s not to say we don’t want to touch any real data at all, of course. Looking at how buses actually behave on a statistical level can certainly tell us if our thinking is on the right track, and even give us ideas down the road.

All of that being said, sometimes it’s nice to work with something tangible. For that reason, we’re going to introduce some *made up* data to help us investigate the holding control strategy a bit better.

Suppose we decide to sit at a bus stop and count bus arrivals. Over the course of the day, 40 buses arrive and depart our stop, and as they do so we check how close they were to the schedule. For simplicity, we are only looking at the minute hand of our watch, and ignoring the second (a bus that is 1 minute and 50 seconds late falls under the “1” category). For each bus, we draw a little box above a number line that corresponds to how early or late the bus was. By the end of the day, we have what is known as a *histogram* that looks something like this:

Because of this nice layout, we can start reporting some information, but *only about how the day went today*, not about how the day always goes. Doing that would involve statistical inference, requiring a decent amount of statistical rigor that we don’t need right now.

There *is *one value (stay tuned for another!) that we are going to need to have any kind of useful discussion about this graph, and that is the average, or mean arrival time. You may already be familiar with the idea, but one way to find the mean is to pretend as if each of those blue blocks weighs the same, and then put your finger on the point at which the line below them is balanced. You intuitively know that that one block way out there at “8 minutes late” is going to have more of an impact than one of the blocks in the “1 minute late” category, and might place your finger somewhere between the “0” and “1 minute late” categories. What you have done is found the *centre of mass* of this data set, which is the same as the average. It turns out that the average is 0.875 minutes late, which we will round up to 1.

Now we get to see what our holding control strategy does for us. Let’s see what would have happened if we added an extra minute to this schedule to match this average time, and divide buses into “early” (green) and “late” (red). Remember, the buses that are early (or on time) will depart *on time *thanks to our strategy, and the buses that depart late will stay late. Note that we are completely neglecting any of the previous discussion of the time a bus spends at a bus stop (break the problem into bits). In that case, our histogram looks something like this:

So for this particular day, 17 buses would depart this stop late, and 23 buses would depart on time thanks to our strategy. Hooray! We have fixed more than half the buses!

Or have we?

Now we have created another problem: *buses are waiting around*. Maybe you’ve experienced this, sitting on a bus that is idling at a stop, nobody in sight. Maybe the driver is getting a coffee. If that scenario irritated you, you can bet it irritates other people. So just how much waiting around *is* there? That question is actually fairly easy to answer, just count each bus that’s early, weighted by *how much* it’s early (remember, we’re waiting an extra minute because we added that extra minute to the schedule!). 2 buses at 8 minutes, plus 2 buses at 6 minutes, plus 2 buses at 4 minutes, etc. for a total of 58 minutes (we won’t count the buses in the “1 minute early” category as having to wait at all). That’s close to an hour of total waiting time. Is that worth it?

The upshot is that *it depends*. Having early buses is generally considered bad, because that will cause people to miss their intended bus and have to wait for the next one. If the next one comes in 5 minutes, that’s not such a big deal, but if the bus comes every 20, people quickly spend more time waiting around for the next bus than they would spend waiting around *on* the current one. For each stop, a kind of “cost-benefit analysis” must be done to decide whether it’s worth adding *any* slack time (any amount will cause all the early buses to wait), and if so how much. If this cost-benefit analysis can be done properly for every stop, and with every trade-off, we can start thinking that maybe we are “as close as possible”.

Before we end this discussion, let’s take one more look at our fake-real data set. In Part II of this series, I talked about two ways to deal with the uncertainty in bus movement. We’ve discussed the “shifting” strategy a lot, but it would be nice to see what the *other* method (reducing it in general) would do for us. For that we need to understand the concept of spread, or *variance*.

Our average/centre of mass trick is useful, but it does have a dangerous flaw. I can move blocks further and further away from that balancing point, and as long as I do it on both sides, the balancing point will stay in the same place. That means that if I had a situation where *every* bus was *exactly* on time (0 minutes late/early), I could have the same average as if half the buses were 100 minutes late, and half the buses were 100 minutes early. Clearly the first situation is desirable, while the second is not.

What “reducing uncertainty” does is move those pesky “7 minutes early” and “8 minutes late” blocks and stacks them up somewhere in the middle. This is super helpful in our case, because we calculated our total wait time by weighting each block by its value. If those two “7 minutes early” blocks are closer in, the time savings is substantial. This “reduction in uncertainty” has move our cost-benefit

So what have we learned? We know now that our holding control strategy can do wonders for reducing early buses, but we found that we have to be careful about our decision to add in slack time, to avoid too much waiting around. There are plenty of traps, and we haven’t even added in our previous problem of passengers!

I hope that this post has convinced you of a few things. First, that transit planners are trying to balance a great many things at once, many of which are tied together in complex ways. Second, that understanding probability is important, not just for transit, but for anything out there in the real world. Lastly, I hope you can see how important it is to have useful data, and why it is absolutely crucial to make the best decisions.