# Survey Says?

I come from a physics background, which is an area of study that rewards a good critical thinking, problem solving, and mathematical skills. While these skills are certainly transferable to transportation and transportation research (there is a ton of physics that has made its way into how we model transportation), there is one aspect of dealing with transportation that physics cannot help with in any way: people.

People are the real wildcards in most of the research that is done. There are a few strategies for dealing with ourselves as we interact with other people and transportation systems, each of them with their unique set of problems. One way is to try and aggregate behavior of individuals into a predictable clump, which in some cases can provide useful big-picture ideas, but can fall victim to the problem of rhetorical annihilation. Another way is to accept that people are perfectly unpredictable, and build models founded on probability and statistics . This has its own set of problems, since people are most certainly not unpredictable. I would say that we as humans fall almost dead centre on the mathematically frustrating (but philosophically encouraging) spectrum between predictable and random, at least when it comes to how we decide to move about.

The third approach, which I plan to discuss in more detail, focuses on observation. The idea is to translate what people do (or say they’ll do) into mathematical “choice” models, and use those models to predict what people will do when situations change. There are two approaches to this, and I’ll focus on one approach in particular in this post.

The research that I’ve done so far (coming soon, hopefully) looks at translating results from studies on transit psychology into mathematical functions. This idea draws on the aggregate approach as well as the observational approach, as it requires generalizing people into a simple graph. Those generalizations are proposed based on observational studies, however. My preferred approach (at the moment) is the probability-based idea, which may give you the impression that I either have no idea what I’m doing, or I am using whatever tool in the toolbox will let me explore a certain problem. None of the tools are very good, but they’re the best we have at the moment.

There is a large amount of research (not just on transportation, either) concerned with trying to go out and observe the behaviour of people. There are two basic philosophies on how to go about doing this: Revealed preference, where we watch what people do or buy and determine their preferences based on them choosing one option over another, and stated preference, where we directly ask people what they would do given a hypothetical scenario. Again, both of these approaches have their set of drawbacks, but I’m going to focus on stated preference and explain a little bit about how it works, so that the title of this blog post can have some relevancy.

### Stated Preference Surveys

Suppose we wanted to figure out an equation that described how people make decisions about travelling between Calgary and Edmonton, Alberta. The cities are located about 300km apart, and are connected by a major four to six lane highway. There are also a large number of daily commuter flights between the airports, but the Edmonton airport is about 40km from the downtown (a 30 minute drive) and the Calgary airport is about 20km from downtown (a 20 minute drive). For simplicity, let’s say that driving and taking a flight are currently the only two ways that people can get between these two cities, but we’d like to see if it’s feasible to add a high-speed railway line between them.

The first step would be to consider what factors significantly influence people’s choice of transportation. For example, the price of orange juice is not likely to have a significant effect on whether someone chooses between flying and driving between Calgary and Edmonton, so we won’t include that in our equation. The price of gas, on the other hand, might be something that changes whether people drive or fly. For now, and to keep things simple, let’s list a few possible factors that might determine which mode of transportation (car, plane, or train) someone will use:

• The cost of gas spent to drive
• The cost of a plane ticket
• The cost of a train ticket
• The time taken to travel
• The ability to move around after you reach your destination city

You can’t possibly hope to list all the factors that influences everybody’s decision, and so you lump all the missing influences into a big statistical term we’ll call error. Without moving too much into the mathematics and statistics of how this type of model works, the idea is that you can determine a probability of someone taking a certain mode of transportation as some function of each of those factors we listed above. Usually, it’s a weighted sum. For example, the probability Pd that someone will drive to Edmonton from Calgary might be represented as:

$P_d = p_1(\mbox{gas cost}) + p_2(\mbox{plane cost}) + p_3(\mbox{train cost}) + p_4(\mbox{travel time}) + p_5(\mbox{mobility})$

With each pi representing some sort of weighting factor for each of the variables beside it. The idea here is that if you can develop a function like this with a reasonable amount of confidence, and you make the important assumption that this equation reflects the reality of how people behave, you can predict what happens, for example, if the cost of travel by car triples.

Hopefully, I’ve already gotten you thinking in the hypothetical, and you can see how you might be more likely to choose to drive (over flying) if the price of a plane ticket went up, or how you can see why you drive to Edmonton because the time it takes to get there is not as important to you as having a car at your destination. Intuitively, this whole idea feels good, which is why I think it’s such a popular model. There’s also a certain cleanliness to figuring out these constants pi, because we do it by getting people to compare sterile, hypothetical scenarios where they have to make a judgement based on only a few decisions (which for a lot of people is tricky enough!). In a sense, you’re creating “laboratory conditions” for people to make decisions in, by removing as many external influences as possible. For example, you may be asked, in a survey, to compare and choose between these two situations:

1. You are travelling to Edmonton from Calgary. It takes you 3 hours. It costs you \$40. Your mobility at your destination is high.
2. You are travelling to Edmonton from Calgary. It takes yous 2 hours. It costs you \$40. Your mobility at your destination is medium.

Whatever you choose, I now have some information on your preference. You are stating your preference to me by choosing one scenario over another. Perhaps you like scenario 1, which tells me that you value a certain amount of greater mobility as much as one hour of travel time. If I were to present you with four scenarios, and have you rank them, I could get a whole lot more data about how your stated preferences are, and if I have you do it 10 times for a random collection of scenarios I am suddenly looking at a whole lot of data, just for one person. Do this survey with a bunch of other people, and the statistics starts to get more and more confident about how people feel. In relation to our high-speed rail scenario, we could use this model to figure out how fast, cheap, or mobile the service would have to be, and use that information to see if the whole plan is feasible.

There’s one glaring problem with this whole idea, though. You are asking people to tell you what they would do in a certain situation. There is a huge difference in how people say they would behave, and how people actually behave. This, to me, is the number one problem with a stated preference survey. While the error term does try to account for some of the guesswork, this problem of people saying and doing different things is a systematic error, while the statistics really only accounts for the random error.

The point of this post, other than a general introduction to the idea of stated preference surveys, is to show you a little bit about how people and human behaviour is mixed in with transportation research. There are mathematical models that are painstakingly developed, but still have large fundamental flaws. We are borrowing tools from very precise areas of science like physics and math, and using them on something that is admittedly inexact and speculative. That’s not meant to be a discouraging statement, it’s meant to be a cautious one – models are inexact by nature, but human models are even more so.

Because people are, lets face it, impossible.