This R Markdown file presents an in-depth analysis of transportation choices among college students at Cal Poly, aiming to provide valuable insights and recommendations for optimizing transportation options. I will explore key conclusions from the analysis and provide detailed coding to support our findings.

The main conclusions from my analysis are as follows:

In this document, I will provide code samples and data analysis to support these conclusions. I will also discuss recommendations for the college to increase the demand for the bus system and reduce the number of drivers on the road. This report will guide you through the analysis and coding processes to help inform the college’s transportation planning and policies. Let’s get started.

#Install required packages
library(readxl)
library(dfidx)
library(mlogit)
library(tibble)
library(tidyverse)
#Import data
commute_m <- read_excel("commute_m.xlsx")
head(commute_m)
## # A tibble: 6 x 13
##      id mode  time.car cost.car time.bus cost.~1 time.~2 cost.~3 time.~4 cost.~5
##   <dbl> <chr>    <dbl>    <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1     1 bus         16     0.82       20       0      20       0      55       0
## 2     2 car         10     0.59       16       0      15       0      34       0
## 3     3 bus         21     1.78       34       0      39       0     105       0
## 4     4 bus         13     0.31       12       0      12       0      24       0
## 5     5 bus         11     0.42       16       0      13       0      35       0
## 6     6 bus          9     0.24       10       0       7       0      18       0
## # ... with 3 more variables: age <dbl>, income <dbl>, marital_status <chr>, and
## #   abbreviated variable names 1: cost.bus, 2: time.bike, 3: cost.bike,
## #   4: time.walk, 5: cost.walk
#Use dfidx package to convert to an indexed data frame for choice indicator
commute_dfidx <- dfidx(commute_m, shape = 'wide', 
                       choice = 'mode', varying = 3:10)

head(tibble(commute_dfidx))
## # A tibble: 6 x 8
##      id mode    age income marital_status  time  cost idx$id1 $id2 
##   <dbl> <lgl> <dbl>  <dbl> <chr>          <dbl> <dbl>   <int> <fct>
## 1     1 FALSE    33     28 married           20  0          1 bike 
## 2     1 TRUE     33     28 married           20  0          1 bus  
## 3     1 FALSE    33     28 married           16  0.82       1 car  
## 4     1 FALSE    33     28 married           55  0          1 walk 
## 5     2 FALSE    25     28 single            15  0          2 bike 
## 6     2 FALSE    25     28 single            16  0          2 bus


Analysis: Multinomial Logit

To understand the factors influencing student transportation decisions, we employed a multinomial logit model. This approach allows us to analyze how various variables affect students’ choices between different modes of travel, including biking, walking, taking the bus, and driving a car.

#Model using Multinomial Logit
mlogit_model = mlogit(formula = mode ~   I(cost/income) | 1 | time, reflevel = 'car', data = commute_dfidx)

The multinomial logit model was specified using the following formula:

\[V_{ij} = a_{j} + \beta_1 \frac{C_{ij}}{Y_i} + \beta_2 T_{ij}\]

Here’s what each part of the formula represents:

The model allows us to estimate how these variables influence the probability of a student selecting a particular mode of transportation. We will now present the results of the analysis, discussing the coefficients, significance, and implications for transportation choices.

summary(mlogit_model)
## 
## Call:
## mlogit(formula = mode ~ I(cost/income) | 1 | time, data = commute_dfidx, 
##     reflevel = "car", method = "nr")
## 
## Frequencies of alternatives:choice
##   car  bike   bus  walk 
## 0.375 0.113 0.453 0.059 
## 
## nr method
## 8 iterations, 0h:0m:0s 
## g'(-H)^-1g = 7.44E-06 
## successive function values within tolerance limits 
## 
## Coefficients :
##                    Estimate Std. Error z-value  Pr(>|z|)    
## (Intercept):bike  -2.816387   0.438281 -6.4260 1.310e-10 ***
## (Intercept):bus   -3.072624   0.365943 -8.3964 < 2.2e-16 ***
## (Intercept):walk   0.184561   0.785896  0.2348 0.8143322    
## I(cost/income)   -57.797383  15.464584 -3.7374 0.0001859 ***
## time:car          -0.413471   0.044998 -9.1887 < 2.2e-16 ***
## time:bike         -0.283030   0.036182 -7.8224 5.107e-15 ***
## time:bus          -0.134744   0.031030 -4.3424 1.410e-05 ***
## time:walk         -0.295081   0.038231 -7.7184 1.177e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -980.22
## McFadden R^2:  0.14008 
## Likelihood ratio test : chisq = 319.34 (p.value = < 2.22e-16)

An essential point to consider is that the outcome variable, \(V_{ij}\), is a measure of the representative utility of each form of travel. When the utility gain for one form of travel exceeds the others, the student will to select that mode of transportation. Because the coefficients prefixed with “time:” are all negative, an increase in travel time will result in a decrease in utility. Similarly, the coefficients for “cost/income” show that an increase in the ratio of cost to income leads to a decrease in utility.

Calculating Price Elasticity of Demand

In this section, we’ll explore the concept of price elasticity of demand, specifically for the choice of driving a car among students. Elasticity measures how sensitive the demand for a product is to changes in price.

We’ll calculate two types of elasticity: own price elasticity and cross price elasticity. Own price elasticity measures how the demand for driving a car responds to changes in the cost of driving a car. Cross price elasticity assesses how the demand for driving a car changes in response to changes in the price of other transportation options.

commute_m <- commute_m %>% mutate(prob_car = fitted(mlogit_model, type = 'probabilities')[, 1])

To calculate the own price elasticity, we use the formula:

\[\text{Own Price Elasticity} = \beta_{2} \left(\frac{\text{cost of driving}}{\text{individual's income}}\right) \left(1 - \text{probability of choosing to drive}\right)\]

#To calculate own price elasticity
commute_m <- commute_m %>% mutate(elas_own_car = coef(mlogit_model)[4]*(cost.car/income)*(1-prob_car))

For the cross price elasticity, we use the formula:

\[\text{Cross Price Elasticity} = -\beta_{2} \left(\frac{\text{cost of driving}}{\text{individual's income}}\right) \left(\text{probability of choosing to drive}\right)\]

# Calculate cross price elasticity
commute_m <- commute_m %>% mutate(elas_cross_car = -coef(mlogit_model)[4]*(cost.car/income)*prob_car)

Now, let’s examine the summary statistics for these elasticities:

commute_m %>% select(starts_with('elas')) %>% summary()
##   elas_own_car     elas_cross_car   
##  Min.   :-4.4549   Min.   :0.01524  
##  1st Qu.:-0.8268   1st Qu.:0.22315  
##  Median :-0.4802   Median :0.31333  
##  Mean   :-0.6645   Mean   :0.31519  
##  3rd Qu.:-0.3070   3rd Qu.:0.40649  
##  Max.   :-0.1000   Max.   :0.83561
#To better illustrate the summary statistics above I plot a kernel density below. 
commute_m %>% 
  ggplot(aes(x = elas_own_car)) +
  geom_density() +
  xlab('Own Price Elasticty of Demand') +
  ylab('Kernel Density') + 
  ggtitle("Own Price Elasticty of Demand with Respect to Cost of Driving a Car")+
  theme(plot.title = element_text(hjust = 0.5))

commute_m %>% 
  ggplot(aes(x = elas_cross_car)) +
  geom_density() +
  xlab('Cross Price Elasticity of Demand') +
  ylab('Kernel Density') + 
  ggtitle("Cross Price Elasticity of Demand with Respect to Cost of Driving a Car")+
  theme(plot.title = element_text(hjust = 0.5))


As we can see from the above, the own price elasticity on average is has an absolute value of 0.665. This put demand for cars as fairly inelastic and does not change much as price of gas increases. The Cross price elsticity above on the other hand is positive and between 0 and 1, suggesting (as should be expected) that the goods are substitutes though only weakly connected.

Analyzing the Impact of Bus Travel Time Changes

In this section, we examine the effects of potential increases and decreases in bus travel time on various forms of travel, including biking, walking, taking the bus, and driving a car.

# Calculate aggregate choices with current parameters
agg_choices_obs <- colSums(predict(mlogit_model, newdata = commute_dfidx))
df<-as.data.frame(t(round(agg_choices_obs)))
time =  list(1)

# Loop through different levels of change in bus travel time
for (i in seq(from = .5, to = 1.5, length.out = 110)){
  if (i==1)  next
  
  comm_counter <- commute_m %>% 
    mutate(time.bus = i * time.bus)
  comm_counter_dfidx <- dfidx(comm_counter, shape = 'wide', 
                            choice = 'mode', varying = 3:10)

  agg_choices_counter <- predict(mlogit_model, newdata = comm_counter_dfidx)
  
  df <- rbind(df,t(round(colSums(agg_choices_counter))))
  time=append(time, i)
}

df$time = unlist(time)

In the code above, we loop through various scenarios of bus travel time adjustments and calculate how they affect students’ choices of transportation modes. The results are stored in the “df” data frame.

#Ploting data above
ggplot(df, aes(time))+ 
  geom_line(aes(y = bike, colour = "bike"), size=1) + 
  geom_line(aes(y = bus, colour = "bus"), size=1) + 
  geom_line(aes(y = car, colour = "car"), size=1) + 
  geom_line(aes(y = walk, colour = "walk"), size=1) +
  scale_x_continuous(labels = function(x) paste0((x-1)*100, "%"))+
  xlab("Percentage Reduction in Bus Time") +
  ylab("Students") +
  ggtitle("Student Choices of Travel as Bus Time Decreases") +
  theme(plot.title = element_text(hjust = 0.5))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## i Please use `linewidth` instead.

The code above generates a plot that visualizes how students’ travel choices change as bus travel time changes. As we can see, number of cars on the rode increase the longer it takes to ride the bus.

# Assessing the impact of a 20% decrease in bus time
comm_counter <- commute_m %>% 
    mutate(time.bus = .8 * time.bus)
  comm_counter_dfidx <- dfidx(comm_counter, shape = 'wide', 
                            choice = 'mode', varying = 3:10)

  agg_choices_counter <- colSums(predict(mlogit_model, newdata = comm_counter_dfidx))
  

# Calculate percentage changes
round(agg_choices_counter - agg_choices_obs)[1]/agg_choices_obs[1]
##    car 
## -0.136
round(agg_choices_counter - agg_choices_obs)[3]/agg_choices_obs[3]
##       bus 
## 0.1633554

In the code above, we specifically assess the impact of a 20% decrease in bus travel time. We calculate the percentage change in the number of cars and the usage of the bus in response to this change.

As a result, a 20% decrease in bus travel time leads to a 13.6% decrease in the number of cars on the road and a 16.3% increase in bus usage among students. These findings provide valuable insights for transportation planning and policy considerations.

# Calculate logsum with current parameters
logsum_old <- logsum(mlogit_model, data = commute_dfidx) 
logsum_new <- logsum(mlogit_model, data = comm_counter_dfidx) 

# Calculate the change in consumer surplus
change_in_consumer_surplus <- sum((logsum_new - logsum_old) / (-coef(mlogit_model)[4]/commute_m$income))

In the code above, we calculate the change in consumer surplus resulting from the 20% decrease in bus travel time. This change is a measure of the additional benefit that students gain from the reduced travel time.

The change in consumer surplus is calculated to be $86.80. This positive value indicates that the decrease in bus travel time by 20% has led to an increase in consumer surplus, indicating that the reduction in travel time is beneficial to students.


Analyzing the Impact of Income Changes

In this section, we investigate how changes in students’ income levels affect their choices of transportation modes, including biking, walking, taking the bus, and driving a car.

# Calculate aggregate choices with current parameters
df2<-as.data.frame(t(round(agg_choices_obs)))
income =  list(1)

# Loop through different levels of income change
for (i in seq(from = .5, to = 1.5, length.out = 110)){
  if (i==1)  next
  
  # Adjust income
  comm_counter <- commute_m %>% 
    mutate(income = i * income)
  comm_counter_dfidx <- dfidx(comm_counter, shape = 'wide', 
                            choice = 'mode', varying = 3:10)
  
  # Calculate aggregate choices with adjusted parameters
  agg_choices_counter <- predict(mlogit_model, newdata = comm_counter_dfidx)
  
  df2 <- rbind(df2,t(round(colSums(agg_choices_counter))))
  income=append(income, i)
}

df2$income = unlist(income)

In the code above, we analyze how different levels of income changes affect students’ transportation choices. The results are stored in the df2 data frame.

ggplot(df2, aes(income))+ 
  geom_line(aes(y = bike, colour = "bike"), size=1) + 
  geom_line(aes(y = bus, colour = "bus"), size=1) + 
  geom_line(aes(y = car, colour = "car"), size=1) + 
  geom_line(aes(y = walk, colour = "walk"), size=1) +
  scale_x_continuous(labels = function(x) paste0((x-1)*100, "%"))+
  xlab("Percentage Change in Income") +
  ylab("Students") +
  ggtitle("Student Choices of Travel as Income Changes") +
  theme(plot.title = element_text(hjust = 0.5))

From the graph above, we can observe the relationship between changes in income levels and students’ choices of transportation modes. The graph shows how various income adjustments affect transportation decisions. As income levels decrease, there is a noticeable increase in the number of students opting for bus travel. This increase in bus ridership becomes more prominent as incomes decline. The relationship is characterized by an accelerating rate, indicating that lower-income students rely more on the bus as their income diminishes. In contrast, as incomes increase, a greater proportion of students choose to drive cars. The graph illustrates that as income levels rise, students become more inclined to drive cars as their preferred mode of transportation.

This insight highlights the impact of income changes on students’ travel choices. It demonstrates the importance of considering income-related factors when planning and assessing transportation policies and services for students.


Conclusions:


  • Demand for commuting by car is fairly inelastic: The demand for cars among college students appears to be relatively insensitive to changes in factors such as gas prices or alternative transportation options.

  • Lowering travel time on bus by 20% will reduce the number of students traveling by car: Implementing measures to reduce bus travel times by 20% will result in 13.6% decrease in cars on the road and a 16.3% increase in bus usage.

  • Time is money: As people’s incomes increase, they tend to opt for commuting options that save them time even if that means spending more money.

These conclusions can help Cal Poly:

  • Increase the demand for the bus system
  • Lower the number of drivers on the road

This can be achieved through:

  • Introducing more buses to reduce travel time
  • Targeting advertising towards lower-income students

However, it should be noted that while increasing gas prices may reduce the number of cars on the road, the college should not expect a massive shift of students to public transit due to the elasticity of demand for Car transport.