This R Markdown file presents an in-depth analysis of transportation choices among college students at Cal Poly, aiming to provide valuable insights and recommendations for optimizing transportation options. I will explore key conclusions from the analysis and provide detailed coding to support our findings.
The main conclusions from my analysis are as follows:
Demand for Cars is fairly inelastic: We have observed that students’ preferences for car usage are relatively insensitive to external factors, such as changes in gas prices or the availability of alternative modes of transportation.
Lowering travel time on bus by 20% will reduce the number of students traveling by car: My analysis suggests that enhancing the efficiency of the bus system by reducing travel time can potentially encourage more students to choose public transportation over personal vehicles.
Time is money: As students’ incomes increase, they are more likely to select commuting options that save them time. This emphasizes this importance of targeting the right student with advertising.
In this document, I will provide code samples and data analysis to support these conclusions. I will also discuss recommendations for the college to increase the demand for the bus system and reduce the number of drivers on the road. This report will guide you through the analysis and coding processes to help inform the college’s transportation planning and policies. Let’s get started.
#Install required packages
library(readxl)
library(dfidx)
library(mlogit)
library(tibble)
library(tidyverse)
#Import data
commute_m <- read_excel("commute_m.xlsx")
head(commute_m)
## # A tibble: 6 x 13
## id mode time.car cost.car time.bus cost.~1 time.~2 cost.~3 time.~4 cost.~5
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 bus 16 0.82 20 0 20 0 55 0
## 2 2 car 10 0.59 16 0 15 0 34 0
## 3 3 bus 21 1.78 34 0 39 0 105 0
## 4 4 bus 13 0.31 12 0 12 0 24 0
## 5 5 bus 11 0.42 16 0 13 0 35 0
## 6 6 bus 9 0.24 10 0 7 0 18 0
## # ... with 3 more variables: age <dbl>, income <dbl>, marital_status <chr>, and
## # abbreviated variable names 1: cost.bus, 2: time.bike, 3: cost.bike,
## # 4: time.walk, 5: cost.walk
#Use dfidx package to convert to an indexed data frame for choice indicator
commute_dfidx <- dfidx(commute_m, shape = 'wide',
choice = 'mode', varying = 3:10)
head(tibble(commute_dfidx))
## # A tibble: 6 x 8
## id mode age income marital_status time cost idx$id1 $id2
## <dbl> <lgl> <dbl> <dbl> <chr> <dbl> <dbl> <int> <fct>
## 1 1 FALSE 33 28 married 20 0 1 bike
## 2 1 TRUE 33 28 married 20 0 1 bus
## 3 1 FALSE 33 28 married 16 0.82 1 car
## 4 1 FALSE 33 28 married 55 0 1 walk
## 5 2 FALSE 25 28 single 15 0 2 bike
## 6 2 FALSE 25 28 single 16 0 2 bus
To understand the factors influencing student transportation decisions, we employed a multinomial logit model. This approach allows us to analyze how various variables affect students’ choices between different modes of travel, including biking, walking, taking the bus, and driving a car.
#Model using Multinomial Logit
mlogit_model = mlogit(formula = mode ~ I(cost/income) | 1 | time, reflevel = 'car', data = commute_dfidx)
The multinomial logit model was specified using the following formula:
\[V_{ij} = a_{j} + \beta_1 \frac{C_{ij}}{Y_i} + \beta_2 T_{ij}\]
Here’s what each part of the formula represents:
The model allows us to estimate how these variables influence the probability of a student selecting a particular mode of transportation. We will now present the results of the analysis, discussing the coefficients, significance, and implications for transportation choices.
summary(mlogit_model)
##
## Call:
## mlogit(formula = mode ~ I(cost/income) | 1 | time, data = commute_dfidx,
## reflevel = "car", method = "nr")
##
## Frequencies of alternatives:choice
## car bike bus walk
## 0.375 0.113 0.453 0.059
##
## nr method
## 8 iterations, 0h:0m:0s
## g'(-H)^-1g = 7.44E-06
## successive function values within tolerance limits
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept):bike -2.816387 0.438281 -6.4260 1.310e-10 ***
## (Intercept):bus -3.072624 0.365943 -8.3964 < 2.2e-16 ***
## (Intercept):walk 0.184561 0.785896 0.2348 0.8143322
## I(cost/income) -57.797383 15.464584 -3.7374 0.0001859 ***
## time:car -0.413471 0.044998 -9.1887 < 2.2e-16 ***
## time:bike -0.283030 0.036182 -7.8224 5.107e-15 ***
## time:bus -0.134744 0.031030 -4.3424 1.410e-05 ***
## time:walk -0.295081 0.038231 -7.7184 1.177e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -980.22
## McFadden R^2: 0.14008
## Likelihood ratio test : chisq = 319.34 (p.value = < 2.22e-16)
An essential point to consider is that the outcome variable, \(V_{ij}\), is a measure of the representative utility of each form of travel. When the utility gain for one form of travel exceeds the others, the student will to select that mode of transportation. Because the coefficients prefixed with “time:” are all negative, an increase in travel time will result in a decrease in utility. Similarly, the coefficients for “cost/income” show that an increase in the ratio of cost to income leads to a decrease in utility.
In this section, we’ll explore the concept of price elasticity of demand, specifically for the choice of driving a car among students. Elasticity measures how sensitive the demand for a product is to changes in price.
We’ll calculate two types of elasticity: own price elasticity and cross price elasticity. Own price elasticity measures how the demand for driving a car responds to changes in the cost of driving a car. Cross price elasticity assesses how the demand for driving a car changes in response to changes in the price of other transportation options.
commute_m <- commute_m %>% mutate(prob_car = fitted(mlogit_model, type = 'probabilities')[, 1])
To calculate the own price elasticity, we use the formula:
\[\text{Own Price Elasticity} = \beta_{2} \left(\frac{\text{cost of driving}}{\text{individual's income}}\right) \left(1 - \text{probability of choosing to drive}\right)\]
#To calculate own price elasticity
commute_m <- commute_m %>% mutate(elas_own_car = coef(mlogit_model)[4]*(cost.car/income)*(1-prob_car))
For the cross price elasticity, we use the formula:
\[\text{Cross Price Elasticity} = -\beta_{2} \left(\frac{\text{cost of driving}}{\text{individual's income}}\right) \left(\text{probability of choosing to drive}\right)\]
# Calculate cross price elasticity
commute_m <- commute_m %>% mutate(elas_cross_car = -coef(mlogit_model)[4]*(cost.car/income)*prob_car)
Now, let’s examine the summary statistics for these elasticities:
commute_m %>% select(starts_with('elas')) %>% summary()
## elas_own_car elas_cross_car
## Min. :-4.4549 Min. :0.01524
## 1st Qu.:-0.8268 1st Qu.:0.22315
## Median :-0.4802 Median :0.31333
## Mean :-0.6645 Mean :0.31519
## 3rd Qu.:-0.3070 3rd Qu.:0.40649
## Max. :-0.1000 Max. :0.83561
#To better illustrate the summary statistics above I plot a kernel density below.
commute_m %>%
ggplot(aes(x = elas_own_car)) +
geom_density() +
xlab('Own Price Elasticty of Demand') +
ylab('Kernel Density') +
ggtitle("Own Price Elasticty of Demand with Respect to Cost of Driving a Car")+
theme(plot.title = element_text(hjust = 0.5))
commute_m %>%
ggplot(aes(x = elas_cross_car)) +
geom_density() +
xlab('Cross Price Elasticity of Demand') +
ylab('Kernel Density') +
ggtitle("Cross Price Elasticity of Demand with Respect to Cost of Driving a Car")+
theme(plot.title = element_text(hjust = 0.5))
As we can see from the above, the own price elasticity on average is has an absolute value of 0.665. This put demand for cars as fairly inelastic and does not change much as price of gas increases. The Cross price elsticity above on the other hand is positive and between 0 and 1, suggesting (as should be expected) that the goods are substitutes though only weakly connected.
In this section, we examine the effects of potential increases and decreases in bus travel time on various forms of travel, including biking, walking, taking the bus, and driving a car.
# Calculate aggregate choices with current parameters
agg_choices_obs <- colSums(predict(mlogit_model, newdata = commute_dfidx))
df<-as.data.frame(t(round(agg_choices_obs)))
time = list(1)
# Loop through different levels of change in bus travel time
for (i in seq(from = .5, to = 1.5, length.out = 110)){
if (i==1) next
comm_counter <- commute_m %>%
mutate(time.bus = i * time.bus)
comm_counter_dfidx <- dfidx(comm_counter, shape = 'wide',
choice = 'mode', varying = 3:10)
agg_choices_counter <- predict(mlogit_model, newdata = comm_counter_dfidx)
df <- rbind(df,t(round(colSums(agg_choices_counter))))
time=append(time, i)
}
df$time = unlist(time)
In the code above, we loop through various scenarios of bus travel time adjustments and calculate how they affect students’ choices of transportation modes. The results are stored in the “df” data frame.
#Ploting data above
ggplot(df, aes(time))+
geom_line(aes(y = bike, colour = "bike"), size=1) +
geom_line(aes(y = bus, colour = "bus"), size=1) +
geom_line(aes(y = car, colour = "car"), size=1) +
geom_line(aes(y = walk, colour = "walk"), size=1) +
scale_x_continuous(labels = function(x) paste0((x-1)*100, "%"))+
xlab("Percentage Reduction in Bus Time") +
ylab("Students") +
ggtitle("Student Choices of Travel as Bus Time Decreases") +
theme(plot.title = element_text(hjust = 0.5))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## i Please use `linewidth` instead.
The code above generates a plot that visualizes how students’ travel choices change as bus travel time changes. As we can see, number of cars on the rode increase the longer it takes to ride the bus.
# Assessing the impact of a 20% decrease in bus time
comm_counter <- commute_m %>%
mutate(time.bus = .8 * time.bus)
comm_counter_dfidx <- dfidx(comm_counter, shape = 'wide',
choice = 'mode', varying = 3:10)
agg_choices_counter <- colSums(predict(mlogit_model, newdata = comm_counter_dfidx))
# Calculate percentage changes
round(agg_choices_counter - agg_choices_obs)[1]/agg_choices_obs[1]
## car
## -0.136
round(agg_choices_counter - agg_choices_obs)[3]/agg_choices_obs[3]
## bus
## 0.1633554
In the code above, we specifically assess the impact of a 20% decrease in bus travel time. We calculate the percentage change in the number of cars and the usage of the bus in response to this change.
As a result, a 20% decrease in bus travel time leads to a 13.6% decrease in the number of cars on the road and a 16.3% increase in bus usage among students. These findings provide valuable insights for transportation planning and policy considerations.
# Calculate logsum with current parameters
logsum_old <- logsum(mlogit_model, data = commute_dfidx)
logsum_new <- logsum(mlogit_model, data = comm_counter_dfidx)
# Calculate the change in consumer surplus
change_in_consumer_surplus <- sum((logsum_new - logsum_old) / (-coef(mlogit_model)[4]/commute_m$income))
In the code above, we calculate the change in consumer surplus resulting from the 20% decrease in bus travel time. This change is a measure of the additional benefit that students gain from the reduced travel time.
The change in consumer surplus is calculated to be $86.80. This positive value indicates that the decrease in bus travel time by 20% has led to an increase in consumer surplus, indicating that the reduction in travel time is beneficial to students.
In this section, we investigate how changes in students’ income levels affect their choices of transportation modes, including biking, walking, taking the bus, and driving a car.
# Calculate aggregate choices with current parameters
df2<-as.data.frame(t(round(agg_choices_obs)))
income = list(1)
# Loop through different levels of income change
for (i in seq(from = .5, to = 1.5, length.out = 110)){
if (i==1) next
# Adjust income
comm_counter <- commute_m %>%
mutate(income = i * income)
comm_counter_dfidx <- dfidx(comm_counter, shape = 'wide',
choice = 'mode', varying = 3:10)
# Calculate aggregate choices with adjusted parameters
agg_choices_counter <- predict(mlogit_model, newdata = comm_counter_dfidx)
df2 <- rbind(df2,t(round(colSums(agg_choices_counter))))
income=append(income, i)
}
df2$income = unlist(income)
In the code above, we analyze how different levels of income changes affect students’ transportation choices. The results are stored in the df2 data frame.
ggplot(df2, aes(income))+
geom_line(aes(y = bike, colour = "bike"), size=1) +
geom_line(aes(y = bus, colour = "bus"), size=1) +
geom_line(aes(y = car, colour = "car"), size=1) +
geom_line(aes(y = walk, colour = "walk"), size=1) +
scale_x_continuous(labels = function(x) paste0((x-1)*100, "%"))+
xlab("Percentage Change in Income") +
ylab("Students") +
ggtitle("Student Choices of Travel as Income Changes") +
theme(plot.title = element_text(hjust = 0.5))
From the graph above, we can observe the relationship between changes in income levels and students’ choices of transportation modes. The graph shows how various income adjustments affect transportation decisions. As income levels decrease, there is a noticeable increase in the number of students opting for bus travel. This increase in bus ridership becomes more prominent as incomes decline. The relationship is characterized by an accelerating rate, indicating that lower-income students rely more on the bus as their income diminishes. In contrast, as incomes increase, a greater proportion of students choose to drive cars. The graph illustrates that as income levels rise, students become more inclined to drive cars as their preferred mode of transportation.
This insight highlights the impact of income changes on students’ travel choices. It demonstrates the importance of considering income-related factors when planning and assessing transportation policies and services for students.
Demand for commuting by car is fairly inelastic: The demand for cars among college students appears to be relatively insensitive to changes in factors such as gas prices or alternative transportation options.
Lowering travel time on bus by 20% will reduce the number of students traveling by car: Implementing measures to reduce bus travel times by 20% will result in 13.6% decrease in cars on the road and a 16.3% increase in bus usage.
Time is money: As people’s incomes increase, they tend to opt for commuting options that save them time even if that means spending more money.
These conclusions can help Cal Poly:
This can be achieved through:
However, it should be noted that while increasing gas prices may reduce the number of cars on the road, the college should not expect a massive shift of students to public transit due to the elasticity of demand for Car transport.