The closing longevity gap between battery electric vehicles and internal combustion vehicles in Great Britain

The closing longevity gap between battery electric vehicles and internal combustion vehicles in Great Britain

Anonymized MOT test dataset

The main dataset used in this study is the anonymized MOT (Ministry of Transport) test database. The MOT test is mandatory for almost all passenger and light-goods vehicles, private buses and motorbikes in the United Kingdom, as required by the Road Traffic Act of 1988. The anonymized MOT test dataset used in this study however only covers tests in Great Britain. To ensure that vehicles are roadworthy and meet minimum environmental requirements, an MOT test must be taken at least once a year for vehicles that are 3 years or older. For certain vehicles, such as taxis, ambulances, and some motor caravans and dual-purpose vehicles, the age at which the first test is required is 1 year. The dataset includes not only information about the time, location and final outcome of the MOT test but also a number of vehicle characteristics. MOT test outcomes were computerized in 2005. As MOT computerization was not fully implemented across Great Britain until 1 April 2006, the dataset is not complete for tests conducted between 1 January 2005 and 31 March 2006. We waited for the May 2023 update, which covers tests from 2005 to 2022, and includes revised 2017 results that were previously missing due to a recording error (corrected in June 2022).

MOT tests are carried out primarily in private garages and by certain local authorities. The locations, known as Vehicle Testing Stations (VTS), are authorized and designated as appropriate by the Driver and Vehicle Standards Agency (DVSA). The VTS and their staff are subject to inspections by the DVSA to ensure that testing is conducted properly using approved equipment. Only specifically approved individuals are permitted to conduct tests, sign official test documents and make database entries. Information about the vehicles, such as the mileage, colour, fuel type and cylinder capacity, is entered or validated by the tester at the time of the test. Vehicles can be tracked using the vehicle ID field, which is based on the registration and vehicle identification number. A high-level postcode region (the first 1–2 digits of the postcode of the VTS) is also provided, but to prevent identifying any individual VTS, any region with fewer than five active sites is merged under the code ‘XX’.

Data processing

The first stage was to download the MOT test data for each year between 2005 and 2022 from the UK’s Department for Transport (DfT) website and combine them into a single dataset. During the initial cleaning process (Supplementary Table 4), we checked and verified that no records had a missing vehicle ID. As part of data quality control, it was discovered that there were occasional discrepancies in the information provided for the same vehicle in different tests. As a result, rules were established to deal with these inconsistencies. For vehicle types and fuel, information from the most recent test was used, as the classification of vehicles tends to improve over time as testers become more familiar with the new technologies. Information provided in the first test was used for colour and first use time. For cylinder capacity, a majority rule was used and the odometer information and test date from the last test in the dataset was taken to calculate the average mileage of each vehicle throughout its lifetime. Since a car can be brought back for multiple MOT tests on the same day (for example, for retesting), we select the record from the last test day that has the highest non-missing odometer reading. After resolving conflicts in the data, we removed all vehicles that had their first MOT test before it was 2 years old since these vehicles were more likely to be taxis and ambulances. We only analysed Class 4 vehicles that mainly consist of passenger and light-goods vehicles.

The final sample is restricted to four major powertrains: PE (petrol), DI (diesel), EL (electric) or HY (hybrid). We treat electric/hybrid electric (clean) codes (added since 2022) as EL/HY, respectively. While classifying petrol and diesel was straightforward, it was initially necessary to combine EL and HY together as there was no clear and consistent rule to differentiate them, especially in the early years when EVs are much less popular. For example, there were a large number of Toyota Prius (a famous HEV model) and Mitsubishi Outlander (a famous plug-in hybrid electric vehicle (PHEV) model) classified or misclassified as either HY or EL. After an initial pooling, we were then able to split the HY/EL pool into two samples.

First, those with non-missing and non-zero cylinder capacity are put into the (P)HEV sample as they all have an electric motor and an engine (suggested by the cylinder capacity information) and so must be either an HEV or PHEV. Unfortunately, the information provided in the MOT test data did not allow us to differentiate between PHEVs and HEVs so we call this sample (P)HEV. Given this limitation, our primary analysis above focuses on comparing BEVs against petrol and diesel vehicles only. However, Supplementary Note 2 provides some results for this mixed sample of HEVs (which are closer to ICEVs) and PHEVs (which are closer to BEVs).

Second, those with missing or zero cylinder capacity are more likely to have no engine and hence are classified as fully electric vehicles (BEVs). In those cases where vehicles with an engine failed to record an engine size during the MOT test, we consolidated the information on the make and models of these cars and kept only those recognized by the DVSA as BEVs so we did not accidentally include other powertrains. This means that we exclude the small number of (P)HEV vehicles that did not have information on engine size of which the make and model was not recognized by the DVSA as a BEV.

For petrol and diesel cars, we also excluded a negligible fraction of vehicles with missing or zero cylinder capacity. Petrol and diesel were placed into one of the three bins based on cylinder capacity: under 1 l, between 1 l and 2 l, and above 2 l. We dropped the make ‘LONDON TAXIS INT’ and standardized major makes. For example, any vehicles with a make of BMW and other characters (that is, additional details regarding the BMW model) were shortened to just BMW. Similar rules were applied to other makes. We also removed vehicles with unusually high mileages (exceeding 100 miles per day, as recorded at the first/last tests).

Vehicle location was inferred from the postcode area of the first recorded MOT result. Postcodes were then mapped to 11 regions in Great Britain. Relatively aggregated regions were used not only to speed up the computational process but also to allow for easier interpretation since these regions are sufficient to capture some aspects of natural driving patterns, weather conditions and certain socioeconomic characteristics. Vehicles with postcodes coded as ‘XX’ were excluded. Location assumes that owners take the vehicle to a VTS relatively close to where they live.

Finally, a cohort variable was created to capture the vintage of the technology, determined by ‘first use time’ information. Each year is defined as a new cohort and our sample includes vehicles registered in 2005–2017. Cohorts after 2017 are excluded as we want to follow a vehicle for at least two MOT tests from the first test or roughly 5 years from the first use if the vehicle still exists. For sample size reasons, only makes with at least 1,000 unique vehicles for petrol and diesel were included. For BEVs, the threshold was lowered to 100 as this powertrain was still growing from a low base during this period but provides the main motivation for the study. In robustness checks, we also restricted the sample to BEV makes with at least 1,000 vehicles.

The heuristic of death definition

As the anonymized MOT dataset does not contain explicit information on the retirement of vehicles, we use the date of a vehicle attending an MOT test as evidence of its survival up to that point in time. As our data ends on 31 December 2022, we have a right-censoring issue. More precisely, for a vehicle that regularly attends MOT tests, we do not know the exact date of its death but can conclude that it must have happened after the last MOT test is recorded in the data.

The use of MOT records allows us to infer that death occurred within a certain interval of time. A legal requirement is that if a vehicle is over 3 years old and still operating on British roads, it must attend an MOT test every year. As our database contains all MOT tests taken within our sample period, if a vehicle is not recorded as having taken a test, then it raises questions about the continued survival of that vehicle. If all vehicles strictly follow the legal requirement, we can confidently classify a vehicle as ‘retired’ if no test result is observed for a certain period (usually 1 year) after the last MOT test result recorded in the system.

However, there are a number of practical reasons why a vehicle MOT test may be delayed so we allow for a ‘buffer period’ after the date the test should have been taken before concluding that a vehicle has been retired. For example, some drivers may be unaware of the importance of regular MOT testing or when their MOT is due, particularly if the vehicle recently changed ownership. The cost of an MOT test and any necessary repairs can also be a factor for some owners, particularly if they are facing financial difficulties. Vehicles that are not used frequently or have mechanical issues may be kept off the road until they can be repaired, which can also push back the eventual MOT date that is recorded in the system.

Figure 1 gives an example of an MOT attendance pattern and illustrates the vehicle retirement assumptions used in the analysis. The top line shows that the vehicle regularly attended MOT tests at times t1, t2 and t3. As the cut-off point of our data is the end of 2022, in this case, we do not observe the vehicle fate as the expected MOT t4 has not yet happened and thus we conclude that the vehicle fails at some point after t3, or in other words within the interval (t3, ). However, the second line shows a vehicle that attended regular MOT tests up to t2 but missed the MOT test that should have happened in t3. To account for delays in taking the MOT in that year, we allow a buffer Δt and search again. If we do not see the vehicle attending an MOT test within the designated buffer period, we conclude that the vehicle no longer operates on British roads and classify it as retired between the interval (t2, t3 + Δt).

The selection of buffer time Δt is an empirical matter. One should note that if we allow for a long Δt, we may miss information on some real deaths of vehicles and lose useful information (that is, classify an interval-censored death as a right-censored death). By contrast, if we assume too short a Δt, we may misclassify some surviving vehicles with late MOT attendance as retired. Our heuristic approach to selecting the appropriate buffer time is to analyse the distribution of the gaps between consecutive MOT test dates in our cleaned database (which includes more than 264 million tests). Our analysis suggests that around 50% of tests, including those impacted by COVID-19 disruptions, fall strictly within a year of the previous MOT test. Recent research indicates that up to 5.2 million cars could be on UK roads without a valid MOT certificate, with 360,000 of these being presented for a new MOT more than a year after their previous certificate had expired52. Therefore, setting a buffer time to zero would classify any vehicle that misses an MOT test within 1 year as retired and would be too strong an assumption. By contrast, when we set the baseline buffer time to 6 months, we capture 99% of tests since results show that less than 1% of tests occur more than 6 months after the original due date. As our baseline, we classify as retired any vehicles that fail to attend an MOT test within 18 months of their last recorded test. As a sensitivity check, our results also include estimates based on two alternative thresholds 3 months early and later than our 18-month baseline at 15 and 21 months.

Survival analysis

To model the longevity of a vehicle, we use survival analysis, a statistical technique that deals with the expected duration of time until an event occurs53. More specifically, we are interested in a non-negative random variable T representing the lifetime of a vehicle, that is, the duration until retirement (being scrapped or no longer driving on British roads). The distribution of T can be characterized by a survival function, S(t) = P(T > t), which gives the probability that a vehicle will survive past a certain time t, and a hazard function, which specifies the probability for a vehicle to be scrapped in the next infinitely small period of time, Δt, conditional on the fact that the vehicle survives to time t.

$$h(t)=\lim _\Delta t\to 0\fracP(t < T < t+\Delta t)\Delta tS(t)=\fracf(t)S(t)=\fracf(t)1-F(t)$$

(1)

In this equation, f(t) and F(t) are respectively the density function and the cumulative distribution function and the survival function can be expressed as S(t) = 1 − F(t).

Adopting the proportional hazard function, a common approach to model hazard function h(t), we assume that the hazard function of a vehicle is proportionate to a baseline hazard function, h0k(t), and is adjusted by a vector of time-invariant covariates, xj, that is specific to vehicle j, and a vector of coefficients, βk. Here we use the subscript k to denote the different powertrain types, including petrol, diesel and BEVs, in both the baseline hazard and the vector of coefficients, to clarify that we model the data separately for each type.

$$h_j(t)=h_0k(t)\exp (x_j^\prime \beta _k)$$

(2)

A range of covariates are included in the analysis. (1) We use the mileage rate (MileageRatej) recorded at the last test date as a proxy for the usage pattern of vehicles hypothesizing that a vehicle driven more often will tend to retire earlier. (2) We include a cohort variable (Cohortj) as a proxy for the technology available at the time the vehicle is first on the road. (3) For powertrains with internal combustion engines, we include a vector of indicator variables (EngineSizej) for cylinder capacity to account for the variation in lifespan across engine sizes (1 l and below, 1–2 l, and 2 l and above). (4) We include a vector (Colourj) to capture the colour of the vehicle as this choice may be correlated with some unobserved traits related to the choice of colour and the characteristics of drivers that may influence driving patterns (refs. 54,55 have suggested that the visibility of vehicles may affect their safety). (5) We use the region that the MOT test was taken (Regionj) to proxy regional driving and road conditions. (6) We include a set of vehicle make indicator variables (Makej) to explain the variation in vehicle popularity, demand for luxury or cost sensitivity and to capture the possibility that the make of a vehicle may also be correlated with driver characteristics. Equation 2 can be expanded as follows, where Greek lowercase characters denote coefficients and Greek uppercase characters denote vectors of coefficients:

$$\beginarrayrclh_j(t)&=&h_0k(t)\exp \left(\alpha _k+\gamma _k\mathrmMileageRate_j+\delta _k\mathrmCohort_j\right.\\ &&\left.+\Pi _k\mathrmEngineSize_j+\Phi _k\mathrmColour_j+\Psi _k\mathrmMake_j+\Omega _k\mathrmRegion_j\right)\endarray$$

(3)

Here we do not explicitly model the impact of policies on the scrappage decisions of vehicle owners. Although there was a UK-wide, government-backed scrappage scheme introduced in the 2009 UK Budget38, it was terminated in March 2010 and did not target vehicles registered after 2005 (which is the first cohort included in our sample). More recent regional scrappage schemes, including Birmingham (2021), Bristol (2022), London (2023) and Scotland (2023)56, had only a negligible effect on the vehicles in our dataset, given their proximity to the end of our study period (2022). As such, the longevity estimates are mainly driven by mechanical ageing, user behaviour, accidents and market factors, rather than explicit policies. Market factors may include various scrappage schemes run by car manufacturers, which typically offer financial incentives to trade in old vehicles for new.

We further assume that the baseline hazard function is parametric and follows a Weibull distribution such that

$$h_j(t)=\rho _kt^\rho _k-1\exp (x_j\beta _k)$$

(4)

The key implication of this parametric form is that the hazard rate is monotonic and increasing or decreasing over time, depending on whether the shape parameter ρk is greater or smaller than 1, respectively. If ρk = 1, the hazard rate is constant over time and the Weibull simplifies to an exponential distribution. The parameterization λj = exp(xjβk), which is non-negative, time invariant and covariate dependant, scales the baseline hazard rate up or down and is specific to each vehicle27. We use the Weibull proportional hazard model as the literature suggests that it is well suited to model the retirement of vehicles with censored data27,57. Again, the subscription k of ρk highlights the fact that our models permit distinct shape parameters across powertrains. Meanwhile, other observable covariates come into play, affecting the scale parameter of the Weibull distributions within each powertrain.

The vector of the coefficient β and the shape parameter ρ were estimated with maximum likelihood. As discussed above, the observations are either right-censored (j RC) or interval-censored (j IC). This means that we do not observe tj directly but instead have its lower bound tlj (the last MOT test the vehicle attended) and the upper bound tuj for some vehicles that missed a recent MOT test. The log-likelihood function for estimation can be written as follows:

$$\log L=\mathop\sum \limits_j\in \mathrmRC\log S_j(t_lj)+\mathop\sum \limits_j\in \mathrmIC\log [S_j(t_lj)-S_j(t_uj)]$$

(5)

For each vehicle and standard in the literature, we estimate the median lifetime as the point in time where the survival function reaches a value of 0.5:

$$\hatl_j=\t:\hatS_j(t)=0.5\$$

(6)

The median lifetime mileage is then estimated as the product of the estimated median lifespan and the estimated mileage rate \((\hatr_j)\) recorded in the last MOT test.

$$\hatm_j=\hatl_j\times \hatr_j$$

(7)

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

link

Leave a Reply

Your email address will not be published. Required fields are marked *