Appendix A
Overview of NHTSA Methodology

Nan Humphrey
TRB Staff

In this appendix, an overview is provided of the data sources and methodology used by NHTSA staff in their most recent series of studies on the relationships between average vehicle weight, size, and safety. There are six studies in all--one lengthy analysis investigating the relationship between vehicle weight and fatality risk, four analyses investigating vehicle weight effects on moderate to serious injuries, and one analysis investigating effects on less serious injuries--plus a summary presenting the agency's principal findings.

Although the studies use different data sources and methodologies, they make the following common assumptions:


The Kahane analysis is the most comprehensive and complex of the studies. It draws on the Fatal Accident Reporting System (FARS) database--a census of fatalities in highway crashes in the United States--and on the R.L. Polk National Vehicle Population Profile for vehicle registration data. It also uses data from 11 state accident files to derive information on many of the independent variables, such as driver age and gender, that are not available at the vehicle level from the Polk data.

The objective of the analysis is to estimate the relationship between vehicle curb weight and fatality risk per some measure of exposure for model year 1985-1993 passenger cars and lights trucks based on their crash experience in the U.S. from 1989 through 1993 (Table A-1). Exposure data are problematic however. Data on miles traveled by vehicle make and model are not available. Data on vehicle registrations are available, but do not provide information about key control variables, such as the age and gender of drivers. Hence several analysis strategies were attempted.

Two approaches use the state accident data (described above) and an induced-exposure method to estimate fatality risk. Induced exposure refers to an indirect method of estimating exposure when direct methods, such as vehicle miles traveled or vehicle years, are unavailable at the level of desired detail. There are many ways of defining induced-exposure crashes. In this report, induced-exposure crashes, which are reported on state accident files, record involvements in multiple vehicle crashes where one vehicle was standing still--waiting for traffic to clear or a green light--and got hit by a second vehicle1. The hit vehicle in the crash, the driver of which is not culpable, is said to be a surrogate for exposure, because it measures how often a vehicle "is there" where it can be hit by other vehicles (Kahane, p.22).

The first of the two analysis approaches attempts to estimate the percentage change in fatality risk per 100 induced-exposure crashes using logistic regression and accident data from 11 states. The second approach attempts to estimate the percentage change in induced-exposure crashes per 1000 vehicle registration years, a more commonly accepted exposure measure, employing a 2-step aggregate linear regression (explained subsequently) and using state accident and Polk registration data. The purpose of the second analysis is to estimate whether and to what extent size-related bias in fatality rates could have been introduced by using the indirect, induced-exposure measure2.

Unfortunately, in Kahane's view, neither analysis proved successful. The first approach produced results which he judged biased for light trucks (Kahane, p. 87). The second approach showed a statistically significant but practically insignificant bias for passenger vehicles, but, in his view, a strong bias for light trucks. After controlling for driver age and sex as well as other variables, the second regression still showed a significant decline in induced-exposure accident rates per 1000 light truck vehicle registration years (Kahane, p. 121).

The third and final analysis strategy, which the author believes produces more reliable results, uses a combination approach. It estimates the percentage change in fatality risk given a 100-lb. change in vehicle weight using a more commonly accepted exposure measure, vehicle registration years. Most of the control variables, however, are still drawn from the induced-exposure accident data, the only source for this information. Because many of the control variables, particularly for age and sex, are not defined for individual vehicle registration years, the registration and fatality data are aggregated by vehicle make-model and model year. For each make-model and model year, the number of fatalities is divided by the sum of the vehicle registrations to obtain a fatality rate, the dependent variable. Then, the average age of drivers, sex of drivers, percentage of fatalities occurring at night etc. in the induced-exposure crash involvements are found for each make-model and model year, defining each independent variable. Finally, an aggregate linear regression is performed on the fatality rates and the independent variables (Kahane, p. 110, 112, 164).

Two additional adjustments are made. First, to avoid the problem of small cell size with numerous independent variables, the regression analysis is performed in two steps. Step 1 estimates the effect of vehicle age, state group, and calendar year on fatalities per million vehicle registration years (Kahane, 164)3. Then, vehicle registration counts are adjusted upwards or downwards based on the coefficients from Step 1. Step 2 reaggregates the data by make-model and model year with a further aggregation across model-years, body styles, and/or make-models until a minimum cell size of 334,600 vehicle registration years is reached4. Finally, estimates are made of the effect of curb weight (and track width for rollover crashes), driver age and sex, and the remaining control variables on fatalities per million adjusted vehicle registration years.

A second adjustment is made because of the high correlation between the independent variables representing vehicle weight and driver age introduced by the aggregation of the data at the make-model, model-year level. The coefficients for age and gender effects from the earlier regressions (i.e., fatalities per 1000 induced-exposure crashes, and induced-exposure crashes per 1000 vehicle years) are summed to provide the net effect per car registration year for each crash type, and are substituted in the estimating equation5.

The final task of the study is to apply the percentage changes in the fatality rate as estimated by the third analysis approach to the absolute numbers of fatalities in a baseline year--1993 in this analysis--to obtain estimates of the effects of 100-lb. reductions in vehicle weight on the absolute numbers of fatalities.


Partyka's analyses focus on the relationship between vehicle weight and moderate to serious occupant injuries. The primary data source is the National Accident Sampling System (NASS), which provides information on a nationally representative sample of police-reported crashes involving at least one towed passenger car or light truck. Injuries are categorized by severity based on the Abbreviated Injury Scale (AIS) and include moderate to severe injuries (AIS > 2) or a fatality6.

The injury analyses differ from Kahane's analysis of fatalities in the following ways (Table A-1). First, the sampling time frame is extended to cover crashes from CY 1981 through 1993, a 12-year rather than a 5-year period, to assure adequate sample size. It also includes vehicles of all model years. Second, the injury analyses only look at vehicle crashworthiness (not at the likelihood of being in a crash) and only focus on vehicle occupant injuries in these crashes. As a result, they exclude crash effects on pedestrians and bicyclists. The analysis that is compared with the Kahane study in the summary review (Analysis 1)7, also excludes rollover crashes where the primary effect of vehicle weight is on crash proneness not crashworthiness8. Thus, the injury analyses do not provide as comprehensive an assessment of the net safety effects of reductions (or increases) in vehicle weight as does the Kahane study. Finally, the injury analyses simply regress vehicle weight on driver injuries without controlling for driver age, vehicle speed, or type of highway. Differences in vehicle use, however, are discussed in one analysis (Analysis 2)--Passenger Vehicle Weight and Driver Injury Severity--and are the primary topic of another--Patterns of Driver Age, Sex, and Belt Use by Car Weight.

The objective of the injury analyses is to estimate the relationship between vehicle curb weight and driver injury risk of all model year passenger cars and light trucks involved in towaway crashes during CY 1981 through 1993. In Analysis 1, injury rates are defined at the crash level (i.e., number of injured drivers divided by the total number of towaway crashes by crash type (e.g., nonrollover single-vehicle, car with heavy truck, car with car). In Analysis 2, injury rates are defined at the driver level (i.e., number of injured drivers divided by the total number of drivers involved in towaway crashes by crash type). In both cases, simple linear regression is used to estimate the effect of changes in vehicle weight on injury rates.

A final task is to apply the injury rate coefficients from the regressions to the national-level NASS estimates for baseline year 1993 to project the effects of 100-lb. reductions in vehicle weight on the absolute numbers of moderate to serious injuries. In Analysis 1, driver injury rates are assumed to be representative of all vehicle occupant injury rates and are thus used to estimate the effects of vehicle weight reductions on total vehicle occupant moderate to serious injuries for 1993.


Hertz's analysis focuses on the relationship between vehicle weight and less serious occupant injuries, although fatalities are also included. The data are from two state accident files--Florida and Illinois. These states were selected because they both experience large numbers of crashes each year, thus yielding large sample sizes, and they record vehicle identification numbers for crash involved vehicles, which allows precise identification of vehicle weight, crucial to the analysis. Injuries are categorized by severity using the KABCO injury coding scheme and include "A" or incapacitating injuries, which in practice comprise many minor injuries, and "K" injuries or a fatal injury9.

The Hertz analysis compares with the Kahane and Partyka analyses in the following ways (Table A-1). First, the model year (1985-1993) coverage is the same as the Kahane analysis, although the crash year coverage is slightly different (1989 is excluded). Second, similar to the Partyka analyses, Hertz only looks at vehicle crashworthiness. Her reason for not addressing vehicle crash proneness or crash avoidance is the difficulty of predicting changes in driver use patterns that could affect the relationship between vehicle weight and crash risk (Hertz, p. 6). Finally, in contrast to the Partyka analyses, Hertz does attempt to control for the effect of driver age and a surrogate measure of travel speed (rural crash location) in examining the effect of changes in vehicle weight on the likelihood of incapacitating driver injuries in a crash10.

The objective of the Hertz analysis is to estimate the percentage change in driver incapacitating injury rates in crashes of MY 1985-1993 passenger cars and light trucks during CY 1990-1993 as a function of vehicle weight, controlling for driver age and travel speed. Logistic regression is used as the analysis technique.

A final step in the analysis is to apply the percentage change in incapacitating injury rates to national level estimates for baseline 199311 to project the effects of 100-lb. reductions in vehicle weight on the absolute number of incapacitating injuries. In these calculations, the estimated change in the injury probability for the driver is assumed to be the same for all vehicle occupants, and thus is used to estimate the effects of vehicle weight reductions on total vehicle occupant incapacitating injuries.


  1. The more traditional measure of induced-exposure is a nonculpable driver of a vehicle involved in a multivehicle collision. A narrower definition was used in the Kahane paper that does not depend on a police determination of culpability. There is no need to assess culpability for a vehicle that was standing still and got hit by someone else.
    Return to text

  2. According to Kahane, the strategy is to compute the incidence rate for the questionable exposure measure (induced-exposure crashes) relative to a more commonly accepted exposure measure (vehicle years) as a function of vehicle weight, controlling, to the extent possible, for driver age and sex. If the ratio of induced exposure to vehicle years is constant across vehicle weights, then induced exposure may be considered an unbiased surrogate for exposure. If the ratio moves up or down as weight increases, the amount by which the ratio moves is a measure of the amount of bias (Kahane, 102).
    Return to text

  3. In this step of the analysis, the FARS and Polk data are aggregated by state group, calendar year, and model year.
    Return to text

  4. This level of aggregation should assure a minimum of 5 expected fatalities per unit of aggregation, the minimum suggested sample size (Kahane, p. 167).
    Return to text

  5. These regressions had shortcomings (biases, insufficient sample sizes) for accurately estimating the relatively weak weight-safety relationship, but, according to the author, they can provide a sufficiently accurate estimate of the much stronger age-safety relationship (Kahane, p. 173).
    Return to text

  6. The seven categories of the AIS are as follows: (1) minor injury, (2) moderate injury, (3) serious injury, (4) severe injury, (5) critical injury, (6) maximum (untreatable), and (7) injured, unknown severity.
    Return to text

  7. This analysis is entitled Effect of Vehicle Weight on Crash-Level Driver Injury Rates.
    Return to text

  8. One of the analyses, Passenger Vehicle Weight and Driver Injury Severity, does examine rollover crashes but does not find a statistically significant relationship between vehicle weight reduction and injury rates for this type crash.
    Return to text

  9. The KABCO injury scale includes the following categories: (1) K=fatal injury, (2) A=incapacitating injury, (3) B=non-incapacitating injury, and (4) C=possible injury.
    Return to text

  10. A driver gender variable was also included, but was dropped because it never proved statistically significant. A variable accounting for seat belt use would also have been desirable. However, reported belt use in crashes is not considered to be reliable. To minimize the number of injuries that could be attributed to lack of belt use, ejected drivers were eliminated from the Florida data; ejected drivers were not identifiable in the Illinois data (Hertz, p. 8).
    Return to text

  11. NHTSA's General Estimates System (GES) provides the baseline estimates for 1993. GES data are obtained from a nationally representative probability sample selected from all police-reported crashes. To be eligible for the GES sample, a police accident report must be completed for the crash, and the crash must involve at least one motor vehicle traveling on a highway and result in property damage, injury, or death.
    Return to text

Return to Report on Review of Federal Estimates of the Relationship of Vehicle Weight to Fatality and Injury Risk