Home > Publications > Research > Productivity and Local Workforce Composition > Empirical Approach

Productivity and Local Workforce Composition

Empirical approach

We estimate the relationship between productivity and local workforce characteristics using a gross output Cobb-Douglas production function augmented with area-level workforce composition measures,

`G0_(it)=phi_(it)^A+beta_j^kK_(it)+beta_j^LL_(it)+beta_j^MM_(it)+(lambda_i+alpha_(jt)+epsilon_(it))`, (1)

Help displaying formulae

where i denotes a firm, t refers to time period and j indicates parameters that vary by industry. Output (GOit), capital services (Kit), labour input (Lit), and intermediate consumption (Mit) are all measured in logarithms. The error term potentially has components corresponding to firms, industries, and time periods. The first term (`phi_(it)^A` ) is the Hicks-neutral contribution to productivity in period t of characteristics of the area (Ai) in which firm i operates. This contribution is entered as a linear combination of local workforce measures,

`phi_(it)^A=gamma^(Dens)[text(% Population Density)]_(it)^A+gamma^(HS)[text(% degree qualified)]_(it)^A``+gamma^(New)[text(% New to Area)]_(it)^A+gamma^(Mig)[text(% Foreign-Born)]_(it)^A+e_(it)`, (2)

Help displaying formulae

We use annual production data, combined with area information that is available only every five years. Consequently, we estimate equation (1) in two stages. In the first stage, we estimate productivity using an annual firm-level panel, but omitting area characteristics. We estimate a separate regression for each industry, allowing for clustered errors at the firm-level.

In the second stage, we regress the residuals from the first-stage regression (multi-factor productivity) on the right-hand-side terms of equation (2). The second stage regression is estimating using 5-yearly firm-level data, with separate intercepts for industry and for year. We allow for area-clustered errors, since the area level characteristics are common to all firms with the same geographic distribution (Moulton, 1990).[1]

Workforce composition is potentially endogenous, as entrants and high skilled workers may be attracted to areas with high-productivity firms. We use an instrumental variables approach to adjust for this endogeneity. Specifically, we use five-year lags of the composition variables as instruments in the second stage regression.

We also control for selected firm-level workforce characteristics that may be correlated with the area level composition measures. Firms in areas where there is a high proportion of the workforce with a degree qualification will themselves employ more highly qualified personnel. Productivity in equation (1) is estimated using a headcount measure of labour input, which is likely to understate the effective labour input used by firms in high-skilled areas. Similarly, a high proportion of people new to an area may be reflected in higher worker turnover rates for local firms, which may have an independent influence on productivity. Consequently, we augment equation (2) by adding firm-specific labour quality and turnover measures.


[1] In practice, we observe firms operating in more than one location and measure geographic variables as the firm’s average (employment-weighted) exposure to area characteristics. Clustering of errors is corrected for based on clusters identified from common combinations of area characteristics. Our standard errors do not allow for the variability associated with the use of generated regressors obtained from the first stage, and will therefore be somewhat understated. We generated one-step estimates for our main specifications and found that coefficients and standard errors were very similar to those obtained using our two-step procedure. On this basis, we judge that our results would be largely unchanged if we were to use one-step estimation or generate bootstrap standard errors for our two-stage estimates.