Linking Error Estimation in Stocking–Lord Linking

Robitzsch, Alexander

doi:10.3390/foundations5010002

Open AccessArticle

Linking Error Estimation in Stocking–Lord Linking

by

Alexander Robitzsch

^1,2

¹

IPN—Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany

²

Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118 Kiel, Germany

Foundations 2025, 5(1), 2; https://doi.org/10.3390/foundations5010002

Submission received: 28 October 2024 / Revised: 20 December 2024 / Accepted: 26 December 2024 / Published: 27 December 2024

(This article belongs to the Section Mathematical Sciences)

Download

Browse Figures

Versions Notes

Abstract

:

Stocking–Lord (SL) linking is a widely used linking method based on item response theory (IRT). This article examines the variability in SL linking parameter estimates within the two-parameter logistic (2PL) model. The uncertainty in SL linking arises from the sampling variability (standard error) and item selection (linking error), which can induce variability due to random differential item functioning (DIF). Three linking error estimation approaches are compared in this paper: the conventional jackknife linking error method, a newly developed approximate jackknife linking error method, and a Taylor approximation-based estimate. Simulation studies showed that the approximate jackknife method closely aligns with the traditional jackknife linking error method and outperforms the linking error estimation approach based on Taylor approximation. The adequacy of coverage rates for SL linking parameter estimates was also assessed using estimates of the total error. Results from a simulation study demonstrate that the bias-corrected total error provides superior coverage rates compared to both the conventional total error and the standard error, which does not account for item-related uncertainty due to random DIF.

Keywords:

Stocking–Lord linking; linking; item response model; jackknife linking error; 2PL model; standard error; linking error; total error; differential item functioning

1. Introduction

Item response theory (IRT) models [1] are multivariate statistical models for analyzing multivariate binary (i.e., dichotomous) random variables. Let

X = (X_{1}, \dots, X_{I})

denote a vector of I binary random variables

X_{i} \in {0, 1}

, often referred to as items or (scored) item responses. A unidimensional IRT model [1] is a statistical model for the probability distribution

P (X = x)

P (X = x; δ, γ) = \int \prod_{i = 1}^{I} [{P_{i} (θ; γ_{i})}^{x_{i}} {(1 - P_{i} (θ; γ_{i}))}^{1 - x_{i}}] ϕ (θ; μ, σ) d θ

(1)

for

x = (x_{1}, \dots, x_{I}) \in {0, 1}^{I}

. Here,

ϕ

denotes the density of the normal distribution, depending on the mean

μ

and the standard deviation (SD)

σ

; the distribution parameters of the latent variable

θ

, often referred to as a trait or ability variable, are contained in the vector

δ = (μ, σ)

. The vector

γ = (γ_{1}, \dots, γ_{I})

includes the item parameters for the item response functions (IRFs)

P_{i} (θ; γ_{i}) = P (X_{i} = 1 | θ)

(for

i = 1, \dots, I

). The IRF of the two-parameter logistic (2PL) model [2] is provided by

P_{i} (θ; γ_{i}) = Ψ (a_{i} (θ - b_{i})),

(2)

where

a_{i}

is the item discrimination and

b_{i}

is the item difficulty

b_{i}

, with

Ψ (x) = {(1 + exp (- x))}^{- 1}

as the logistic distribution function. For independently and identically distributed observations

x_{1}, \dots, x_{N}

across N persons from the distribution of the random variable

X

, the unknown model parameters in the IRT model (1) can be consistently estimated through marginal maximum likelihood estimation (MML; [3]). It is worth noting that identification constraints in the IRT model are often required, as item and distribution parameters would otherwise not be statistically identified.

IRT models are commonly used to compare the performance of two groups on a test, examining differences in the factor variable

θ

within the IRT model (1). This article focuses on linking methods [4] in the 2PL model. In the first step of the linking method, the 2PL model can be estimated separately for each group, which accommodates the possibility of items functioning differently across groups, known as differential item functioning (DIF; [5,6]). In the second step, differences in item parameters are utilized to calculate group differences in

θ

through a linking method [4,7].

Random DIF [8] introduces additional variability in the estimated mean

μ

and the SD

σ

of the second group in a linking method, making the distribution parameters

δ = (μ, σ)

sensitive to item selection even with an infinite number of persons. This variability, known as the linking error, can be quantified to assess the stability of linking parameter estimates across selected items [9,10,11,12,13].

In a separate estimation of the 2PL model, as the first step of the linking procedure, group-specific item discriminations

{\hat{a}}_{i g}

and item difficulties

{\hat{b}}_{i g}

are obtained (

i = 1, \dots, I

,

g = 1, 2

). We arrange all estimated item parameters into vectors

{\hat{a}}_{g}

and

{\hat{b}}_{g}

for the two groups

g = 1, 2

. As presented in [14], the Stocking–Lord (SL) linking method utilizes the linking function H, which is defined as

H (δ, {\hat{a}}_{1}, {\hat{b}}_{1}, {\hat{a}}_{2}, {\hat{b}}_{2}) = \sum_{t = 1}^{T} ω_{t} {[\sum_{i = 1}^{I} Ψ ({\hat{a}}_{i 1} (σ θ_{t} + μ - {\hat{b}}_{i 1})) - \sum_{i = 1}^{I} Ψ ({\hat{a}}_{i 2} (θ_{t} - {\hat{b}}_{i 2}))]}^{2} .

(3)

for

δ = (μ, σ)

. The weights

ω_{t}

are predetermined, and an equidistant grid for the ability variable

θ

is established spanning from

θ_{1}

to

θ_{T}

for

θ_{t}

with

t = 1, \dots, T

. The weights

ω_{t}

may either be uniform (equal to one) or proportional to a discretized density of a normal distribution with an SD larger than 1 (e.g., an SD of 2). The linking function (3) is referred to as asymmetric Stocking–Lord linking because it aligns the test characteristic function (TCF) from the first group with that of the second group.

The distribution parameters

μ

and

σ

can be estimated by minimizing the linking function H

\hat{δ} = (\hat{μ}, \hat{σ}) = \underset{δ}{arg min} H (δ, {\hat{a}}_{1}, {\hat{b}}_{1}, {\hat{a}}_{2}, {\hat{b}}_{2}) .

(4)

The estimating equations

μ

and

σ

are derived by calculating the partial derivatives of H with respect to

μ

and

σ

. The estimate

\hat{δ}

matches the true parameter

δ

in the absence of sampling errors and random DIF. This property validates the use of the linking function H in SL linking.

This article is devoted to the computation of the linking error in SL linking. To th best of our knowledge, there have been no attempts in the literature to derive a linking error estimation approach for SL linking. This article fills this gap by deriving an estimate based on M-estimation theory [15]. Moreover, linking error estimation can alternatively be carried out by applying the jackknife approach to items [10]. The jackknife linking error approach can be computationally intensive for the iterative SL linking method for a moderate to large number of items. For this reason, an approximation of the jackknife linking error is proposed in this article. Finally, bias-corrected linking error estimates are derived which have improved behavior in small sample sizes. The adequacy of the different error estimation approaches is compared through empirical coverage rates of

\hat{μ}

and

\hat{σ}

obtained from SL linking.

The rest of this article is organized as follows: Section 2 discusses the estimation of the standard error, linking error, and total error for SL linking; in Section 3, we present findings from a simulation study conducted with an infinite number of persons; Section 4 reports results from a simulation study that examines finite sample sizes; finally, the article closes with a discussion in Section 5 and conclusions in Section 6.

2. Assessing the Standard Error, Linking Error, and Total Error in Stocking–Lord Linking

The SL linking method can generally be formulated as the estimation problem

\hat{δ} = \underset{δ}{arg min} H (δ, \hat{γ}),

(5)

where H is the linking function used in SL linking (see (3)),

δ = (μ, σ)

is the vector of linking parameters

μ

and

σ

that represent the mean and the SD of the second group, the vector

\hat{δ} = (\hat{μ}, \hat{σ})

contains the estimated linking parameters obtained from SL linking, and the vector

\hat{γ}

contains all estimated item parameters from separate estimation of the 2PL model in the first step of the linking method. By using partial derivatives

H_{δ} = (\partial H) / (\partial δ)

, the parameter estimate

\hat{δ}

from (5) fulfills the nonlinear equation

H_{δ} (\hat{δ}, \hat{γ}) = 0 .

(6)

For a general linking function H, the true parameter vector

δ

must satisfy the population estimation equation

H_{δ} (δ, γ) = 0

, where the item parameters

γ

are assumed to be invariant across the two groups and free from sampling errors.

This article addresses estimation of the uncertainty of the estimated mean

\hat{μ}

and SD

\hat{σ}

in the parameter vector

δ

. This uncertainty arises from two main sources during group comparisons: the sampling of persons (as represented in the scenario where

N \to \infty

), and the selection or modeling of items (in the context of

I \to \infty

). The variability in linking parameter estimates due to the selection of items is a consequence of the presence of random DIF.

Much of the existing literature on linking addresses uncertainty in

\hat{γ}

by calculating the standard error (SE) of

\hat{δ}

for a fixed number of items [9,16,17,18]. In this situation, the variability in

\hat{γ}

is attributed to the sampling variability present in the estimated item parameters

{\hat{γ}}_{i}

. Variance estimation for

\hat{δ}

operates under the assumption that

N \to \infty

.

Estimated item parameters

{\hat{γ}}_{i}

for item i correspond to a population analog

γ_{i}

when considering an infinite sample size of persons. However, in the presence of model error (e.g., random DIF), the estimated linking parameter

\hat{δ}

is influenced by the chosen set of items even with an infinite sample size of persons N. This variability stemming from item selection due to DIF is referred to as the linking error (LE, [10,19]). Variance estimation operates under the assumption of

I \to \infty

.

It has been shown that linking error estimates also contain portions of sampling error [20]. In detail, the variance estimate

V_{LE}

for the linking error can be decomposed into

V_{LE} = V_{{LE}_{bc}} + f (Var (\hat{γ})),

(7)

where

f (Var (\hat{γ}))

is the expected value of the variance contribution due to the sampling variance

Var (\hat{γ})

in estimated item parameters

\hat{γ}

. If an estimate

\hat{f} (Var (\hat{γ}))

for this term were available, a bias-corrected variance estimate

V_{{LE}_{bc}}

for the linking error could be obtained from (7) as

V_{{LE}_{bc}} = V_{LE} - \hat{f} (Var (\hat{γ})) .

(8)

This article proposes a bias-corrected linking error estimate for SL linking.

The total error (TE) comprises both sources of uncertainty, that is, the standard errorcaused by to randomness due to persons and the linking error caused by to randomness (or random DIF) due to items [9,11,19,21,22]. The conventional estimate of the variance matrix for the total error is provided by

V_{TE} = V_{SE} + V_{LE} .

(9)

A bias-corrected variance estimate of the total error is provided by

V_{{TE}_{bc}} = V_{SE} + V_{{LE}_{bc}} .

(10)

In the next subsections, we discuss estimation of the SE, LE, and TE for SL linking.

2.1. Standard Error Estimation

First, we review the SE estimation of the linking parameter estimate

\hat{δ}

in SL linking. The treatment is based on the delta method, and can be found in the literature [16,17,23,24]. Let

V_{\hat{γ}} = Var (\hat{γ})

be the variance matrix of the vector of estimated item parameters

\hat{γ}

. A Taylor approximation of the partial derivative

H_{δ}

of the linking function H provides

0 = H_{δ} (\hat{δ}, \hat{γ}) = H_{δ} (δ, γ) + H_{δ δ} (δ, γ) (\hat{δ} - δ) + H_{δ γ} (δ, γ) (\hat{γ} - γ) .

(11)

Because it holds

H_{δ} (δ, γ) = 0

, we obtain

\hat{δ} - δ = A_{SE} (\hat{γ} - γ), where A_{SE} = H_{δ δ} {(δ, γ)}^{- 1} H_{δ γ} (δ, γ) .

(12)

Hence, the variance of

\hat{δ}

due to sampling persons can be written as

V_{SE} = Var (\hat{δ}) = A_{SE} V_{\hat{γ}} A_{SE}^{⊤} .

(13)

An estimate of the variance is obtained by

V_{SE} = Var (\hat{δ}) = {\hat{A}}_{SE} V_{\hat{γ}} {\hat{A}}_{SE}^{⊤}, where {\hat{A}}_{SE} = H_{δ δ} {(\hat{δ}, γ)}^{- 1} H_{δ γ} (\hat{δ}, γ) .

(14)

Standard error estimates for

\hat{μ}

or

\hat{σ}

can be obtained by taking the square root of the corresponding elements in

V_{SE}

.

The variance matrix

V_{SE}

correctly assesses the variability in

\hat{δ}

if no random DIF occurs. In the presence of random DIF, linking errors must also be considered in order to correctly reflect the uncertainty in the linking parameter estimate. The estimation of these errors is discussed in the following sections.

2.2. Linking Error Estimation Based on Taylor Approximation

The assessment of linking errors for linking methods was investigated using M-estimation theory in [19,20]. This theory has been developed for linking functions that are additive regarding the dependence on items, that is,

H_{δ} (δ, \hat{γ}) = \sum_{i = 1}^{I} h_{δ} (δ, {\hat{γ}}_{i}) .

(15)

However, SL linking does not allow such an additive decomposition. To this end, we derive a variance estimate of the linking error in SL linking that relies on Taylor approximation. We use the notation

P_{i} (θ; γ_{i g}) = Ψ (a_{i g} (θ - b_{i g})) .

(16)

The estimating function in SL linking can be written as

H_{δ} (δ, \hat{γ}) = (\begin{matrix} H_{μ} (δ, \hat{γ}) \\ H_{σ} (δ, \hat{γ}) \end{matrix}) = \sum_{t = 1}^{T} ω_{t} \sum_{i = 1}^{I} Z_{i t} (δ, \hat{γ}) D_{I, t} (δ, \hat{γ}) = 0,

(17)

where we use the abbreviations

Z_{i t} (δ, \hat{γ}) = P_{i} (σ θ_{t} + μ; {\hat{γ}}_{i 1}) - P_{i} (θ_{t}; {\hat{γ}}_{i 2}) and

(18)

D_{I, t} (δ, \hat{γ}) = (\begin{matrix} D_{μ, I, t} (δ, \hat{γ}) \\ D_{σ, I, t} (δ, \hat{γ}) \end{matrix}) = (\begin{matrix} \frac{1}{I} \sum_{i = 1}^{I} a_{i} P_{i}^{'} (σ θ_{t} + μ; {\hat{γ}}_{i 1}) \\ \frac{1}{I} \sum_{i = 1}^{I} a_{i} θ_{t} P_{i}^{'} (σ θ_{t} + μ; {\hat{γ}}_{i 1}) \end{matrix}) .

(19)

Now, we define the stochastic limits

D_{t} (δ, \hat{γ}) = (\begin{matrix} D_{μ, t} (δ, \hat{γ}) \\ D_{σ, t} (δ, \hat{γ}) \end{matrix}) = \underset{I \to \infty}{plim} D_{I, t} (δ, \hat{γ}), where

(20)

D_{μ, t} (δ, \hat{γ}) = \underset{I \to \infty}{plim} D_{μ, I, t} (δ, \hat{γ}) and D_{σ, t} (δ, \hat{γ}) = \underset{I \to \infty}{plim} D_{σ, I, t} (δ, \hat{γ}) .

(21)

We utilize a Taylor expansion of

H_{δ}

around

δ

and replace

D_{I, t}

with

D_{t}

in (17), yielding

H_{δ} (\hat{δ}, \hat{γ}) = H_{δ} (δ, \hat{γ}) + H_{δ δ} (δ, \hat{γ}) (\hat{δ} - δ) = 0, where

(22)

H_{δ δ} (δ, \hat{γ}) = I \sum_{t = 1}^{T} ω_{t} (\begin{matrix} D_{μ, I, t}^{2} (δ, \hat{γ}) & D_{μ, I, t} (δ, \hat{γ}) D_{σ, I, t} (δ, \hat{γ}) \\ D_{μ, I, t} (δ, \hat{γ}) D_{σ, I, t} (δ, \hat{γ}) & D_{σ, I, t}^{2} (δ, \hat{γ}) \end{matrix}) .

(23)

From (22), we obtain

\hat{δ} - δ = H_{δ δ} {(δ, \hat{γ})}^{- 1} H_{δ} (δ, \hat{γ}) .

(24)

The linking error can be derived from the variance matrix corresponding to the linear Taylor expansion (24) as

Var (\hat{δ}) = H_{δ δ} {(δ, \hat{γ})}^{- 1} Var (H_{δ} (δ, \hat{γ})) H_{δ δ} {(δ, \hat{γ})}^{- ⊤} .

(25)

Note that we can rewrite (17) approximately as

H_{δ} (δ, \hat{γ}) = \sum_{i = 1}^{I} \sum_{t = 1}^{T} ω_{t} Z_{i t} (δ, \hat{γ}) D_{I, t} (δ, \hat{γ}),

(26)

and

Var (H_{δ} (δ, \hat{γ}))

can be evaluated as the empirical variance of the random variables

V_{i} (\hat{δ}, \hat{γ}) = \sum_{t = 1}^{T} ω_{t} Z_{i t} (\hat{δ}, \hat{γ}) D_{I, t} (\hat{δ}, \hat{γ}) for i = 1, \dots, I .

(27)

Hence, we finally obtain the variance matrix

V_{LE, TAY}

due to the linking error as

V_{LE, TAY} = \frac{I}{I - 1} H_{δ δ} {(\hat{δ}, \hat{γ})}^{- 1} Var (H_{δ} (\hat{δ}, \hat{γ})) H_{δ δ} {(\hat{δ}, \hat{γ})}^{- ⊤} .

(28)

Note that the finite sampling correction factor

I / (I - 1) > 1

is included in (28) to improve the parameter coverage performance of confidence interval estimates in finite samples (see [19,20,25]). The linking error estimates can be obtained as the square root of the corresponding diagonal entries in

V_{LE, TAY}

.

2.3. Jackknife Linking Error

The jackknife method as a general method for estimating the linking error has been discussed in [10,11]. The main advantage of the jackknife linking error is that it is generally applicable to any linking method, even in concurrent calibration or fixed item parameter calibration [26]. The idea behind the application of this technique to linking is to repeatedly remove the ith item from the linking procedure and obtain the parameter estimate

{\hat{δ}}_{(- i)}

. This estimate differs slightly from the linking parameter estimate

\hat{δ}

that involves all items. The variance estimate of the linking error based on the jackknife technique is defined as

V_{LE, JK} = \frac{I}{I - 1} \sum_{i = 1}^{I} ({\hat{δ}}_{(- i)} - \hat{δ}) {({\hat{δ}}_{(- i)} - \hat{δ})}^{⊤} .

(29)

Note that the factor

I / (I - 1) > 1

in (29) differs from the originally proposed factor

(I - 1) / I < 1

in [10,27]. Here, we chose the former one because it yielded more satisfactory coverage rates for linking parameter estimates. Notably, the relevance of this factor diminishes with an increasing number of items I.

The variance estimate

V_{LE, JK}

contains additional variability due to sampling errors. This variance portion can be removed by the double jackknife method [11,21]. An alternative and less computationally demanding method for removing the bias contribution for the jackknife linking error is presented in Section 2.5.

2.4. Approximate Jackknife Linking Error

The jackknife linking error requires additional computation of the linking method I times (i.e., the number of items). Depending on the complexity of the linking method (e.g., if the estimation needs iterations), the jackknife estimate might be computationally cumbersome. To avoid this, the jackknife parameter estimates can be approximated by closed-form updates that involve only one iteration. We use the notation from Section 2.2 and define

Z_{i t} (\hat{δ}) = P_{i} (\hat{σ} θ_{t} + \hat{μ}; {\hat{γ}}_{i 1}) - P_{i} (θ_{t}; {\hat{γ}}_{i 2}) .

(30)

The linking parameter estimate

\hat{δ}

based on all items is based on the estimating equation

H_{δ} (\hat{δ}, \hat{γ}) = \sum_{t = 1}^{T} ω_{t} \sum_{i = 1}^{I} Z_{i t} (\hat{δ}, \hat{γ}) D_{I, t} (\hat{δ}, \hat{γ}) = 0 .

(31)

The parameter estimate

{\tilde{δ}}_{(- i)}

by removing item i approximately solves the equation

\sum_{t = 1}^{T} ω_{t} \sum_{\binom{j = 1}{j \neq i}}^{I} Z_{j t} ({\tilde{δ}}_{(- i)}, \hat{γ}) D_{I, t} (\hat{δ}, \hat{γ}) = 0 .

(32)

In (32), we assume that

D_{I, t}

is a constant that does not depend on

δ

. Next, we apply a Taylor expansion of (32) around

\hat{δ}

and use the identity due to (31)

\sum_{t = 1}^{T} ω_{t} \sum_{\binom{j = 1}{j \neq i}}^{I} Z_{j t} (\hat{δ}, \hat{γ}) D_{I, t} (\hat{δ}, \hat{γ}) = - \sum_{t = 1}^{T} ω_{t} Z_{i t} (\hat{δ}, \hat{γ}) D_{I, t} (\hat{δ}, \hat{γ}) .

(33)

The application of the Taylor formula applied to (32) yields

0 = - c_{i} + B_{i} ({\tilde{δ}}_{(- i)} - \hat{δ}), where

(34)

c_{i} = \sum_{t = 1}^{T} ω_{t} Z_{i t} (\hat{δ}, \hat{γ}) D_{I, t} (\hat{δ}, \hat{γ}) and

(35)

B_{i} = \sum_{t = 1}^{T} ω_{t} [\sum_{\binom{j = 1}{j \neq i}}^{I} a_{j 1} P_{j}^{'} (\hat{σ} θ_{t} + \hat{μ}; {\hat{γ}}_{j 1})] (\begin{matrix} D_{μ, I, t} (\hat{δ}, \hat{γ}) \\ D_{σ, I, t} (\hat{δ}, \hat{γ}) \end{matrix}) (\begin{matrix} 1 & θ_{t} \end{matrix}) .

(36)

Finally, we obtain

{\tilde{δ}}_{(- i)} - \hat{δ} = B_{i}^{- 1} c_{i} .

(37)

In this way, we obtain the variance estimate of the approximate jackknife procedure as

V_{LE, AJK} = \frac{I}{I - 1} \sum_{i = 1}^{I} ({\tilde{δ}}_{(- i)} - \hat{δ}) {({\tilde{δ}}_{(- i)} - \hat{δ})}^{⊤} = \frac{I}{I - 1} \sum_{i = 1}^{I} B_{i}^{- 1} c_{i} c_{i}^{⊤} B_{i}^{- ⊤} .

(38)

Because the derivatives of the linking function of H are often easy to obtain, the approximate jackknife method for estimating the linking error is computationally much cheaper than exact jackknife linking error estimation.

2.5. Bias-Corrected Approximate Jackknife Linking Error

The jackknife and approximate jackknife approaches to linking error estimation are also affected by sampling errors due to the sampling of persons, which positively biases these variance estimates. In the (approximate) jackknife technique, the expected value of the square of the deviations

{\hat{δ}}_{(- i)} - \hat{δ}

seeks to estimate the square of the deviation

δ_{(- i)} - δ

. We observe that

\hat{δ}

and

{\hat{δ}}_{(- i)}

solve the nonlinear equations

H_{δ} (\hat{δ}, \hat{γ}) = 0 and H_{δ, (- i)} ({\hat{δ}}_{(- i)}, \hat{γ}) = 0,

(39)

where

H_{δ, (- i)}

denotes the vector of partial derivatives from those linking function in which the ith item is removed. We can use Taylor expansion of

H_{δ}

and

H_{δ, (- i)}

, yielding

\hat{δ} - δ = U (\hat{γ} - γ) with U = H_{δ δ} {(\hat{δ}, \hat{γ})}^{- 1} H_{δ γ} (\hat{δ}, \hat{γ}) and

(40)

{\hat{δ}}_{(- i)} - δ_{(- i)} = U_{(- i)} (\hat{γ} - γ) with U_{(- i)} = H_{δ δ, (- i)} {({\hat{δ}}_{(- i)}, \hat{γ})}^{- 1} H_{δ γ, (- i)} ({\hat{δ}}_{(- i)}, \hat{γ}) .

(41)

Then, from Equations (40) and (41) we can derive

[{\hat{δ}}_{(- i)} - \hat{δ}] - [δ_{(- i)} - δ] = (U_{(- i)} - U) (\hat{γ} - γ) .

(42)

Equation (42) provides a hint for bias correction in the form of the variance of the linking errors induced by the sampling variability associated with

\hat{γ}

. Consequently, the variance bias can be expressed as

V_{Bias} = \frac{I}{I - 1} \sum_{i = 1}^{I} Var ({\hat{δ}}_{(- i)} - \hat{δ}) = \frac{I}{I - 1} \sum_{i = 1}^{I} (U_{(- i)} - U) V_{\hat{γ}} {(U_{(- i)} - U)}^{⊤},

(43)

where

V_{\hat{γ}} = Var (\hat{γ})

. Equation (43) can be utilized to compute the bias-corrected jackknife estimates

V_{{LE}_{bc}, JK} = V_{LE, JK} - V_{Bias} and V_{{LE}_{bc}, AJK} = V_{LE, AJK} - V_{Bias} .

(44)

The corresponding linking error estimates can also be obtained by taking the square root of the corresponding diagonal elements in the variance matrices

V_{{LE}_{bc}, JK}

or

V_{{LE}_{bc}, AJK}

. In the case of negative variance estimates, the linking error estimate is defined to be zero. Notably, Equation (44) allows an accurate estimation of the bias-corrected linking error to be obtained.

3. Simulation Study 1: Infinite Sample Size

3.1. Method

In Simulation Study 1, only item parameters and no item responses were generated for each replication, as the focus was on a scenario with an infinite sample size. Hence, uncertainty was only caused by DIF in item parameters, not by sampling errors due to the sampling (or choice) of persons. The 2PL model for two groups served as the IRF in the data-generating model (DGM). To ensure identification, the mean and SD of the factor variable

θ

for the first group were fixed at 0 and 1, respectively. For the second group, the mean

μ

and the SD

σ

were used to represent group differences, with the fixed values of

μ = 0.3

and

σ = 1.2

applied throughout all simulation conditions in the DGM.

We carried out the simulation study for

I = 10

, 20, 40, and 80 items. Group-specific parameters

a_{i g}

and

b_{i g}

for each item for groups

g = 1, 2

were based on fixed base item parameters and random DIF effects that were newly simulated in each replication of the simulation. The item parameters used in this simulation study align with those in [19,20]. For the case of

I = 10

items, base item discrimination values

a_{i 0}

were set at 0.73, 1.25, 1.20, 1.47, 0.97, 1.38, 1.05, 1.14, 1.15, and 0.67, resulting in a mean

M = 1.101

and an SD of 0.257 for the item discriminations. The base item difficulties

b_{i 0}

were set to −1.31, 1.44, −1.20, 0.10, 0.10, −0.74, 1.48, −0.61, 0.82, and −0.07, yielding a mean item difficulty of

M = 0.001

and an SD of 1.002. For item numbers that were multiples of 10, the parameters for the ten items were repeated accordingly. Group-specific item difficulties

b_{i g}

(g = 1, 2)

were then simulated as follows:

b_{i 1} = b_{i 0} - e_{i} / 2 and b_{i 2} = b_{i 0} + e_{i} / 2,

(45)

where

e_{i}

represents a random uniform DIF effect. Note that

e_{i} = b_{i 2} - b_{i 1}

defines the uniform DIF effect as the difference in item difficulties between the two groups. Group-specific item discriminations

a_{i g}

(g = 1, 2)

were generated as

a_{i 1} = a_{i 0} exp (- f_{i} / 2) and a_{i 2} = a_{i 0} exp (f_{i} / 2),

(46)

where

f_{i}

introduces a second random DIF effect. This nonuniform DIF effect

f_{i}

is defined as the difference in the logarithms of item discriminations, that is,

f_{i} = log a_{i 2} - log a_{i 1}

. In the simulation, the DIF effects

e_{i}

and

f_{i}

were modeled as uncorrelated, with both having a mean of zero. Both DIF effects were normally distributed. The SD of the DIF effects was set to

τ

for

e_{i}

and

0.3 \cdot τ

for

f_{i}

. In this simulation study, the DIF SDs

τ

for

e_{i}

(DIF in item difficulties) were simulated with values of 0.20, 0.40, and 0.60, resulting in respective DIF SDs for

f_{i}

(DIF in log-transformed item discriminations) of 0.06, 0.12, and 0.18.

The identified item parameters in the infinite-size sample were computed as

{\hat{a}}_{i 1} = a_{i 1}, {\hat{b}}_{i 1} = b_{i 1}, {\hat{a}}_{i 2} = σ^{- 1} a_{i 2}, and {\hat{b}}_{i 2} = σ^{- 1} (b_{i 2} - μ) .

(47)

SL linking was applied to these obtained item parameters. We compared three methods to assess the linking error. First, we applied the jackknife (JK) method as described in Section 2.3. Second, the approximate jackknife (AJK) method (see Section 2.4) was utilized. Third, the linking error was computed based on Taylor approximation (TAY), as described in Section 2.2.

In each of the 3 (DIF SD

τ

) × 4 (number of items I) = 12 cells of the simulation, 4000 replications were performed. The adequacy of the alternative linking error estimation methods was assessed for the estimated mean

\hat{μ}

and estimated SD

\hat{σ}

. We evaluated the coverage rate for

\hat{μ}

and

\hat{σ}

at a 95% confidence level, using the normal distribution to calculate the percentage of instances in which the confidence interval included the true values of

μ = 0.3

or

σ = 1.2

, respectively.

All analyses for this simulation study were conducted using R statistical software (Version 4.4.1; [28]). Dedicated R functions were written to carry out this simulation study. These functions, along with replication materials for the study, are available at https://osf.io/4dr2c (accessed on 28 October 2024).

3.2. Results

Table 1 displays the coverage rates at the 95% confidence level for the mean

μ

and SD

σ

in an infinite sample size across different SDs

τ

of the random DIF and different number of items I. For both

μ

and

σ

, the JK and AJK methods yielded nearly identical coverage rates, consistently around the target level of 95%; the largest difference between JK and AJK in the coverage rates was 0.2% for

σ

in the condition

τ = 0.4

and

I = 10

. The TAY method showed lower coverage rates in conditions of higher DIF (e.g.,

τ = 0.4

or 0.6), indicating inferior performance of this method in these simulation conditions. Coverage rates for the TAY method decreased with increasing DIF SD

τ

for both the

μ

and

σ

parameters, which was especially evident at

τ = 0.6

, where the coverage fell below 92.5%. This contrasts with the JK and AJK methods, which maintained closer adherence to the 95% target rate. As the number of items increased from

I = 10

items to

I = 80

items, the differences in the coverage rates between the three methods became smaller. These findings suggest that a larger number of items improved the quality of linking error estimation based on the Taylor method.

Figure 1 visually represents the results from Table 1 for the AJK and TAY methods. The JK method is not displayed in this figure, as its results closely mirrored those of the AJK method. Overall, Figure 1 underscores that the AJK method provides more reliable coverage across various DIF SD and item number conditions. At the same time, TAY is less precise, with a substantial undercoverage, especially in the scenarios with a large DIF SD and small number of items.

4. Simulation Study 2: Finite Sample Sizes

4.1. Method

Simulation Study 2 extends Simulation Study 1 to finite sample sizes. In this case, the variability of the estimated group mean

\hat{μ}

and estimated SD

\hat{σ}

arises from the sampling of persons (i.e., standard error) and random DIF (i.e., linking error), resulting in a total error that quantifies the overall uncertainty.

Item responses were generated based on the 2PL model for the two groups, with the tests containing

I = 10

, 20, or 40 items. The item parameters matched those used in Simulation Study 1 (see Section 3). The factor variable

θ

followed a normal distribution in both groups. As in Simulation Study 1, the

θ

distribution of the first group had a mean of 0 and an SD of 1. For the second group, the random variable

θ

was fixed at a mean

μ = 0.3

and an SD

σ = 1.2

. We modeled random DIF effects using a normal distribution, setting the DIF SD

τ

of item difficulties to 0, 0.3, and 0.6. We employed the same DGM for item parameters as in Simulation Study 1 (see Section 3, Equations (45) and (46)). The DIF SD in log-transformed item discriminations was chosen as

0.3 \cdot τ

, resulting in SDs of 0, 0.1, and 0.2. Note that the condition

τ = 0

corresponds to a scenario with no DIF. Sample sizes of

N = 500

, 1000, 2000, and 4000 were selected to reflect typical sample sizes used in 2PL model medium- to large-scale testing applications.

Unlike in Simulation Study 1, the item parameters for the 2PL model were estimated separately for each group using MML estimation in the first step. In the second step, SL linking was performed. Because of the findings in Simulation Study 1, we only evaluated the approximate jackknife (AJK) method for linking error estimation, yielding the estimation method “LE” for the linking error (see Section 2.4). Moreover, we probed the bias-correction method for estimating the linking error based on the approximate jackknife technique, denoted by “

{LE}_{bc}

” (see Section 2.5). Standard errors (SE) were computed based on the delta method as described in Section 2.1. Based on these quantities, the overall uncertainty was quantified by the total error

TE

and the bias-corrected total error

{TE}_{bc}

defined by

TE = \sqrt{{SE}^{2} + {LE}^{2}} and {TE}_{bc} = \sqrt{{SE}^{2} + {LE}_{bc}^{2}} .

(48)

In all, 4000 replications were conducted in each of the 3 (DIF SD

τ

) × 3 (number of items I) × 4 (sample size N) = 36 cells of the simulation. In our analysis of this simulation study, we calculated the median of the linking error estimates

LE

and

{LE}_{bc}

for

\hat{μ}

and

\hat{σ}

. We opted for the median instead of the mean in order to carefully investigate the performance of the estimate in the scenario with no DIF. As in Simulation Study 1, we mainly focused on assessment of the adequacy coverage rates for the estimates

\hat{μ}

and

\hat{σ}

, which were based on the standard error

SE

, the uncorrected total error

TE

, and the bias-corrected total error

{TE}_{bc}

.

All analyses in this simulation study were conducted using R software (Version 4.4.1; [28]). The 2PL model was estimated with the sirt::xxirt() function from the R package sirt (Version 4.2-73; [29]). Dedicated R functions were written for SL linking estimation and the computation of the standard, linking, and total errors. Replication materials for Simulation Study 2 are available at https://osf.io/4dr2c (accessed on 28 October 2024).

4.2. Results

Table 2 shows median estimates of the linking error LE and bias-corrected linking error

{LE}_{bc}

for the group mean

μ

and group SD

σ

in finite sample settings. The results are analyzed across various conditions of DIF SD

τ

, number of items I, and sample sizes N. It turned out that both median LE and

{LE}_{bc}

values increased with a higher amount of DIF (i.e., with increasing DIF SD

τ

), which was particularly noticeable with smaller sample sizes and fewer items. For example, the LE for

μ

at

τ = 0.6

was markedly higher than for

τ = 0

, suggesting that the DIF SD significantly impacted the linking error, as is known from the literature. Across all conditions, the bias-corrected linking error

{LE}_{bc}

was generally lower than the corresponding LE values, illustrating that the bias correction technique was effective in reducing the bias during estimation of the linking error LE under the no-DIF condition (

τ = 0

). The difference between LE and

{LE}_{bc}

was more pronounced with smaller numbers of items and smaller sample sizes. As the sample size or the number of items increased, the median LE and

{LE}_{bc}

estimates decreased, demonstrating that larger samples and more items result in smaller linking errors due to random DIF.

Table 3 presents coverage rates at the 95% confidence level for the mean

μ

and SD

σ

across different DIF SDs

τ

, item numbers I, and sample sizes N. We compared three error estimation approaches in all conditions: the standard error (SE), total error (TE), and bias-corrected total error (

{TE}_{bc}

). As the DIF SD

τ

increased from 0 to 0.6, coverage rates for SE consistently decreased, particularly for large sample sizes. For instance, when

I = 10

and

N = 500

, the SE coverage rate for

μ

dropped from 95.4% at

τ = 0

to 57.1% at

τ = 0.6

, indicating that coverage rates based on SE are inadequate under conditions with large random DIF. In contrast, coverage rates based on TE and

{TE}_{bc}

were more adequate, with the bias-corrected linking error

{TE}_{bc}

typically showing the best performance. For example, with

τ = 0.6

,

I = 10

, and

N = 500

, the coverage rate of

{TE}_{bc}

for

σ

remained above 93%, while SE coverage was significantly lower at about 82% and TE had an inflated coverage rate of about 98%. This suggests that

{TE}_{bc}

is more resilient to a larger extent of random DIF. Overall, Table 3 highlights that while SE and TE coverage rates respectively resulted in substantial undercoverage or overcoverage rates,

{TE}_{bc}

maintained adequate coverage rates across various simulation conditions.

Finally, Figure 2 summarizes the coverage rates for the SE, TE, and

{TE}_{bc}

error estimation methods across all simulation conditions in the form of a boxplot. The boxplot shows how each error estimation method performed in maintaining accurate 95% coverage rates under varying simulation conditions. The

{TE}_{bc}

method generally showed the most consistent coverage close to the desired 95% coverage rate across all conditions. This finding indicates the effectiveness of the

{TE}_{bc}

estimation method for adequately assessing the impact of random DIF in linking parameter estimates

\hat{μ}

and

\hat{σ}

. Both the SE and TE estimation methods displayed more variability in their coverage rates, with SE often undershooting the target, particularly under higher DIF conditions. This suggests that SE and TE are less reliable in accurately estimating the total amount of uncertainty in linking parameter estimates.

5. Discussion

This article has examined error estimation for Stocking–Lord linking in dichotomous responses, focusing on sampling errors from person sampling and linking errors from random differential item functioning. By introducing alternatives to the computationally demanding jackknife method, we offer two methods—the approximate jackknife approach and a Taylor approximation-based approach—that present efficient substitutes without substantially compromising accuracy. Our results reveal that the approximate jackknife approach performs almost identically to the traditional method, providing a practical tool for scenarios requiring reduced computational overhead.

Moreover, we have developed a bias-corrected linking error estimate that improves reliability, particularly with smaller sample sizes or fewer items. This approach significantly mitigates the influence of sampling error, yielding more robust linking error estimates. Coverage rates based on the bias-corrected total error surpassed those obtained with the conventional total error, highlighting its value for enhancing accuracy in educational testing and other psychometric applications.

However, this study has certain limitations. The error estimation methods were only examined for dichotomous items, suggesting a need for future extensions to polytomous items. Moreover, the linking error estimation technique for Stocking–Lord linking was applied to the 2PL, but can also be used for group comparisons involving the one-parameter, three-parameter, or four-parameter logistic IRT models. Additionally, the studied methods have yet to account for dependent item responses, such as those clustered in testlets. Exploring linking error estimation in multidimensional IRT models presents another promising area for future work. Addressing these limitations could expand the applicability and effectiveness of linking and error estimation methods, especially for complex assessment designs.

As noted by an anonymous reviewer, the group comparison focused solely on two parameters: the mean difference

μ

, and the relative change in standard deviation

σ

. However, other distributional changes, such as quantiles or the proportion of individuals exceeding a specific cutoff value, may also be of interest. When the

θ

variable follows a normal distribution, these additional parameters are nonlinear functions of

μ

and

σ

, and their associated error estimates can be derived using the delta method.

Driven by the premise that items exhibiting DIF should not be excluded from group comparisons, this article has examined linking errors under random DIF. Excluding such items risks altering the meaning of the construct, as DIF items are considered construct-relevant. In contrast, much of the existing literature assumes a subset of items with fixed DIF, with most items remaining unaffected. Under this assumption, failing to exclude DIF items could lead to biased group comparisons. However, this approach treats DIF items as construct-irrelevant, an assumption that we argue is rarely justifiable in empirical contexts.

This paper has focused on comparing two groups, without specifying the precise meaning of the grouping variable. This variable could represent demographic categories such as gender, country, or socioeconomic background; alternatively, it could define groups based on experimental treatments. The proposed methodology is also applicable to comparisons across different grades or time points.

In summary, our findings suggest that the approximate jackknife and bias-corrected linking error estimates offer valuable, efficient alternatives for Stocking–Lord linking. These contributions could enhance both the precision and practicality of linking error estimation, with implications for various psychometric and educational measurement contexts.

6. Conclusions

This article presents an efficient approach for estimating bias-corrected linking error and total error in Stocking–Lord linking. Two simulation studies demonstrate the necessity of bias correction to avoid inflated coverage rates in linking parameter estimates. The jackknife linking error estimate for Stocking–Lord linking is implemented in a computationally efficient manner, eliminating the need for repeated minimization of the linking function.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This article only uses simulated datasets. Replication material for creating the simulated datasets in Simulation Study 1 (Section 3) and Simulation Study 2 (Section 4) can be found at https://osf.io/4dr2c (accessed on 28 October 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

2PL	two-parameter logistic
AJK	approximate jackknife
DGM	data-generating model
IRF	item response function
IRT	item response theory
JK	jackknife
MML	marginal maximum likelihood
LE	linking error
SD	standard deviation
SE	standard error
SL	Stocking–Lord
TCF	test-characteristic function
TE	total error

References

Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, CT, USA, 2006; pp. 111–154. [Google Scholar]
Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 1981, 46, 443–459. [Google Scholar] [CrossRef]
Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
Holland, P.W.; Wainer, H. (Eds.) Differential Item Functioning: Theory and Practice; Lawrence Erlbaum: Hillsdale, NJ, USA, 1993. [Google Scholar] [CrossRef]
Penfield, R.D.; Camilli, G. Differential item functioning and item bias. In Handbook of Statistics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; Volume 26, pp. 125–167. [Google Scholar] [CrossRef]
Sansivieri, V.; Wiberg, M.; Matteucci, M. A review of test equating methods with a special focus on IRT-based approaches. Statistica 2017, 77, 329–352. [Google Scholar] [CrossRef]
De Boeck, P. Random item IRT models. Psychometrika 2008, 73, 533–559. [Google Scholar] [CrossRef]
Battauz, M. Multiple equating of separate IRT calibrations. Psychometrika 2017, 82, 610–636. [Google Scholar] [CrossRef] [PubMed]
Monseur, C.; Berezner, A. The computation of equating errors in international surveys in education. J. Appl. Meas. 2007, 8, 323–335. [Google Scholar] [PubMed]
Robitzsch, A. Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry 2021, 13, 2198. [Google Scholar] [CrossRef]
Sachse, K.A.; Roppelt, A.; Haag, N. A comparison of linking methods for estimating national trends in international comparative large-scale assessments in the presence of cross-national DIF. J. Educ. Meas. 2016, 53, 152–171. [Google Scholar] [CrossRef]
Wu, M. Measurement, sampling, and equating errors in large-scale assessments. Educ. Meas. 2010, 29, 15–27. [Google Scholar] [CrossRef]
Stocking, M.L.; Lord, F.M. Developing a common metric in item response theory. Appl. Psychol. Meas. 1983, 7, 201–210. [Google Scholar] [CrossRef]
Boos, D.D.; Stefanski, L.A. Essential Statistical Inference; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
Andersson, B. Asymptotic variance of linking coefficient estimators for polytomous IRT models. Appl. Psychol. Meas. 2018, 42, 192–205. [Google Scholar] [CrossRef] [PubMed]
Jewsbury, P.A. Generally applicable variance estimation methods for common-population linking. J. Educ. Behav. Stat. 2024; Epub ahead of print. [Google Scholar] [CrossRef]
Zhang, Z. Asymptotic standard errors of generalized partial credit model true score equating using characteristic curve methods. Appl. Psychol. Meas. 2021, 45, 331–345. [Google Scholar] [CrossRef] [PubMed]
Robitzsch, A. Linking error in the 2PL model. J 2023, 6, 58–84. [Google Scholar] [CrossRef]
Robitzsch, A. Estimation of standard error, linking error, and total error for robust and nonrobust linking methods in the two-parameter logistic model. Stats 2024, 7, 592–612. [Google Scholar] [CrossRef]
Haberman, S.J.; Lee, Y.H.; Qian, J. Jackknifing Techniques for Evaluation of Equating Accuracy; Research Report No. RR-09-02; Educational Testing Service: Princeton, NJ, USA, 2009. [Google Scholar] [CrossRef]
Robitzsch, A. Does random differential item functioning occur in one or two groups? Implications for bias and variance in asymmetric and symmetric Haebara and Stocking-Lord linking. Asymmetry 2024, 1, 0005. [Google Scholar] [CrossRef]
Battauz, M. Factors affecting the variability of IRT equating coefficients. Stat. Neerl. 2015, 69, 85–101. [Google Scholar] [CrossRef]
Ogasawara, H. Standard errors of item response theory equating/linking by response function methods. Appl. Psychol. Meas. 2001, 25, 53–67. [Google Scholar] [CrossRef]
Fay, M.P.; Graubard, B.I. Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics 2001, 57, 1198–1206. [Google Scholar] [CrossRef] [PubMed]
Robitzsch, A. Analytical approximation of the jackknife linking error in item response models utilizing a Taylor expansion of the log-likelihood function. AppliedMath 2023, 3, 49–59. [Google Scholar] [CrossRef]
Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2024; Available online: https://www.R-project.org (accessed on 15 June 2024).
Robitzsch, A. sirt: Supplementary Item Response Theory Models; R package version 4.2-73. 2024. Available online: https://github.com/alexanderrobitzsch/sirt (accessed on 7 September 2024).

Figure 1. Simulation Study 1: Coverage rates for

μ

and

σ

for the linking error based on approximate jackknife (AJK; in blue color) and Taylor approximation (TAY; in red color) as a function of the DIF SD

τ

and the number of items I.

Figure 1. Simulation Study 1: Coverage rates for

μ

and

σ

for the linking error based on approximate jackknife (AJK; in blue color) and Taylor approximation (TAY; in red color) as a function of the DIF SD

τ

and the number of items I.

Figure 2. Simulation Study 2: Boxplot for coverage rates for different error estimates (standard error SE, total error TE, and bias-corrected total error

{TE}_{bc}

) across all simulation conditions.

Figure 2. Simulation Study 2: Boxplot for coverage rates for different error estimates (standard error SE, total error TE, and bias-corrected total error

{TE}_{bc}

) across all simulation conditions.

Table 1. Simulation Study 1: Coverage rates at the 95% confidence level for the mean

μ

and the standard deviation

σ

in an infinite sample size as a function of the DIF SD

τ

and the number of items I.

Table 1. Simulation Study 1: Coverage rates at the 95% confidence level for the mean

μ

and the standard deviation

σ

in an infinite sample size as a function of the DIF SD

τ

and the number of items I.

		$τ = 0.2$			$τ = 0.4$			$τ = 0.6$
Par	$I$	JK	AJK	TAY	JK	AJK	TAY	JK	AJK	TAY
$μ$	10	93.1	93.1	90.1	94.1	94.2	91.5	93.5	93.5	91.0
	20	94.6	94.6	93.1	94.2	94.2	92.9	94.4	94.4	93.1
	40	94.6	94.6	93.8	94.9	94.9	94.3	94.8	94.8	94.0
	80	94.8	94.8	94.4	95.4	95.4	94.9	94.7	94.7	94.5
$σ$	10	95.6	95.5	91.6	95.0	94.8	91.2	94.6	94.6	89.9
	20	95.1	95.0	92.9	95.1	95.0	93.1	95.1	95.0	92.9
	40	94.6	94.5	93.3	94.9	94.9	94.0	95.0	95.0	93.9
	80	95.5	95.5	94.9	95.0	95.0	94.2	94.5	94.5	94.0

Note. Par = parameter; JK = linking error based on jackknife; AJK = linking error based on approximate jackknife; TAY = linking error based on Taylor approximation; Coverage rates smaller than 92.5 and larger than 97.5 are printed in bold font.

Table 2. Simulation Study 2: Median of linking error estimates LE and

{LE}_{bc}

for the mean

μ

and the standard deviation

σ

as a function of the DIF SD

τ

, the number of items I and sample size N.

Table 2. Simulation Study 2: Median of linking error estimates LE and

{LE}_{bc}

for the mean

μ

and the standard deviation

σ

as a function of the DIF SD

τ

, the number of items I and sample size N.

		$μ$						$σ$
		$τ = 0$		$τ = 0.3$		$τ = 0.6$		$τ = 0$		$τ = 0.3$		$τ = 0.6$
$I$	$N$	LE	${LE}_{bc}$	LE	${LE}_{bc}$	LE	${LE}_{bc}$	LE	${LE}_{bc}$	LE	${LE}_{bc}$	LE	${LE}_{bc}$
10	500	0.053	0.000	0.117	0.102	0.215	0.207	0.085	0.000	0.101	0.044	0.133	0.095
	1000	0.038	0.000	0.109	0.102	0.211	0.207	0.060	0.000	0.079	0.048	0.117	0.098
	2000	0.027	0.000	0.110	0.107	0.213	0.211	0.043	0.000	0.068	0.051	0.111	0.101
	4000	0.019	0.000	0.106	0.104	0.208	0.207	0.030	0.000	0.059	0.050	0.104	0.099
20	500	0.035	0.000	0.080	0.071	0.146	0.141	0.051	0.000	0.062	0.032	0.086	0.067
	1000	0.025	0.000	0.076	0.071	0.145	0.143	0.036	0.000	0.050	0.034	0.077	0.067
	2000	0.018	0.000	0.073	0.071	0.143	0.141	0.025	0.000	0.043	0.034	0.071	0.066
	4000	0.013	0.000	0.072	0.071	0.143	0.142	0.018	0.000	0.039	0.034	0.070	0.067
40	500	0.024	0.000	0.055	0.049	0.102	0.099	0.033	0.000	0.041	0.023	0.057	0.046
	1000	0.017	0.000	0.053	0.050	0.100	0.099	0.023	0.000	0.033	0.024	0.052	0.047
	2000	0.012	0.000	0.051	0.050	0.099	0.099	0.016	0.000	0.029	0.023	0.049	0.046
	4000	0.009	0.000	0.050	0.049	0.099	0.099	0.012	0.000	0.026	0.024	0.048	0.046

Note. LE = linking error;

{LE}_{bc}

= bias-corrected linking error.

Table 3. Simulation Study 2: Coverage rates at the 95% confidence level for the mean

μ

and the standard deviation

σ

as a function of the DIF SD

τ

, the number of items I and sample size N.

Table 3. Simulation Study 2: Coverage rates at the 95% confidence level for the mean

μ

and the standard deviation

σ

as a function of the DIF SD

τ

, the number of items I and sample size N.

			$τ = 0$			$τ = 0.3$			$τ = 0.6$
Par	$I$	$N$	SE	TE	${TE}_{bc}$	SE	TE	${TE}_{bc}$	SE	TE	${TE}_{bc}$
$μ$	10	500	95.4	98.1	96.1	80.8	97.3	95.5	57.1	94.8	93.8
		1000	95.0	97.8	95.5	70.4	96.3	95.0	44.0	94.8	94.3
		2000	91.6	97.7	95.6	55.5	94.9	94.3	32.2	94.5	94.1
		4000	94.9	98.1	95.6	44.0	94.6	94.0	23.8	94.1	94.0
	20	500	95.4	97.0	95.6	85.1	96.1	95.1	67.9	96.0	95.2
		1000	95.3	97.1	95.6	78.1	95.8	95.1	53.7	95.2	94.8
		2000	95.7	97.3	96.0	66.3	95.8	95.2	41.3	94.8	94.6
		4000	95.2	97.0	95.4	51.3	94.8	94.4	29.9	94.8	94.7
	40	500	93.6	94.8	93.6	88.4	94.7	93.8	73.7	94.1	93.8
		1000	94.7	95.6	94.9	84.2	95.2	94.6	64.6	95.3	95.0
		2000	94.0	95.3	94.2	76.7	95.2	94.9	50.4	95.0	94.8
		4000	92.9	94.6	93.1	61.8	94.7	94.4	38.4	95.2	95.1
$σ$	10	500	94.9	99.0	95.9	92.0	98.8	95.1	81.8	97.7	92.8
		1000	94.9	99.1	96.2	88.1	98.0	93.9	73.7	97.0	94.1
		2000	92.1	99.1	96.2	77.8	97.3	93.4	57.8	95.4	93.5
		4000	95.1	99.3	96.3	71.4	97.2	93.9	48.6	96.3	95.0
	20	500	95.0	98.0	95.5	93.3	98.0	95.4	86.5	97.7	95.2
		1000	95.1	98.5	95.9	89.9	97.9	95.5	77.7	97.1	95.3
		2000	95.6	98.4	96.3	84.8	97.5	94.9	65.5	96.2	95.0
		4000	94.9	98.2	95.6	77.3	96.8	95.4	53.2	95.4	94.8
	40	500	94.8	97.3	95.0	93.4	97.3	95.0	89.0	96.6	94.9
		1000	95.0	97.1	95.2	91.9	96.9	95.2	83.4	96.6	95.3
		2000	94.4	97.0	94.7	88.1	96.2	94.6	75.0	96.1	95.5
		4000	94.8	97.2	95.1	82.5	96.2	94.9	60.6	95.6	95.2

Note. Par = parameter; SE = standard error; TE = total error;

{TE}_{bc}

= bias-corrected total error; Coverage rates smaller than 92.5 and larger than 97.5 are printed in bold font.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Robitzsch, A. Linking Error Estimation in Stocking–Lord Linking. Foundations 2025, 5, 2. https://doi.org/10.3390/foundations5010002

AMA Style

Robitzsch A. Linking Error Estimation in Stocking–Lord Linking. Foundations. 2025; 5(1):2. https://doi.org/10.3390/foundations5010002

Chicago/Turabian Style

Robitzsch, Alexander. 2025. "Linking Error Estimation in Stocking–Lord Linking" Foundations 5, no. 1: 2. https://doi.org/10.3390/foundations5010002

APA Style

Robitzsch, A. (2025). Linking Error Estimation in Stocking–Lord Linking. Foundations, 5(1), 2. https://doi.org/10.3390/foundations5010002

Article Menu

Linking Error Estimation in Stocking–Lord Linking

Abstract

1. Introduction

2. Assessing the Standard Error, Linking Error, and Total Error in Stocking–Lord Linking

2.1. Standard Error Estimation

2.2. Linking Error Estimation Based on Taylor Approximation

2.3. Jackknife Linking Error

2.4. Approximate Jackknife Linking Error

2.5. Bias-Corrected Approximate Jackknife Linking Error

3. Simulation Study 1: Infinite Sample Size

3.1. Method

3.2. Results

4. Simulation Study 2: Finite Sample Sizes

4.1. Method

4.2. Results

5. Discussion

6. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI