On the Convergence Rate of the SCAD-Penalized Empirical Likelihood Estimator

This paper investigates the asymptotic properties of a penalized empirical likelihood estimator for moment restriction models when the number of parameters (p) and/or the number of moment restrictions increases with the sample size. Our main result is that the SCAD-penalized empirical likelihood estimator is &#8730;n/P<sub>n</sub>-consistent under a reasonable condition on the regularization parameter. Our consistency rate is better than the existing ones. This paper also provides sufficient conditions under which both &#8730;n/P<sub>n</sub>-consistency and an oracle property are satisfied simultaneously. Our results provide a solid theoretical support to the penalized empirical likelihood estimator of Leng and Tang (2012).


Introduction
Recently, sparse regression models have received considerable attention in business, economics, genetics, and various other fields. In these models, the number of possible regressors can be potentially large; however, only a relatively small number of these regressors are relevant.
Penalization is an alternative to a classical subset selection. One of the drawbacks of subset selection is lack of stability due to its discrete nature, meaning that variables are either retained or are dropped from a model. As a result, a small perturbation in a sample may cause a drastic change in the post-selection results (Breiman 1996). Penalization addresses this issue by achieving variable selection and estimation simultaneously, through a continuous process.
Several penalization methods have been advocated for linear regression models. Examples include the bridge penalty (Frank and Friedman 1993), LASSO (Tibshirani 1996), the smoothly clipped absolute deviation (SCAD) penalty (Fan and Li 2001), and the elastic net penalty (Zou and Hastie 2005). However, penalized least squares methods are not applicable when endogeneity exists (Fan and Liao 2014). When endogeneity exists, parameters of interest are identified often by moment restrictions, using instrument variables.
This study investigates the asymptotic properties of a penalized empirical likelihood (PEL) estimator for moment restriction models, when the number of parameters and/or the number of moment restrictions increases with the sample size. We extend the EL estimator of Qin and Lawless (1994) by equipping the SCAD penalty, so that we can achieve estimation and variable selection simultaneously. Some penalized estimators for moment restriction models have been proposed in the econometric literature. Caner (2009) and Shi (2016b) considered the GMM estimator with a LASSOtype penalty. Caner and Zhang (2014) proposed the adaptive elastic net GMM estimator. Fan and Liao (2014) proposed the penalized focused GMM estimator. Leng and Tang (2012) and Chang, Chen, and Chen (2015) studied the asymptotic properties of the PEL estimator for independent and weakly dependent observations, respectively. Tang, Yan, and Zhao (2017) considered a penalized exponential tilting estimator. This paper shows that the SCAD-penalized EL estimator is √ n/p n -consistent, where p n is the number of parameters. Leng and Tang (2012) showed that the non-penalized EL estimator is √ n/p n -consistent under the assumption that p n /r n → c ∈ (0, 1), where r n is the number of moment restrictions. Thus, essentially, they only proved √ n/r n -consistency. Chang, Chen, and Chen (2015) proved √ n/p n -consistency of the non-penalized EL estimator without imposing p n /r n → c ∈ (0, 1), but they only obtained √ n/r n -consistency for the PEL estimator.
We prove √ n/p n -consistency of the PEL estimator under a reasonable condition on the regularization parameter of the penalty function. Therefore, our result strengthens the theoretical property of the PEL estimator for over-identified models.
This paper also shows that the PEL estimator satisfies the oracle property in the sense of Fan and Peng (2004). Although Leng and Tang (2012) and Chang, Chen, and Chen (2015) also discussed the oracle property of the PEL estimator, they did not provide sufficient conditions under which √ n/r n -consistency and the oracle property are satisfied simultaneously. As far as we know, this paper is the first to specify sufficient conditions for both √ n/p n -consistency and the oracle property of the PEL estimator.
Recently, Chang, Tang, and Wu (2017) proposed an alternative PEL estimator that regularizes both parameters and Lagrange multipliers. Their estimator allows the case where r n and p n increase at an exponential rate, while our PEL estimator allows a polynomial rate only.
Their method is useful when the truth is actually sparse. In contrast, our estimator is valid even when the truth is not sparse.
There is also a large literature on instrument (moment) selection that addresses the problem of selecting/constructing optimal instruments when a large number of instruments are available (e.g., Donald and Newey 2001;Bai and Ng 2009;Kuersteiner and Okui 2010;Belloni, Chen, Chernozhukov, and Hansen 2012;Caner and Fan 2015;Cheng and Liao 2015;Shi 2016a). In contrast to these papers, here we focus on variable selection in a structural model. This paper is organized as follows. We first show √ n/p n -consistency of the SCAD-penalized EL estimator and compare our assumptions with those of Leng and Tang (2012) and Chang, Chen, and Chen (2015). Then, we obtain the asymptotic distribution. Our proofs of √ n/p nconsistency and asymptotic normality are new in the EL literature. All the proofs are found in the Appendix. We omit the Monte Carlo study because the estimator itself is the same as that of Leng and Tang (2012).

PEL estimator and asymptotic results
Let {y 1 , . . . , y n } be a random sample from an unknown distribution on R dn . This study considers the moment restriction model where θ 0 = (θ 10 , . . . , θ pn0 ) ′ ∈ Θ n is a p n -dimensional true parameter and m(y, θ) = (m 1 (y, θ), . . . , m rn (y, θ)) ′ is an r n -dimensional moment function. For instance, the model includes the linear instrumental variable model where z i is an r n × 1 vector of instrumental variables and x i is a p n × 1 vector of explanatory variables. We consider the case where r n ≥ p n . The subscript indicates that d n , p n , and r n may increase with the sample size.
For concreteness, we employ the SCAD penalty of Fan and Li (2001): for some a > 2. Similar asymptotic results are obtained also by using a different penalty function, such as the minimax concave penalty of Zhang (2010).
We impose the following conditions for √ n/p n -consistency.

Assumption 2.1 (i) The true parameter vector θ 0 is the unique minimizer of Q n (θ, λ(θ))
and belongs to the interior of Θ n ; (ii) There are positive functions ∆ 1 (r, p) and ∆ 2 (ϵ) such that for any ϵ > 0

Assumption 2.4 (i) The moment function m(y, θ) is twice continuously differentiable in θ
for all y in a neighborhood of θ 0 ; (ii) There exists C such that λ min a neighborhood of θ 0 with probability approaching one.
Assumption 2.1 is similar to condition 2.1 of Chang, Chen, and Chen (2015). Assumption 2.1 (iii) is an extension of the uniform convergence. If we restrict the parameter space such that Θ n is compact and condition that guarantees consistency of the estimator can replace Assumption 2.1.
Assumptions 2.2 (i) and (ii) are similar to Assumptions 2 and 4 in Leng and Tang (2012).
However, we do not assume that p n /r n → c ∈ (0, 1). Thus, r n can grow faster than p n . We can allow the case where p n is fixed and only r n increases with the sample size.
Assumption 2.4 states that the objective function of the EL estimator is strictly convex in θ in a neighborhood of θ 0 . When r n and p n are fixed, this condition is satisfied under fairly weak conditions. If we relax the condition so that λ min Assumption 2.5 is similar to condition (B2) in Huang and Xie (2007), who obtained the convergence rate of the SCAD-penalized least squares estimator. Following Huang and Xie (2007), we prove √ n/p n -consistency of the PEL estimator in two steps. We first prove ∥θ n − under Assumptions 2.1-2.4 and q n κ 2 n → 0 (see Lemma A.3 in the Appendix). Then, we improve the convergence rate by using Assumption 2.5.
Under these conditions, we obtain the convergence rate of the PEL estimator.
The sparsity assumption is not necessary for this theorem. As we will see in the next theorem, if the truth is sparse, then we obtain √ n/q n -consistency under certain additional assumptions.
Our convergence rate of the PEL estimator is better than that of Chang, Chen, and Chen (2015). Roughly speaking, different convergence rates are based on different equalities. The asymptotic analyses of Leng and Tang (2012) and Chang, Chen, and Chen (2015) are based on Leng and Tang (2012) obtained √ n/p n -consistency of the non-penalized EL estimator by assuming On the other hand, our asymptotic analysis is based Therefore, our proof is not a straightforward extension of that of Leng and Tang (2012) and Chang, Chen, and Chen (2015).
If we obtain a convergence rate in line with the proof of Leng and Tang (2012) and Chang, Chen, and Chen (2015), we need a rather strong condition on the regularization parameter.
For instance, Chang, Chen, and Chen (2015) assumed that q n κ n r −1 n nM −1 = O(1) to prove √ n/r n -consistency, where M is the block length, which is equal to unity when the observations are independent. As stated before, we can obtain under Assumptions 2.1-2.4 and q n κ 2 n → 0. The condition of Chang, Chen, and Chen (2015) corresponds to the condition that √ q n κ n = o( √ p n /n) in our case. Although this condition simplifies the proof of √ n/p n -consistency, it causes a problem for the sparsity of the estimator as we will see later.

Assumption 2.8 There exists
Assumption 2.6 (i) is a key condition for sparsity. It requires that the regularization parameter is not too small.
Theorem 2.2 Suppose that Assumptions 2.1-2.8 hold. Let B n be an l × q n matrix such that Then, the PEL estimator satisfies the following: 1. Sparsity:θ 2n = 0 with probability approaching one.

Asymptotic normality:
Although a detailed proof is given in the Appendix, we give a sketch of the proof for asymptotic normality here. If λ(θ) were known, then θ 0 can be estimated bỹ which is a penalized maximum likelihood estimator using a least favorable submodel of the moment restriction model (see Sueishi 2016, for instance). Becauseθ n is the penalized maximum likelihood estimator, its distribution can be obtained in a manner similar to Fan and Peng (2004). We derive the asymptotic distribution ofθ n by showing thatθ n is asymptotically equivalent toθ n .
By modifying the proofs of Theorems 2.1 and 2.2, we can obtain easily the asymptotic distribution of the non-penalized EL estimator. Although we omit the proof, we can show that the efficiency of the PEL estimator for θ 10 is the same as that of the non-penalized EL estimator for which it is known a priori that θ 20 = 0. Thus, our estimator satisfies the oracle property in the sense of Fan and Peng (2004).
Theorem 2.2 is similar to Theorem 3 of Leng and Tang (2012). However, they proved sparsity by assuming that the PEL estimator is √ n/p n -consistent. They did not state explicitly the conditions under which the non-penalized and penalized EL estimators have the same convergence rate.
Chang, Chen, and Chen (2015) showed a similar result to Theorem 2.2 for weakly dependent observations. They obtained √ n/r n -consistency and sparsity under two separate κ n rate conditions. Specifically, they assume: (i) q n κ n r −1 n nM −1 = O(1) for √ n/r n -consistency and (ii) κ n √ n/r n M −1 → ∞ for sparsity. If condition (ii) is satisfied, however, condition (i) requires that q n √ n/r n → 0, which is clearly impossible. We relaxed condition (i) and obtained sufficient conditions under which both √ n/r n -consistency and sparsity are satisfied.

Conclusion
We investigated the asymptotic properties of the PEL estimator when the number of parameters and/or the number of moment restrictions increases with the sample size. Especially, we showed that the PEL estimator is √ n/p n -consistent under a reasonable condition on the regularization parameter. Although we cannot compare our results directly to those of Chang, Chen, and Chen (2015) because they allow weakly dependent observations, our convergence rate is improved over the existing ones. Our consistency rate is natural because it implies √ n-consistency of the estimator when p n is fixed and only r n increases with the sample size.
This is consistent with previous results in the econometric literature such as Donald, Imbens, and Newey (2003). In terms of converge rate, our result is even better than Tang, Yan, and Zhao (2017) and Chang, Tang, and Wu (2017), because their convergence rates depend also on the number of moment restrictions.
A crucial issue with the PEL estimation concerns selecting the size of the regularization parameter. The asymptotic theory does not tell us how to select the regularization parameter in practice. We are working on this important issue in a separate project.

A Appendix
Throughout the Appendix, C denotes a generic positive constant which may vary according to context. The qualifier "with probability approaching one" is abbreviated as w.p.a.1. We .
We prepare some lemmas to prove Theorems 2.1 and 2.2.
Combining these results, we obtain which implies the desired result by Assumption 2.2 (ii).
This implies that min 1≤j≤qn |θ j | > aκ n for sufficiently large n.
Furthermore, because B n B ′ n → G, we have Therefore, we obtain and thus ∑ n i=1 z ni d → N (0, G) by the Lindeberg-Feller central limit theorem. 2