Orbital-invariant spin-extended approximate coupled-cluster for multi-reference systems

We present an approximate treatment of spin-extended coupled-cluster (ECC) based on the spin-projection of the broken-symmetry coupled-cluster (CC) ansatz. ECC completely eliminates the spin-contamination of unrestricted CC and is therefore expected to provide better descriptions of dynamical and static correlation effects, but introduces two distinct problems. The first issue is the emergence of non-terminating amplitude equations, which are caused by the de-excitation effects inherent in symmetry projection operators. In this study, we take a minimalist approach and truncate the Taylor series of the exponential ansatz at a certain order such that the approximation safely recovers the traditional CC without spin-projection. The second issue is that the nonlinear equations of ECC become underdetermined, although consistent, yielding an infinitude of solutions. This problem arises because of the redundancies in the excitation manifold, as is common in other multi-reference approaches. We remove the linear dependencies in ECC by employing an orthogonal projection manifold. We also propose an efficient solver for our method, in which the components are usually sparse but not diagonal-dominant. It is shown that our approach is rigorously orbital-invariant and provides more accurate results than its configuration interaction and linearized CC analogues for chemical systems.


I. INTRODUCTION
Treating quasi-degeneracies accurately is one of the most challenging tasks in electronic structure theory, as many single-reference (SR) methods are not suitable for capturing static correlations in general and thus typically break down. The failures of SR methods are ascribed to their deficient starting point, Hartree-Fock (HF). Although a number of approaches have attempted to approximate static correlation effects in an SR framework, 1-9 multi-reference (MR) methods are usually more reliable and accurate. The fundamental idea is to handle static correlations at some multi-configuration level such as the complete active space self-consistent field (CASSCF) [10][11][12] and add dynamical correlations using perturbation theory, configuration interaction (CI), or coupled-cluster (CC). [13][14][15][16] Thus far, there have been various attempts to develop MRCC approaches. Many Hilbert-space methods are based on the Jeziorski-Monkhorst ansatz, where each determinant in the reference space associates with its own cluster operator. 17 In the state-universal coupled-cluster (SUCC) formulation, all the amplitudes and electronic states are simultaneously determined by diagonalizing the effective Hamiltonian, but its application is severely hampered by the intruder state problem. The state-specific approaches such as the Brillouin-Wigner (BW) MRCC [18][19][20][21] and Mukherjee (Mk) MRCC [22][23][24][25][26][27] a) Electronic mail: tsuchimochi@gmail.com b) Electronic mail: tenno@garnet.kobe-u.ac.jp avoids the intruder state problem by considering only one (lowest) state; however, they in turn suffer from the redundancy problem where the number of amplitude equations is less than the number of amplitudes. In these state-specific approaches, the redundancy problem is solved by sufficiency conditions, [18][19][20]22,23 which unfortunately bring about other undesired consequences such as the improper residuals 28 and the lack of orbital invariance. 29,30 Finally, the internally contracted (ic) approaches resolve these issues, by employing a single cluster operator along with a multi-configuration vacuum, [31][32][33] although the prefactor is significantly large and grows rapidly with the size of the active space. The properties and capabilities of these methods as well as other variants of MRCC are well documented in the recent review papers. 34,35 With these difficulties and complications, it should be fair to point out that most MRCC methods are not straightforward to formulate. They are also cost-intensive and require a priori specification of an active space where electrons are strongly correlated. Hence, it is important to develop efficient and black-box MRCC schemes in an orbital-invariant fashion. In the present work, we aim at formulating such a method based on symmetry projection.
Symmetry-projected HF (PHF) appears to be a good compromise between HF and CASSCF. 36,37 By writing a symmetry-projection operator aŝ its wave function comprises a few non-orthogonal brokensymmetry determinants, which are systematically generated by some unitary rotation operatorR(Ω) specific to the symmetry that is to be restored. In the orthogonal orbital picture, a PHF wave function is also expressed as a linear combination of determinants through all orders of excitations with respect to some symmetry-adapted determinant. 38 Among the symmetries that Fermionic wave functions can break, spin symmetry has been of great interest in quantum chemistry because spontaneous symmetry breaking in spin-unrestricted HF (UHF) is usually associated with quasi-degeneracies in the system. As a result, spin-projected UHF (SUHF) brings a large amount of static correlations as well as physically correct spin states, all in a black-box manner. Many studies have shown its improved accuracy over both restricted HF (RHF) and UHF, 37,[39][40][41] suggesting that it is a more suitable starting point.
We have therefore proposed several post-SUHF methods to include the residual dynamical correlation effect. These include spin-extended perturbation theory 42 and CI with singles and doubles (ECISD). 43,44 It was shown that these approaches generically outperform their SR analogues. We have further explored a generalization to coupled electron-pair approximation (ECEPA) including spin-extended linearized CC (ELCC) and averaged quadratic CC (EAQCC). 45 This builds on ECISD to furnish approximate size-consistent components in the dynamical correlation by writing the quadratiĉ PT 2 elements as products of the correlation energy and lin-earPT terms, whereT is the excitation operator. ECEPA methods were shown to improve both the correlation energy and molecular properties over ECISD. 45 However, as the exclusion-principle-violating terms are treated in an average manner, they are plagued by instabilities; for average CEPA to be a good approximation, the reference state has to be well separated from the interacting space, but this is often not the case in SUHF and ECISD. The same problem is also present in the CAS case, unless the active space is large enough. 46 From this point of view, it is obviously desirable to develop CC from SUHF, a theory we call spin-extended CC (ECC). The wave function ansatz for ECC is given by whereT is the excitation operator and |Φ 0 is an underlying broken-symmetry determinant of SUHF. There have recently been efforts in similar directions by other research groups. [47][48][49][50][51] In particular, the spin-projected unrestricted CC developed by Scuseria and coworkers uses the same ansatz. 51 There are, however, two main difficulties in formulating the rigorous ECC. The first obvious problem is that the corresponding equations do not terminate at finite order becauseR(Ω) is unitary andP is thus an n e -body operator, where n e is the number of electrons in the system. In the pioneering work of Qiu et al.,P was written as a polynomial of particle-hole excitations with respect to some reference RHF-type determinant, 52 and then combined with a cluster operator followed by variational optimization of amplitudes. 50 However, such normal-ordering ofP loses the de-excitations from the CC ansatz, which can play a significant role in describing static correlations. In addition, the variational CC approach results in the same complexity as full CI (FCI). Very recently, Qiu et al.
proposed a truncation scheme to incorporate the de-excitation effects approximately on the basis of the excitation rank of so-called disentangled cluster operators. 51 The method essentially approximates eV 1 (Ω) eT , where eV 1 (Ω) is the deexcitation component inR(Ω) and thus no longer produces pure spin-states in general, although the spin-contamination introduced was shown to be satisfactorily small for a model Hamiltonian.
The second difficulty, which was not realized in previous work, 47,51 is the redundancy problem inherent to many other MR methods. [31][32][33]53,54 In essence, the emergence of nonlinear parametrization in Eq. (2) gives rise to linearly dependent t-amplitudes and equations because of the nonorthogonality of the excitation manifold. Such redundancies are common in ECI and ELCC, but do not pose any problem in these methods because they are strictly variational or quasi-variational in the sense that the CI coefficients or t-amplitudes are optimized with respect to a well-defined energy functional. 44,45 Therefore, these linear methods provide only one unique (lowest) energy, even though there are infinitely many sets of amplitudes that satisfy the underlying generalized eigenvalue problem. However, when ECC is solved in a non-variational, projective manner, its correlation energy is arbitrary up to linear dependencies. As in various ic MRCC approaches, 32,33 we need to orthogonalize the projection manifold by performing singular-value-decomposition (diagonalization) of some metric of the nonlinear ECC system.
In light of the latter problem, a well-defined metric that is, ideally, strictly positive-semidefinite is indispensable. While it is possible to systematically include only the important components of high powers ofR(Ω)eT , such approximations can make it unclear which metric the nonlinear redundant problem is to be solved under. In fact, in the course of our study, we have found several cases in which approximating terms inR(Ω)eT causes severe convergence problems. This instability is likely to be attributable to an inconsistent metric; as a result, the nonlinear ECC equations become unstable or even unsolvable. For this reason, we wish to avoid introducing approximations toP at all costs.
To this end, we will simply truncate the Taylor expansion while securing the (numerical) exactness of the projection operator, thus giving a metric with the desirable properties (symmetric, positive-semidefinite). This permits us to remove the redundancies in the equations correctly and to study the improvements that the quadratic and higher-order terms bring about, especially when the t-amplitudes are optimized in the presence ofP. As will be shown, the divergence behavior of ECEPA can be rectified in the truncated ECCSD method, while offering improved performance over ECISD, just like in the SR case.
The resultant residual equations are still non-orthogonal and non-diagonal-dominant, and cannot be solved by iterative diagonalization, unlike linearized schemes. In this paper, following our work on ECI, 44 we propose a robust converger for solving the nonlinear ECC equations. The use of our scheme in conjunction with the direct inversion of iterative subspace (DIIS) [55][56][57] gives stable and fast convergence of the t-amplitudes.
We organize this article as follows. As the appearance of redundancies is general, regardless of the approximations introduced to the ECC ansatz (as far as it is nonlinear), we first address this issue in Sec. II. Then, in Sec. III, we rewrite the ECCSD wave function as a superposition of non-orthogonal pseudo-CI wave functions in terms of normal-ordering, where the effective coefficients are approximated based on the truncation of the cluster expansion. Sec. IV describes how the truncated ECCSD equations can be solved in practice. Sec. VI confirms the orbital invariance of the method and reports the convergence behavior of the residuals. We also demonstrate the performance of the method for the H4 and H8 models and O 3 as pilot examples. Finally, we discuss the sizeconsistency problem of the method and the structure of metric eigenvalues.

A. Background
Throughout this work, we will adopt the conventional notation for orbitals, namely, p, q, r, s, . . . for general, i, j, k, l, . . . for occupied, and a, b, c, d, . . . for virtual spinorbitals. Greek subscripts are used to represent excited determinants, whereas |Φ 0 denotes the reference broken-symmetry determinant.
For ECCSD in whichT is truncated at doubles, we haveT whereâ ab... ij... are the excitation operators written with creation (â p ) and annihilation (â p ) operatorŝ In this work, we only consider a UHF-type determinant for |Φ 0 and the collinear excitations to preserve theŜ z quantum number, e.g., i and a share the same spin in t a i . For later use, Eqs. (3) are written in the shorthand notation where µ runs over the singles and doubles spaces and M SD is the dimension. The excitation operatorsâ a i andâ ab ij are merged into the row vectorÊ, where each component generates excited (broken-symmetry) determinants from |Φ 0 , i.e., As in regular CC, the correlation energy E c and t-amplitudes may fulfill the following equations: We note thatP is idempotent, Hermitian, and commutable withĤ. Henceforth, we always assume the normalization of an SUHF wave function, but not of |Φ 0 . Note that the Hamiltonian is not similarity-transformed in Eqs. (7), and therefore, the left hand side includes unlinked terms. These are mostly expected to cancel out with the right hand side. As we have already mentioned, the above equation is non-terminating becauseP involves n e de-excitations. Furthermore, the above set of equations does not provide the proper residuals because of the redundancies. The redundancy problem in ECC is essential because the excitation manifold of Eq. (6) is non-orthogonal after the projection operatorP is applied, and certain linear combinations will create the null basis. In other words, such linear combinations of excitation operators can create a mixed state that has no component of the target symmetry space ofP. This is shown in Fig. 1, where |Φ 0 is set to an RHF determinant for simplicity, but two excited determinants |Φ A and |Φ B generate the same configuration state function once the singlet projection operatorP S is enacted and are therefore redundant. A linear combination of the two bare excited determinants (without projection), |Φ A −|Φ B , is a pure triplet state, which is destroyed upon the action ofP S . Therefore, although the nonlinear system of ECCSD as given by Eq. (7b) consists of M SD equations and variables t µ , it is underdetermined (overparametrized). A consequence of this is an infinite number of solutions that fulfill both Eqs. (7a) and (7b), i.e., E c is arbitrary. This is somewhat different from the redundancy problem that appears in statespecific MRCC approaches based on the Jeziorski-Monkhorst ansatz, 17 which can be solved by invoking appropriate sufficiency conditions, 18,19,22,23 at the cost of orbital invariance. 30 Our redundancy problem arises from the non-orthogonal character of the excitation manifold, and, in this respect, is rather similar to the situation in ic MR methods. [31][32][33]53,54 FIG. 1. Redundancy in the singlet-projected excitation manifold. For simplicity, the reference state is RHF, but excited determinants become identical when spin-projected.
One possible method of removing the redundancies is to make use of the property of spin-projection. In previous work, we numerically found that the number of linear dependencies in ECISD is always associated with the number of single excitations for tested singlet systems. 44 Further mathematical analyses should make it possible to identify the redundant excitations analytically for spin-projection, which can then be explicitly removed from the equations and amplitudes. Nevertheless, it is questionable whether this approach retains unitary invariance because there are multiple choices for the removal of such excitations; namely, either |Φ A or |Φ B can be removed. Ideally, any procedure should ensure a unique solution that is orbital-invariant. In the following, we outline such a protocol.

B. Intermediate normalized ansatz
Before eliminating the redundancies in Eqs. (7), we need a clear separation between the reference and excitation manifold. As |Φ µ overlaps with |Φ 0 throughP, the original ECC ansatzPeT |Φ 0 is not intermediate normalized. This is not a desirable characteristic when we wish to solve our redundancy problem because the dimensions of the redundant parameters t µ and redundant basis {Φ 0 , Φ µ } are inconsistent. Accordingly, as previously suggested in ECEPA theory, 45 the projection of the reference SUHF space is first performed, restricting the linear dependence problem only to the excitation manifold.
To this end, we prepare the projectorQ, which eliminates the SUHF referenceP|Φ 0 , aŝ and redefine the ECC wave function as where we have conveniently definedQ =PQ =QP. One could certainly putQ in front ofT to form eQT , but we do not choose this option because it would complicate the algebra with terms likeQTQT , which are obviously much more difficult to deal with (recall thatP has de-excitation components). Rather, it is reasonable to set |Ψ as above by recognizinĝ QeT |Φ 0 as the purely dynamical correlation space orthogonal to SUHF. Then, |Ψ is intermediate normalized (again,P|Φ 0 itself is normalized), and the corresponding t-amplitude equations are separated from the energy definition, by choosing Φ µ |Q † as the projection manifold where we have used the fact that Φ 0 |Ĥ NP |Φ 0 = 0. In the intermediate normalized form, the redundancies in t µ are directly mapped to those inQ|Φ µ , and the metric becomes We note that Eq. (12) is not the only choice for the metric, and this is where the arbitrary nature of the ECC solution comes in Refs. 32, 33, and 58. The singles space is also redundantly present in the doubles space (so-called spectator excitations). The former can be projected out from the latter in a similar way as that presented above. This procedure changes the definition of the metric and gives different orthogonalized excitation operators, vide infra. In ic-MRCC, such a sequential projection of lower-excited functions removes some disconnected terms in the contraction between the metric and amplitudes. As a result, it partially recovers size-extensivity, known as core extensivity, i.e., the correct scaling of the correlation energy associated with the growth of inactive orbitals and electrons. 33,58 The full sizeextensivity is perfectly retained if excitation operators are derived based on the generalized normal-ordering. [58][59][60] In our method, however, the full size-extensivity is particularly difficult, irrespective of the choice of metric, because of the symmetry projection operator, which destroys the full sizeextensivity (see Sec. VI E). Hence, in the present work, we will employ Eq. (12), i.e., a mixture of singles and doubles. Our choice is therefore similar to that of Evangelista et al. 32 and version C of Hanauer and Köhn. 33 A comparison between different metric schemes for ECC will be investigated in future work.

C. Orthogonalization
We are now in a position to deal with the redundancies present in Eq. (11b). To remove the linear dependencies in t µ , it is necessary to employ a set of orthogonalized excitation operatorsÊ as suggested by other authors. 32,33,53,58,[61][62][63] In analogy with these previous works, we adopt the canonical orthogonalization by using the singular-value-decomposition of S. As S is positive-semidefinite, we have in general that where the subscript α is restricted to positive λ α and β corresponds to zero singular values or those below some threshold η (λ β < η). These spaces are called the range and null space, characterized as R(S) and N(S), whose dimensions are respectively M R and M N = M SD − M R . More often than not, M R M N > 0. As several authors have pointed out, 32,63,64 in standard ic-MRCC methods, unlike in ic-MRCI, 65 the truncation threshold η must be carefully chosen because small values cause numerical instability while large values lead to degradation of results. However, we see no such difficulty in our method, and η can always be very small. We present some examples in Sec. VI F.
Using the singular values and vectors, the transformation matrix X is obtained as and one can produce an excitation basis which is orthonormal whenQ-projected, Note that U α may not be defined uniquely because of possible degeneracies in λ α . However, the final result is invariant with respect to the unitary rotation of degenerate singular vectors within the range (and the null space), as will be shown.
The orthonormalized cluster operator is obtained aŝ Clearly, the number of linearly independent parameters is only M R , and the same number of independent equations can be constructed with the orthonormal projection manifold Ξ α |Q † . The resulting set of ECC equations is rewritten as Solving Eqs. (19)- (20) gives unique E c and τ.
In practice, we do not explicitly transform the excitation operators and amplitudes. Suppose that we have successfully obtained some τ that satisfies Eq. (20). From Eq. (18), it is evident that there is the following one-to-one mapping between τ α in the orthonormal basis and somet µ in the original nonorthogonal basis:t Hence,t is also uniquely determined, up to the choice of the metric S. The chief difference between this particulart and other general t is thatt is orthogonal to N(S), whereas t can spill into N(S), i.e., This means thatt is represented as a vector that is projected on to N(S) using the null-space projector P, where Substituting Eqs. (18) and (22) into Eq. (20), we have the proper residuals where we remind the reader that r is the residual vector in the non-orthogonal basis and is defined in Eq. (11b).
As often M R >> M N , it is worth rewriting Eq. (27) in terms of U µβ . To do so, we multiply it by S νλ U λα λ − 1 2 α from the left and sum over α and λ to givẽ where S + is the Moore-Penrose pseudoinverse of S. Botht andr are written in the same non-orthogonal basis restricted in R(S), and therefore, one can directly use the latter to perturbatively update the former. However, in this basis, the metric is not diagonal-dominant. In Sec. IV, we overcome this problem using a robust preconditioning scheme that is effective for non-diagonal-dominant metrics such as S.
Finally, note that the scheme is invariant with respect to a unitary rotation between degenerate vectors in U µα and in U µβ because the null-space projection operator P is uniquely defined in Eq. (26). Whether the method is unitary-invariant with respect to orbital rotations of |Φ 0 is a separate question and will be discussed in Sec. III.

III. APPROXIMATE CCSD IN SPIN-PROJECTED MANIFOLD: TRUNCATION OF TAYLOR SERIES
While we have described how to manage the redundancies in the ECCSD parametrization, it is practically impossible to construct the exact residuals at polynomial cost when an exponential ansatz is associated withP. Here, we consider how to approximate the ansatz withP intact.
By expanding |Ψ (not intermediate normalized) in the Taylor series, we obtain AsĤ is a two-body operator, at most quadruply excited determinants with respect to |Φ 0 could interact with the bra projection Φ ab ij | ifP were absent. A similar relation also holds for the non-orthogonal case when the strings are normalordered.
First, let us write the projection operator as a numerical interaction,P where g is the numerical grid for representing the Ω angle. The rotated creation and annihilation operators due toR g are given byb For the explicit form ofR g as well as the simplification one can exploit for spin-projection, see Ref. 37. Therefore, the rotated determinant |Φ 0 (g) ≡R g |Φ 0 comprisesb i (g) instead ofâ i . Wick's theorem then allows us to write the rotatedT operators as 43,44 R gT1R where we have introduced the concept of normal-ordering Φ|{. . .} g |Φ 0 (g) = 0 and the one-body contraction tensor W, composed of only nonzero contractions, besides the trivial Kronecker delta, Similar to the orthogonal case, the normal-ordering in the non-orthogonal Wick theorem also requires connectivity between the products (35) The essential idea is then to write all operators, e.g.,T and H, in terms of normal-ordered strings. Using the generalized Wick theorem for normal-ordered products, we write |Ψ as a pseudo-CI where we have introduced some effective coefficients ω, which are anti-symmetrized. They depend on both t and g, whose dependencies are suppressed for the sake of simplicity. As the Hamiltonian is a two-body operator, we need only up to normal-ordered doubles for the energy and overlap matrix elements and normal-ordered quadruples for solving the ECCSD equation. Quintuple and higher excitations play no role in the normal-ordered representation. It should be clear that any projected wave functions can be written in this form. For example, ECISD is truncated at doubles in Eq. (36), and there are no higher excitations. Therefore, the previously developed ECISD code can be directly exploited to compute matrix elements up to normal-ordered double excitations for ECCSD. 44 One can also exactly compute matrix elements such as Φ ab ij |Ĥ{b cdef klmn } g |Φ 0 (g) at polynomial cost (see the supplementary material). We should emphasize that it is the t-amplitudes (ort) that are subject to optimization, but not ω, as we use the sameT for all gauge angles.
So far, we have made no approximation except thatT is truncated at doubles; Eq. (36) is still equivalent to Eq. (29). However, this does not pave the way to solving Eq. (11) [or (28)], as all the complexities arising fromP are shifted to the evaluation of the effective coefficients ω. These are represented as highly complicated tensor contractions of several orders of t and W, and the computational cost for evaluating them is formally O(N!), where N is some measure of system size. Hence, some approximation is required to arrive at a tractable method.
In passing, we should note that, as in unitary CC and most MRCC methods, the non-terminating equation may be truncated at the second-order of the Baker-Campbell-Hausdorff expansion of the projected Hamiltonian, i.e., e −TPĤP eT =PĤP + [PĤP,T ] + 1 2 [[PĤP,T ],T ] + · · · . However, the resulting equations comprise a mixture of symmetry-projected and symmetry-broken components, and so the t-amplitudes rarely converge. Therefore, this truncation scheme will not be discussed any further.
Instead, we consider truncating the Taylor expansion in Eq. (29) to include excitations up to quadruples. The sizeextensivity is obviously lost by this approximation, but we should emphasize that this is not an important issue, as it cannot be retained anyway because of the introduction of the spin-projection operator. The wave function ansatz is given by and hereinafter, we call this method spin-extended approximate CCSD (EACCSD). This specific choice of truncation is made based on several observations. First, |Ψ EACCSD is guaranteed to reduce to SRCCSD whenP = 1. Second, the first-order interacting space (FOIS) of projected doubles for ket states ( Φ ab ij |) is completely spanned by the SUHF reference, projected singles, · · · , quadruples, and thus the EACCSD problem covers the Hilbert space essential for exact ECCSD (see Appendix A). Higher excitations such asT n 2 components with n > 2 can play a role in re-optimizing the cluster amplitudes; while it is possible to treat these terms explicitly, the inclusion ofT 3 2 increases the computational cost from O(N 6 ) to O(N 8 ). Additionally, provided that SUHF is a good reference, the cluster amplitudes are expected to be sufficiently small, and it is plausible that the effect ofT 3 2 will be insignificant. Hence, it is advisable to neglect these terms for our approximation to be feasible. Finally, the sizeextensive component of the exact ECCSD correlation energy will eventually become that of UCCSD, and therefore, the role of T 3 2 and higher terms should diminish with system size.
With the EACCSD ansatz, in the normal-ordered pseudo-CI form [i.e., Eq. (36)], the triples and quadruples amplitudes take the following forms: Other ω coefficients, as well as the matrix elements required to implement the EACCSD methods, are summarized in the supplementary material. Finally, we briefly mention the orbital invariance characteristic of EACCSD. It is well known that most MRCC approaches, including those based on the SR formalism [66][67][68][69] and all state-specific MRCC methods based on the Jeziorski-Monkhorst ansatz, [17][18][19][20][21][22][23][24][25][26][27] are not orbital-invariant with respect to active space rotations, which is considered a serious drawback. 30 Achieving invariance requires an unbiased treatment of orbitals in each subspace as well as the use of a single vacuum. On the other hand, as in traditional CC, EACCSD is strictly orbital-invariant because all the tensor contractions appear in the energy and residuals run over all the orbitals in the given subspace, either occupied or virtual spaces of |Φ 0 . A proof is given in Appendix C.

IV. SOLVER FOR THE APPROXIMATE ECCSD EQUATIONS
Having outlined our approach, we now describe how to solve the EACCSD equation in practice. For linearized methods such as ECISD and ELCCSD, one can resort to the Davidson diagonalization approach, 70 as the resulting problems are cast as a generalized eigenvalue problem (with a dressed Hamiltonian). 44,45 However, the EACCSD equation becomes a non-Hermitian and nonlinear problem, which is symbolically written as where H and S are the effective Hamiltonian and overlap of ECCSD, both of which depend ont. This equation must be solved iteratively. To do so, in Eq. (40), we separate H and S into their dominant parts H 0 and S 0 and the remaining parts. They do not have to be diagonal, but it may be assumed that H 0 − E c S 0 is relatively easy to invert. For simplicity, we assume H 0 and S 0 contain only the linear ECISD components, which are considered dominant, because the contributions of the quadratic and higher-order terms are of size-consistent correction and are expected to be several orders of magnitude smaller than the linear part. First-order perturbation theory suggestsr where we have assumed H 0 − E c S 0 , P ≈ 0. This is a reasonable assumption because we have set H 0 ≈ H ECISD , where H ECISD is the ECISD Hamiltonian, and the variational nature of ECISD necessarily insists that H ECISD , P = 0. In other words, H ECISD naturally shares the same null-space as S; otherwise, the ECISD correlation energy would become negative infinity, which is unphysical. The same reasoning applies to S 0 (indeed, in what follows, we choose it to be S, which certainly satisfies the commutativity). Then, noting that P 2 = P, we obtain This allows us to use r instead ofr. Our numerical tests suggest that the use ofr in place of r makes no difference; both become zero at convergence. In practice, the (k + 1)th amplitudest (k+1) µ are extrapolated to minimize the norm of the residual vector ||r (k+1) || by DIIS (assuming thatr is approximately a linear function oft). Then, it is just a matter of an appropriate choice for H 0 and S 0 to carry out the inversion that appears in Eq. (42). The simplest approach is diagonal preconditioning, whereby one takes H 0 µν = δ µν H ECISD µν and S 0 µν = δ µν S µν . This approach is usually stable and easy to implement but suffers from slow convergence because the off-diagonal couplings can make a significant contribution in ECISD and ECCSD. This is a considerable drawback because the computational cost of constructing r (k ) at each cycle scales as O(N 6 ), and we need to reduce the number of these steps as much as possible.
As is well known, appropriate preconditioning is important for fast convergence in iterative methods. In this paper, we follow our previous method with some further modification. 44 Let us write the Hamiltonian aŝ where E g ,F N,g , andV N,g are the normal-ordered zero-, one-, and two-body operators at each projection-grid point, with the transition Fock matrix f(g) defined as As the contribution ofV N,g to the ECISD Hamiltonian elements is generally weak, we ignore this part for H 0 . Integrating over grid points, we define the effective Fock-like matrix as follows: Although F may not be spin-adapted, it holds a one-particle picture and is a good approximation to the ECISD Hamiltonian matrix, which is all we need for our purpose. The zeroth order H 0 then has the following form: where the last two terms come from theQ projection (the subscript 0 indicates the reference determinant). For S 0 , it is natural to take the whole S because of its structural resemblance to H 0 . These definitions exactly parallel the adoption of the orbital energy differences for preconditioning in SRCC with the canonical orbitals. Unfortunately, there is no orbital basis in which H 0 is diagonal for symmetry-projected methods, and hence the inversion in Eq. (42) requires some attention.
By writing we solve the linear equation at each macro iteration k. Eq. (51) is solved with the diagonal preconditioning scheme (and with DIIS). For the sake of clarity, the mth micro-iteration is performed as The superscript (k[m]) denotes the mth micro-cycle of the kth macro-cycle. Once ∆t (k ) is obtained, we perform the null-space projection given by Eq. (50). Note that the computational scaling for solving Eq. (51) is Hence, we previously concluded that the two-step scheme is not likely to be efficient in terms of the overall computational performance. However, it is noteworthy that the accuracy of ∆t is relatively unimportant in the first few macro-cycles because ||r (k) || itself is large. We can therefore approximate ∆t in such a way that Eq. (51) is only solved accurately enough to qualitatively "scan" the structure of H 0 − E (k) c S, which is all that is needed for preconditioning. By contrast, when we are close to convergence, ||r (k) || is small and so is ||∆t (k ) ||, thus necessitating an accurate solution for Eq. (51). These requirements can be simultaneously satisfied by proposing the following adaptive criterion for ρ (k ) : where δ Pert is some loose threshold. We call this the adaptive Fock preconditioning scheme. The pseudo-code of our solver is summarized in Fig. 2.

V. COMPUTATIONAL SCALING
The proposed method retains the formal scaling order of CCSD, O(N 6 ), at each grid point. By factoring the contraction terms, for the macro-steps, the computational cost of EACCSD formally scales as O(  (38), which describes the entanglement between two T 2 amplitudes through W f n , and this is likely to be the limiting step for large basis calculations. We note that, if the projection operator is absent, W q p → δ pq and there is no entanglement between t-amplitudes.
For the micro-steps of the adaptive Fock preconditioning scheme, the necessary contraction H 0 ∆t has a computational cost that scales as O(3o 3 v 2 + 3o 2 v 3 ), whereas the diagonal preconditioning requires only O(o 2 v 2 ).

VI. RESULTS AND DISCUSSION
In the following, we report several EACCSD calculations. All of these calculations used the SUHF orbitals, and the t-amplitudes were determined by solving Eq. (40) perturbatively with DIIS, as outlined in Sec. IV. For spin-extended methods, frozen orbitals were spin-adapted and doubly occupied to ensure the spin-symmetry of a wave function in post-SUHF calculations. Finally, we set η = 10 −12 and N grid = 4 throughout.

A. Unitary invariance and uniqueness in correlation energy
It is important for a method to fulfill orbital invariance because otherwise energy and other properties depend on the choice of orbitals. Here, orbital invariance is characterized as the nature whereby the computed energy is unchanged when orbitals are unitarily transformed within each orbital subspace. A mathematical proof for the orbital invariance of EACCSD is given in Appendix C. In this section, we present numerical examples. Furthermore, it is also our purpose to present a numerical proof that the removal of the linear dependencies (N(S)) is essential for a unique solution. We deem it important to provide these results before discussing the performance of EACCSD in more detail.
The orbital invariance of EACCSD can be most easily checked by employing two different orbital sets, namely (a) semi-canonical 71,72 and (b) corresponding-pair orbitals 73,74 of |Φ . In addition to the standard initial guesst (1) UCCSD amplitudes were also tested as an initial guess, in which case the null-space projection was first performed as necessary. The diagonal preconditioning scheme was used with the maximum DIIS subspace size set to 10 and a very tight convergence criterion, ||r|| < 10 −8 (||r|| < 10 −8 for the results without the null-space projection) to ensure high precision in the evaluated energy. We use the H 2 O molecule at equilibrium (R O−H = 0.9929 Å and ∠HOH = 109.57 • ) with a 6-31G basis as our showcase example. Table I presents the computed energies under these different conditions. As expected, if the null-space projection is not carried out, i.e., if we allow the t-amplitudes to spill into N(S), the arbitrariness of the EACCSD energy is manifest. While all such energies are close to each other under a unitary transformation, using different guess amplitudes can yield significant fluctuations in energy and clearly illustrates the problem of linear dependencies. This arbitrariness makes the method completely useless because molecular properties  such as vibrational frequencies are much more sensitive. We note that the orbital invariance of the EACCSD energy guarantees the definitive energy when evaluated with the UCCSD amplitudes in one shot, even if the null-space projection is not performed. It is the linearly dependent residual equations (7) that are improper and violate the uniqueness. When the null-space of S is projected out of t, the orbital invariance and uniqueness of energy are rigorously achieved, as is clear from Table I.

B. Convergence behavior
In this section, we demonstrate the validity of our iterative solver and the consequence of approximatingP by neglecting some grid-dependent terms. These tests should identify an optimal adaptive criterion for micro-iterations. We use the same water molecule and basis set as in Sec. VI A, but similar behaviors are observed in other systems. The maximum size of the DIIS subspace was set to 10. All calculations used the corresponding-pair orbitals and gave the same energy at convergence. Figure 3 illustrates the norm of the EACCSD residual vectorr µ at the kth iteration. The diagonal preconditioning scheme (plotted as "Diag") is stable, although it converges slowly. This behavior is similar to the Davidson diagonalization of ECISD with the same preconditioning scheme. 44 If we iterate Eq. (52) for a few cycles at each macro-iteration, the convergence performance is substantially improved. The results are plotted as "Fock" in Fig. 3 for different adaptive criteria δ Pert . While there is almost no gain with δ Pert = 1, the number of cycles to achieve ||r|| < 10 −6 decreases to 17, 11, and 10 for δ Pert = 0.5, 0.1, and 0.01, respectively. In spite of a slight increase in the computational cost due to the micro-iterations, the total CPU time required to complete a calculation typically decreases by a factor of about 3-6 for δ Pert = 0.1 compared to the diagonal preconditioning scheme. Table II summarizes the total number of micro-and macroiterations required to reach the final convergence ||r|| < 10 −6 . The average number of micro-iterations required at each macro-step is 1.  for different molecules and basis sets. Based on these results, it is recommended that the adaptive Fock preconditioning scheme for perturbative updates uses δ Pert = 10 −1 . We additionally set the maximum number of micro-steps (m max in Fig. 2) to 10.
In addition, the adaptive preconditioning is much more compatible with DIIS than is the diagonal preconditioning. DIIS is an extrapolation technique in which the new trial vector is expanded as a linear combination of the current and previous vectorst (k) . Therefore, the greater the differences between thet (k) , the more effective the DIIS technique, because it can explore a large space efficiently with a relatively small DIIS dimension. By contrast, if the previous vectors are all similar to one another, as in the diagonal preconditioning scheme, then the extrapolated vectors will be also similar, thus requiring a large subspace size to achieve good performance. This is demonstrated in Fig. 4, which shows the convergence behavior of both preconditioning schemes with DIIS dimensions of 10, 5, and 3 (δ Pert = 0.1 for the adaptive Fock preconditioning scheme). Clearly, the performance of the adaptive preconditioning does not depend on the subspace size, whereas that of the diagonal preconditioning is prominently size-dependent.
As EACCSD is computationally more intensive than ECISD for each cycle, we wish to decrease its cost. The chief difference in cost between the two arises from the presence of the 6th term of ω 3 [Eq. (38)] in EACCSD. The Hamiltonian contraction with this term is where P(pq) is the antisymmetrizer that permutes p and q, andV pq rs are the transformed two-electron integrals through W, which thus depend on g (see the supplementary material). This term evidently requires O(o 2 v 4 ) operations to calculate the intermediate X ad en = V ec ak t cd kn . It is tempting to avoid computing this term in order to reduce the computational cost. However, neglecting the term essentially amounts to approximating ω cde klm (g) inR gT 2 at each grid point, which would cause numerical instability by losing the exactness ofP. To demonstrate this, we plot ||r|| without Eq. (54) in Fig. 3. Convergence is not achieved after hundreds of macro-cycles, strongly indicating the equations are ill-conditioned or inconsistent. While stable (but slow) convergence is occasionally observed in some other molecules, the H 2 O result in Fig. 3 clearly implies that the approximation can introduce numerical difficulties.

C. H4 and H8 systems
The H4 and H8 systems are MR models that have previously been studied with several MRCC schemes, including the state-specific approaches such as BWCC [18][19][20][21] and MkCC [22][23][24][25][26][27] and the state-universal approach of Jeziorski and Monkhorst. 17 These systems illustrate the transition states of chemical reactions, as can be seen in the structures depicted in Fig. 5. In the H4 model, the bond distances between the nearest hydrogen atoms are fixed to 2 bohrs, and its structure changes from square to linear as the angle απ varies with α ranging from 0 to 1 2 . In the H8 model, four hydrogen molecules with R H− −H = 2 bohrs are placed on a plane to form an octagonal configuration, and two of these molecules are horizontally deviated by α, reducing the spatial symmetry from D 8h to D 2h . In both cases, the high spatial symmetry at α = 0 requires a twoconfiguration wave function for a proper description. Hence, all the genuine MRCC methods in previous studies employed a (2e, 2o) CAS. 75 In this study, we conducted EACCSD calculations starting from SUHF, with the same basis sets employed in Ref. 75 (DZP and DZ for H4 and H8) to enable direct comparisons. Figures 6 and 7 present the energy errors from FCI in mHartree as a function of α for the H4 and H8 systems, respectively. Table III lists the non-parallelity errors (NPE) as the difference between the maximum and minimum errors. In both systems, the SRCCSD energy largely deviates from the exact energy around α = 0, as it fails to capture the strong static correlation accurately. All the MRCCSD methods improve the description of both the static and dynamic correlation effects reasonably well, except for SUCCSD in the H4 case, which tends to overestimate the correlation energy (by ∼3 mHartree) as α increases. When triples are included, MRCCSDT methods generically reduce the errors of MRCCSD. However, Evangelista et al. 75 showed that they are much less accurate than SRCCSDT methods when a system has little MR character (i.e., large α region), although this is not shown in the present figures.
For these small systems, ECISD turns out to be more accurate than MRCCSD, but comparing the two systems shows that its accuracy deteriorates as the number of electrons increases from H4 to H8 because of the lack of size-consistency and size-extensivity. Although ECISD with Davidson's sizeconsistent correction 76,77 (we used the variant proposed by Pople 78 ), i.e., ECISD + Q, and ELCCSD are designed to fix this issue, they are both significantly less accurate than ECISD. However, a substantial improvement over these methods can be attained with EACCSD, with NPEs of only 66 and 76 µHartree for H4 and H8, respectively. As shown in Table III, the NPEs of BWCCSD (644 and 1906 µHartree) and MkCCSD (677 and 2019 µHartree) are one order of magnitude larger than those of EACCSD. It is also intriguing that EACCSD is much more accurate than MRCCSDT. However, we should point out that the reference wave functions are different for MRCC and EACCSD; while the former uses CASSCF (2e, 2o), the latter uses SUHF, which is more flexible. To faithfully evaluate the dynamical correlation effect that EACCSD captures, we also performed EACCSD calculations with a CASSCF (2e, 2o) reference, which is a subset of SUHF. In this case, the NPEs of EACCSD for H4 and H8 drastically increase to 666 and 1513 µHartree, which seem more reasonable when compared with the values of genuine MRCCSD. This result indicates that these systems are inherently more MR than a two-configuration description, and an active space of (2e, 2o) may not be appropriate. We should stress that the choice of orbitals, i.e., whether they are obtained by SUHF or CASSCF (2e, 2o), does not influence the computational effort in EACCSD. Occasionally, the former may require slightly more rotational grid points N grid to carry out the numerical integration of spin-projection accurately, but this results in the increase in the computational cost that scales only FIG. 6. Comparison of MR methods for the H4 system. linearly with N grid . Therefore, there is no exponential increase in the cost when enlarging the "active" space of SUHF, a superiority of our method to other MRCC approaches tested here.

D. Ozone
The ozone is a challenging system for many SR methods. Its MR character requires descriptions of both dynamical and static correlation effects for accurate geometry and vibrational frequencies. Recently, we carried out calculations on this molecule to evaluate the performance of SUHF and ECISD analytical gradients. 79 This work revealed the questionable applicability of SUHF; while the bond length is drastically improved over RHF, some of the vibrational frequencies are severely underestimated. ECISD, on the other hand, yielded very similar results to MRCISD. However, compared to both traditional CCSD and MR-CCSD, ECISD and MRCISD slightly overestimated the vibrational frequencies. We therefore concluded that the overestimation can be attributed to the size-inconsistent error associated with these methods. However, the basis set used in the previous work was of double zeta quality, which is usually insufficient to compare results with experiments. Moreover, ECEPA methods tend to produce significant overcorrelation compared with ECISD +Q for this system.
It is therefore interesting to revisit the ozone molecule and learn the potential that EACCSD has to offer for molecular properties. For this purpose, we used cc-pVTZ.  Although the size of the basis set may still not be large enough to allow a direct comparison against the experimental values, this moderate-size basis set has been extensively employed in many MRCC ozone studies and therefore should point out the relative accuracy of EACCSD in comparison with other methods. 29,33,80 As no analytical derivatives are yet available in EACCSD, we computed the gradients and Hessian using a finite difference scheme. The geometry optimization was considered to have converged when all the gradients in the Cartesian coordinates are less than 10 −5 a.u., and the Hessian was computed with a step size of 10 −3 Å. All electrons were correlated except for MRCCSD. The results of MkCCSD and BWCCSD were taken from Ref. 80, whereas those of ic-MRCCSD were obtained from Ref. 33. We summarize the computed geometries and frequencies in Table IV. At the equilibrium geometry, an appropriate zeroth-order wave function of ozone is composed of two configurations, with either HOMO or LUMO of closed-shell HF being doubly occupied. As such, in MkCCSD and BWCCSD, a CASSCF (2e, 2o) model space was used. However, care must be  taken in many MRCC approaches, because the open-shell singlet configuration (HOMO) 1 (LUMO), 1 whose contribution is zero with a C 2v geometry, is also indispensable for correctly describing the antisymmetric stretching frequency, ω(1b 2 ). 80,81 Furthermore, it was pointed out that, for MkCCSD and BWCCSD, which are not unitary-invariant, their values of ω(1b 2 ) depend rather sensitively on a small orbital rotation between these active orbitals; changes can be on the order of hundreds of wavenumbers. 80 Hence, while the reported values obtained with the natural orbitals of CASSCF (2e, 2o) are quite reasonable, they can deteriorate substantially when the active orbitals employed are prepared in a different manner. This clearly exposes the problems inherent in these methods. By contrast, ic-MRCCSD and EACCSD are free from this kind of problem, as they are orbital-invariant. As in our previous study, SUHF was unable to predict reasonable frequencies and ECISD overestimated the 1b 2 frequency. For this system, the instability of ELCCSD became manifest. In the vicinity of the equilibrium geometry, the correlation energy is considerably overestimated and does not converge in most cases. EAQCC, one of the ECEPA methods, mitigates this problem by appropriately treating the exclusionprinciple-violating terms and thereby reducing the energy shift in the dressed Hamiltonian (see Ref. 45 for details), but the computed geometry is notably worse. This is most likely because the approximation to the quadraticT 2 effect is inaccurate.
Including the quadratic term explicitly, EACCSD is immune to such instabilities. We obtain systematic improvements with EACCSD over both EAQCC and ECISD in most aspects. The bond length of EACCSD (1.259 Å) is in better agreement with the experimental value (1.272 Å), although slightly worse than other genuine MRCC methods. We note that, while the results of EACCSD and BWCCSD are very similar, this resemblance may be simply coincidental, given the orbital invariance issue in the latter method. That EACCSD does not correctly predict the ordering of ω(1a 1 ) and ω(1b 2 ) is consistent with other MRCC results. Hanauer and Köhn 33 showed that triple excitations are needed for ic-MRCC to be able to reproduce the correct ordering. Evangelista et al. 29 suggested that the inclusion of perturbative triples in MkCCSD could improve the description of ω(1b 2 ), although the possible sensitivity of their results to active orbital rotations was not addressed. Similarly, we expect that EACCSD(T) and EACCSDT should be able to give more accurate results than EACCSD.

E. Size-consistency
Size-consistency is a very difficult requirement to satisfy with SUHF-based methods. This is simply because a spinprojection operator does not generally allow for the separation of an SUHF wave function, for subsystems X and Y. However, if X is symmetry-adapted and the symmetry-breaking occurs locally in Y, only the latter is subject to the spin projection. In such a case, SUHF is written as and is separable. ECC formally satisfies size-consistency, asT X is symmetry-adapted. Here, we assume the intermediate normalization without loss of generality. This separability holds for EACCSD energy, but not for its residuals. The exact separability of residuals is only retained with the exponential ansatz. Furthermore, even if this problem was circumvented with the rigorous ECC theory, the current orthogonalization procedure might not guarantee Eq. (57), because St contains disconnected terms. Nonetheless, we expect the error due to size-inconsistency to be smaller in EACCSD than in ECISD and ECEPA because of the presence of theT 2 2 term.
To numerically assess the non-separability of the EACCSD wave function, we computed the size-consistency errors of the non-interacting Be 2 and Be 3 molecules with a 6-31G basis, where each atom is either symmetry-adapted (A) or symmetry-broken (B). Here, the size-consistency error is defined as the difference between the energy of the supermolecule and the sum of energies of individual atoms, and all combinations of A and B are considered (except for AAA, which results in SR calculations). The results are presented in Table V. Although SUHF and ELCCSD give large errors for the BB and BBB systems, they are strictly size-consistent for AB and AAB. Furthermore, partial size-consistency is achieved for ABB in the sense that their energies are the sum of the energies of fragments A and BB. Unsurprisingly, ECISD, ECISD+Q, and EAQCC do not possess these properties. EACCSD is not exactly size-consistent for AB and AAB, either. In particular, the increase in error from BB to ABB is rather large, unlike the behavior of most other methods. Nevertheless, overall, the size-consistency error of EACCSD is small for these test cases. As expected, EACCSD drastically reduces the error in the ill-behaved ECISD energy. We expect this improvement to be further facilitated by eliminating unlinked terms via truncation schemes based on the Baker-Campbell-Hausdorff expansion.

F. Analysis of singular values
In ic approaches, one typically sets a numerical threshold η to truncate small singular values of the metric. The energy obtained noticeably depends on η. A tight threshold is therefore desired so that only those energies relevant to redundancies are discarded and a reliable energy is obtained, but such a threshold often causes convergence difficulties, because small singular values trigger large amplitudes. A loose threshold yields discontinuous potential curves, because the magnitudes of singular values change rather significantly across the reaction coordinates, and hence a fixed threshold changes the number of variables included in the cluster operator. In particular, many singular values that are larger than a reasonable threshold at equilibrium smoothly become numerical noise as a bond is stretched, and eventually vanish to zero. This makes it hard to choose an appropriate value for η in ic methods such as canonical transformation theory 64 and ic-MRCC. 32,65 In EACCSD, one also takes the singular-valuedecomposition of S to remove linear dependencies. However, the singular values are always well-behaved and the above problem never occurs. In other words, singular values do not turn to numerical noise and vice versa. In Fig. 8, we show the distribution of singular values of S for the double dissociation of H 2 O. We have used the same geometry as in Sec. VI A for equilibrium and symmetrically stretched the two OH bonds from 1R e to 3R e with R e = 0.9929 Å. Across the internuclear distance, there is always the same number of zero singular values, a feature common in ic-MRCC. 32,65 However, the singular values of S in EACCSD evidently lie in a certain narrow range and the distinction between nonzeros and zeros becomes even clearer as the bonds are stretched. This is a completely opposite characteristic to that seen in ic-MRCC 32,65 and can perhaps be ascribed to the physical simplicity of the spin-projection operator. In all the applications presented in this paper, we have seen the same behavior of S and have experienced no difficulties in choosing η and obtaining convergence.
However, we should note that EACCSD faces a different difficulty due to the black-box nature of SUHF active space. Namely, it will become increasingly difficult to diagonalize S to find U µβ as the system size grows because S has the same dimension as the singles and doubles spaces. One could resort to an iterative diagonalizer 70 by finding the zero eigenvalues of S, but this will eventually fail as M N increases. In particular, in our approach, M N almost certainly scales as O(ov) for singlet systems, as discussed in our previous work. 44 This is clearly manifested in the computational cost of constructing the null-space projector P, which scales as O(o 3 v 4 ), and the storage, which scales as O(o 3 v 3 ). The size issue of N(S) therefore needs to be overcome before EACCSD can be used for larger applications. Dealing with this problem is not trivial and will be addressed in a forthcoming paper.

VII. CONCLUSIONS
In this paper, we have attempted to merge CC and spinprojected HF. Despite the simplicity of the ECC ansatz, the residual equations present two difficulties. First, they do not terminate naturally and require excitations of all electrons in the system. Second, the set of nonlinear equations is linearly dependent and possesses an infinitude of solutions. In the present work, we have avoided introducing approximations to a spin-projection operatorP but rather truncated the exponential series of the wave operator at quadruples with respect to the underlying broken-symmetry determinant. This particular choice ensures that the method recovers the traditional CCSD whenP is absent and that the computational scaling remains at O(N 6 ). To address the second issue, we have introduced an orthogonal excitation basis through the singular-valuedecomposition of the metric S and the projection of its null space from the equations. This scheme was shown to correctly remove redundancies in the parametrization (t-amplitudes) in an orbital-invariant fashion. Furthermore, we have proposed an adaptive preconditioning scheme for updating t-amplitudes that introduces micro-iterations with an O(N 5 ) scaling but dramatically reduces the total number of macro-iterations with an O(N 6 ) scaling.
The current work focused on the performance of EACCSD for systems where genuine MRCCSD results are available. In all these systems, we have shown that EACCSD is more accurate (the H4 and H8 systems) or comparable (O 3 ) to MRCCSD. Moreover, unlike MRCC based on the Jeziorski-Monkhorst ansatz, our method was proven to be orbital-invariant, which is an important property. Although size-consistency is not strictly satisfied, EACCSD achieves remarkable improvements, in terms of correcting the size-inconsistency issue of SUHF, over previous CI-based models such as ECISD and ELCCSD.
Despite these initial encouraging results, the applicability of our method to larger systems is presently hampered by the need to diagonalize prohibitively large S to remove its null space. In principle, the same issue is shared by ic MR methods employing large active spaces (especially in combination with density matrix renormalization groups), 64 where approximating S usually causes significant errors in energy. We therefore need a robust and computationally efficient scheme to project out the null space. We will discuss this issue in a forthcoming paper.

SUPPLEMENTARY MATERIAL
See supplementary material describes the effective coefficients ω, matrix elements, and the total energies of H4 and H8.

ACKNOWLEDGMENTS
We would like to thank Gustavo E. Scuseria for stimulating discussions. This research was supported by MEXT as "Priority Issue on Post-K computer" (Development of new fundamental technologies for high-efficiency energy creation, conversion/storage and use) using the computational resources of the K computer provided by the RIKEN Advanced Institute for Computational Science through the HPCI System Research project (Project ID: hp170259). This work was partly supported by JSPS KAKENHI Grant-in-Aids for Scientific Research (A) (Grant No. JP18H03900) and Young Scientists (B) (Grant No. JP17K14438).

APPENDIX A: FIRST-ORDER INTERACTING SPACE OF SPIN-PROJECTED MANIFOLD
In principle, a spin-projected determinant interacts with essentially any projected n-tuply excited determinantP|Φ (n) µ through the Hamiltonian because the projected configuration space that each excitation level spans overlaps with that of other excitation levels. However, it can easily be shown that the first-order interacting space with respect toP|Φ (n) µ is completely spanned by the projected configuration spaces that consist of determinants with n − 2 ∼ n + 2 excitations (not necessarily different by two electron substitutions). For example, the FOIS of SUHF is composed of the projected singles and doubles, and likewise that of the projected doubles is composed of SUHF and the projected singles to quadruples, with all possible spin combinations that maintain the same S z .
To see this, we first note that, while |Φ 0 is a determinant in a broken-symmetry orbital basis φ α i and φ β i , the orbital transformation to some spin-adapted orbitals ϕ P (capital letters indicate spatial orbitals of this basis) can be achieved by where U is a unitary matrix. The unitary group generatorÊ P Q in the orbital basis of {ϕ P } (represented as the creation and annihilation operators {ĉ P σ ,ĉ P σ }) can be back-transformed to the broken-symmetry orbitals φ P , As Ê p q ,P = 0 for spin-projection, the FOIS of an SUHF wave functionP|Φ 0 is then found to bê where both α and β spins are implicit in a, b, i, j. Similarly, the FOIS of projected doublesP|Φ ab ij is shown to be spanned by the SUHF reference, projected singles, . . ., quadruples.
Finally, we mention that the nonorthogonality between different excitation ranks makes it impossible to distinguish them unless they are properly orthogonalized. This means, for example, that the importance of triples excitations in EACCSDT compared to that of singles and doubles excitations in EACCSD may not be clear. To avoid such ambiguous definitions of excitations in spin-extended methods, the sequential orthogonalization technique should be invoked, as proposed by the pioneering work of Hanauer and Kohn for ic-MRCC (version D). 33

APPENDIX B: SINGULARITIES IN LINEARIZED COUPLED CLUSTER
Standard linearized CC methods have a tendency to diverge as the system size increases unless the active space is large enough. ELCCSD is usually solved as the dressed ECISD eigenvalue problem in the intermediate normalization, i.e., with the basis {P|Φ 0 ,Q|Φ µ }. The equations for SRLCC and MRLCC can be written similarly. The correlation energy E c is therefore formally determined by iteratively solving Eq. (B1). If the first excited state of ECISD is well separated from the ground state, to a good approximation, using the dressed Hamiltonian corresponds to shifting down all the CI excitation energies ε i (i.e., the interacting space) by E c [see Fig. 9(a)]. This procedure is, however, not stable if the interacting space is close to the reference space and, therefore, the low-lying excitation energies are small compared to the CI correlation energy. In this case, as shown in Fig. 9(b), these excited states (red) become intruders and go below the energy of the true ground state (green). Evidently, as the iteration proceeds, E c will eventually overcorrect or diverge. It is noteworthy that the excitation energies are sizeintensive quantities, whereas the correlation energy of LCC is designed to be size-extensive. Therefore, the larger the system size, the more severely LCC is likely to overestimate the correlation energy. This issue is also common in MRLCC, and therefore a large (complete) active space is required to avoid instabilities.

APPENDIX C: ORBITAL INVARIANCE PROPERTY OF ECC
A method is said to be orbital-invariant if the energy does not change on a unitary rotation in a certain orbital space P, which is given byâ¯p where U¯p q is again a unitary matrix and a bar indicates the transformed spin-orbitals.
To show the orbital invariance of SUHF and ECC, we first assume that the SUHF and ECC energies have converged and the residual equations have been solved. We then perform a unitary rotation within each orbital space to verify that the energy and residuals are unchanged. 30 In these methods, the orbital space can be naturally separated into occupied and virtual subspaces of the underlying broken-symmetry determinant |Φ 0 =â i 1 i 2 ···i ne | . Therefore, P spans either an occupied (U occ ) or virtual (U vir ) space.
As occupied orbital rotations in |Φ 0 simply change the phase, |Φ 0 = |Φ 0 det(U occ ), the unitary invariance of the SUHF energy is immediately obvious: It then follows that the residuals of SUHF, defined as If, however, the reference space is too close to the interacting space, the energy shift E c overcorrects the spectrum and low-lying states (red) go below the true ground state.