Delocalization for a class of random block band matrices

We consider $N\times N$ Hermitian random matrices $H$ consisting of blocks of size $M\geq N^{6/7}$. The matrix elements are i.i.d. within the blocks, close to a Gaussian in the four moment matching sense, but their distribution varies from block to block to form a block-band structure, with an essential band width $M$. We show that the entries of the Green's function $G(z)=(H-z)^{-1}$ satisfy the local semicircle law with spectral parameter $z=E+\mathbf{i}\eta$ down to the real axis for any $\eta \gg N^{-1}$, using a combination of the supersymmetry method inspired by \cite{Sh2014} and the Green's function comparison strategy. Previous estimates were valid only for $\eta\gg M^{-1}$. The new estimate also implies that the eigenvectors in the middle of the spectrum are fully delocalized.


Introduction
The Hamiltonian of quantum systems on a graph Γ is a self-adjoint matrix H = (h ab ) a,b∈Γ , H = H * . The matrix elements h ab represent the quantum transition rates from vertex a to b. Disordered quantum systems have random matrix elements. We assume they are centered, Eh ab = 0, and independent subject to the basic symmetry constraint h ab =h ba . The variance σ 2 ab := E|h ab | 2 represents the strength of the transition from a to b and we use a scaling where the norm H is typically order 1. The simplest case is the mean field model, where h ab are identically distributed; this is the standard Wigner matrix ensemble [29]. The other prominent example is the Anderson model [2] or random Schrödinger operator, H = ∆+V , where the kinetic energy ∆ is the (deterministic) graph Laplacian and the potential V = (V x ) x∈Γ is an on-site multiplication operator with random multipliers. If Γ is a discrete d-dimensional torus, then only few matrix elements h ab are nonzero and they connect nearest neighbor points in the torus, dist(a, b) ≤ 1. This is in sharp contrast to the mean field character of the Wigner matrices.
Random band matrices naturally interpolate between the mean field Wigner matrices and the short range Anderson model. They are characterized by a parameter M , called the band width, such that the matrix elements h ab for dist(a, b) ≥ M are zero or negligible. If M is comparable with the diameter N of the system then we are in the mean field regime, while M ∼ 1 corresponds to the short range model. The Anderson model exhibits a metal-insulator phase transition: at high disorder the system is in the localized (insulator) regime, while at small disorder it is in the delocalized (metallic) regime, at least in d ≥ 3 dimensions and away from the spectral edges. The localized regime is characterized by exponentially decaying eigenfunctions and off diagonal decay of the Green's function, while in the complementary regime the eigenfunctions are supported in the whole physical space. In terms of the localization length ℓ, the characteristic length scale of the decay, the localized regime corresponds to ℓ ≪ N , while in the delocalized regime ℓ ∼ N . Starting from the basic papers [1,15], the localized regime is well understood, but the delocalized regime is still an open mathematical problem for the d-dimensional torus.
Since the eigenvectors of the mean field Wigner matrices are always delocalized [13,14], while the short range models are localized, by varying the parameter M in the random band matrix, one expects a (de)localization phase transition. Indeed, for d = 1 it is conjectured (and supported by non rigorous supersymmetric calculations [16]) that the system is delocalized for broad bands, M ≫ N 1/2 and localized for M ≪ N 1/2 . The optimal power 1/2 has not yet been achieved from either sides. Localization has been shown for M ≪ N 1/8 in [22], while delocalization in a certain sense was proven for M ≫ N 4/5 in [11]. Interestingly, for a special Gaussian model even the sine kernel behavior of the 2-point correlation function of the characteristic polynomials could be proven down to the optimal band width M ≫ N 1/2 , see [18,20]. Note that the sine kernel is consistent with the delocalization but does not imply it. We remark that our discussion concerns the bulk of the spectrum; the transition at the spectral edge is much better understood. In [24] it was shown that the edge spectrum follows the Tracy Widom distribution, characteristic to mean field model, for M ≫ N 5/6 , but it yields a different distribution for narrow bands, M ≪ N 5/6 . Delocalization is closely related to estimates on the diagonal elements of the resolvent G(z) = (H −z) −1 at spectral parameters with small imaginary part η = Imz. Indeed, if G ii (E + iη) is bounded for all i and all E ∈ R, then each ℓ 2 -normalized eigenvector u of H is delocalized on scale η −1 in a sense that max i |u i | 2 η, i.e. u is supported on at least η −1 sites. In particular, if G ii can be controlled down to the scale η ∼ 1/N , then the system is in the complete delocalized regime. Moreover, boundedness of G ii also implies that the local semicircle law holds for the same regime of η.
For band matrices with band width M , or even under the more general condition σ 2 ab ≤ M −1 , the boundedness of G ii was shown down to scale η ≫ M −1 in [14] (see also [12]). If M ≫ N 1/2 , it is expected that G ii remains bounded even down to η ≫ N −1 which is the typical eigenvalue spacing, the smallest relevant scale in the model. However, with the standard approach [14,12] via the self-consistent equations for the Green's function does not seem to work for η ≤ 1/M ; the fluctuation is hard to control. The more subtle approach using the self-consistent matrix equation in [11] could prove delocalization and the off-diagonal Green's function profile that are consistent with the conventional quantum diffusion picture, but it was valid only for relatively large η, far from M −1 . Moment methods, even with a delicate renormalization scheme [23] could not break the barrier η ∼ M −1 either.
In this paper we attack the problem differently; with supersymmetric (SUSY) techniques. Our main result is that G ii (z) is bounded, and the local semicircle law holds for any η ≫ N −1 , i.e. down to the optimal scale, if the band width is not too small, M ≫ N 6/7 , but under two technical assumptions. First, we consider a generalization of Wegner's n-orbital model [21,28], namely, we assume that the band matrix has a block structure, i.e. it consists of M × M blocks and the matrix elements within each block have the same distribution. This assumption is essential to reduce the number of integration variables in the supersymmetric representation, since, roughly speaking, each M × M block will be represented by a single supermatrix with 16 supersymmetric variables. Second, we assume that the distribution of the matrix elements matches a Gaussian up to four moments in the spirit of [27]. Supersymmetry heavily uses Gaussian integrations, in fact all mathematically rigorous works on random band matrices with supersymmetric method assume that the matrix elements are Gaussian, see [4,5,6,18,19,20,25,26]. The Green's function comparison method [14] allows one to compare Green's functions of two matrix ensembles provided that the distributions match up to four moments and provided that G ii are bounded. This was an important motivation to reach the optimal scale η ≫ N −1 .
In the next subsections we introduce the model precisely and state our main results. While SUSY approach is ubiquitous in physics, see e.g. the basic monograph by Efetov [7], its application in rigorous proofs is notoriously difficult. Initiated by T. Spencer (see [25] for a summary) and starting with the paper [4] by Disertori, Pinsker and Spencer, only a handful of mathematical papers have succeeded in exploiting this powerful tool. Our supersymmetric analysis was inspired by [19], but our observable, G ab , requires a partly different formalism, in particular we use the singular version of the superbosonization formula [3]. Moreover, our analysis is considerably more involved since we consider relatively narrow bands. In Section 1.3, we explain our novelties compared with [19]. Eh jk,αβ h j ′ k ′ ,α ′ β ′ = 1 M δ jk ′ δ j ′ k δ αβ ′ δ βα ′ (δ jk + s jk ). (1.1) That means, the variance profile of the random matrix √ M H N is given bỹ S = (s jk ) := I + S, (1.2) in which each entry represents the common variance of the entries in the corresponding block of √ M H N . Moreover, if h jk,αβ 's are Gaussian, (1.1) also implies that for each off-diagonal entry h jk,αβ , its real part and imaginary part are i.i.d. N (0,s jk /2M ) variables.

1.2.
Assumptions and main results. In the sequel, for some matrix A = (a ij ) and some index sets I and J, we introduce the notation A (I|J) to denote the submatrix obtained by deleting the i-th row and j-th column of A for all i ∈ I and j ∈ J. We will adopt the abbreviation In addition, we use ||A|| max := max i,j |a ij | to denote the max norm of A. Throughout the paper, we need some assumptions on S.  (ii)S defined in (1.2) is strictly diagonally dominant, i.e., there exists some constant c 0 > 0 such that 1 + 2s ii > c 0 , ∀ i = 1, . . . , W.
(iv) There exists a spanning tree G 0 = (V, E 0 ) ⊂ G, on which the weights are bounded below, i.e. for some constant c > 0, we have Remark 1.2. From Assumption 1.1 (ii), we easily see that for the same positive constant c 0 . In addition, the lower bound c in (iv) can be weakened to N −ε for some sufficiently small constant ε > 0. But for simplicity, we will not try to optimize this bound in this paper. For instance, one can refer to [8] for more details.
For simplicity, we also introduce the notation  Especially, when γ = 1, one has M ≫ N 6/7 . Actually, through a more involved analysis, (1.7) (or (1.8)) can be further improved. At least, for γ ≤ 1, we expect that M ≫ N 4/5 is enough. However, we will not pursue this direction here.
Besides Assumption 1.1 on the variance profile of H, we need to impose some additional assumption on the distribution of its entries. To this end, we temporarily employ the notation H g = (h g ab ) to represent a random block band matrix with Gaussian entries, satisfying (1.1), Assumption 1.1 and Assumption 1.3. Assumption 1.5 (On distribution). We assume that for each a, b ∈ {1, . . . , N }, the moments of the entry h ab match those of h g ab up to the 4th order, i.e.
The four moment condition (1.9) in the context of random matrices first appeared in Tao and Vu's work [27].
To state our results, we will need the following notion on the comparison of two random sequences, which was introduced in [9] and [12]. Definition 1.6 (Stochastic domination). For some possibly N -dependent parameter set U N , and two families of random variables X = (X N (u) : N ∈ N, u ∈ U N ) and Y = (Y N (u) : N ∈ N, u ∈ U N ), we say that X is stochastically dominated by Y, if for all ε > 0 and D > 0 we have for all sufficiently large N ≥ N 0 (ε, D). In this case we write Note thatS is doubly stochastic. It is known that the empirical eigenvalue distribution of H N converges to the semicircle law, whose density function is given by ̺ sc (x) := 1 2π 4 − x 2 · 1(|x| ≤ 2).
We denote the Green's function of H N by G(z) ≡ G N (z) := (H N − z) −1 , z = E + iη ∈ C + := {w ∈ C : Imw > 0} and its (a, b) matrix element is G ab (z). Throughout the paper, we will always use E and η to denote the real and imaginary part of z without further mention. In addition, for simplicity, we suppress the subscript N from the notation of the matrices here and there. The Stieltjes transform of ̺ sc (x) is where we chose the branch of the square root with positive imaginary part for z ∈ C + . Note that m sc (z) is a solution to the following self-consistent equation m sc (z) = 1 −z − m sc (z) . (1.13) The semicircle law also holds in a local sense, see Theorem 2.3 in [12]. For simplicity, we cite this result with a slight modification adjusted to our assumption. Proposition 1.7 (Erdős, Knowles, Yau, Yin, [12]). Let H be a random block band matrix satisfying Assumptions 1.1, 1.3 and 1.5. Then , if E ∈ [−2 + κ, 2 − κ] and M −1+ε ≤ η ≤ 10 (1.14) for any fixed small positive constants κ and ε.
Remark 1.8. We remark that Theorem 2.3 in [12] was established under a more general assumption k σ 2 jk = 1 and σ 2 jk ≤ C/M . Especially, the block structure on the variance profile is not needed. In addition, Theorem 2.3 in [12] also covers the edges of the spectrum, which will not be discussed in this paper. We also refer to [14] for a previous result, see Theorem 2.1 therein.
Our aim in this paper is to extend the local semicircle law to the regime η ≫ N −1 and replace M with N in (1.14). More specifically, we will work in the following set, defined for arbitrarily small constant κ > 0 and any sufficiently small positive constant ε 2 := ε 2 (ε 1 ), Throughout the paper, we will assume that ε 2 is much smaller than ε 1 , see (1.7) for the latter. Specifically, there exists some large enough constant C such that ε 2 ≤ ε 1 /C.
Remark 1.10. The restriction |E| ≤ √ 2 − κ in (1.15) is technical. We believe the result can be extended to the whole bulk regime of the spectrum, i.e., |E| ≤ 2 − κ, see Section 12 for further comment. The upper bound of η in (1.15) is also technical. However, for η > M −1 N ε2 , one can control the Green's function by (1.14) directly.
Let λ 1 , . . . , λ N be the eigenvalues of H N . We denote by u i := (u i1 , . . . , u iN ) the normalized eigenvector of H N corresponding to λ i . From Theorem 1.9, we can also get the following delocalization property for the eigenvectors. Theorem 1.11 (Complete delocalization). Let H be a random block band matrix satisfying Assumptions 1.1, 1.3 and 1.5. We have Remark 1.12. We remark that delocalization in a certain weak sense was proven in [11] for an even more general class of random band matrices if M ≫ N 4/5 . However, Theorem 1.11 asserts delocalization for all eigenvectors in a very strong sense (supremum norm), while Proposition 7.1 of [11] stated that most eigenvectors are delocalized in a sense that their substantial support cannot be too small.

1.3.
Outline of the proof strategy and novelties. In this section, we briefly outline the strategy for the proof of Theorem 1.9. The first step, which is the main task of the whole proof, is to establish the following Theorem 1.14, namely, a prior estimate of the Green's function in the Gaussian case. For technical reason, we need the following slight modification of Assumption 1.3, to state the result. Assumption 1.13 (On M ). Let ε 1 be the small positive constant in Assumption 1.3. We assume (1. 18) In the regime M ≥ N (log N ) −10 , we see that (1.16) anyway follows from (1.14) directly.
Theorem 1.14. Assume that H is a Gaussian block band matrix, satisfying Assumptions 1.1 and 1.13. Let n be any fixed positive integer. Let κ be an arbitrarily small positive constant and ε 2 be any sufficiently small positive constant. There is N 0 = N 0 (n), such that for all N ≥ N 0 and all z ∈ D(N, κ, ε 2 ), we have for some positive constant C 0 independent of n and z.
Remark 1.15. Much more delicate analysis can show that the prefactor N C0 can be improved to some n-dependent constant C n . We refer to Section 12 for further comment on this issue.
Using the definition of stochastic domination in Definition 1.6, a simple Markov inequality shows that (1.19) implies (1.20) The proof of Theorem 1.14 is the main task of our paper. We will use the supersymmetry method. We partially rely on the arguments from Shcherbina's work [19] concerning universality of the local 2-point function and we develop new techniques to treat our observable, the high moment of the entries of G(z), under a more general setting. We will comment on the novelties later in this subsection.
The second step is to generalize Theorem 1.14 from the Gaussian case to more general distribution satisfying Assumption 1.5, via a Green's function comparison strategy initiated in [14], see Lemma 2.1 below.
The last step is to use Lemma 2.1 and its Corollary 2.2 to prove our main theorems. Using (1.20) below to bound the error term in the self-consistent equation for the Green's function, we can prove Theorem 1.9 by a continuity argument in z, with the aid of the initial estimate for large η provided in Proposition 1.7. Theorem 1.11 will then easily follow from Theorem 1.9.
The second and the last steps are carried out in Section 2. The main body of this paper, Sections 3-11 is devoted to the proof of Theorem 1.14.
One of the main novelty of this work is to combine the supersymmetry method and the Green's function comparison strategy to go beyond the Gaussian ensemble, which was so far the only random band matrix ensemble amenable to the supersymmetry method, as mentioned at the beginning. The comparison strategy requires an apriori control on the individual matrix elements of the Green's function with high probability (see (1.20)), this is one of our main motivations behind Theorem 1.14.
Although we consider a different observable than [19], many technical aspects of the supersymmetric analysis overlaps with [19]. For the convenience of the reader, we now briefly introduce the strategy of [19], and highlight the main novelties of our work.
In [19], the author considers the 2-point correlation function of the trace of the resolvent of the Gaussian block band matrix H, with the variance profileS = 1 + a∆, under the assumption M ∼ N (note that we use M instead of W in [19] for the size of the blocks). The 2-point correlation function can be expressed in terms of a superintegral of a superfunction F ( is an M × 4 matrix and Z * i is its conjugate transpose, where Ψ 1,i and Ψ 2,i are Grassmann M -vectors whilst Φ 1,i and Φ 2,i are complex M -vectors. Then, by using the superbosonization formula in the nonsingular case (M ≥ 4) from [17], where each S i is a supermatrix akin toS i , but only consists of 16 independent variables (either complex or Grassmann). We will call the integral representation of the observable after using the superbosonization formula as the final integral representation. Schematically it has the form g(S c )e Mfc(Sc)+fg(Sg ,Sc) dS, (1.21) for some functions g(·), f c (·) and f g (·), where we used the abbreviation S := {S i } W i=1 and S c and S g represents the collection of all complex variables and Grassmann variables in S, respectively. Here, g(S c ) and f c (S c ) are some complex functions and f g (S g , S c ) will be mostly regarded as a function of the Grassmann variables with complex variables as its parameters. The number of variables (either complex or Grassmann) in the final integral representation then turns out to be of order W , which is much smaller than the original order N . In fact, in [19] it is assumed that W = O(1) although the author also mentions the possibility to deal with the case W ∼ N ε for some small positive ε, see the remark below Theorem 1 therein.
Performing a saddle point analysis for the complex measure exp{M f c (S c )}, one can restrict the integral in a small vicinity of some saddle point, say, S c = S c0 . It turns out that f c (S c0 ) = 0 and f c (S c ) decays quadratically away from S c0 . Consequently, by plugging in the saddle point S c0 , one can estimate g(S c ) by g(S c0 ) directly. However, for exp{M f c (S c )} and exp{f g (S g , S c )}, one shall expand them around the saddle point. Roughly speaking, in some vicinity of S c0 , one will find that the expansions read where u is a complex vector of dimension O(W ), which is essentially a vectorization of √ M (S c − S c0 ); e c (u) = o(1) is some error term; ρ and τ are two Grassmann vectors of dimension O(W ); A is a complex matrix with positive-definite Hermitian part and H is a complex matrix; p(ρ, τ , u) is the expansion of exp{f g (S g , S c ) − f g (S g , S c0 )}, which possesses the form where p ℓ (ρ, τ , u) is a polynomial of the components of ρ and τ with degree 2ℓ, regarding u as fixed parameters. Now, keeping the leading order term of p(ρ, τ , u), and discarding the remainder terms, one can get the final estimate of the integral by taking the Gaussian integral over u, ρ and τ . This completes the summary of [19]. Similarly to [19], we also use the superbosonization formula to reduce the number of variables and perform the saddle point analysis on the resulting integral. However, owing to the following three main aspects, our analysis is significantly different from [19].
•(Different observable) Our objective is to compute high moments of the single entry of the Green's function. By using Wick's formula (see Proposition 3.1), we express E|G jk | 2n in terms of a superintegral of some superfunction of the form for some p, q ∈ {1, . . . , W } and α, β ∈ {1, . . . , M }, where φ 1,p,α is the α-th coordinate of Φ 1,p , and the others are defined analogously. Unlike the case in [19],F is not a function of {S i } W i=1 only. Hence, using the superbosonization formula to changeS i to S i directly is not feasible in our case. In order to handle the factor φ 1,q,β φ 1,p,αφ2,p,α φ 2,q,β n , the main idea is to split off certain rank-one supermatrices fromS p andS q such that this factor can be expressed in terms of the entries of these rank-one supermatrices.
Then we use the superbosonization formula not only in the nonsingular case from [17] but also in the singular case from [3] to change and reduce the variables, resulting the final integral representation of E|G jk | 2n . Though this final integral representation, very schematically, is still of the form (1.21), due to the decomposition of the supermatricesS p andS q , it is considerably more complicated than its counterpart in [19]. Especially, the function g(S c ) differs from its counterpart in [19], and its estimate at the saddle point follows from a different argument.
•(Small band width) In [19], the author considers the case that the band width M is comparable with N , i.e. the number of blocks W is finite. Though the derivation of the 2-point correlation function is highly nontrivial even with such a large band width, our objective, the local semicircle law and delocalization of the eigenvectors, however, can be proved for the case M ∼ N in a similar manner as for the Wigner matrix (M = N ), see [12,14]. In our work, we will work with much smaller band width to go beyond the results in [12,14], see Assumption 1.3. Several main difficulties stemming from a narrow band width can be heuristically explained as follows. At first, let us focus on the integral over the small vicinity of the saddle point, in which the exponential functions in the integrand in (1.21) approximately look like (1.22).
We regard the first term in (1.22) as a complex Gaussian measure, of dimension O(W ). When W ∼ 1, one can discard the error term e c (u) directly and perform the Gaussian integral over u, due to the fact du exp{−u ′ Re(A)u}|e c (u)| = o(1). However, such an estimate is not allowed when W ∼ N ε (say), because the normalization of the measure exp{−u ′ Re(A)u} might be exponentially larger than that of exp{−u ′ Au}. In order to handle this issue, we shall do a second deformation of the contours of the complex variables in the vicinity of the saddle, following the steepest descent paths exactly, whereby we can transform the complex Gaussian measure to a real one, thus the error term of the integral can be controlled. Now, we turn to the second term in (1.22). When W ∼ 1, there are only finitely many Grassmann variables. Hence, the complex coefficient of each term in the polynomial p(ρ, τ , u), which is of order M −ℓ/2 for some ℓ ∈ N (see (1.23)), actually controls the magnitude of the integral of this term against the Gaussian measure exp{−ρ ′ Hτ }. Consequently, in case of W ∼ 1, it suffices to keep the leading order term (according to M −ℓ/2 ), one may discard the others trivially, and compute the Gaussian integral over ρ and τ explicitly. However, when W ∼ N ε (say), in light of the Wick's formula (3.2) and the fact that the coefficients are of order M −ℓ/2 , the order of the integral of each term of p(ρ, τ , u) against the Gaussian measure reads M −ℓ/2 det H (I|J) for some index sets I and J and some ℓ ∈ N. Due to the fact W ∼ N ε , det H (I|J) is typically exponential in W . Hence, it is much more complicated to determine and compare the orders of the integrals of all e O(W ) terms. In our discussion, we perform a unified estimate for the integrals of all the terms, rather than simply compare them by M −ℓ/2 .
In addition, the analysis for the integral away from the vicinity of the saddle point in our work is also quite different from [19]. Actually, the integral over the complement of the vicinity can be trivially ignored in [19], since each factor in the integrand of (1.21) is of order 1, thus gaining any o(1) factor for the integrand outside the vicinity is enough for the estimate. However, in our case, either exp{M f c (S c )} or dS g exp{f g (S g , S c )} is essentially exponential in W . This fact forces us to provide an apriori bound for dS g exp{f g (S g , S c )} in the full domain of S c rather than in the vicinity of the saddle point only. In addition, an analysis of the tail behavior of the measure exp{M f c (S c )} needs also to be performed.
•(General variance profileS) In [19], the authors considered the special case S = a∆ with a < 1/4d. We generalize the discussion to more general weighted Laplacians S satisfying Assumption 1.1, which, as a special case, includes the standard Laplacian ∆ for any fixed dimension d.
1.4. Notation and organization. Throughout the paper, we will need some notation. At first, we conventionally use U (r) to denote the unitary group of degree r, as well, U (1, 1) represents the U (1, 1) group. Furthermore, we denote Recalling the real part E of z, we will frequently need the following two parameters Correspondingly, we define the following four matrices We remark here D ± does not mean "D + or D − ". For simplicity, we introduce the following notation for some domains used throughout the paper.
For some ℓ × ℓ Hermitian matrix A, we use λ 1 (A) ≤ . . . ≤ λ ℓ (A) to represent its ordered eigenvalues. For some possibly N -dependent parameter set U N , and two families of complex functions {a N (u) : Conventionally, we use {e i : i = 1, . . . , ℓ} to denote the standard basis of R ℓ , in which the dimension ℓ has been suppressed for simplicity. For some real quantities a and b, we use a ∧ b and a ∨ b to represent min{a, b} and max{a, b}, respectively.
Throughout the paper, c, c ′ , c 1 , c 2 , C, C ′ , C 1 , C 2 represent some generic positive constants that are possibly n-dependent and may differ from line to line. In contrast, we use C 0 to denote some generic positive constant independent of n.
The paper will be organized in the following way. In Section 2, we prove Theorem 1.9 and Theorem 1.11, with Theorem 1.14. The proof of Theorem 1.14 will be done in Section 3-Section 11. More specifically, in Section 3, we use the supersymmetric formalism to represent E|G ij | 2n in terms of a superintegral, in which the integrand can be factorized into several functions; Section 4 is devoted to a preliminary analysis on these functions; Section 5-Section 10 are responsible for different steps of the saddle point analysis, whose organization will be further clarified at the end of Section 5; Section 11 is devoted to the final proof of Theorem 1.14, by summing up the discussions in 3-Section 10. Finally, in Section 12, we make some further comments on possible improvements.
2. Proofs of Theorem 1.9 and Theorem 1.11 At first, (1.19) can be generalized to the generally distributed matrix with the four moment matching condition via the Green's function comparison strategy. Lemma 2.1. Assume that H is a random block band matrix, satisfying Assumptions 1.1, 1.5 and 1.13. Let κ be an arbitrarily small positive constant and ε 2 be any sufficiently small positive constant. There is N 0 = N 0 (n), such that for all N ≥ N 0 and all z ∈ D(N, κ, ε 2 ), we have for some positive constant C 0 uniform in n and z.
By the definition of stochastic domination in Definition 1.6, we can get the following corollary from Lemma 2.1 immediately.
In the sequel, at first, we prove Lemma 2.1 from Theorem 1.14 via the Green's function comparison strategy. Then we prove Theorem 1.9, using Lemma 2.1. Finally, we will show that Theorem 1.11 follows from Theorem 1.9 simply. Then we use H k to represent the N ×N random Hermitian matrix whose (i, j)-th entry is h ij if ̟(i, j) ≤ k, and is h g ij otherwise. Especially, we have H 0 = H g and H ς(N ) = H. Correspondingly, we define the Green's functions by (2.4) Then, we write where H 0 k is obtained via replacing h ab and h ba by 0 in H k (or replacing h g ab and h g ba by 0 in H k−1 ). In addition, we denote Set ε 3 ≡ ε 3 (γ, ε 1 ) to be a sufficiently small positive constant, satisfying (say) where γ is from Assumption 1.1 (iii) and ε 1 is from (1.7). For simplicity, we introduce the following parameters for ℓ = 1, . . . , ς(N ) and i, j = 1, . . . , N , where C is a positive constant. Here we used the notation δ IJ = 1 if two index sets I and J are the same and δ IJ = 0 otherwise. It is easy to see that for η ≤ M −1 N ε2 , we have by using (1.8). Now, we compare G k−1 (z) and G k (z). We will prove the following lemma.
Lemma 2.3. Suppose that the assumptions in Lemma 2.1 hold. Additionally, we assume that for some sufficiently small positive constant ε 3 satisfying (2.5), uniformly for z ∈ D(N, κ, ε 2 ). Let n ∈ N be any given integer. Then, if we also have for any k = 1, . . . , ς(N ).
Proof of Lemma 2.3. Fix k and omit the argument z from now on. At first, under the conditions (2.8) and (2.9), we show that (2.11) To see this, we use the expansion with (2.4) which implies that for a sufficiently large constant D > 0 where the first step follows from (1.12), (2.8), Definition 1.6 and the trivial bound η −1 for the Green's functions, and the second step follows from (2.9), (2.7) and the fact N 2ε3 Now, recall (2.4) again and expand G k−1 (z) and G k (z) around G 0 k (z), namely We always choose m to be sufficiently large, depending on ε 3 but independent of N . Then, we can write (2.14) At first, by taking m sufficiently large, from (2.8) and (1.12), we have the trivial bound For R ℓ,ij and S ℓ,ij , we split the discussion into off-diagonal case and diagonal case. In the case of i = j, we keep the first and the last factors of the terms in the expansions of ((G 0 , and bound the factors in between by using (1.12) and (2.8), resulting the bound For i = j, we only keep the first factor of the terms in the expansions of ((G 0 By substituting the expansion (2.13) into (2.18), we can write where A(i, j) is the sum of the terms which depend only on H 0 k and the first four moments of h ab , and R d (i, j) is the sum of all the other terms. We claim that R d (i, j) satisfies the bound for some positive constant C. Now, we verify (2.20). According to (2.11) and the fact that the sequence R 1,ij , . . . , R m,ij ,R m+1,ij , as well as S 1,ij , . . . , S m,ij ,S m+1,ij , decreases by a factor N ε3 / √ M in magnitude, it is not difficult to check the leading order terms of R k−1 (i, j) are of the form and those of R k (i, j) are of the form Every other term has at least 6 factors of h ab or h g ab or their conjugates, thus their sizes are typically controlled by M −3 (N η) −n , i.e. they are subleading. Hence, it suffices to bound (2.21) and (2.22). In the sequel, we only estimate (2.21) in details, (2.22) can be handled in the same manner. Now, the five factors of h ab or h ba within the R ℓ,ij 's in (2.21) are independent of the rest and estimated by M −5/2 . For the remaining factors from G 0 k , we use (2.11) to bound 2n of them and use (2.8) to bound the rest. In the case that i = j and {i, j} = {a, b}, by the discussion above, we must have an off-diagonal entry of G 0 k in the product we keep the off-diagonal entry and bound the other by N ε3 from assumption (2.8). Hence, by using (2.16) and (2.23), we see that for some i r , j r ∈ {i, j, a, b} with i r = j r , r = 1, . . . , (q ℓ + q ′ ℓ ), the following bound holds where the last step follows from (2.11) and Hölder's inequality. In case of i = j but {i, j} = {a, b}, we keep an entry in the product (G 0 k ) ij ′ (G 0 k ) i ′ j and bound the other by N ε3 . We remark here in this case the entry being kept can be either diagonal or off-diagonal. Consequently, for some i r , j r ∈ {i, j, a, b}, r = 1, . . . , (q ℓ + q ′ ℓ ), we have the bound which together with the assumption (2.9) for E|(G k−1 ) ij | 2n and the definition of Θ ℓ,ij 's in (2.6), we can get Hence, we completed the proof of Lemma 2.3.
To show (2.1), we also need the following lemma.
According to our assumption, both k 1 and k 2 are of the order log N . Now, we have where in the second step, we used the fact that the function y → yImG ℓℓ (E + iy) is monotonically increasing, the condition (2.27) and the fact η ≤ η 0 . Hence, we conclude the proof of Lemma 2.4. Now, with Theorem 1.14, Lemma 2.3 and Lemma 2.4, we can prove Lemma 2.1.

2.2.
Proof of Theorem 1.9. Without loss of generality we can assume that M ≤ N (log N ) −10 , otherwise, Proposition 1.7 implies (1.16) immediately. Now, recalling the notation defined in (1.3), we denote the Green's function of H (i) as with a little abuse of notation. We only need to consider the diagonal entries G ii below, since the bound for the off-diagonal entires of G(z) is implied by (2.1) directly. Set (2.31) We introduce the notation We have the following lemma.
Lemma 2.5. Suppose that H satisfies Assumptions 1.1, 1.5 and 1.13. We have The proof of Lemma 2.5 will be postponed. Using Lemma 2.5, we see that, with high probability, (2.31) is a small perturbation of the self-consistent equation of m sc , i.e. (1.13), considering a σ 2 ai = 1. To control Λ d , we use a continuity argument from [12].
We remind here that in the sequel, the parameter set of the stochastic dominance is always D(N, κ, ε 2 ), without further mention. We need to show that and first we claim that it suffices to show that Indeed, if (2.34) were proven, we see that with high probability either Λ d > N − ε 2 4 or Λ d ≺ 1/ √ N η ≤ N − ε 2 2 for z ∈ D(N, κ, ε 2 ). That means, there is a gap in the possible range of Λ d . Now, choosing ε in (1.14) to be sufficiently small, we are able to get for η = M −1 N ε2 , By the fact that Λ d is continuous in z, we see that with high probability, Λ d can only stay in one side of the range, namely, (2.33) holds. The rigorous details of this argument involve considering a fine discrete grid of the z-parameter and using that G(z) is Lipschitz continuous (albeit with a large Lipschitz constant 1/η). The details are found in Section 5.3 of [12]. Hence, what remains is to verify (2.34). The proof of (2.34) is almost the same as that for Lemma 3.5 in [14]. For the convenience of the reader, we sketch it below without reproducing the details. We set We also denote u := (u 1 , . . . , u N ). By the assumption Λ d ≤ N − ε 2 4 , we have Now we rewrite (2.31) as By using (2.32), Lemma 5.1 in [14], and the assumption Λ d ≤ N − ε 2 4 , we can show that One can refer to the derivation of (5.14) in [14] for more details. Averaging over i for (2.37) and (2.38) leads tom and Plugging (2.36) and (2.32) into (2.40) yields , and Lemma 5.2 in [14], to (2.39), we have where in the first step we have used the fact that z ∈ D(N, κ, ε 2 ) thus away from the edges of the semicircle law. Now, we combine (2.37), (2.38) and (2.39), resulting We just take the above identity as the definition of w i . Analogously, we set w := (w 1 , . . . , w N ) ′ . Then (2.40) and (2.43) imply where the second step follows from the fact |z + m sc (z)| ≥ 1 in D(N, κ, ε 2 ) (see (5.1) in [14] for instance), (2.41) and (2.42), and in the last step we used (2.42) again. Now, using the fact m 2 sc (z) = (m sc (z) + z) −2 (see (1.13)), we rewrite (2.43) in terms of the matrix T introduced in (1.5) as
Proof of Lemma 2.5. For simplicity, we omit the variable z from the notation below. At first, we recall the elementary identity by Schur's complement, namely, (2.47) where we used the notation h i i to denote the i-th column of H, with the i-th component deleted. Now, we use the identity for a, b = i (see Lemma 4.5 in [12] for instance), (2.48) By using (1.10) and the large deviation estimate for the quadratic form (see Theorem C.1 of [12] for instance), we have where we have used the fact that a σ 2 ai = 1 in the first inequality above. Plugging (1.20) and (2.50) into (2.48) and using Corollary 2.2 we obtain (2.51) In addition, (1.20), (2.48) and (2.51) lead to the fact that Now, using (2.31), (2.47), (2.49) and (2.52), we can see that Therefore, we completed the proof of Lemma 2.5.

2.3.
Proof of Theorem 1.11. With Theorem 1.9, we can prove Theorem 1.11 routinely. At first, due to Definition 1.6 and the fact that G ab (z) and m sc (z) are Lipschitz functions of z with Lipschitz constant η −1 , it is easy to strengthen (1.16) to for some positive constant C due to the fact that m sc (z) ∼ 1. Recalling the normalized eigenvector u i = (u i1 , . . . , u iN ) corresponding to λ i , and using the spectral decomposition, we have (2.55) For any |λ i | ≤ √ 2 − κ, we set E = λ i on the r.h.s. of (2.55) and use (2.54) to bound the l.h.s. of it. Then we obtain Choosing η = N −1+ε2 above and using the fact that ε 2 can be arbitrarily small, we can get (1.17). Hence, we completed the proof of Theorem 1.11.

Supersymmetric formalism and integral representation for the Green's function
In this section, we will represent E|G ij (z)| 2n for the Gaussian case by a superintegral. The final representation is stated in (3.30). We make the convention here, for any real argument in an integral below, its region of the integral is always R, unless specified otherwise.

2)
where I = {i 1 , . . . , i ℓ }, and J = {j 1 , . . . , j ℓ }. Now, we introduce the superbosonization formula for superintegrals. Let χ = (χ ij ) be an ℓ × r matrix with Grassmann entries, f = (f ij ) be an ℓ × r matrix with complex entries. In addition, we denote their conjugate transposes by χ * and f * respectively. Let F be a function of the entries of the matrix Let A(χ, χ * ) be the Grassmann algebra generated by χ ij 's andχ ij 's. Then we can regard F as a function defined on a complex vector space, taking values in A(χ, χ * ). Hence, we can and do view F (S(f , f * ; χ, χ * )) as a polynomial in χ ij 's andχ ij 's, in which the coefficients are functions of f ij 's and f ij 's. Under this viewpoint, we state the assumption on F as follows.
is a holomorphic function of f ij 's andf ij 's if they are regarded as independent variables, and F is a Schwarz function of Ref ij 's and Imf ij 's, by those we mean that all of the coefficients of F (S(f , f * ; χ, χ * )), as functions of f ij 's andf ij 's, possess the above properties.
Proposition 3.3 (Superbosonization formula for the nonsingular case, [17]). Suppose that F satisfies is a positive-definite Hermitian matrix; ω and ξ are two Grassmann matrices, and all of them are r × r. Here and dμ(·) is defined by under the parametrization induced by the eigendecomposition, namely, Here dµ(V ) is the Haar measure onŮ (r), and ∆(·) is the Vandermonde determinant. In addition, the integral w.r.p.t. x ranges over U (2), that w.r.p.t. y ranges over all positive-definite matrices.
For the singular case, i.e. r > ℓ, we only state the formula for the case of r = 2 and ℓ = 1, which is enough for our purpose. We can refer to formula (11) in [3] for the result under more general setting on r and ℓ. Proposition 3.4 (Superbosonization formula for the singular case, [3]). Suppose that F satisfies Assumption 3.2. If r = 2 and ℓ = 1, we have

4)
where y is a positive variable; x is a 2-dimensional unitary matrix; ω = (ω 1 , ω 2 ) ′ and ξ = (ξ 1 , ξ 2 ) are two vectors with Grassmann components. In addition, w is a unit vector, which can be parameterized by Moreover, the differentials are defined as In addition, the integral w.r.p.t. x ranges over U (2).
In our discussion, for w, we will adopt the parametrization for convenience. Accordingly, we can get dw = vdvdθ.

3.2.
Initial representation. For a = 1, 2 and j = 1, . . . , W , we set For each j and each a, Φ a,j is a vector with complex components, and Ψ a,j is a vector with Grassmann components. In addition, we use Φ * a,j and Ψ * a,j to represent the conjugate transposes of Φ a,j and Ψ a,j respectively. Analogously, we adopt the notation Φ * a and Ψ * a to represent the conjugate transposes of Φ a and Ψ a , respectively. We have the following integral representation for the moments of the Green's function.

3.3.
Averaging over the Gaussian random matrix. Recall the variance profileS in (1.2). Now, we take expectation of the Green's function, i.e average over the random matrix. By elementary Gaussian integral, we get and for each j = 1, . . . , W , the matricesX j ,Y j ,Ω j andΞ j are 2 × 2 blocks of a supermatrix, namely, Remark 3.6. The derivation of (3.6) from (3.5) is quite standard. We refer to the proof of (2.14) in [19] for more details and will not reproduce it here.

3.4.
Decomposition of the supermatrices. From now on, we split the discussion into the following three cases For each case, we will perform a decomposition for the supermatrixS j (j = p or q). For a vector v and some index set I, we use v I to denote the subvector obtained by deleting the i-th component of v for all i ∈ I. Then, we adopt the notation Here, for A =X j ,Y j ,Ω j orΞ j , the notation A I is defined via replacing Φ a,j , Ψ a,j , Φ * a,j and Ψ * a,j by Φ I a,j , Ψ I a,j , (Φ * a,j ) I and (Ψ * a,j ) I , respectively, for a = 1, 2, in the definition of A. In addition, the notation A [i] is defined via replacing Φ a,j , Ψ a,j , Φ * a,j and Ψ * a,j by φ a,j,i , ψ a,j,i ,φ a,j,i andψ a,j,i respectively, for a = 1, 2, in the definition of A. Moreover, for A =S j ,X j ,Y j ,Ω j orΞ j , we will simply abbreviate A {a,b} and A {a} by A a,b and A a , respectively. Note thatS [i] j is of rank-one. For Case 1, due to symmetry, we can assume α = β = 1. Then we extract two rank-one supermatrices fromS p andS q such that the quantitiesφ 2,p,1 φ 1,p,1 andφ 1,q,1 φ 2,q,1 can be expressed in terms of the entries of these supermatrices. More specifically, we decompose the supermatrices Consequently, we can writeφ For Case 2, due to symmetry, we can assume that α = 1, β = 2. Then we extract two rank-one supermatrices fromS p , namely,S Consequently, we can writeφ Finally, for Case 3, due to symmetry, we can assume that α = 1. Then we extract only one rank-one supermatrix fromS p , namely,S Consequently, we can writē Since the discussion for all three cases are similar, we will only present the details for Case 1. More specifically, in the remaining part of this section and Section 4 to Section 10, we will only treat Case 1. In Section 11, we will sum up the discussions in the previous sections and explain how to adapt them to Case 2 and Case 3, resulting a final proof of Theorem 1.14.
3.6. Parametrization for X, B. Similarly to the discussion in [19], we start with some preliminary parameterization. At first, we do the eigendecomposition Further, we introduce Especially, we have V 1 = T 1 = I. Now, we parameterize P 1 , Q 1 , V j and T j for all j = 2, . . . , W as follows Under the parametrization above, we can express the corresponding differentials as follows.
In addition, for simplicity, we do the change of variables Note that the Berezinian of such a change is 1. After this change, P(Ω, Ξ, X, B, y [1] , w [1] ) turns out to be independent of P 1 and Q 1 .
To facilitate the discussions in the remaining part, we introduce some additional terms and notation here. Henceforth, we will employ the notation for the collection of inverse matrices and reciprocals, respectively. For a matrix or a vector A under discussion, we will use the term A-variables to refer to all the variables parametrizing it. For example, X j -variables means x j,1 and x j,2 , andX-variables refer to the collection of allX j -variables. Analogously, we can define the terms T -variables, y [1] -variables , Ω-variables and so on. We use another term A-entries to refer to the non-zero entries of A. Note thatX j -variables are justX j -entries. However, for T j , they are different, namely, Analogously, we will also use the term T -entries to refer to the collection of all T j -entries. Then V -entries, w [1] -entries, etc. are defined in the same manner. It is easy to check that Q −1 1 -entries are the same as Q 1 -entries, up to a sign, as well, T −1 j -entries are the same as T j -entries, for all j = 2, . . . , W . Moreover, to simplify the notation, we make the convention here that we will frequently use a dot to represent all the arguments of a function. That means, for instance, we will write P(Ω, Ξ,X,B, V, T ) as P(·) if there is no confusion. Analogously, we will also use the abbreviation Q(·), F (·), A(·), and so on.
Let a := {a 1 , . . . , a ℓ } be a set of variables, we will adopt the notation to denote the class of all multivariate polynomials p(a) in the arguments a 1 , . . . , a ℓ such that the following three conditions are satisfied: (i) The total number of the monomials in p(a) is bounded by κ 1 ; (ii) the coefficients of all monomials in p(a) are bounded by κ 2 in magnitude; (iii) the power of each a i in each monomial is bounded by κ 3 , for all i = 1, . . . , ℓ. For example, In addition, we define the subset of Q(a; κ 1 , κ 2 , κ 3 ), namely, consisting of those polynomials in Q(a; κ 1 , κ 2 , κ 3 ) such that the degree is bounded by κ 3 , i.e. the total degree of each monomial is bounded by κ 3 . For example

Preliminary discussion on the integrand
In this section, we perform a preliminary analysis on the factors of the integrand in (3.17). For convenience, we introduce the matrix Recall the parametrization ofB j ,X j , T j and V j in (3.22) and (3.24), as well as the matrices defined in (1.24). According to the discussion in [19], there are three types of saddle points of this function, namely, • Type I : (Actually, since θ j and v j vary on continuous sets, it would be more appropriate to use the term saddle manifolds.) Note that at each type of saddle points, we have (B j , T j ) = (D ± , I) for all j. We will see that the main contribution to the integral (3.17) comes from some small vicinities of the Type I saddle points. Furthermore, the contributions from all the Type I saddle points are the same, which can be explained as follows. At first, by the definition in (3.23), we have V 1 = I. If we regard θ j 's in the parametrization of V j 's as fixed parameters, it is easy to see that there are totally 2 W choices of Type I saddle points. Moreover, if v j = 1, we can do the transform Consequently, it suffices to consider two saddle points corresponding toX 1 = D ± or D ∓ , respectively. Furthermore, the contributions to the integral (3.17) from the vicinities of these two saddle points are also the same. To see this, we recall the fact that the original integrand in (3.17) is a function of the entries of X j = P −1 jX j P j . Now we do the transform P j → IP j andX j → IX j I for all j = 1, . . . , W to change one saddle in (4.2) to the other. Now, since the Haar measure onŮ (2) is invariant under the shift P 1 → IP 1 , the integral over P 1 -variables is unchanged. That means, for Type I saddle points, it suffices to consider • Type I' : For each j, (B j , T j ,X j , V j ) = (D ± , I, D ± , I).
In summary, the total contribution to the integral (3.17) from all Type I saddle points is 2 W times that from the Type I' saddle point. Following the discussion in [19], we will show in Section 5 that both K(X, V )−K(D ± , I) and L(B, T )− L(D ± , I) have positive real parts, bounded by some positive quadratic forms from below, which allows us to perform the saddle point analysis. In addition, it will be seen that in a vicinity of Type I' saddle point, exp{−M (K(X, V ) + L(B, T ))} is approximately Gaussian. 4.2. Q(Ω, Ξ, ω [1] , ξ [1] , P 1 , Q 1 , X [1] , y [1] , w [1] ). The function Q(·) contains both the Ω, Ξ-variables from P(·), and the P 1 , Q 1 , X [1] , y [1] , w [1] -variables from F (·). In addition, note that in the integrand in (3.17), Q(·) is the only factor containing the ω [1] and ξ [1] -variables. Hence, we can compute the integral Q Ω, Ξ, P 1 , Q 1 , X [1] , y [1] , w [1] := dω [1] dξ [1] Q Ω, Ξ, ω [1] , ξ [1] , P 1 , Q 1 , X [1] , y [1] , w [1] (4.3) at first. The explicit formula for Q(·) is complicated and irrelevant for us. From (3.29) and the definition of the Grassmann integral, it is not difficult to see that Q(·) is a polynomial of the (X [1] ) −1 , (y [1] ) −1 , w [1] , P 1 , Q 1 , Ω and Ξ-entries. In principle, for each monomial in the polynomial Q(·), we can combine the Grassmann variables with P(·), then perform the integral over Ω and Ξ-variables, whilst we combine the complex variables with F (·), and perform the integral over X [1] , y [1] , w [1] , P 1 and Q 1 -variables. A formal discussion on Q(·) will be given in Section 6.1. However, the terms from Q(·) turn out to be irrelevant in our proof. Therefore, in the arguments with Q(·) involved, a typical strategy that we will adopt is as follows: we usually neglect Q(·) at first, and perform the discussion on P(·) and F (·) separately, at the end, we make necessary comments on how to slightly modify the discussions to take Q(·) into account.
4.3. P(Ω, Ξ,X,B, V, T ). We will mainly regard P(·) as a function of the Ω and Ξ-variables. As mentioned above, we also have some Ω and Ξ-variables from the irrelevant term Q(·). But we temporarily ignore them and regard as if the integral over Ω and Ξ-variables reads 4.4. F (X,B, V, T, P 1 , Q 1 , X [1] , y [1] , w [1] ). Observe that F is the only term containing the energy scale η.
We notice that the factor e inσ [1] p e −inσ [1] q in (4.6) actually comes from the term (3.20). This factor brings a strong oscillation to the integrand in the integral (4.6). In Case 2, an analogous factor will appear, resulting the same estimate as (4.6). However, in Case 3, such an oscillating factor is absent, then the estimate for the counterpart of the integral in (4.6) is of order 1/N η instead of 1/(N η) n+1 . The detailed analysis will be presented in Sections 10 and 11.

Saddle points and vicinities
In this section, we study the saddle points of K(X, V ) and L(B, T ) and deform the contours of thê B-variables to pass through the saddle points. Then we introduce and classify some small vicinities of these saddle points. The derivation of the saddle points of K(X, V ) and L(B, T ) in Section 5.1 and 5.2 below is essentially the same as the counterpart in [19], the only difference is that we are working under a more general setting on S. Hence, in Section 5.1 and 5.2, we just sketch the discussion, list the results, and make necessary modifications to adapt to our setting. In the sequel, we employ the notation As mentioned above, later we also need to deform the contours, and discuss the integral over some vicinities of the saddle points, thus it is convenient to introduce a notation for the integral over specific domains. To this end, for a = 1, 2, we use I b a and I x a to denote generic domains of b a and x a respectively. Analogously, we use I t and I v to represent generic domains of t and v, respectively. These domains will be specified later. Now, for some collection of domains, we introduce the notation For example, we can write (3.30) as which is the integral over the full domain.
where we used the notation introduced in (5.1), and the functions ℓ(·) and ℓ S (·) are defined as Following the discussion in [19] with slight modification (see Section 3 therein), we see that for |E| ≤ √ 2 − κ, the saddle point of L(B, T ) is where D ± is defined in (1.24). For simplicity, we will write ( to pass through the saddle points ofB-variables, based on the following lemma which will be proved in Section 7. Lemma 5.1. With the notation introduced in (5.2), we have We introduce the notation r j,1 = |b j,1 |, r j,2 = |b j,2 |, j = 1, . . . , W.
Along the new contours, we have the following lemma.
for some positive constant c.

5.2.
Saddle points of K(X, V ). Analogously, recalling the definition in (5.6), we can write where ℓ(·) is defined in the first line of (5.6) and ℓ S (X, V ) is defined as Analogously to the notation L(D ± , I), we will use K(D ± , I) to represent the value of K(X, V ) at (X j , V j ) = (D ± , I) for all j = 1, . . . , W . In addition, K(D + , I) and K(D − , I) are defined in the same manner. Observing that we have Moreover, we employ the notation We will need the following elementary observations that are easy to check from (5.18) and (5. In addition, we introduce the W × W matrix 21) and the 2W × 2W matrices where S v depends on the V -variables according to (5.21). Here we regard V -variables as fixed parameters. Due to the fact |(V k V * j ) 12 | ∈ I, it is easy to see that S v is a weighted Laplacian of a graph with 2W vertices. In particular, S v ≤ 0. By the definition (5.21), one can see that S v ii = 0 for all i = 1, . . . , W . Consequently, we can obtain k =j Similarly to (1.4), we get where c 0 is the constant in Assumption 1.1 (ii). Moreover, it is not difficult to see from the definitions in (5.16), (5.21) and (5.22) that where we used the notation x : Then, recalling the parametrization of V j 's in (3.24), we have the following lemma.
Lemma 5.3. Assume that x j,1 , x j,2 ∈ Σ for all j = 1, . . . , W . We have for some positive constant c. In addition, ReK(X, V ) attains its minimum 0 at the following three types of saddle points which are the restrictions of three types of saddle points in Section 4.1, onX and V -variables.
Remark 5.4. The Type I saddle points of (X, V ) are exactly those points satisfying In Lemma 5.3, we wrote them in terms ofX j , v j and θ j in order to evoke the parameterization in (3.22) and (3.24).
Proof. By (5.15), (5.24), the definitions of the functions ℓ(·) in (5.7) and k(·) in (5.4), we can write By using (5.25) and the fact |x j,a | = 1 for all j = 1, . . . , W and a = 1, 2, we can obtain via elementary calculation In light of the fact S v ≤ 0 and (5.23), we have At first, by the second term on the r.h.s. of (5.26), we see that for any solution to (5.29), which implies that x j,a = a + or a − for all j = 1, . . . , W and a = 1, 2, by recalling the definition (5.25) and the definitions of a + and a − in Section 1.4. Consequently, for each j,X j can only be one of D ± , D ∓ , D + and D − . Suppose thatX 1 = D + , we claim thatX j = D + for all j. Otherwise, owing to the fact that the graph G is connected, there exists {i, j} ∈ E such that s ij > 0 andX i = D + butX j = D ± , D ∓ or D − . Without loss of generality, we assumeX j = D ± . In this case, we use the fact which follows from (5.26) directly. Now, by the assumptionX i = D + whileX j = D ± , we have which together with (5.30) implies that contradicting to (5.29). Analogously, we can show thatX j can not be D ∓ or D − . Consequently, for a solution to (5.29), ifX 1 = D + , we have shown thatX j = D + for all j. Similarly, we can show that if X 1 = D − , thenX j = D − for all j. These two kinds of solutions are collected as the Type II and Type III saddle points, respectively. What remains is to show that ifX 1 = D ± or D ∓ , the solution to (5.29) must be one of the Type I saddle points. We only show the case ofX 1 = D ± . Assume that {1, i} ∈ E in the graph G, i.e. s 1i > 0. At first, similarly to the discussion from (5.30) to (5.31), we can show thatX i can only be D ± or D ∓ . If X i = D ± , then by using (5.30) with j = 1, we have and the equality holds if and only if V i = I, according to the assumption V 1 = I and the definition in (5.21). The discussion on the case ofX i = D ∓ is analogous. Consequently, we have Since the graph G is connected, we can show that (5.32) holds for all i = 1, . . . , W . Analogously, if X 1 = D ∓ , we can show that V * jX j V j = D ∓ for all j = 1, . . . , W . Therefore, we completed the proof of Lemma 5.3. Now, we define the following domains .
where the superscripts b and x indicate that these will be domains of the corresponding variables. In order to define the vicinities of the Type I saddle points properly, we introduce the permutation ǫ j of {1, 2}, for each triple (x j,1 , x j,2 , v j ). Specifically, recalling the fact of u j = 1 − v 2 j from (3.24), we define v j,ǫj ≡ v j,ǫj (ǫ 1 ) := v j 1(ǫ j = ǫ 1 ) + u j 1(ǫ j = ǫ 1 ).
In the following discussion, the parameter ε 0 in Θ is allowed to be different from line to line. However, given ε 1 in (1.7), we shall always choose ε 2 in (1.15) and ε 0 in (5.33) according to the rule for some sufficiently large C > 0. Consequently, by Assumption 1.13 we have To prove Theorem 1.14, we split the task into three steps. The first step is to exclude the integral outside the vicinities. Specifically, we will show the following lemma.
Lemma 5.6. Under Assumptions 1.1 and 1.13, we have, Remark 5.7. The first three terms on the r.h.s. of (5.39) correspond to the integrals over vicinities of the Type I, II, and III saddle points, respectively. Note that for the first term, we have used the argument in Section 4.1, namely, the total contribution of the integral over the Type I vicinity is 2 W times that over the Type I' vicinity.
The second step, is to estimate the integral over the Type I vicinity. We have the following lemma.
The last step is to show that the integral over the Type II and III vicinities are also negligible.
. Therefore, the remaining task is to prove Lemmas 5.1, 5.6, 5.8 and 5.9. For the convenience of the reader, we outline the organization of the subsequent part as follows.
At first, the proofs of Lemmas 5.1 and 5.6 require a discussion on the bound of the integrand, especially on the term A(·), which contains the integral over all the Grassmann variables. To this end, we perform a crude analysis for the function A(·) in Section 6 in advance, with which we are able to prove Lemmas 5.1 and 5.6 in Section 7.
Then, we can restrict ourselves to the integral over the vicinities, i.e., prove Lemmas 5.8 and 5.9. It will be shown that in the vicinity the factor exp{−ML(B, T )} is approximately the product of a complex Gaussian measure of theB-variables and a real Gaussian measure of the tvariables. Here, by "complex Gaussian measure" we mean a function of the form exp{−u ′ Au}, where u is a real vector, while A is a complex matrix with positive-definite Hermitian part. In order to estimate the integral against this Gaussian measure (in an approximate sense), we shall get rid of the o(1) term in the integral of the form for some function f, which however cannot be done directly, owing to the fact that A is complex. In our case, this problem can be solved by further deforming the contours of theB-variables, following the steepest descent paths exactly in the vicinity. By doing this, we can get a real Gaussian measure, thus the remainder terms can be easily controlled when integrate against this measure. The situation for exp{−MK(X, V )} is a little bit more complicated due to different types of the saddle points. However, in the Type I vicinity, we can do the same thing. Hence, in Section 8, we will analyze the Gaussian measure (in an approximate sense) exp{−M (K(X, V ) +L(B, T ))}, especially, we will further deform the contours ofX andB-variables in the vicinities, whereby we can prove Lemmas 5.8 in Section 9. In the Type II and III vicinities, we will bound exp{−MK(X, V )} by its absolute value directly. It turns out to be enough for our proof of Lemma 5.9, which is given in Section 10.

Crude bound on A(X,B, V, T )
In this section, we provide a bound on the function A(·) in terms of theB, T -variables, which holds on all the domains under discussion in the sequel. Here, by crude bound we mean a bound of order exp{O(W N ε2 )}), which will be specified in Lemma 6.1 below. By the definition in (3.31), we see that A(·) is an integral of the product of Q(·), P(·) and F (·). We will mainly treat Q(·) as a function of ω [1] , ξ [1] variables, treat P(·) as a function of Ω, Ξ-variables, and treat F (·) as a function of X [1] , y [1] , w [1] , P 1 , Q 1variables. However, in the function Q(·), we actually have every argument mentioned above. Hence, we perform the integral over ω [1] -variables and ξ [1] -variables for Q(·) at first. The resulting function Q(·) turns out to be a polynomial of the remaining arguments. As mentioned in Section 4.2, a typical procedure we will adopt is to ignore Q(·) at first, then estimate the integrals of P(·) and F (·), which are denoted by P(·) and F(·), respectively (see (4.4) and (4.5)), finally, we make necessary comment on how to modify the bounding scheme to take Q(·) into account, whereby we can get the desired bound for A(·).
By the definition in (6.4), Q(·) is the integral of Q(·) over the ω [1] and ξ [1] -variables. Now, we regard all the other variables in S 5 , except ω [1] and ξ [1] -variables, as parameters. By the definition of Grassmann integral, we know that only the coefficient of the highest order term k=p,q a=1,2 ω k,a ξ [1] k,a in Q(·) survives after integrating ω [1] and ξ [1] -variables out. Then, it is easy to see (6.5) from (6.11), completing the proof.
6.2. Integral of P. In this subsection, we temporarily ignore the Ω and Ξ-variables from Q(·), and estimate P(X,B, V, T ) defined in (4.4). Recalling r j,1 and r j,2 defined in (5.12), we can formulate our estimate as follows.
Lemma 6.4. Suppose that the assumptions in Lemma 6.1 hold. We have Proof. We start with one factor from P(·) (see (3.28)), namely Here p ℓ (·) is a polynomial inX −1 j ,B −1 j , V j , T j , Ω j and Ξ j -entries with bounded degree and bounded coefficients. Here we used the fact that V * j and T −1 j -entries are the same as V j and T j -entries, respectively, up to a sign. Moreover, if we regard p ℓ (·) as a polynomial of Ω j and Ξ j -entries, it is homogeneous, with degree 2ℓ, and the total degree for Ω j -variables is ℓ, thus that for Ξ j -entries is also ℓ. More specifically, we can write where we used the notation in (6.1) and denoted α = (α 1 , . . . , α ℓ ) and β = (β 1 , . . . , β ℓ ). It is easy to verify that ̟ j is of the form (6.13) by taking Taylor expansion with respect to the Grassmann variables. The expansion in (6.13) terminates at ℓ = 4, owing to the fact that there are totally 8 Grassmann variables from Ω j and Ξ j . In addition, it is also easy to check that p ℓ,α,β (·) is a polynomial ofX −1 j ,B −1 j , V j , T j -entries with bounded degree and bounded coefficients, which implies that there exist two positive constants C 1 and C 2 , such that |p ℓ,α,β (·)| ≤ C 1 r −1 j,1 + r −1 j,2 + t j + 1 C2 (6.14) uniformly in ℓ, α and β. Here we used the fact thatX −1 j and V j -entries are all bounded and T j -entries are bounded by 1 + t j . Now, we go back to the definition of P(·) in (3.28) and study the last factor. Similarly to the discussion above, it is easy to see that for k = p or q, wherep 0 (·) = detX k / detB k andp ℓ,α,β (·)'s are some polynomials ofX k ,B −1 k , V k , T k -entries with bounded degree and bounded coefficients. Similarly, we have for some positive constants C 1 and C 2 .
Here we used the notation (6.1). In addition, we introduce the matrix It is easy to check j,ks jk T rΩ j Ξ k = ΩH Ξ ′ .
By using the Gaussian integral formula for the Grassmann variables (3.2), we see that for each ℓ, α and β, we have Thus we completed the proof.
Hence, we completed the proof of Lemma 6.5.
6.4. Summing up: Proof of Lemma 6.1. In the discussions in Sections 6.2 and 6.3, we ignored the irrelevant factor Q(·). However, it is easy to modify the discussion slightly to take this factor into account, whereby we can prove Lemma 6.1.

Proofs of Lemmas 5.1 and 5.6
In this section, with the aid of Lemma 6.1, we prove Lemmas 5.1 and 5.6. According to Lemmas 5.2 and 5.3, one can see that away from the saddles, ReL(B, T ) and ReK(X, V ) increase quadratically inB-variables andX-variables, respectively. Hence, it is easy to control the integral (5.2) over these variables outside the vicinities. However, likeB-variables, the domain of t-variables is also not compact. This forces us to analyze the exponential function carefully for any fixedB-variables.
Recall the definition of the sector K in (6.2). For b 1 ∈ K W and b 2 ∈K W , we have min j,k for some positive constant c depending on κ from (1.15). From now on, we regard M(t) as a measure of the t-variables and study it in the following two regions separately: Roughly speaking, when t ∈ I W −1 , we will see that M(t) can be bounded pointwisely by a Gaussian measure. More specifically, we have the following lemma.
Lemma 7.1. With the notation above, we have However, the behavior of M(t) for t ∈ R W −1 + \ I W −1 is much more sophisticated. We will not try to provide a pointwise control of M(t) in this region. Instead, we will bound the integral of q(t) against M(t) over this region, for any given monomial q(·) of interest. More specifically, recalling the definition of Θ in (5.33) and the spanning tree G 0 = (V, E 0 ) in Assumption 1.1, and additionally setting we have the following lemma.
j be a monomial of t-variables, with powers n j = O(1) for all j = 2, . . . , W . We have Remark 7.3. Roughly speaking, by Lemma 7.2 we see that the integral of q(t)-variables against the measure M(t) over the region R W −1 + \ I W −1 is exponentially small, owing to the fact Θ 2 ≫ W 2 log N .
We will postpone the proofs of Lemmas 7.1 and 7.2 to the end of this section. In the sequel, at first, we prove Lemmas 5.1 and 5.6 with the aid of Lemmas 6.1, 7.1 and 7.2. Before commencing the formal proofs, we mention two basic facts which are formulated as the following lemma. • For the smallest eigenvalue of S (1) , there exists some positive constant c such that • Let ̺ = (ρ 2 , . . . , ρ W ) ′ be a real vector and ρ 1 = 0. If there is at least one α ∈ {2, . . . , W } such Proof. Let ̺ = (ρ 2 , . . . , ρ W ) ′ be a real vector and ρ 1 = 0. Now, we assume |ρ α | = max β=2,...,W |ρ β |. Then where the second step follows from Assumption 1.1 (iv) and Cauchy-Schwarz inequality. Analogously, we have according to the definition of Θ in (5.33). Hence, we completed the proof.
Recalling the notation defined in (5.2) and the facts |x j,a | = 1 and |b j,a | = r j,a for all j = 1, . . . , W and a = 1, 2, for any sequence of domains, we have In addition, according to Lemma 6.1, we have for some polynomialp(r, r −1 , t) with positive coefficients, and 7.1. Proof of Lemma 5.1. At first, since throughout the whole proof, the domains of x 1 , x 2 , and v-variables, namely, Σ W , Σ W and I W −1 , will not be involved, we just use * 's to represent them, in order to simplify the notation. Now, we introduce the following contours with the parameter D ∈ R + , In addition, we recall the sector K defined in (6.2). Then, trivially, we have We claim that the integrand in (5.2) is an analytic function of theB-variables. To see this, we can go back to the integral representation (3.17) and the definitions of L(B) and P(Ω, Ξ, X, B) in (3.18). Note that since exp{M log det B j } = (det B j ) M , actually the logarithmic terms in L(B) do not produce any singularity in the integrand in (3.17). In addition, according to the fact that the χ ℓ = 0 for any Grassmann variable χ and ℓ ≥ 2, the factors det is actually a polynomial of Ω j , Ξ j , X −1 j and B −1 j -entries with degree 16. The other factors containingB-variables P(·) can be checked analogously. Hence, it is easy to see that exp{−M L(·)}P(·) is analytic inB-variables. Consequently, we have Hence, to prove Lemma 5.1, it suffices to prove the following lemma. Proof. For simplicity, we use the notation (7.10) By the assumption |E| ≤ √ 2 − κ, we see that Reb j,a b k,a > 0 for all b j,a , b k,a ∈ K ∪K. Consequently, when b j,1 ∈ K and b j,2 ∈K for all j = 1, . . . , W , we have for some positive constant c dependent of κ in (1.15), where we used Assumption 1.1 (ii) and the fact that (−1) a+1 EImb j,a ≥ 0. Now, when (b 1 , b 2 ) ∈ I b,i D for i = 1, 2, 3, we have a=1,2 j r 2 j,a ≥ cD 2 for some positive constant c, which implies the trivial fact a=1,2 j Consequently, we can get from (7.10), (7.11) and (7.12) that for some positive constant c, According to the facts κ 1 = e O(W ) and κ 2 = O(1) in (7.9), it is suffices to consider one monomial iñ p(r, r −1 , t) with bounded coefficient. That means, it suffices to estimate the integral j,a r M j,a · M(t) ·q(r, r −1 , t), i = 1, 2, 3, (7.14) for some monomial q(r, r −1 , t) = where the bound on ℓ j 's and n j 's follows from the fact that κ 3 = O(1) in (7.9). Bounding t j 's by 1 trivially in the region t ∈ I W −1 and using Lemma 7.2 in the region t ∈ R W −1 + \I W −1 , we can get for i = 1, 2, 3, (7.14) ≤ e O(W 2 log N ) By the definition of A(B) in (7.2) and the assumption M ≫ W 4 we see that Consequently, by using elementary Gaussian integral, we can get the trivial bound (7.14) ≤ e O (N log N ) , i = 1, 2, 3, and then we have for i = 1, 2, 3, Thus we completed the proof.
In the sequel, we prove Lemmas 7.6 and 7.7.
Proof of Lemma 7.6. Recall (7.16) with the choice of the integration domains To simplify the integral on the r.h.s. of (7.16), we use the fact ReK(X, V ) ≥ 0 implied by (5.26), together with the facts that the x and v-variables are bounded by 1. Consequently, we can eliminate the integral over x and v-variables from the integral on the r.h.s. of (7.16). Moreover, according to (7.9), it suffices to prove instead, whereq(·) is the monomial defined in (7.15). Now, by the first inequality of (5.13), we have c(r j,a − 1) 2 + (r j,a − log r j,a − 1) · M(t). (7.20) At first, we integrate t-variables out by using Lemma 7.2, namely,  Hence, we get the bound Consequently, (7.20)-(7.23) imply that for some positive constant c, where in the last step we use the obvious fact (5.38) and the definition of A(B) in (7.2). Plugging the bound (7.24) into the l.h.s of (7.19) and taking the integral overB-variables we can see that (7.19) holds, which further implies (7.17). Therefore, we completed the proof of Lemma 7.6.
To prove Lemma 7.7, we split the exponential function into two parts. We use one part to control the integral, and the other will be estimated by its magnitude. More specifically, we shall prove the following two lemmas. Proof of Lemma 7.7. For the sake of simplicity, in this proof, we temporarily use I full to represent the l.h.s. of (7.18), i.e. the integral over the full domain, and use I I , I II and I III to represent the first three terms on the r.h.s. of (7.18). Now, combining (7.16), (7.26) and (7.25), we see that, in light of the definition of Θ in (5.33) and the assumption (5.37). Hence, we completed the proof of Lemma 7.7.
Proof of Lemma 7.8. At first, again, the polynomialp(·) in the integrand can be replaced by the monomial q(·) defined in (7.15) in the discussion below, owing to the fact that κ 1 = exp{O(W )} in (7.9). Then, the proof is similar to that of Lemma 7.6, but much simpler, since t-variables are bounded by 1 now. Consequently, we can eliminateX, t and v-variables from the integral directly and use the trivial bounds where the latter is from (5.13). Hence, it suffices to show Note that (7.28) follows from elementary Gaussian integral immediately. Therefore, we completed the proof of Lemma 7.8.
Proof of Lemma 7.9. At first, according to (5.13) and (5.26), we see both M ReL(B, T ) and M ReK(X, V ) are nonnegative on the full domain. Hence, it suffices to show one of them is larger than Θ outside the Type I, II, III vicinities. Note that for each type of vicinity, we have If the former holds, by using (5.13) and the definition of Υ b where in the second step we used the definition of Υ S in (5.35) and in the last step we used the fact . Then (7.20) and (7.30) also imply (7.26). Now, we turn to show for those j with ǫ(j) = ǫ(1), where I is defined in (4.1). Then, it suffices to consider In either case, we can show that M ReK(X, V ) ≥ Θ, analogously to case of (b Now, what remains is to show that for those (x 1 , x 2 ) ∈ Υ x A but outside the Type A vicinity (A=II, III), we have M ReK(X, V ) ≥ Θ. We only discuss the case A = II, the other is analogous. Note that outside the Type II vicinity ofX variables we have Observe that now we are already in Υ x II , which means that all x j,a 's are close to a + and far away from a − . That means, we have sin(arg(x ja )) − E/2 ∼ arg(a −1 + x ja ). Consequently, (7.36) also implies (7.35). Therefore, we completed the proof of Lemma 7.9.
Simple estimate using s 2 j = 1 + t 2 j shows that Notice that the assumption t ∈ I W −1 was used only in the last inequality. By (7.37), (7.38) and the definition (7.1), Lemma 7.1 follows immediately.
Note the total number of the choices of such J in the sum above is 2 W −1 − 1. It suffices to consider one of these sequences J ∈ {I, I c } W −1 in which there is at least one i such that J i = I c .
Recall the spanning tree G 0 = (V, E 0 ) in Assumption 1.1. The simplest case is that there exists a linear spanning tree (a path) G 0 with We first present the proof in this simplest case. Now, we only keep the edges in the path E 0 , i.e. the terms with k = j − 1 in (7.37), we also trivially discard the term 1/(1 + 2t 2 j ) from the sum 1/(1 + 2t 2 j−1 ) + 1/(1 + 2t 2 j ) in the estimate (7.38) (the first inequality), and finally we bound all M A(B)s j−1,j /4 by L defined in (7.3) from below. That means, we use the bound Note that, as a function of t,M j (t) only depends on t j−1 and t j .
Having fixed J, assume that k is the largest index such that J k = I c , i.e. t k+1 , . . . , t W ∈ I. Now, we claim that To see (7.43), we use the following elementary facts for all j = 2, . . . , W . We show (7.43) by contradiction. If (7.43) is violated, we have which together with (7.44) and (7.45) implies that .
(7.46) Using (7.46) recursively yields where in the second step we used the fact t 1 = 0. Note that (7.47) contradicts t k ∈ I c . Hence, we verified (7.43). Now, we split W j=2M j (t) into two parts. We use one to control the integral, and the other will be estimated by (7.43). Specifically, substituting (7.43) into (7.42) we have Therefore, what remains is to estimate the integral in (7.48), which can be done by elementary Gaussian integral step by step. More specifically, using (7.44) and (7.45) and the change of variable t j /t j−1 −1 → t j in case of t j−1 ∈ I c and t j − t j−1 → t j in case of t j−1 ∈ I, it is elementary to see that for any ℓ = O(W ), Starting from j = W , using (7.49) to integrate (7.48) successively, the exponent of t j increases linearly (n j = O(1)), thus we can get . Then (7.4) follows from the definition of L in (7.3) and (5.38). Hence, we completed the proof for (7.4) when the spanning tree is given by (7.40). Now, we consider more general spanning tree G 0 and regard 1 as its root. We start from the generalization of (7.41), namely, i,j (t). (7.50) Here we make the convention that dist(1, i) = dist(1, j) − 1 for all {i, j} ∈ E 0 , where dist(a, b) represents the distance between a and b. Now, if there is k ′ such that J k ′ ∈ I c , we can prove the following analogue of (7.43), namely, by performing the argument in (7.44)-(7.47) on the path connecting k ′ and the root 1. Consequently, we can get the analogue of (7.48) via replacingM j (t)'s byM i,j (t)'s. Finally, integrating t j 's out successively, from the leaves to the root 1, yields the same conclusion, i.e. (7.4), for general G 0 . Therefore, we completed the proof of Lemma 7.2.

Gaussian measure in the vicinities
From now on, we can restrict ourselves to the Type I, II and III vicinities. As a preparation of the proofs of Lemmas 5.8 and 5.9, we will show in this section that the exponential function is approximately a Gaussian measure (unnormalized).
8.1. Parametrization and initial approximation in the vicinities. We change the x, b, t, vvariables to a new set of variables, namely,x,b,t andv. The precise definition ofx differs in the different vicinities. To distinguish the parameterization, we set κ = ±, +, or −, corresponding to Type I, II or III vicinity, respectively. Recalling D κ from (1.24). For each j and each κ, we then set If κ = ±, we also need to parameterize v j by v j =v j / √ M . Accordingly, recalling the quantity Θ from (5.33), we introduce the domains We remind here, as mentioned above, in the sequel, the small constant ε 0 inΥ andΥ S may be different from line to line, subject to (5.37). Now, by the definition of the Type I', II and III vicinities in Definition 5.5 and the parametrization in (8.2) and (8.3), we can redefine the vicinities as follows.
Definition 8.1. We can redefine three types of vicinities as follows.
We recall from (7.6) the factt Now, we use the representation (5.2). Then, for the Type I vicinity, we change x, b, t, v-variables to x,b,t,v-variables according to (8.2) with κ = ±, thus For the Type II or III vicinities, i.e. κ = + or −, we change x, b, t-variables tox,b,t-variables. Consequently, we have We will also need the following facts which always hold in these types of vicinities. The first estimate in (8.7) is trivial, and the second follows from Lemma 6.1. Now, we approximate (8.1) in the vicinities. For any ϑ ∈ L, we introduce the matrices Then, with the parameterization above, expandingX j in (3.22) and T j in (3.24) up to the second order, we can writeX For κ = ±, we also expand V j in (3.24) up to the second order, namely, We just take (8.8) and (8.9) as the definition of R x j , R t j and R v j . Note that R x j is actually κ-dependent. However, this dependence is irrelevant for our analysis thus is suppressed from the notation. It is elementary that Here || · || max represents the max-norm of a matrix. Recall the facts (5.10) and (5.19) In light of (5.17)-(5.19), we can also represent MK(X, V ) in the following two alternative ways We will use three representations of MK(X, V ) in (8.11), (8.12) and (8.13) for Type I', II and III vicinities respectively. In addition, we introduce the matrices Then, we have the following lemma.
Lemma 8.2. With the parametrization in (8.8), we have the following approximations.
• In the Type II vicinity, we have • In the Type III vicinity, we have Here R b R x ± , R x + and R x − are remainder terms of the Taylor expansion of the function ℓ(a) defined in (5.6). Remark 8.3. Here we stated (8.15) and (8.16) in the domains much larger than the Type I' vicinity for further discussion. In addition, the restriction ||b a || ∞ and ||x a || ∞ for a = 1, 2 is imposed to avoid the ambiguity of the definition of the logarithmic term in the function ℓ(a).
Proof. It follows from the Taylor expansion of the function ℓ(a) easily.
Then, according to (8.11)-(8.13), what remains is to approximate M ℓ S (B, T ) and M ℓ S (X, V ) in the vicinities. Recalling the definition in (5.6) and the parameterization in (8.2), we can rewrite We take the above equation as the definition of R t,b . Now, we set τ j,1 :=t j cos σ j , τ j,2 :=t j sin σ j , ∀ j = 2, . . . , W and change the variables and the measure as In the Type I' vicinity, we can do the same thing for M ℓ S (X, V ), namely, where R v,x ± is the remainder term. Then we set υ j,1 :=v j cos θ j , υ j,2 :=v j sin θ j , ∀ j = 2, . . . , W and change the variables and measure as Now, we introduce the vectors τ a = (τ 2,a , . . . , τ W,a ), υ a = (υ 2,a , . . . , υ W,a ), a = 1, 2.
With this notation, we can rewrite (8.19) and (8.21) as According to (8.20) and (8.22), we can express (8.5) as an integral overb,x,τ andυ-variables. However, we need to specify the domains ofτ andυ-variables in advance. Our aim is to restrict the integral in the domains Takingt for instance, we see that which actually implies τ a ∈Υ S for a = 1, 2 =⇒t ∈Υ S . (8.25) However the reverse of (8.25) may not be true. That means, (8.24) is stronger than (t,v, σ, θ) ∈Υ S × Υ S × L W −1 × L W −1 . To show the truncation to (8.24) from (t,v, σ, θ) ∈Υ S ×Υ S × L W −1 × L W −1 By the discussion above, for the Type I vicinity, we can write (8.5) as where the error term stems from the truncation of the vicinity (t,v, σ, θ) Now, for the Type II and III vicinities, the discussion on ℓ S (B, T ) is of course the same. For ℓ S (X, V ), we make the following approximation. For the Type II vicinity, using the notation in (5.21),we can write 1 2x and recall that S v is defined in (5.22). Analogously, for the Type III vicinity, we can write Consequently, by (8.12) and (8.13) we can write (8.6) for κ = +, − as properly, we need to control various remainder terms in (8.32) and (8.38) to reduce these integrals to Gaussian ones. The final result is collected in Proposition 8.6 at the end of this section. As a preparation, we shall further deform the contours ofb-variables andx-variables to the steepest descent paths. We mainly provide the discussion for theb-variables, that for thex-variables is analogous. For simplicity, in this section, we assume 0 ≤ E ≤ √ 2 − κ, the case − √ 2 + κ ≤ E ≤ 0 can be discussed similarly. We introduce the eigendecomposition of S as Note that U is an orthogonal matrix thus the entries are all real. Now, we perform the change of coordinate c a = (c 1,a , . . . , c W,a ) ′ := U ′b a , a = 1, 2.
Obviously, for the differentials, we have With the notation introduced above, we have To simplify the following discussion, we enlarge the domain of the c-variables to Obviously,Υ ⊂ Υ ∞ . It is easy to check that (7.26) also holds when c a ∈ Υ ∞ \Υ for either a = 1 or 2, according to (8.39), thus such a modification of the domain will only produce an error term of order O(exp{−Θ}) in the integral (8.32), by using (8.7). Now we do the scaling Consequently, we haveb Accordingly, we should adjust the change of differentials as In addition, the domain of c 1 should be changed from Υ ∞ to and that of c 2 should be changed from Υ ∞ to By the fact det D + D − = 1/ det A + A − , we can write (8.32) as For Here we used the elementary fact ||U a|| ∞ ≤ √ W ||a|| ∞ for any a ∈ C W and and unitary matrix U . Then, we deform the contour of c j,1 from J + j to for each j = 1, . . . , W , where It is not difficult to see that by using (8.40). Consequently, by (8.42), we have Then using (8.7), we can get rid of the integral over Σ + j and −Σ + j , analogously to the discussion in Section 7. Similarly, we can perform the same argument for c 2 . Consequently, we can restrict the integral in (8.43) to the domain So we can assume that W j=1 J + j and W j=1 J − j are replaced with W j=1 L + j and W j=1 L − j respectively in (8.43).
By (8.15), (8.44) and the fact ||a|| 3 3 ≤ ||a|| ∞ ||a|| 2 2 for any vector a, we see that for some positive constant C, where in the last step we also used the fact that ||b a || 2 = O(||c a || 2 ) for a = 1, 2, which is implied by (8.41) and (8.40). Consequently, we have This allows us to take a step further to truncate c 1 and c 2 according to their 2-norm, namely c 1 , c 2 ∈Υ. (8.46) Similarly to the discussion in the proof of Lemma 7.7, such a truncation will only produce an error of order exp{−Θ} to the integral, by using (8.7). Now, analogously to (8.41), we can changex-variables to d-variables, defined by Thus accordingly, we change the differentials In addition, like (8.46), we deform the domain to Finally, from (8.43), we arrive at the representation , (8.47) in which x andx-variables should be regarded as functions of the d-variables, as well, b andb-variables should be regarded as functions of the c-variables. Now, in the Type II and III vicinities, we only do the change of coordinates for theb-variables, which is enough for our purpose. Consequently, we have We keep the terminology "Type I', II and III vicinities" for the slightly modified domains defined in terms of c, d, τ and υ-variables. More specifically, we redefine the vicinities as follows.
Definition 8.5. We slightly modify Definition 8.1 as follows.
• Type III vicinity: Then, by elementary Gaussian integral we obtain (5.40). Hence, we completed the proof of Lemma 5.8.
The remaining part of this section will be dedicated to the proof of Lemma 9.1. Recall the definitions of the functions A(·), Q(·), P(·) and F(·) in (3.31), (4.3), (4.4) and (4.5). Using the strategy in Section 6 again, we ignore the irrelevant factor Q(·) at the beginning. Hence, we bound P(·) and F(·) at first, and modify the bounding procedure slightly to take Q(·) into account in the end, resulting a proof of Lemma 9.1. 9.1. P(X,B, V, T ) in the Type I' vicinity. As mentioned above, we should always regard b orbvariables as functions of c-variables, regard x orx-variables as functions of d-variables. Our aim, in this section, is to prove the following lemma.
Lemma 9.3. With the notation above, we have Moreover, we have We postpone the proof of Lemma 9.3 and prove Lemma 9.2 at first.
Lemma 9.4. For any index sets I, J ⊂ {1, . . . , W } with |I| = |J| = m ≥ 1, we have the following bounds for the determinants of the submatrices of S, A + and A − defined in (8.14).
In addition, if || ℓ|| 1 = 2, by using (9.30) below, one has For more general ℓ, by Lemma 9.4, we have Then, by the fact | det A + A − | = | det A + | 2 , we can conclude the proof of Lemma 9.3.
To prove Lemma 9.4, we will need the following lemma. Proof of Lemma 9.5. Without loss of generality, we assume j > i in the sequel. We introduce the matrices It is not difficult to check Then, by the fact det P ij E j = (−1) j−i , we can get the conclusion.
Proof of Lemma 9.4. At first, by the definition in (8.14), (1.4) and the fact Rea 2 + = Rea 2 − > 0, it is easy to see that the singular values of A + and A − are all larger than 1. With the aid of the rectangular matrix (A + ) (I|∅) as an intermediate matrix, we can use Cauchy interlacing property twice to see that the k-th largest singular value of (A + ) (I|J) is always smaller than the k-th largest singular value of A + . Consequently, we have the first inequality of (9.26). In the same manner, we can get the second inequality of (9.26) Now, we prove (9.27). At first, we address the case that I ∩ J = ∅. In light of (9.31), without loss of generality, we assume that 1 ∈ I ∩ J. Then S (I|J) is a submatrix of S (1) . Therefore, we can find two permutation matrices P and Q, such that where D = S (I|J) . Now, by Schur complement, we know that That means det S (I|J) / det S (1) is the determinant of a sub matrix of (S (1) ) −1 (with dimension |I| − 1), up to a sign. Then, by Assumption 1.1 (iii), we can easily get Now, for the case I ∩ J = ∅, we can fix one i ∈ I and j ∈ J. Due to (9.30), it suffices to consider det S (I|J) det S (i|j) . (9.33) By similar discussion, one can see that (9.33) is the determinant of a sub matrix of (S (i|j) ) −1 with dimension |I| − 1. Hence, it suffices to investigate the bound of the entries of (S (i|j) ) −1 . From (9.32) we have Observe that Then, it is elementary to see that the entries of (S (i|j) ) −1 are bounded by 2W γ , in light of (9.34) and Assumption 1.1 (iii). Consequently, we have which implies (9.27). Hence, we completed the proof of Lemma 9.4. 9.2. F(X,B, V, T ) in the Type I' vicinity. Neglecting the X [1] , y [1] and w [1] -variables in Q(·) at first, we investigate the integral F(X,B, V, T ) in the Type I' vicinity in this section. We have the following lemma.
Lemma 9.7. Suppose that the assumptions in Theorem 1.14 hold. In the Type I' vicinity, we have Recalling the functions G(B, T ) and F(X, V ) defined in (6.23) and (6.24), we further introduce 9.2.1. Estimate ofF(X, V ). We have the following lemma.
Obviously, by the fact that X [1] -variables are all bounded and | det X [1] k | = 1 for k = p, q, it is easy to see that Therefore, we completed the proof.
With the aid of Lemma 9.10, it suffices to work on G(B, T ) in the sequel. We have the following lemma.
Proof of Lemma 9.9 . This is a direct consequence of Lemmas 9.10 and 9.11.
Proof of Lemma 9.7. This is a direct consequence of (9.37), Lemma 9.8 and Lemma 9.9. 9.3. Summing up: Proof of Lemma 9.1. In this section, we slightly modify the discussions in Sections 9.1 and 9.2 to prove Lemma 9.1. The combination of Lemmas 9.2 and 9.7 would directly imply Lemma 9.1 if the Q(·) factor were not present in the definition of A(·). Now we should take Q(·) into account. This argument is similar to the corresponding discussion in Section 6.4. Lemma 10.1. Suppose that the assumptions in Theorem 1.14 hold. In the Type II vicinity, we have |A(X,B, V, T )| ≤ e −cN η | det A + | 2 det(S (1) ) 2 (10.2) for some positive constant c.
With the aid of (10.1) and Lemma 10.1, we can prove Lemma 5.9.
Proof of Lemma 5.9. Recall (8.48). At first, by the definition of A v + in (8.35), (5.23) and the fact Rea 2 + > 0, we can see that where we absorbed several factors by exp{−cN η}. We also enlarged the domains to the full ones. Then, using the trivial facts 2v j dv j = 1 and performing the Gaussian integral for the remaining variables, we can get Observe that | det A + | ≤ |1 + a 2 + | W ≤ 2 W . for some positive constant δ. Hence, we proved the first part of Lemma 5.9. The second part can be proved analogously.
In the sequel, we prove Lemma 10.1. We also ignore the factor Q(·) from the discussion at first. by our assumption on η. From (3.19) we can also see that all the other factors of f (P 1 , V,X, X [1] ) are O (1). Hence, by the definition (9.36), we have F(X, V ) = O(exp{−(a + − a − )N η}), which together with Lemma 9.9 yields the conclusion.
10.3. Summing up: Proof of Lemma 10.1. Analogously, we shall slightly modify the proofs of Lemma 10.2 and Lemma 10.3, in order to take Q(·) into account. The proof can then be performed in the same manner as Lemma 9.1. We omit the details here.
11. Proof of Theorem 1.14 The conclusion for Case 1 is a direct consequence of the discussions in Sections 3.5-10. The proofs of Case 2 and Case 3 can be performed analogously, with slight modifications, which will be stated below.
we shall only keep the factors with k = p and delete those with k = q. Moreover, we shall also replace A [1] q by 0 for A = X, Y , Ω, Ξ, ω, ξ, w, y,ũ,ṽ or σ in (3.16). In addition, dA [1] shall be redefined as the differential of A [1] p -variables only, for A = X, y, w, w and ξ. One can check step by step that such a modification does not require any essential change of our discussions for Case 1. Especially, note that our modification has nothing to do with the saddle point analysis on the Gaussian measure exp{−M (K(X, V ) + L(B, T ))}. Moreover, the term P(·) in (3.28) can be redefined by deleting the factor with k = q in the last term therein. Such a modification does not change our analysis of P(·). In addition, the irrelevant term Q(·) can also be defined accordingly. Specifically, we shall delete the factor with k = q in the last term of (3.29) and replace A [1] q by 0 for A = Ω, Ξ, ω, ξ, w, y. It is routine to check that Lemma 6.3 still holds under such a modification. Analogously, we can redefine the functions F (·), f (·) and g(·) in (3.18)- (3.20). Now, the main difference between Case 3 and Case 1 or 2 is that the factor (y [1] p |(w [1] p (w [1] p ) * ) 12 |) 2n does not produce oscillation in the integral of g(·) any more. Heuristically, the counterpart of (4.6) in Case 3 reads e (a+−a−)N η dy [1] dw [1] dν(Q 1 ) · g(B, T, Q 1 , y [1] , w [1] ) [1] p · e −cN ηt 2 +c1e −iσ [1] p t ∼ 1 N η .
Therefore, we completed the proof of Theorem 1.14.

Further comments
In this section, we make some comments on possible further improvements on our results.
• (Comment on how to remove the prefactor N C0 in (1.19))