A Note on Subgeometric Rate Convergence for Ergodic Markov Chains in the Wasserstein Metric

We investigate subgeometric rate ergodicity for Markov chains in the Wasserstein metric and show that the finiteness of the expectation E(i,j)[ ∑τ△−1 k=0 r(k)], where τ△ is the hitting time on the coupling set △ and r is a subgeometric rate function, is equivalent to a sequence of Foster-Lyapunov drift conditions which imply subgeometric convergence in the Wassertein distance. We give an example for a ’family of nested drift conditions’. Introduction and Notations We start with a brief review of ergodicity. Let Z+ = {0, 1, 2, ...}, N+ = {1, 2, ...}, and R+ = [0,∞). Let (Φn)n∈Z+ denote a Markov chain with transition kernel P on a countably generated state space denoted by (X ,B(X )). P (i, j) = Pi(Φn=j) = Ei[1Φn=j ], where Pi and Ei respectively denote the probability and expectation of the chain under the condition that its initial state Φ0 = i, and 1A is the indicator function of set A. According to Markov’s theorem, a Markov chain (Φn)n∈Z+ is ergodic if there’s positive probability to pass from any state, say i ∈ X to any other state, say · ∈ X in one step. That is, for states i, · ∈ X then chain (Φn)n∈Z+ is ergodic if P (i, ·) > 0. Also the chain (Φn)n∈Z+ is said to be (ordinary) ergodic if ∀ i, · ∈ X then P (i, ·) → π(·) as n → ∞, where the σ-finite measure π is the invariant limit distribution of the chain. Chain (Φn)n∈Z+ is referred to as geometrically ergodic if there exists some measurable function V : X → (0,∞), and constants β < 1 andM < ∞ such that ||P (i, ·)− π(·)|| ≤ MV (i)β, ∀ n ∈ N+, where here and hereafter for the (signed) measure μwe define μ(f) = ∫ μ(dj)f(j), and the norm ||μ|| is defined by sup|g|≤f |μ(g)|, whereas the total variation norm is defined similarly but with f ≡ 1. Markov chain (Φn)n∈Z+ is strongly ergodic if lim n→∞ sup i∈X ||P (i, ·)− π(·)|| = 0. Loosely speaking subgeometric ergodicity, which we define next, is a kind of convergence that’s faster than ordinary ergodicity but slower than geometric ergodicity. Let function r ∈ Λ0 where Λ0 is the family of measurable increasing functions r : R+ → [1,∞) satisfying log r(t) t ↓ 0 as t ↑ ∞. Let Λ denote the class of positive functions r : R+ → (0,∞) such that for some r ∈ Λ0 we have; 0 < lim n inf r(n) r(n) ≤ lim n sup r(n) r(n) < ∞. (1) Indeed (1) implies the equivalence of the class of functionsΛ0 with the class of functions Λ. Examples of functions in the class r ∈ Λ is the rate r(n) = exp(sn), α > 0, s > 0. Without loss to Bulletin of Mathematical Sciences and Applications Submitted: 2016-08-30 ISSN: 2278-9634, Vol. 17, pp 40-45 Revised: 2016-10-10 doi:10.18052/www.scipress.com/BMSA.17.40 Accepted: 2016-10-17 2016 SciPress Ltd, Switzerland Online: 2016-11-01 SciPress applies the CC-BY 4.0 license to works we publish: https://creativecommons.org/licenses/by/4.0/ generality we suppose that r(0) = 1 whenever r ∈ Λ. The properties of r ∈ Λ0 which follow from (1) and are to be used frequently in this study are; r(x+ y) ≤ r(x)r(y) ∀ x, y ∈ R+ (2) r(x+ a) r(x) → 1 as x → ∞, for each a ∈ R+. (3) Λ is referred to as the class of subgeometric rate functions(cf. [3]). Let r ∈ Λ, then the ergodic chain Φn is said to be subgeometrically ergodic of order r in the f norm, (or simply (f, r)-ergodic) if for the unique invariant distribution π of the process and ∀ i ∈ X , then lim n→+∞ r(n)||P (i, ·)− π(·)||f = 0, (4) where ||σ||f = sup|g|≤f |σ(g)| and f : X → [1,∞) is a measurable function. Also for subgeometric ergodic to hold it’s necessary that there exist a deterministic sequence {Vn} of functions Vn : X → [1,∞) which satisfy the Foster-Lyapunov drift condition: PVn+1 ≤ Vn − r(n)f + br(n)1C , n ∈ Z+. (5) for a petite set C ∈ B(X ) and a constant b ∈ R+ such that supC V0 < ∞. The Foster-Lyapunov drift conditions provide bounds on the return time to accessible sets thereby availing some control on the Markov process dynamics by focusing on the hitting times on a particular set. Convergence in the Wasserstein distance is a very interesting research area through which [1] amongst other authors suggested a new technique for establishing subgeometric ergodicity. Following [1] we define the Wasserstein distance as follows. Let (X , d) be a Polish space where d is a distance bounded by 1 and let P(X ) denote the set of all probability measures on state space (X ,B(X )). Let μ, ν ∈ P(X ); λ is a coupling of μ and ν if λ is a probability on the product space (X ×X ,B(X ×X )), such that λ(A×X ) = μ(A) and λ(X ×A) = ν(A) ∀ A ∈ B(X ). We further let C(μ, ν) be set of all probability measures on (X ×X ,B(X ×X )) with marginals μ and ν, and Q be the coupling Markov kernel on (X × X ,B(X × X )) such that for every i, j ∈ X , then Q((i, j), ·) is a coupling of P (i, ·) and P (j, ·). The Wasserstein metric associated with the semimetric d on X , between two probability measures μ and ν, is then given as Wd(μ, ν) := inf γ∈C(μ,ν) ∫ X×X d(i, j)dγ(i, j). When d is the trivial metric d0(i, j) = 1i ̸=j , then the associatedWasserstein metric is the total variation metricWd0(μ, ν) = dTV (μ, ν) := 2 supC∈B(X ) |μ(C)− ν(C)|, μ, ν ∈ P(X ). A set C is said to be small if there exists a constant ε > 0 such that for all i, j ∈ C then 1 2 dTV (P (i, ·), P (j, ·)) ≤ 1 − ε. Set C ∈ B(X ) is petite if there exist some non-trivial measure νa on B(X ) and some probability distribution a = {an : n ∈ Z+} such that ∞ ∑ n=1 anP (x, ·) ≥ νa(·), ∀ x ∈ C. (6) Petite sets generalize small sets. The first hitting time on small set C delayed by a constant δ > 0 is given by τ δ C = inf{n ≥ δ : Φn ∈ C}. We also have τ C = inf{n ≥ J1 : Φn ∈ C} as the first hitting time on the set C after the first jump J1 of the process. We note that ξ C = ξC if Φ0 / ∈ C. In the case when δ = 0 we have τ 0 C = τC . If C is a singleton consisting only of state i then we write τ δ i for τ δ C and equivalently τ i for τ C . It’s worth noting that finite mean return times Ei[τ + i ] < ∞ guarantee ergodicity or the existence of stationary probability and the convergence P (i, j)− π → 0 Bulletin of Mathematical Sciences and Applications Vol. 17 41


Introduction and Notations
We start with a brief review of ergodicity. Let Z + = {0, 1, 2, ...}, N + = {1, 2, ...}, and R + = [0, ∞). Let (Φ n ) n∈Z + denote a Markov chain with transition kernel P on a countably generated state space denoted by (X , B(X )). P n (i, j) = P i (Φ n=j ) = E i [1 Φ n=j ], where P i and E i respectively denote the probability and expectation of the chain under the condition that its initial state Φ 0 = i, and 1 A is the indicator function of set A. According to Markov's theorem, a Markov chain (Φ n ) n∈Z + is ergodic if there's positive probability to pass from any state, say i ∈ X to any other state, say · ∈ X in one step. That is, for states i, · ∈ X then chain (Φ n ) n∈Z + is ergodic if P 1 (i, ·) > 0.
Also the chain (Φ n ) n∈Z + is said to be (ordinary) ergodic if ∀ i, · ∈ X then P n (i, ·) → π(·) as n → ∞, where the σ-finite measure π is the invariant limit distribution of the chain. Chain (Φ n ) n∈Z + is referred to as geometrically ergodic if there exists some measurable function V : X → (0, ∞), and constants β < 1 and M < ∞ such that where here and hereafter for the (signed) measure µ we define µ(f ) = ∫ µ(dj)f (j), and the norm ||µ|| is defined by sup |g|≤f |µ(g)|, whereas the total variation norm is defined similarly but with f ≡ 1.
Markov chain Loosely speaking subgeometric ergodicity, which we define next, is a kind of convergence that's faster than ordinary ergodicity but slower than geometric ergodicity.
Let function r ∈ Λ 0 where Λ 0 is the family of measurable increasing functions r : Let Λ denote the class of positive functions r : R + → (0, ∞) such that for some r ∈ Λ 0 we have; Indeed (1) implies the equivalence of the class of functions Λ 0 with the class of functions Λ. Examples of functions in the class r ∈ Λ is the rate r(n) = exp(sn 1/(1+α) ), α > 0, s > 0. Without loss to generality we suppose that r(0) = 1 whenever r ∈ Λ. The properties of r ∈ Λ 0 which follow from (1) and are to be used frequently in this study are; Λ is referred to as the class of subgeometric rate functions(cf. [3]). Let r ∈ Λ, then the ergodic chain Φ n is said to be subgeometrically ergodic of order r in the fnorm, (or simply (f, r)-ergodic) if for the unique invariant distribution π of the process and ∀ i ∈ X , then lim where ||σ|| f = sup |g|≤f |σ(g)| and f : X → [1, ∞) is a measurable function. Also for subgeometric ergodic to hold it's necessary that there exist a deterministic sequence {V n } of functions V n : X → [1, ∞) which satisfy the Foster-Lyapunov drift condition: for a petite set C ∈ B(X ) and a constant b ∈ R + such that sup C V 0 < ∞. The Foster-Lyapunov drift conditions provide bounds on the return time to accessible sets thereby availing some control on the Markov process dynamics by focusing on the hitting times on a particular set. Convergence in the Wasserstein distance is a very interesting research area through which [1] amongst other authors suggested a new technique for establishing subgeometric ergodicity. Following [1] we define the Wasserstein distance as follows. Let (X , d) be a Polish space where d is a distance bounded by 1 and let P(X ) denote the set of all probability measures on state space (X , B(X )). Let µ, ν ∈ P(X ); λ is a coupling of µ and ν if λ is a probability on the product space (X × X , B(X × X )), such that λ(A × X ) = µ(A) and λ(X × A) = ν(A) ∀ A ∈ B(X ). We further let C(µ, ν) be set of all probability measures on (X × X , B(X × X )) with marginals µ and ν, and Q be the coupling Markov kernel on (X × X , B(X × X )) such that for every i, j ∈ X , then Q((i, j), ·) is a coupling of P (i, ·) and P (j, ·). The Wasserstein metric associated with the semimetric d on X , between two probability measures µ and ν, is then given as When d is the trivial metric d 0 (i, j) = 1 i̸ =j , then the associated Wasserstein metric is the total variation metric A set C is said to be small if there exists a constant ϵ > 0 such that for all i, j ∈ C then Petite sets generalize small sets. The first hitting time on small set C delayed by a constant δ > 0 is given by τ δ C = inf{n ≥ δ : Φ n ∈ C}. We also have τ + C = inf{n ≥ J 1 : Φ n ∈ C} as the first hitting time on the set C after the first jump J 1 of the process. We note that ξ In the case when δ = 0 we have τ 0 C = τ C . If C is a singleton consisting only of state i then we write τ δ i for τ δ C and equivalently τ + i for τ + C . It's worth noting that finite mean return times E i [τ + i ] < ∞ guarantee ergodicity or the existence of stationary probability and the convergence P n (i, j) − π → 0

Bulletin of Mathematical Sciences and Applications Vol. 17 41
as n → ∞. It's also known that subgeometric ergodic is equivalent to (f, r)-regularity. We define (f, r)-regularity as follows. Set C ∈ X is said to be (f, r)-regular if for all i ∈ C, a measurable function f : X → [1, ∞), rate function r and ∀ B ∈ B + (X ) then, where the set B + (X ) is set of all accessible(or Ψ-irreducible) sets. By finding a suitable contracting metric d which may be different from the discrete metric, and a suitable Foster-Lyapunov function V with a 'd-small' sublevel set, [1] suggested a new technique for establishing subgeometric ergodicity. Then [2] extended the results of [1] by establishing sufficient conditions for the existence of the invariant distribution and subgeometric rates of convergence for chains that are not necessarily Ψ-irreducible. For the Polish space (X , d * ), the d-small set of [1] was extended by [2] to the (ℓ, ϵ, d)coupling set(or simply coupling set) △∈ X × X , where ℓ ∈ Z + , ϵ ∈ (0, 1), and d is a distance on state space X , topologically equivalent to d * and bounded by 1.
Let r ∈ Λ, then we denote the sequence R as We show, in this paper through Proposition 1, that the sequence of drift inequalities proposed by [2] hold if and only if R(τ △ ) < ∞. As an example, we explore a 'family of nested drift conditions' as proposed by [4] in both the discrete and continuous cases whose results we transfer to the convergence in the Wasserstein metric through Proposition 3.

Lyapunov Drift Inequalities
In light of the definitions and notations given above, we state Assumption A1 as follows: A1. There exist a coupling set △∈ B(X × X ) such that for a sequence r ∈ Λ and ∀ i, j ∈ X , According to Theorem 2.1(ii) of [5], as mentioned already, the Foster-Lyapunov drift conditions in (5) can also be used to define subgeometric rate ergodicity. Following this result [2] proposed a sequence of drift functions according to Assumption A2 as follows. A2. There exist 1. a sequence of measurable functions {V n }n ∈ Z + , V n : X × X → R + , 2. a set △∈ B(X × X ), a constant b ∈ R + and a sequence r ∈ Λ such that ∀ i, j ∈ X and for every coupling α ∈ C(P (i, ·), P (j, ·)); ∫ Further, there exist measurable functions (V n )n ∈ Z + such that ∀ i, j ∈ X and any n ∈ Z + : and Proof.

A2 ⇒ A1
Analogous to Proposition 11.3.3 in [7] we get from A2 that for some constant c < ∞ Then by Eq. 8, Eq. 9 and sup

Family of nested drift conditions
The phenomenon of ergodicity as given in Proposition 1 is not altogether new as is clear from the following Proposition which deals with a family of nested drift conditions for subgeometrically ergodic general state space Markov processes analogous to one proposed by [4].

Proposition 2.
Suppose that there are functions V k , W k : X ×X → [1, ∞), where k ∈ Z + , a coupling set △∈ B(X × X ) such that for any initial state (i, j) ∈ X × X of the chain, we have then the chain Φ n is subgeometrically ergodic.
Proposition 2 follows from Proposition 3 which is the extension of Proposition 3.1 in [4] on family of nested drift conditions. As is evident in the Propositions that follow we note that the results of Proposition 2 stay the same if we replace R(τ △ ) with R m (τ △ ), where R m (n) = ∑ n−1 k=0 r(m + k) for m ≥ 0, n ≥ 1 with R(0) = 1. The results also stay the same if we replace the measurable functions V k and W k with V k and W k respectively as is the case for convergence in the f -norm. Proposition 3. Let the chain (Φ n ) n∈Z + be irreducible and aperiodic. Further suppose that there are functions f , V k , W k : X → [1, ∞), where k ∈ Z + , with sup C V k < ∞, sup C W k < ∞ and a small set C such that for a non-decreasing sequence of stopping times {T n : n ∈ Z + } and any Φ Tn ∈ X , we have then the chain Φ n is (f, r)-ergodic.
Proof. We let T n be some random stopping time with F n as the σ-algebra of events generated by T n . Then by Dynkin's inequality we get by assuming that sup C V k < ∞, sup C W k < ∞. Hence the chain is (f, r)-ergodic.

Proposition 4.
Let the chain (Φ t ) t∈R + be irreducible. Further suppose that there are functions f , V k , W k : X → [1, ∞) where k ∈ Z + , some constant ε > 0, a small set C such that for a non-decreasing sequence of stopping times {T n : n ∈ Z + } and any Φ Tn ∈ X , we have then the chain Φ t is (f, r)-ergodic.
Proof. Note that this Proposition is the continuous counterpart to Proposition 3. For the term where ε > 0, by the submultiplicative property (2) we have because r ∈ Λ is finite for all k ∈ Z + and E Φ Tn [ ∫ ε 0 f (Φ s )r(s)ds] < ∞ by proof of Theorem 6 in [6], hence we conclude that the chain Φ t is (f, r)-ergodic.