**Advertisement**

ABSTRACT

Hybrid data assimilation methods combine elements of ensemble

(

1. Introduction

Hybrid data assimilation systems combine two ap- proaches traditionally used in operational weather fore- casting: ensemble

We examine the impacts that a simple 3D-Var has on an EnKF in order to determine the source of these benefits. In an operational environment, the choices of ensemble size and observation coverage are limited by costs of compu- tational facilities and observing equipment. Thus, it is important to identify the preferred algorithmic approach when these parameters are prescribed. We introduce a new hybrid using an EnKF combined with a simple 3D-Var and demonstrate its effectiveness from this perspective. Traditional hybrids start with a variational approach and incorporate the ensemble information through the ensemble-derived covariance matrix. Here we instead start with an EnKF and use a variational approach to apply a correction within the model space to stabilize the EnKF.

2. Methodology

a. Model

For the forecast model, we use the Lorenz-96 model on m 5 40 grid points (Lorenz 1996),

... (1)

with F 5 20 (as used by Wilks 2005, 2006; Messner 2009) and Dt 5 0.01, for j 5 1, ..., m. Because this model's internal doubling time varies strongly with forcing (Orrell 2003), we note that our results were qualitatively similar using F 5 8 with a forecast time step Dt 5 0.05 (as originally used by Lorenz). Lorenz (2005) discusses further implications of varying F. In this model, the first term represents ''advection'' constructed to conserve kinetic energy, the second is damping, and the third is forcing. The boundaries are cyclic, such that xm11 5 x1, and x0 5 xm. The ''truth'' or ''nature'' run is performed with Runge-Kutta-Fehlberg orders 4 and 5, while fore- cast runs are performed with Runge-Kutta-Fehlberg orders 2 and 3 (Fehlberg 1970).

b. Data assimilation methods

We solve the data assimilation problem by minimizing the traditional functional (Kalnay 2003)

... (2)

We minimize J over potential model states x, where xb is the background estimate, yo is the observation vector, and H is an operator transforming x from the model space of dimension m to the observation space of di- mension l. The matrices B and R are the background and observation error covariance matrices, respectively.

For 3D-Var we use the preconditioned conjugate gradient (PCG) minimization algorithm. In PCG, a pre- conditioner matrix M is used to solve M21Ax 5 M21b. The matrix B is used as the preconditioner where

... (3)

... (4)

The B matrix is constructed using an exponential decay function, similar to Derber and Rosati (1989), with maximum radius rB,

... (5)

We implement the local ensemble transform

We implement two hybrid algorithms using LETKF as a basis. First, a hybrid inspired by traditional methods computes a linear combination of B and the ensemble background error covariance matrix Pb for use in a 3D-Var step. A similar approach was shown by Wang et al. (2007b) to be equivalent to the control-variable method of Lorenc (2003) and Buehner et al. (2010a,b). We refer to this method as the Hybrid/Covariance-LETKF.

Our new approach combines the gain matrices of the EnKF and 3D-Var methods, rather than their background covariances. The

... (6)

where

... (7)

... (8)

We form our hybrid analysis by choosing b1 5 1, b2 5 a, and b3 52a, so that

... (9)

... (10)

The modified gain matrix (9) contains an automatic re- duction in the contribution of KB given (I 2 HK)isa contraction. The parameter a allows a further manual scaling of KB, which is equivalent under a monotonic mapping to applying a scalar inflation rB (deflation for 0, a,1)directlytothestaticBmatrixusedtoformKB.

The values for these b coefficients are chosen specif- ically so that we can construct the following algorithm, which gives an algebraically equivalent result (see appendix A for proof). We refer to this algorithm as the Hybrid/Mean-LETKF, and it is first described in words: The standard LETKF is used first. The analysis mean from LETKF is then used as the ''background'' for 3D- Var, which is performed locally in model space after each grid point is analyzed by LETKF. An empirically chosen weighted average of the two analysis solutions is computed, and the LETKF analysis ensemble is recen- tered at the new solution. Kalnay and Toth (1994) per- formed a similar procedure using a single bred vector and 3D-Var. Another choice of coefficients-b1 5 (1 2 a), b2 5 a,andb3 5 0-can be implemented similarly by using the ensemble forecast mean as the background for 3D-Var (see appendix B). For reference in the results section, we will call this the Hybrid/Mean-LETKF(b).

The Hybrid/Mean-LETKF algorithm is detailed as follows: We calculate the LETKF analysis following Hunt et al. (2007), first computing the analysis error covariance in ensemble space,

... (11)

where Yb 5 H(Xb), the columns of Xb are ensemble perturbations from the mean state, and r is the local inflation parameter. The symmetric square root of this matrix is computed to determine the weights for the analysis ensemble,

... (12)

To transform from ensemble space back to model space, we multiply these weights with each of the background ensemble member perturbations,

... (13)

The LETKF analysis perturbations computed in (13) are added to a different estimate of the ensemble mean in each of the following methods. For standard LETKF, the analysis mean is computed as

... (14)

... (15)

where xb is the mean background state. At this point, the standard LETKF algorithm is complete.

Next we relocalize in model space. Greybush et al. (2011) discuss the impacts of localization in observa- tion and model spaces, termed R-localization and B-localization, respectively. We define the local model dimension, mloc 5 2r 1 1, and select the appropriate rows and columns of the full B and Pb matrices to define Bloc and Plboc, respectively. The observations within this local radius are used to form Rloc.

For the Hybrid/Covariance-LETKF, a linear combi- nation is formed with the static B matrix and the ensemble-generated Pb in the local model space with dimension mloc, similar to that performed by Hamill and Snyder (2000),

... (16)

For the Hybrid/Mean-LETKF we minimize the cost function

... (17)

Here we use B^ 5 Bloc and R^ 5 Rloc, but other choices are possible. The preferred choice for B^ is the true analysis error covariance, but this is unknown and ap- proximated with Bloc. Because this approximation over- estimates the analysis error, we adjust the analysis mean back toward the LETKF analysis with a scaling param- eter a and recenter the analysis ensemble to this mean,

... (18)

... (19)

The vector v 5 (1 1 ...11)T is a column of k ones used to add the mean to each column of Xa, resulting in the final analysis ensemble having the hybrid-derived analysis as its mean. Finally, as with the standard LETKF, we up- date the single grid point at the center of the local region with the hybrid solution. For both hybrid methods, a is chosen empirically based on the ensemble size (k) and observation coverage (l).

We note that while the observations have been pro- cedurally reused, the results using the Hybrid/Mean- LETKF algorithm are algebraically equivalent to the previous hybrid gain form given in (9) and (10) in which the observational increments are used only once. The form that defines the Hybrid/Mean-LETKF algorithm has the advantage that existing operational systems can be used without significant modification to generate the hybrid analysis.

3. Experiment design

We first examine special case scenarios using limited observations and a small ensemble size: l 5 4 observa- tions per time step and ensemble size k 5 5. We show the nature run, and compare free-run forecast error and data assimilation analysis error for LETKF, 3D-Var, and the Hybrid/Mean-LETKF. We then generalize the results across the full range of ensemble sizes (2-40) and observation coverage (1-40) for each method, and ex- amine variations in the localization radius.

Observations are generated randomly in space from a uniform distribution on the interval [0, 40] with errors from a normal distribution using a prescribed standard deviation of sr 5 0.5. We assume these observation statistics are known. A linear interpolation scheme is used to construct the observation operator H. The B matrix used for all methods is constructed as a double exponential distribution function with maximum s2b 5 1.0 centered on the diagonals with a local radius of rB 5 5. The Lorenz-96 model is spun up over 14 400 time steps, as per Lorenz (1996), to ensure convergence to the attractor. An additional 600 time steps are run with Dt 5 0.01 to form a nature run xt(t). The experiment initial conditions are sampled from a Gaussian distribution, N[xt(0), 0.1]. Unless otherwise noted, we use a constant multiplicative background covariance inflation of r 5 1.1 (10%) and a local radius of r 5 5 for LETKF and the associated hybrid methods.

4. Results

We show in Fig. 1 that the standard LETKF algorithm performs well with a large ensemble size (e.g., 20, given that 40 would be full rank), but it fails due to cata- strophic filter divergence (Harlim and Majda 2010) when using smaller ensembles (e.g., k 5 5). This filter divergence is typical in our experiment setup for k # 5 and is dependent on the observation locations and Dt. We see that for a large ensemble size, the standard LETKF algorithm is quite accurate. However, as the ensemble size decreases, the analysis solution degrades until the filter eventually diverges from the nature run. When implementing the Hybrid/Mean-LETKF, using k 5 5 ensemble members, the filter recovers stability and has comparable accuracy to the standard LETKF with k 5 20.

The energy for this system, s2 as defined by Lorenz (2005, 2006), is simply the mean square of the system state across all grid points. For longer time periods, the total energy oscillates chaotically in a range from 40 to 90 and is tracked well by the standard LETKF analysis for ensemble sizes k . 5. Bishop and Satterfield (2013) showed that as the variance of the underlying true distri- bution increases, the effective random sample ensemble size decreases. In the standard LETKF (k 5 5), s2 ''blows up'' when the underlying variance increases, while both the Hybrid/Mean-LETKF and Hybrid/Covariance-LETKF using the same k 5 5 members track closely with the standard LETKF (k 5 20) (see Fig. 2).

We next examine the impact of varying observation coverage and ensemble size. We begin with the standard LETKF in Fig. 3, for which k 5 1 represents the pure 3D- Var results. This figure contains results from 40 3 40 5 1600 runs similar to the previously described special case scenarios. We define three regimes within the parameter space: 1) the ensemble/hybrid method outperforms 3D- Var, 2) the ensemble-hybrid method fails, and 3) 3D-Var outperforms the ensemble-hybrid method. Our goal is to maximize the parameter space of regime 1 while simul- taneously minimizing analysis error.

As shown in Fig. 4, the Hybrid/Covariance-LETKF is successful at stabilizing the filter for small ensemble sizes. For improved stability of PCG, we required an initial guess of x 5 xb. With smaller values of a (e.g., a 5 0.0-0.2) for all ensemble sizes, the mean absolute analysis errors are close to that of the standard LETKF (k . 5). However, as the observation coverage decreases, the analysis errors increase. For larger values of a (e.g., a 5 0.5), the mean absolute analysis errors increase throughout regime 1 and converge toward the 3D-Var solution (k 5 1) as a increases to 1.0.

The Hybrid/Mean-LETKF algorithm (Fig. 5) using a 5 0.5 retains much of the accuracy of the standard LETKF (with k 5 20) while still using a small (k 5 5) ensemble size and few (l 5 4) observations. These re- sults are found to hold even when driving the ensemble size down to k 5 3 members. If the number of obser- vations decreases further (e.g., to l 5 3, with k 5 3) however, then this hybrid undergoes the same filter di- vergence as the standard LETKF. For this hybrid method, as a decreases there is a gradual adjustment back to the standard LETKF solution: the mean abso- lute analysis errors decrease throughout regime 1 while the minimum observation count required for filter sta- bility for 2 , k , 5 steadily increases from l 5 3to7(a 5 0.5) to l 5 5to18(a 5 0.2), and finally to l 5 40 (a 5 0.0). We note that these results obtained with a localized 3D- Var are similar when the 3D-Var correction is instead applied globally after LETKF or after a (nonlocalized) ensemble transform

While the a parameter scales the analysis errors with an apparent monotonic relationship between the standard LETKF (a 5 0.0) and 3D-Var (a 5 1.0) for each hybrid method, the exact a values are not directly comparable across methods. All of the methods give a range of mean analysis errors varying from those found with the stan- dard LETKF to approximately those found with 3D- Var. Therefore an appropriate set of parameter values should allow one to tune to any desired mean analysis error accuracy in that range. For example, the Hybrid/ Covariance-LETKF with a 5 0.1 (Fig. 4) appears to have similar analysis errors to the Hybrid/Mean-LETKF (Fig. 5) with a 5 0.5 (with the exception of the cases with very low observation counts, e.g., l , 4, for which the Hybrid/Mean-LETKF has lower mean analysis errors).

Based on their experiments with an extended

The Lorenz-96 system is extensive, which implies that the number of positive Lyapunov exponents grows lin- early with the system size (Pazo et al. 2008; Karimi and Paul 2010). Holding the number of observations fixed at l5 20,wevarythe localizationradiusr andtheensemble size k from 1 to 20 (Fig. 7). The minimum ensemble size for stability of LETKF is k 5 15 for r 5 20 and decreases linearly to k 5 2 for r 5 1. Assigning the covariance localization in 3D-Var to match LETKF (i.e., r [ rB), the Hybrid/Mean-LETKF eliminates the catastrophic filter divergence for all localization distances but ex- hibits increased mean errors that exceed the observation errors for short localization distances. However, when fixing the B-localization at rB 5 5 and applying 3D-Var globally to ensure a minimum dimension for the solution space, the Hybrid/Mean-LETKF has consistent results regardless of the localization radius r used in LETKF.

Multiplicative covariance inflation gives limited ben- efit to the standard LETKF. As shown in Fig. 8, only a few cases with smaller ensemble sizes are afforded improvements in mean analysis error. When using suf- ficient observations (l 5 20), the Hybrid/Mean-LETKF renders inflation unnecessary. However, when using fewer observations (l 5 4), a small inflation (5%-10%) has either a positive or neutral effect on reducing the mean analysis error.

5. Conclusions

This research began with an investigation into the source of benefits arising from hybrid methods when variational techniques are added to an EnKF. We compared solutions from 3D-Var with LETKF, and showed the standard LETKF broke down when using small ensemble sizes. We then introduced two hybrid approaches. The first was the traditionally motivated Hybrid/Covariance-LETKF. The second used a new ap- proach combining gain matrices from LETKF and 3D- Var, and was implemented via the Hybrid/Mean-LETKF algorithm, for which a simple 3D-Var is applied after completion of LETKF to adjust the ensemble mean in model space and to add stability to the filter for small ensemble sizes. While we demonstrated this approach with LETKF and 3D-Var, it is generally applicable to other data assimilation methods as well. A larger value of a tuning parameter a enhanced the stability of LETKF at the cost of accuracy. Thus, in practice, a reasonable approachmaybetobeginwithalargera value and gradually decrease a as long as diagnostic metrics con- tinue to improve. The optimal value for a is dependent on problem specifications, such as the ensemble size, observation coverage, and localization radius.

The filtering solution derived with LETKF is highly accurate when applied to the Lorenz-96 system when allowed a sufficient numberofensemblemembers. However, when using few members, there is catastrophic filter divergence. Because LETKF computes the analysis in the ensemble subspace, its stability is highly dependent on the ensemble dimension. It was shown that for the primary configuration used in this study, regardless of observation coverage, five ensemble members were in- sufficient to prevent filter divergence. For a specified level of observation coverage (l 5 20), this minimum ensemble size was shown to be a linear function of lo- calization radius. While decreasing the localization ra- dius helped stabilize LETKF for small ensemble sizes with Lorenz-96, in practice the minimum localization radius is constrained by the fidelity of the observing network and computational model. A larger localization radius is typically needed for continuity of the analysis field. Of interest is the local dimensionality of the unstable Lyapunov vectors relative to the size of the ensemble k and the local model dimension mloc. Based on the work of Trevisan and Palatella (2011), we suspect the minimum ensemble size for the standard LETKF is directly related to the local dimensionality of the error growth.

Bishop and Satterfield (2013) found that if an ensemble- based estimate of covariance is undersampled, then a su- perior estimate can be obtained by combining that with a climatological estimate. Our results support that find- ing: the Hybrid/Mean-LETKF approach generated so- lutions that outperformed both 3D-Var and the standard LETKF for observation coverage with 2 , l , 10 and ensemble size 1 , k , 5. As has been reported for the traditional hybrid methods (Hamill and Snyder 2000; Wang et al. 2007a), we conclude that it is the computation of the analysis in the higher-dimensional model space that stabilizes the Hybrid/Mean-LETKF when using small ensemble sizes.

In operational settings, all practical ensemble sizes are small relative to the dimension of the state space. Both hybrid LETKF methods are well suited for applications using small ensemble sizes and limited observation cover- age, the typical situation for global ocean data assimilation (Penny 2011; Penny et al. 2013) and coupled atmosphere- ocean data assimilation. In practice, selection of the Hybrid/ Covariance-LETKF versus the Hybrid/Mean-LETKF would depend upon the ease of implementation based on the design of existing operational software.

Success with the Lorenz-96 model does not guarantee success with all models. However, because a similar approach by Kalnay and Toth (1994) was effective on both a Lorenz-63 model and a T62 National Meteoro- logical Center (NMC) model, this gives a positive out- look for use of the Hybrid/Mean-LETKF with more realistic models. As one example, there has been success in preliminary applications of the Hybrid/Mean-LETKF al- gorithm to NCEP's operational Global Ocean Data As- similation System (GODAS). As LETKF is already being used or prepared for use in operational environments in

Acknowledgments. I gratefully acknowledge the thought- ful comments from unofficial reviewer

REFERENCES

Barker, D. M., 1999: The use of synoptic-dependent error structure in 3DVAR. Var Scientific Development Paper 25, Met Office Tech. Rep., 2 pp. [Available from

Bishop, C. H., and

-,

Brett,C.E.A.,K.F.Lam,K.J.H.Law,D.S.McCormick,M.

Buehner, M., 2005: Ensemble-derived stationary and flow-dependent background error covariances: Evaluation in a quasi-operational NWP setting. Quart.

-,

-, -, -, -, and -, 2010b: Intercomparison of vari- ational data assimilation and the ensemble

Derber, J., and A. Rosati, 1989: A global oceanic data assimi- lation system. J. Phys. Oceanogr., 19, 1333-1347, doi:10.1175/ 1520-0485(1989)019,1333:AGODAS.2.0.CO;2.

Fehlberg, E., 1970: Classical Runge-Kutta fourth-order and lower order formulas with step size control and their application to heat conduction problems. Computing, 6, 61-71.

Greybush,S. J., E. Kalnay,T. Miyoshi,

Hamill, T. M., and

Harlim, J., and

Hunt, B. R.,

Kalnay, E., 2003: Data assimilation. Atmospheric Modeling, Data As- similation and Predictability,

-, and Z. Toth, 1994: Removing growing errors in the analysis. Preprints, 10th Conf. on Numerical Weather Prediction, Port- land, OR,

Karimi, A., and

Kleist, D. T., 2012: An evaluation of hybrid variational-ensemble data assimilation for the NCEP GFS. Ph.D. dissertation,

Lorenc,

Lorenz, E. N., 1996: Predictability-A problem partly solved. Proceedings of a Seminar Held at ECMWF on Predictability, ECMWF Seminar Proceedings, Vol. 1, ECMWF, 1-18.

-, 2005:Designing chaotic models. J.

-, 2006: Regimes in simple systems. J.

Messner, J., 2009: Probabilistic forecasting using analogs in the idealized Lorenz96 setting. M.S. thesis,

Orrell, D., 2003: Model error and predictability over different time- scales in the Lorenz '96 systems. J.

Ott, E., and Coauthors, 2004: A local ensemble

Pazo, D.,

Penny, S. G., 2011: Data assimilation of the global ocean using the 4D local ensemble transform

-,

Trevisan, A., and

Wang, X., 2010: Incorporating ensemble covariance in the grid- point statistical interpolation variational minimization: A mathematical framework. Mon. Wea. Rev., 138, 2990-2995, doi:10.1175/2010MWR3245.1.

-,T.M.Hamill,J.S.Whitaker,andC.H.Bishop,2007a: A comparison of hybrid ensemble transform

-,

-, D. M. Barker,

-, -, -, and -, 2008b: A hybrid ETKF-3DVAR data assimilation scheme for the WRF Model. Part II: Real ob- servation experiments. Mon. Wea. Rev., 136, 5132-5147, doi:10.1175/2008MWR2445.1.

-,

Wilks, D. S., 2005: Effects of stochastic parametrizations in the Lorenz '96 system. Quart.

-, 2006: Comparison of ensemble-MOS methods in the Lorenz '96 setting. Meteor. Appl., 13, 243-256, doi:10.1017/ S1350482706002192.

and NOAA/NWS/NCEP,

(Manuscript received

Corresponding author address:

E-mail: Steve.Penny@noaa.gov

DOI: 10.1175/MWR-D-13-00131.1

APPENDIX A

Equivalence of the Hybrid Gain Matrix (Using b1 = 1, b2 = a, and b3 = 2a) and the Hybrid/Mean-LETKF Algorithm

The Hybrid/Mean-LETKF analysis as defined in (18) is

... (A1)

where xa is defined by minimizing the variational equation,

... (A2)

By setting the derivative of J equal to zero, we can solve for xa:

... (A3)

... (A4)

... (A5)

... (A6)

... (A7)

... (A8)

Then, the hybrid analysis (A1) equals

... (A9)

... (A10)

... (A11)

... (A12)

... (A13)

... (A14)

... (A15)

So,

... (A16)

As in (8),

... (A17)

... (A18)

Then, since for LETKF

... (A19)

we can substitute KB and xa into (35) to get

... (A20)

Simplifying, we get

... (A21)

... (A22)

... (A23)

Then, if we define as in (9)

... (A24)

we have that the hybrid analysis is simply given as in (10), with an observation innovation weighted by the modified gain matrix K^ ,

... (A25)

APPENDIX B

Equivalence of the Hybrid Gain Matrix (Using b1 =(1 2 a), b2 = a, and b3 = 0) and the Hybrid/Mean-LETKF(b) Algorithm

If instead of the analysis ensemble mean we assign the forecast ensemble mean as the background for 3D-Var, then analogous to (A2) we have

... (B1)

Similarlyto(A3)-(A8),bysettingthederivativeofJequal to zero, we can solve for xa:

... (B2)

Then the hybrid analysis (A1) equals

... (B3)

Substituting in the definition of xa from (A19), we have

... (B4)

For notational convenience, let D 5(I 1 BHTR21H). Then (B4) can be simplified to

... (B5)

Noting that [aI1(12a)D]xb 5Dxb 2aBHTR21Hxb, we have

... (B6)

Usingdefinition(A18)wecansubstituteKBinto(A16)toget

... (B7)

Thus, we can define as in (6) with b1 5 (1 2 a), b2 5 a, and b3 5 0 that

... (B8)

Again, we have that the hybrid analysis is simply given as in (10), with an observation innovation weighted by the modified gain matrix K^ .