Wavelet Transform as an Alternative to Power Transformation in Time Series Analysis

This study examines the discrete wavelet transform as a transformation technique in the analysis of non-stationary time series while comparing it with power transformation. A test for constant variance and choice of appropriate transformation is made using Bartlett’s test for constant variance while the Daubechies 4 (D4) Maximal Overlap Discrete Wavelet Transform (DWT) is used for wavelet transform. The stationarity of the transformed (power and wavelet) series is examined with Augmented Dickey-Fuller Unit Root Test (ADF). The stationary series is modeled with Autoregressive Moving Average (ARMA) Model technique. The model precision in terms of goodness of fit is ascertained using information criteria (AIC, BIC and SBC) while the forecast performance is evaluated with RMSE, MAD, and MAPE. The study data are the Nigeria Exchange Rate (2004-2014) and the Nigeria External Reserve (1995-2010). The results of the analysis show that the power transformed series of the exchange rate data admits a random walk (ARIMA (0, 1, 0)) model while its wavelet equivalent is adequately fitted to ARIMA (1,1,0). Similarly, the power transformed version of the External Reserve is adequately fitted to ARIMA (3, 1, 0) while its wavelet transform equivalent is adequately fitted to ARIMA (0, 1, 3). In terms of model precision (goodness of fit), the model for the power transformed series is found to have better fit for exchange rate data while model for wavelet transformed series is found to have better fit for external reserve data. In forecast performance, the model for wavelet transformed series outperformed the model for power transformed series. Therefore, we recommend that wavelet transform be used when time series data is non-stationary in variance and our interest is majorly on forecast. 1.0 Introduction In several organizations, managerial decisions are largely based on the available information of the past and present observations and possibly on the process that generate such observations. A time series data provides such information. Time series is used to represent the characterized time course of behavior of wide range of several systems which could be biological, physical or economical. The utility of the time series data lies in the result of the time series analysis. Such analysis will be helpful in achieving the aim for collection of such data which could be for description (exposing the main properties of a series), explanation (revealing the relationship between variables of a series especially when observations are taken on two or more variables), forecasting (prediction of the future values of a series) and control (taking appropriate corrective actions) [1]. To analyze any time series data, time series analysis techniques are adopted. The commonly used techniques are: descriptive technique, probability models technique and spectral density analysis technique. The inference based on the descriptive method and probability models is often referred to as analysis in time domain while inference based on spectral density function is referred to analysis in frequency domain [2,3,4,5]. Bulletin of Mathematical Sciences and Applications Submitted: 2016-09-12 ISSN: 2278-9634, Vol. 17, pp 57-74 Revised: 2016-09-25 doi:10.18052/www.scipress.com/BMSA.17.57 Accepted: 2016-10-24 2016 SciPress Ltd., Switzerland Online: 2016-11-01 SciPress applies the CC-BY 4.0 license to works we publish: https://creativecommons.org/licenses/by/4.0/ All these models assume that the error component of the series (et) is normally distributed with zero mean and constant variance   2  or that the series is normally distributed with constant mean (μ) and constant variance   2  . When any study data violates any or all of these assumptions, the series is subjected to transformation. Transformation helps to (i) stabilize the variance of a series, (ii) make the seasonal effect when present additive and (iii) make the data normally distributed [1]. One of the transformations commonly used is the power transformation developed by [6]. [7] noted that the power transformation (i) changes the scale of the original series, (ii) may introduce bias in the forecast especially when data have to be transformed back to its original scale and (iii) often the transformed series have no physical interpretation. [1] argued that transformation alone may not be helpful when variance changes through time in the absence of trend. In such case (i.e., when variance changes through time in the absence of trend), he recommended that a model that allows for changes in variance should be considered. Wavelet method is one such method that allows for changes in variance which has been found to be useful in time series analysis. It involves decomposition, de-nosing and reconstruction of series. Decomposition involves breaking down time series into two main components namely the detail and the smooth components, de-noising deals with the removal of the non significant components of the series, while reconstruction involves recovering of the original series devoid of noise. Unlike the power transformation, wavelet method does not change the scale of the series, poses no problem in its interpretation and does not rely on the assumption of any underlying distribution of the study data. Additionally, wavelet transformation method allow for decomposition of a series without knowing the underlying functional form of the series [8]. Could this lead to an improved model and forecast performance from those based on power transformation? This and other related questions are what this study intends to address. Therefore, the objective of this study is to examine the precision (in terms of goodness-of-fit) and forecast performances of the models for wavelet transformed series while comparing it with models for power transformed time series. 2.0 Literature Review Various transformations exist, but the power transformation developed by [6] is often used. This transformation requires a correct choice of the transformation parameter often denoted as (λ). [4] suggested using a maximum likelihood value for the choice of the value of λ that results in the smallest residual sum of squares. [9] proposed a Bayesian method to choose the value of λ for a given model structure. The correct choice of the transformation parameter (λ), the simultaneous transformation and fitting of the model of a given series are the noticed limitations in the use of [6] power transformation. To remedy these limitations, [10] have shown how to apply Bartlett’s transformation to time series data. Accordingly, they regress the natural logarithms of the group standard deviation ) ,..., 2 , 1 , ˆ ( m i i   against the natural logarithms of the group means ( , 1,2, , ) i X i m  of time series data arranged chronologically in equal groups and determine the slope (β) of the relationship. [11] derived a confidence interval for the index of a power transformation that stabilizes the variance of a time series. They claimed that the confidence interval for the minimum coefficient of variation can also be used to construct confidence interval for any coefficient of variation. [12] used Box-Cox transformation approach to transform a streamflow time series data to turn the non-Guassian heavy tailed distribution to a nearly Gaussian series. [13] applied log-transformation in time series modeling of US macroeconomic data. He demonstrated that the claim previously made concerning improvement in forecast accuracy following bias correction for the transformed data were not generally well founded. [14] in his work applied differencing and Box-Cox transformation on a non-stationary series prior to application of ARMA model. He discovered that the technique was not optimal for forecasting of the study data. Therefore, he recommended a method for modeling time series data with unstable roots and changing parameters. The appropriateness in context of out-of-sample forecast of vector autoregressive models (VAR) when the series is log-transformed was examined by [15]. The 58 Volume 17


Introduction
In several organizations, managerial decisions are largely based on the available information of the past and present observations and possibly on the process that generate such observations. A time series data provides such information. Time series is used to represent the characterized time course of behavior of wide range of several systems which could be biological, physical or economical. The utility of the time series data lies in the result of the time series analysis. Such analysis will be helpful in achieving the aim for collection of such data which could be for description (exposing the main properties of a series), explanation (revealing the relationship between variables of a series especially when observations are taken on two or more variables), forecasting (prediction of the future values of a series) and control (taking appropriate corrective actions) [1].
To analyze any time series data, time series analysis techniques are adopted. The commonly used techniques are: descriptive technique, probability models technique and spectral density analysis technique. The inference based on the descriptive method and probability models is often referred to as analysis in time domain while inference based on spectral density function is referred to analysis in frequency domain [2,3,4,5].
All these models assume that the error component of the series (e t ) is normally distributed with zero mean and constant variance   2  or that the series is normally distributed with constant mean (μ) and constant variance   2  . When any study data violates any or all of these assumptions, the series is subjected to transformation. Transformation helps to (i) stabilize the variance of a series, (ii) make the seasonal effect when present additive and (iii) make the data normally distributed [1]. One of the transformations commonly used is the power transformation developed by [6]. [7] noted that the power transformation (i) changes the scale of the original series, (ii) may introduce bias in the forecast especially when data have to be transformed back to its original scale and (iii) often the transformed series have no physical interpretation. [1] argued that transformation alone may not be helpful when variance changes through time in the absence of trend. In such case (i.e., when variance changes through time in the absence of trend), he recommended that a model that allows for changes in variance should be considered.
Wavelet method is one such method that allows for changes in variance which has been found to be useful in time series analysis. It involves decomposition, de-nosing and reconstruction of series. Decomposition involves breaking down time series into two main components namely the detail and the smooth components, de-noising deals with the removal of the non significant components of the series, while reconstruction involves recovering of the original series devoid of noise. Unlike the power transformation, wavelet method does not change the scale of the series, poses no problem in its interpretation and does not rely on the assumption of any underlying distribution of the study data. Additionally, wavelet transformation method allow for decomposition of a series without knowing the underlying functional form of the series [8]. Could this lead to an improved model and forecast performance from those based on power transformation? This and other related questions are what this study intends to address. Therefore, the objective of this study is to examine the precision (in terms of goodness-of-fit) and forecast performances of the models for wavelet transformed series while comparing it with models for power transformed time series.

Literature Review
Various transformations exist, but the power transformation developed by [6] is often used. This transformation requires a correct choice of the transformation parameter often denoted as (λ). [4] suggested using a maximum likelihood value for the choice of the value of λ that results in the smallest residual sum of squares. [9] proposed a Bayesian method to choose the value of λ for a given model structure. The correct choice of the transformation parameter (λ), the simultaneous transformation and fitting of the model of a given series are the noticed limitations in the use of [6] power transformation. To remedy these limitations, [10] have shown how to apply Bartlett's transformation to time series data. Accordingly, they regress the natural logarithms of the group standard deviation against the natural logarithms of the group means of time series data arranged chronologically in equal groups and determine the slope (β) of the relationship. [11] derived a confidence interval for the index of a power transformation that stabilizes the variance of a time series. They claimed that the confidence interval for the minimum coefficient of variation can also be used to construct confidence interval for any coefficient of variation. [12] used Box-Cox transformation approach to transform a streamflow time series data to turn the non-Guassian heavy tailed distribution to a nearly Gaussian series. [13] applied log-transformation in time series modeling of US macroeconomic data. He demonstrated that the claim previously made concerning improvement in forecast accuracy following bias correction for the transformed data were not generally well founded. [14] in his work applied differencing and Box-Cox transformation on a non-stationary series prior to application of ARMA model. He discovered that the technique was not optimal for forecasting of the study data. Therefore, he recommended a method for modeling time series data with unstable roots and changing parameters. The appropriateness in context of out-of-sample forecast of vector autoregressive models (VAR) when the series is log-transformed was examined by [15]. The

58
BMSA Volume 17 examination was on out-of-sample forecast of GDP of multi-country set-up. They discovered that the forecast performed better when the series is transformed. [16] in their work highlighted that before application of ARMA model, achieving variance stationarity via power transformations contributes to improvement in forecasts especially for long forecasting horizons. [17] applied power transformation to achieve variance stationarity in modeling of Nigeria external reserve. Using Bartlett's version of power transformation, they discovered that the appropriate transformation for the study data was logarithmic transformation. They further fitted ARIMA (2, 1, 0) to the transformed series. [18] investigated the potential of wavelet methods in the analysis of biological sequences as a complement method to those currently in use. He used the maximal overlap discrete wavelet (MODWT) to extract the relevant structural features from the data. [19] compared the discrete wavelet transform and maximal overlap discrete wavelet transform of Dow Jones index of US stock market. They used a unit root test to establish the non-stationarity of the study series and different wavelet transform families were examined. They discovered that the discrete wavelet transform outperformed the maximal overlap discrete wavelet transform. [20] examined the modeling and forecasting using wavelet while comparing it with ARIMA and X-II modeling approach. The study data were US dollar against DM exchange rate data and ten steps ahead forecast was made. The outcome of the forecast using average percentage forecast error (APE) criterion show that wavelet approach is the best. [21] studied the performances of wavelet-ARIMA and wavelet-ANN models for temperature data in the Northeastern Bangladesh. Their findings indicate that the performance based on the predictive capability was more effective in wavelet-ARIMA model than in the wavelet-ANN model. [22] compared the wavelet method of time series and other existing method like ARIMA and Census X-12 methods. They came to a conclusion that when a series is characterized with medium and long term structure, wavelet reduces the error forecast substantially, but increases the error forecast when such data have only short term structure.

Methodology
This Section discusses the methodology employed in this study. Sub-section 3.1 presents the methods of evaluation for possible transformation and choice of appropriate transformation (using Bartlett test for constant variance and Discrete Wavelet Transform (DWT)). Sub-section 3.2 discusses the Augmented Dickey-Fuller (ADF) Unit Root Test, Sub -sections 3.3 and 3.4 present the ARMA model and model selection criteria, respectively, while the forecast accuracy measures are discussed in Sub-section 3.5. [10] have shown how to apply [23] transformation techniques to time series data without considering the time series model structure. According to [10], the Bartlett transformation for time series data involves regressing the natural logarithm of the group standard deviations

Methods of Evaluation for Possible Transformation and Choice of Appropriate Transformation 3.1.1 Bartlett's Transformation
of series that is arranged chronologically in equal groups and determine the slope β, of the relationship: Application of Eq. 1 will lead to a power transformation which is given by:

Bulletin of Mathematical Sciences and Applications Vol. 17
where t Y is the transformed series at time t , t X is the original value of the series at time t , α, β are the regression coefficients given in Eq. 1 and i  is zero mean white noise with constant variance To test for the significance of the slope β, the following hypothesis is formulated. That is, degrees of freedom and is given by: When β is significantly different from zero, then the series is non-stationary in variance and appropriate transformation using Eq. 2 is applied. [10] further suggested that the value of ˆshould be substituted directly for the transformation instead of possibly using an approximation.

Discrete Wavelet Transform (DWT)
This Section considers wavelet transformation method for transforming time series data (in time domain) to wavelet coefficients (data in the wavelet domain). It involves decomposition, denoising and reconstruction. These steps are briefly discussed in what follows.

(a) Discrete Wavelet Decomposition
Given a real valued time series   of a dyadic length, 2 J n  (a length that is of power of 2), where J is a positive integer, and choosing an appropriate transformation filters, we compute the wavelet coefficients. For a sequence t x , the J-step (level) wavelet transformation is given by: where w is column vector (length, 2 J n  ) comprising of a set of wavelet coefficients (detail coefficients ( j k d , )) and a set of averages (scaling or smooth coefficients ( j k v , )), J W is the J-step n n  real valued orthogonal matrix often called "filter bank" defining the discrete wavelet transform (DWT) and satisfying x is column vector with value of the original series at time t . The matrix W J is built using a finite list of numbers called filters [wavelet filters ( l h ) and scaling filters ( l g )] and is partitioned into two equal part such that the first 2 n rows is a matrix of wavelet filters which used to produce the detail coefficients ( j k d , ) and the last 2 n rows is a matrix of scaling filters is used to produce the smooth coefficients ( j k v , ) which serve as an input data in the next stage of convolution. The first smooth coefficients ( 1 v ) is convolved with the wavelet filter ( l h ) and scaling filter ( l g ) to produce the next detail coefficient ( 2 d ) and smooth coefficient ( 2 v ) and so on. This process is repeated up to:

60
BMSA Volume 17 Therefore, for J steps, we let the th r step wavelet transformation becomes: In particular, for 3 and 2 , 1  r , we have: The partition matrix k H and k G is a circulant matrix of the order   at the position ( j i, ). Alternatively, after the determination of first row, other th i row can as well be the first row circularly shifted to the right by ) 1 ( 2  i units. M is the vanishing moment or a shift parameter of the wavelet which is usually half of the wavelet filter width ( L ). k G has the same dimension with entries in Eq. 7 replaced with the smooth filters ( l g ). The relationship between the wavelet details filters ( l h ) and the scaling or smooth filters ( l g ) can be represented thus: where L is the wavelet filter width and:

Bulletin of Mathematical Sciences and Applications Vol. 17
Using Eq. 7 and Eq. 9, k H can be populated as represented in Eq. 10: Similarly, k G matrix is expressed as Eq. 10 using l g of Eq. 8. The detail filters and scaling (smooth) filters in Eq. 7 and Eq. 8, respectively, satisfy the following conditions as given in [24]: (a) It must have a unit norm or energy: (b) It must sum up to square root of two: (c) It must be orthogonal to its even shifts: and smooth filters values given by: where l h are the detail filters given in Eq. 7 and l g are the smooth filters given in Eq. 8. The value of J 0 < J and is given by where J 0 is the wavelet decomposition level for MODWT, n is the number of observations, and L is the wavelet filter width.

62
BMSA Volume 17 The detail and smooth filters of MODWT have the following properties as given in [25]:

(b) Data De-noising via wavelet
The detail coefficients of the wavelet decomposition are de-noised using a threshold proposed by [26] known as universal threshold given by: where   is the Universal threshold, n is the number of observations and ˆ is the estimate of the noise level at the finest scale. The estimate of the noise level given in Eq. 22 is given by: where ˆ is given in Eq. 22, is the finest scale detail wavelet coefficient and 0.6745 is a scale factor. The threshold wavelet coefficients are obtained using either the hard thresholding rule given by: where k j d , is the detail wavelet coefficient at j-th level and kth scale, λ is the universal threshold given in Eq. 22. The wavelet coefficient could as well be obtained using soft thresholding rule known as wavelet shrinkage which is given by: where W q , is given in Eq. 28 and * * , J j v w are de-noised version of wavelet detail and smooth coefficients, respectively.

Augmented Dickey-Fuller Unit Root Test.
Having arrived at a transformed series   t Y which is stable in variance either through Barthlett's transformation or DWT, another challenge is to ascertain the stationarity in mean of the transformed series (Y t ). One of the ways to test for this is using Augmented Dickey-Fuller Unit root test. The test was developed by [27] and can be illustrated thus; given a model: ,Y t is the transformed series at time t, ) ..., The estimated t-value follows the  (tau) statistic. If the null hypothesis is rejected, the implication is that the series is stationary in mean.
If we do not reject the null hypothesis, and assume that the trend is stochastic and the transformed series Y t is not seasonal, then the series is made stationary by differencing. This will result to a stationary series given by: where Z t is the differenced series at time t, Y t is the transformed series at time t given in Eq. 2 or Eq. 29,  is the differencing operator such that and d is the number of differencing.

The Probability (ARMA) Model
Given the stationary series   are polynomial of degree p and q, respectively, in B.
We require the root of characteristic equation to lie outside unit circle for the process to be stationary and invertible, respectively.

Model Precision (goodness-of-fit) Measures
Assuming that an ARMA model of k parameters is fitted to a stationary time series (Z t ) given in Eq. 31. To assess the precision of the model in terms goodness-of-fit, the following information criteria will be used: (a) Akaike Information Criterion (AIC) [28] introduced a criterion for measurement of goodness-of-fit of a model. The information is given by: where n is the number of observations, The model with a minimum value of information in Eq. 35, Eq. 36 and Eq. 37 are adopted as a model with the best fit.

Forecast Accuracy Measures
Let the l-step-ahead forecast error be:

Results and Discussion
The time series plot of the monthly exchange rate and the monthly external reserve are as displayed in Fig. 1 and Fig. 2, respectively.

Tests for Constant Variance and Choice of Appropriate Transformation
The results of the application of the Bartlett's transformation techniques in Eq. 1 to the study data (exchange rate and the external reserve) are as shown in Table 1. Level of significance (α)=0.05 From Table 1, it could be seen that the original series of the exchange rate and external reserve admits transformation which suggest variance heterogeneity of the series. While the direct substitution of the value of slope (β = -9.124) in Eq. 2 was used for the transformation of the exchange rate data, the square root transformation which is appropriate when the slope (β) equals 0.5 was used for the external reserve data. To further confirm if stationarity in variance had been achieved, the same test (Bartlett's test for constant variance) was carried out on the transformed (power and wavelet) series. The results show that no further transformation is required in both study data.

Wavelet Decomposition, De-noising and Reconstruction of Time Series Data
The results of application of wavelet based transformation (decomposition, de-noising and reconstruction) are as displayed in Table 2 while the plot of the wavelet coefficients are shown in Fig. 3 and Fig. 4 for the exchange rate and external reserve respectively. The result is the decomposition based on MODWT of the Duabachies 4 (D4) with its filter matrix represented in Eq. 10. The threshold used was the universal threshold given in Eq. 24 while the de-nosing was done using the soft threshold given in Eq. 27. The wavelet reconstruction was done using the multiresolution analysis given in Eq. 29. From the result in Table 2, the wavelet filter width used for both exchange rate and external reserve is 4 which is Daubechies wavelet (D4).  X t  Table 2, it could be seen that noise (signal-to-noise) level for exchange rate is 1.8398 with the threshold value of 5.7494 while the noise level for external reserve is 188.3039 with the threshold value of 610.6098. The noise level was estimated using Eq. 25 while the decomposition level was estimated using Eq. 19.

Augmented Dickey-Fuller Unit Root Test for Stationarity
The result of the application of Augmented Dickey-Fuller (ADF) unit root test for mean stationarity for the (power and wavelet) transformed series under study (exchange rate and external reserve) are as shown in Table 3. From Table 3, the ADF statistic value for the (power and wavelet) transformed series for exchange rate data is (-2.8680) and (-2.4204) with p-value of 0.1764 and 0.3674 respectively, suggesting non-stationarity in mean. The corresponding ADF values for first differenced series are -11.4714 and -6.4848 with p-value of 0.000 for the power and wavelet transformed series respectively.

BMSA Volume 17
Similarly, for the external reserve data, the ADF value for the (power and wavelet) transformed series are -1.3508 and -1.3524 with p-value of 0.8719 and 0.8714 respectively. This again suggest non-stationarity in mean while the corresponding ADF value for the first differenced series are -9.1424 and -6.8824 with p-value of 0.0000 for the (power and wavelet) transformed series, respectively.

Model Identification and Parameter Estimation
The results for the estimates of parameters for the models of exchange rate data and external reserve data are as shown in Table 4. The results in Table 4 suggest that the power transformed series of the exchange rate data admits a random walk. This is evident based on the first difference of the power transformed series is found to be a white noise. On the wavelet transformed series, ARIMA (1, 1, 0) is adequately fitted to the data with ϕ 1 = 0.5241 with a p-value of 0.0000.
Based on the value of the information criteria in Table 4, it could be observed that the model for power transformed series has a minimum value than the model for the wavelet transformed series of the exchange rate data. This suggests that the exchange rate data in the period under review has a better fit than the model for the wavelet transformed. Contrary to the result in exchange rate data, the information value for the models of wavelet transformed series was smaller than the model for the power transformed series, thus suggesting a better fit for wavelet transformed series.

Model Diagnostic Check
The results of the descriptive statistics for the residuals of the models of the study data for power and wavelet transform series are as shown in Table 5. The results in Table 5 show that the error means are not significantly different from zero and the distribution of each of the residual is normal or approximately normal. This is an indication of the adequacy of the model that is fitted to the series.

Forecast and Forecast Accuracy Measures
The results of the forecast and their corresponding accuracy measures made on the study data (exchange rate and external reserve) are as shown in Table 6 and Table 7. The results show that the root mean square error (RMSE) of 54.4304 for the forecast of the model for power transformed series is greater than the RMSE of 42.8487 for the forecast of the model for the wavelet transformed series for exchange rate. Also it could be observed that mean absolute deviation (MAD) and mean absolute percentage error (MAPE) followed the same trend with root mean square error on the same series. This suggests that the model for the wavelet transform series outperformed the model for the power transformed series in terms of forecast.  Equivalently, for the external reserve, the root mean square error of the forecast for the model of the power transformed series is found to be 2261.02 and 1311.455 for the model of the wavelet transformed series. Again, the root mean square error is smaller in model for the wavelet transform than the power transformation model. The same trend is seen in the other forecast accuracy measures. This again suggests that the model for wavelet transformed series outperformed the model for the power transformed series in terms of forecast. The plot of the actual and forecasts for power transformed series and wavelet transformed series are as displayed in Fig. 5 and Fig. 6 for exchange rate and external reserve, respectively.

Conclusion
Based on the results of this study, it has been shown that discrete wavelet transform (DWT) can serve as a transformation tool for a time series data and can serve as a better alternative to power transformation especially when a series is characterized with discontinuities and heterogeneous variance. Comparatively, on the study data, the models for the wavelet transformed series outperformed the models for the power transformation especially in forecast.
We therefore recommend that discrete wavelet transform should be used as an alternative to power transformation when a time series is characterized with discontinuities and heterogeneous variance.