Strength of ships: Difference between revisions

Revision as of 00:04, 1 February 2014

In time series analysis, the Box–Jenkins methodology, named after the statisticians George Box and Gwilym Jenkins, applies autoregressive moving average ARMA or ARIMA models to find the best fit of a time series to past values of this time series, in order to make forecasts.

Modeling approach

The original model uses an iterative three-stage modeling approach:

Model identification and model selection: making sure that the variables are stationary, identifying seasonality in the dependent series (seasonally differencing it if necessary), and using plots of the autocorrelation and partial autocorrelation functions of the dependent time series to decide which (if any) autoregressive or moving average component should be used in the model.
Parameter estimation using computation algorithms to arrive at coefficients that best fit the selected ARIMA model. The most common methods use maximum likelihood estimation or non-linear least-squares estimation.
Model checking by testing whether the estimated model conforms to the specifications of a stationary univariate process. In particular, the residuals should be independent of each other and constant in mean and variance over time. (Plotting the mean and variance of residuals over time and performing a Ljung-Box test or plotting autocorrelation and partial autocorrelation of the residuals are helpful to identify misspecification.) If the estimation is inadequate, we have to return to step one and attempt to build a better model.

The data they used were from a gas furnace. These data are well known as the Box and Jenkins gas furnace data for benchmarking predictive models.

Commandeur & Koopman (2007, §10.4) argue that the Box-Jenkins approach is fundamentally problematic. The problem arises because in "the economic and social fields, real series are never stationary however much differencing is done". Thus the investigator has to face the question: how close to stationary is close enough? As the authors note, "This is a hard question to answer". The authors further argue that rather than using Box-Jenkins, it is better to use state space methods, as stationarity of the time series is then not required.

Box-Jenkins model identification

Stationarity and seasonality

The first step in developing a Box–Jenkins model is to determine if the time series is stationary and if there is any significant seasonality that needs to be modelled.

Detecting stationarity

Stationarity can be assessed from a run sequence plot. The run sequence plot should show constant location and scale. It can also be detected from an autocorrelation plot. Specifically, non-stationarity is often indicated by an autocorrelation plot with very slow decay.

Detecting seasonality

Seasonality (or periodicity) can usually be assessed from an autocorrelation plot, a seasonal subseries plot, or a spectral plot.

Differencing to achieve stationarity

Box and Jenkins recommend the differencing approach to achieve stationarity. However, fitting a curve and subtracting the fitted values from the original data can also be used in the context of Box–Jenkins models.

Seasonal differencing

At the model identification stage, the goal is to detect seasonality, if it exists, and to identify the order for the seasonal autoregressive and seasonal moving average terms. For many series, the period is known and a single seasonality term is sufficient. For example, for monthly data one would typically include either a seasonal AR 12 term or a seasonal MA 12 term. For Box–Jenkins models, one does not explicitly remove seasonality before fitting the model. Instead, one includes the order of the seasonal terms in the model specification to the ARIMA estimation software. However, it may be helpful to apply a seasonal difference to the data and regenerate the autocorrelation and partial autocorrelation plots. This may help in the model identification of the non-seasonal component of the model. In some cases, the seasonal differencing may remove most or all of the seasonality effect.

Identify p and q

Once stationarity and seasonality have been addressed, the next step is to identify the order (i.e., the p and q) of the autoregressive and moving average terms. Different authors have different approaches for identifying p and q. Brockwell and Davis (1991, p. 273) state "our prime criterion for model selection [among ARMA(p,q) models] is the AICc", i.e., Akaike information criterion with correction.

Other authors use the autocorrelation plot and the partial autocorrelation plot.

Autocorrelation and partial autocorrelation plots

The sample autocorrelation plot and the sample partial autocorrelation plot are compared to the theoretical behavior of these plots when the order is known.

Specifically, for an AR(1) process, the sample autocorrelation function should have an exponentially decreasing appearance. However, higher-order AR processes are often a mixture of exponentially decreasing and damped sinusoidal components.

For higher-order autoregressive processes, the sample autocorrelation needs to be supplemented with a partial autocorrelation plot. The partial autocorrelation of an AR(p) process becomes zero at lag p + 1 and greater, so we examine the sample partial autocorrelation function to see if there is evidence of a departure from zero. This is usually determined by placing a 95% confidence interval on the sample partial autocorrelation plot (most software programs that generate sample autocorrelation plots also plot this confidence interval). If the software program does not generate the confidence band, it is approximately $\pm 2 / \sqrt{N}$ , with N denoting the sample size.

The autocorrelation function of a MA(q) process becomes zero at lag q + 1 and greater, so we examine the sample autocorrelation function to see where it essentially becomes zero. We do this by placing the 95% confidence interval for the sample autocorrelation function on the sample autocorrelation plot. Most software that can generate the autocorrelation plot can also generate this confidence interval.

The sample partial autocorrelation function is generally not helpful for identifying the order of the moving average process.

The following table summarizes how one can use the sample autocorrelation function for model identification.

Shape	Indicated Model
Exponential, decaying to zero	Autoregressive model. Use the partial autocorrelation plot to identify the order of the autoregressive model.
Alternating positive and negative, decaying to zero	Autoregressive model. Use the partial autocorrelation plot to help identify the order.
One or more spikes, rest are essentially zero	Moving average model, order identified by where plot becomes zero.
Decay, starting after a few lags	Mixed autoregressive and moving average (ARMA) model.
All zero or close to zero	Data are essentially random.
High values at fixed intervals	Include seasonal autoregressive term.
No decay to zero	Series is not stationary.

In practice, the sample autocorrelation and partial autocorrelation functions are random variables and do not give the same picture as the theoretical functions. This makes the model identification more difficult. In particular, mixed models can be particularly difficult to identify. Although experience is helpful, developing good models using these sample plots can involve much trial and error.

Box–Jenkins model estimation

Estimating the parameters for the Box–Jenkins models is a quite complicated non-linear estimation problem. For this reason, the parameter estimation should be left to a high quality software program that fits Box–Jenkins models. Fortunately, many statistical software programs now fit Box–Jenkins models.

The main approaches to fitting Box–Jenkins models are non-linear least squares and maximum likelihood estimation. Maximum likelihood estimation is generally the preferred technique. The likelihood equations for the full Box–Jenkins model are complicated and are not included here. See (Brockwell and Davis, 1991) for the mathematical details.

Box–Jenkins model diagnostics

Assumptions for a stable univariate process

Model diagnostics for Box–Jenkins models is similar to model validation for non-linear least squares fitting.

That is, the error term A_t is assumed to follow the assumptions for a stationary univariate process. The residuals should be white noise (or independent when their distributions are normal) drawings from a fixed distribution with a constant mean and variance. If the Box–Jenkins model is a good model for the data, the residuals should satisfy these assumptions.

If these assumptions are not satisfied, one needs to fit a more appropriate model. That is, go back to the model identification step and try to develop a better model. Hopefully the analysis of the residuals can provide some clues as to a more appropriate model.

One way to assess if the residuals from the Box–Jenkins model follow the assumptions is to generate statistical graphics (including an autocorrelation plot) of the residuals. One could also look at the value of the Box–Ljung statistic.

References

Box, George and Jenkins, Gwilym (1970) Time series analysis: Forecasting and control, San Francisco: Holden-Day.
20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

My blog: http://www.primaboinca.com/view_profile.php?userid=5889534
Commandeur J.J.F., Koopman S.J. (2007), Introduction to State Space Time Series Analysis (Oxford University Press).
Pankratz, Alan (1983) Forecasting with univariate Box–Jenkins models: concepts and cases, New York: John Wiley & Sons.

External links

A First Course on Time Series Analysis - an open source book on time series analysis with SAS (Chapter 7)
Box–Jenkins models in the Engineering Statistics Handbook of NIST

Template:NIST-PD

@@ Line 1: / Line 1: @@
-Ed is what people contact me and my wife doesn't like it at all. It's not a typical factor but what I like performing is to climb but I don't have the time recently. Invoicing is my profession. For years he's been living in Mississippi and he doesn't strategy on changing it.<br><br>Also visit my web site - [http://ustanford.com/index.php?do=/profile-38218/info/ clairvoyant psychic]
+In [[time series analysis]], the '''Box–Jenkins''' [[methodology]], named after the [[statistician]]s [[George Box]] and [[Gwilym Jenkins]], applies autoregressive moving average [[Autoregressive moving average|ARMA]] or [[Autoregressive integrated moving average|ARIMA]] models to find the best fit of a time series to past values of this time series, in order to make [[forecasting|forecast]]s.
+==Modeling approach==
+The original model uses an iterative three-stage modeling approach:
+#''Model identification and [[model selection]]'': making sure that the variables are [[stationary process|stationary]], identifying seasonality in the dependent series (seasonally differencing it if necessary), and using plots of the [[autocorrelation]] and [[partial autocorrelation]] functions of the dependent time series to decide which (if any) autoregressive or moving average component should be used in the model.
+#''[[Parameter estimation]]'' using computation algorithms to arrive at coefficients that best fit the selected ARIMA model. The most common methods use [[maximum likelihood estimation]] or [[non-linear least-squares estimation]].
+#''[[Statistical model validation|Model checking]]'' by testing whether the estimated model conforms to the specifications of a stationary univariate process. In particular, the residuals should be independent of each other and constant in mean and variance over time. (Plotting the mean and variance of residuals over time and performing a [[Ljung-Box test]] or plotting autocorrelation and partial autocorrelation of the residuals are helpful to identify misspecification.) If the estimation is inadequate, we have to return to step one and attempt to build a better model.
+The data they used were from a gas furnace.  These data are well known as the Box and Jenkins gas furnace data for benchmarking predictive models.
+Commandeur & Koopman (2007, §10.4) argue that the Box-Jenkins approach is fundamentally problematic.  The problem arises because in "the economic and social fields, real series are never stationary however much differencing is done". Thus the investigator has to face the question: how close to stationary is close enough? As the authors note, "This is a hard question to answer".  The authors further argue that rather than using Box-Jenkins, it is better to use state space methods, as stationarity of the time series is then not required.
+==Box-Jenkins model identification==
+===Stationarity and seasonality===
+The first step in developing a Box–Jenkins model is to determine if the [[time series]] is [[Stationary process|stationary]] and if there is any significant [[seasonality]] that needs to be modelled.
+====Detecting stationarity====
+Stationarity can be assessed from a [[run sequence plot]]. The run sequence plot should show constant location and [[Scale (ratio)|scale]]. It can also be detected from an [[autocorrelation plot]]. Specifically, non-stationarity is often indicated by an autocorrelation plot with very slow decay.
+====Detecting seasonality====
+Seasonality (or periodicity) can usually be assessed from an autocorrelation plot, a [[seasonal subseries plot]], or a [[spectral plot]].
+====Differencing to achieve stationarity====
+Box and Jenkins recommend the differencing approach to achieve stationarity. However, [[curve fitting|fitting a curve]] and subtracting the fitted values from the original data can also be used in the context of Box–Jenkins models.
+====Seasonal differencing====
+At the model identification stage, the goal is to detect seasonality, if it exists, and to identify the order for the seasonal autoregressive and seasonal moving average terms. For many series, the period is known and a single seasonality term is sufficient. For example, for monthly data one would typically include either a seasonal AR 12 term or a seasonal MA 12 term. For Box–Jenkins models, one does not explicitly remove seasonality before fitting the model. Instead, one includes the order of the seasonal terms in the model specification to the [[ARIMA]] estimation software. However, it may be helpful to apply a seasonal difference to the data and regenerate the autocorrelation and partial autocorrelation plots. This may help in the model identification of the non-seasonal component of the model. In some cases, the seasonal differencing may remove most or all of the seasonality effect.
+===Identify ''p'' and ''q''===
+Once stationarity and seasonality have been addressed, the next step is to identify the order (i.e., the ''p'' and ''q'') of the autoregressive and moving average terms.  Different authors have different approaches for identifying  ''p'' and ''q''.  Brockwell and Davis (1991, p.&nbsp;273) state "our prime criterion for model selection [among ARMA(p,q) models] is the AICc", i.e., [[Akaike information criterion]] with correction.
+Other authors use the autocorrelation plot and the partial autocorrelation plot.
+====Autocorrelation and partial autocorrelation plots====
+The sample autocorrelation plot and the sample partial autocorrelation plot are compared to the theoretical behavior of these plots when the order is known.
+Specifically, for an [[AR(1)]] process, the sample autocorrelation function should have an exponentially decreasing appearance. However, higher-order AR processes are often a mixture of exponentially decreasing and damped sinusoidal components.
+For higher-order autoregressive processes, the sample autocorrelation needs to be supplemented with a partial autocorrelation plot. The partial autocorrelation of an AR(''p'') process becomes zero at lag ''p'' + 1 and greater, so we examine the sample partial autocorrelation function to see if there is evidence of a departure from zero. This is usually determined by placing a 95% [[confidence interval]] on the sample partial autocorrelation plot (most software programs that generate sample autocorrelation plots also plot this confidence interval). If the software program does not generate the confidence band, it is approximately <math>\pm 2/\sqrt{N}</math>, with ''N'' denoting the sample size.
+The autocorrelation function of a [[moving average model|MA(''q'')]] process becomes zero at lag ''q'' + 1 and greater, so we examine the sample autocorrelation function to see where it essentially becomes zero. We do this by placing the 95% confidence interval for the sample autocorrelation function on the sample autocorrelation plot. Most software that can generate the autocorrelation plot can also generate this confidence interval.
+The sample partial autocorrelation function is generally not helpful for identifying the order of the moving average process.
+The following table summarizes how one can use the sample [[autocorrelation function]] for model identification.
+{| class="wikitable" style="text-align:left"
+! Shape !! Indicated Model
+|-
+! Exponential, decaying to zero
+| [[Autoregressive model]]. Use the partial autocorrelation plot to identify the order of the autoregressive model.
+|-
+! Alternating positive and negative, decaying to zero
+| Autoregressive model. Use the partial autocorrelation plot to help identify the order.
+|-
+! One or more spikes, rest are essentially zero
+| [[Moving average model]], order identified by where plot becomes zero.
+|-
+! Decay, starting after a few lags
+| Mixed autoregressive and moving average ([[Autoregressive moving average model|ARMA]]) model.
+|-
+! All zero or close to zero
+| Data are essentially random.
+|-
+! High values at fixed intervals
+| Include seasonal autoregressive term.
+|-
+! No decay to zero
+| Series is not stationary.
+|}
+In practice, the sample autocorrelation and partial autocorrelation functions are [[random variable]]s and do not give the same picture as the theoretical functions. This makes the model identification more difficult. In particular, mixed models can be particularly difficult to identify.  Although experience is helpful, developing good models using these sample plots can involve much trial and error.
+==Box–Jenkins model estimation==
+Estimating the parameters for the Box–Jenkins models is a quite complicated non-linear estimation problem. For this reason, the parameter estimation should be left to a high quality software program that fits Box–Jenkins models. Fortunately, many statistical software programs now fit Box–Jenkins models.
+The main approaches to fitting Box–Jenkins models are non-linear least squares and maximum likelihood estimation. Maximum likelihood estimation is generally the preferred technique. The likelihood equations for the full Box–Jenkins model are complicated and are not included here. See (Brockwell and Davis, 1991) for the mathematical details.
+==Box–Jenkins model diagnostics==
+===Assumptions for a stable univariate process===
+Model diagnostics for Box–Jenkins models is similar to model validation for non-linear least squares fitting.
+That is, the error term ''A<sub>t</sub>'' is assumed to follow the assumptions for a stationary univariate process. The residuals should be [[white noise]] (or independent when their distributions are normal) drawings from a fixed distribution with a constant mean and variance. If the Box–Jenkins model is a good model for the data, the residuals should satisfy these assumptions.
+If these assumptions are not satisfied, one needs to fit a more appropriate model. That is, go back to the model identification step and try to develop a better model. Hopefully the analysis of the residuals can provide some clues as to a more appropriate model.
+One way to assess if the residuals from the Box–Jenkins model follow the assumptions is to generate [[statistical graphics]] (including an autocorrelation plot) of the residuals. One could also look at the value of the [[Ljung-Box test|Box–Ljung statistic]].
+==References==
+* Box, George and Jenkins, Gwilym (1970) ''Time series analysis: Forecasting and control'', San Francisco: Holden-Day.
+* {{cite book
+|author = Brockwell, Peter J. and Davis, Richard A.
+|year = 1991
+|title = Time Series: Theory and Methods
+|publisher = Springer-Verlag
+}}
+* Commandeur J.J.F., Koopman S.J. (2007), ''Introduction to State Space Time Series Analysis'' ([[Oxford University Press]]).
+* Pankratz, Alan (1983) ''Forecasting with univariate Box–Jenkins models: concepts and cases'', New York: John Wiley & Sons.
+==External links==
+* [http://statistik.mathematik.uni-wuerzburg.de/timeseries/ A First Course on Time Series Analysis] - an open source book on time series analysis with SAS (Chapter 7)
+* [http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc445.htm Box–Jenkins models] in the Engineering Statistics Handbook of [[NIST]]
+{{NIST-PD}}
+{{DEFAULTSORT:Box-Jenkins}}
+[[Category:Time series analysis]]

Strength of ships: Difference between revisions

Revision as of 00:04, 1 February 2014

Contents

Modeling approach

Box-Jenkins model identification

Stationarity and seasonality

Detecting stationarity

Detecting seasonality

Differencing to achieve stationarity

Seasonal differencing

Identify p and q

Autocorrelation and partial autocorrelation plots

Box–Jenkins model estimation

Box–Jenkins model diagnostics

Assumptions for a stable univariate process

References

External links

Navigation menu

Strength of ships: Difference between revisions

Revision as of 00:04, 1 February 2014

Modeling approach

Box-Jenkins model identification

Stationarity and seasonality

Detecting stationarity

Detecting seasonality

Differencing to achieve stationarity

Seasonal differencing

Identify p and q

Autocorrelation and partial autocorrelation plots

Box–Jenkins model estimation

Box–Jenkins model diagnostics

Assumptions for a stable univariate process

References

External links

Navigation menu

Search