作业要求:

(1)去掉前10个观测值之后,阐述时间序列的特点,分解出趋势和周期
(2)分解时间序列,对剩余部分进行ARIMA建模或ARMA建模,预测最后10个观测值,并与真实值进行对比。
(3)报告形式可以为word,pdf,html,报告中应包括:数据预处理、模型识别、参数估计与检验、残差分析、预测等等,需要包含代码以及一定的图文说明.

数据预处理

读入数据

data<-read.csv("data2.csv",header=TRUE,sep=",",fileEncoding = "GBK",na.strings = "")
mydata<-data[data$姓名=="李馨雨*"&data$学号=="PB19010425",]
mydata<-na.omit(mydata)   #删去NA值

数据预处理

library(tseries)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
mydata<-mydata[-(1:12)]   #删去前面的姓名学号和前十个数据
md1<-as.double(as.matrix(mydata[1,]))  #读取数据后发现大部分数据都有两位小数后的截尾(即第三位小数及之后均为0),对整体数据分析影响不大
md<-ts(md1)  #转为时间序列数据

plot(md,main="原始序列时序图",ylab="")

acf(md);pacf(md)

tseries::adf.test(md)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  md
## Dickey-Fuller = 0.91189, Lag order = 5, p-value = 0.99
## alternative hypothesis: stationary
plot(md[1:25],type="l",main="原始序列部分时序图",ylab="",axes = F)
axis(2)
axis(1,at=seq(1:25),labels=seq(1:25),las=1)

从时序图可以看出,该时间序列有明显的上升趋势和周期性(呈波动式上升),是非平稳时间序列,且单位根检验和acf图、pacf图也说明了这一点。并且由时间序列间断图可以看出两个峰值之间的差值为4,故周期可以估计为4。

分解序列

decompose方法分解得到季节项

library(forecast)
#decompose方法
md<-ts(md,frequency = 4)
d1<-decompose(md)
plot(d1)

md_seas<-d1$seasonal    #decompose分解得到季节项
cat("用decompose函数分解得到序列季节项为:","\n")
## 用decompose函数分解得到序列季节项为:
md_seas[1:4]
## [1]  -50.05591 -250.07171  100.08398  200.04364
#二次函数拟合趋势项
md_rand_trend<-md-md_seas   #原序列减去季节项得到趋势项和随机项
t <- seq(length(md_rand_trend))
lmr <- lm(md_rand_trend ~ t+I(t^2)) 
md_trend <- ts(predict(lmr,newdata=list(t)),frequency=4,start=start(md_rand_trend)) 
cat("\n","按照二次函数回归拟合得到趋势项为:","\n")
## 
##  按照二次函数回归拟合得到趋势项为:
md_trend
##           Qtr1        Qtr2        Qtr3        Qtr4
## 1     47.58100    59.08781    71.56377    85.00888
## 2     99.42314   114.80655   131.15912   148.48084
## 3    166.77172   186.03175   206.26093   227.45926
## 4    249.62675   272.76338   296.86918   321.94412
## 5    347.98822   375.00147   402.98387   431.93543
## 6    461.85613   492.74600   524.60501   557.43318
## 7    591.23050   625.99697   661.73260   698.43737
## 8    736.11131   774.75439   814.36663   854.94802
## 9    896.49856   939.01826   982.50710  1026.96511
## 10  1072.39226  1118.78857  1166.15403  1214.48864
## 11  1263.79241  1314.06532  1365.30740  1417.51862
## 12  1470.69900  1524.84853  1579.96721  1636.05504
## 13  1693.11203  1751.13817  1810.13347  1870.09792
## 14  1931.03152  1992.93427  2055.80617  2119.64723
## 15  2184.45744  2250.23681  2316.98532  2384.70299
## 16  2453.38982  2523.04579  2593.67092  2665.26520
## 17  2737.82863  2811.36122  2885.86296  2961.33385
## 18  3037.77390  3115.18310  3193.56145  3272.90895
## 19  3353.22561  3434.51142  3516.76638  3599.99050
## 20  3684.18376  3769.34619  3855.47776  3942.57849
## 21  4030.64837  4119.68740  4209.69558  4300.67292
## 22  4392.61941  4485.53506  4579.41985  4674.27380
## 23  4770.09690  4866.88916  4964.65057  5063.38113
## 24  5163.08084  5263.74971  5365.38773  5467.99490
## 25  5571.57123  5676.11670  5781.63133  5888.11512
## 26  5995.56805  6103.99014  6213.38139  6323.74178
## 27  6435.07133  6547.37003  6660.63788  6774.87489
## 28  6890.08105  7006.25636  7123.40083  7241.51444
## 29  7360.59722  7480.64914  7601.67022  7723.66044
## 30  7846.61983  7970.54836  8095.44605  8221.31289
## 31  8348.14888  8475.95403  8604.72833  8734.47178
## 32  8865.18439  8996.86614  9129.51705  9263.13712
## 33  9397.72633  9533.28470  9669.81223  9807.30890
## 34  9945.77473 10085.20971 10225.61384 10366.98713
## 35 10509.32957 10652.64116 10796.92190 10942.17180
## 36 11088.39085 11235.57906 11383.73641 11532.86292
## 37 11682.95858 11834.02340 11986.05736 12139.06048
## 38 12293.03276 12447.97418 12603.88476 12760.76449
## 39 12918.61338 13077.43142 13237.21861 13397.97495
## 40 13559.70045 13722.39510 13886.05890 14050.69185
## 41 14216.29396 14382.86522 14550.40563 14718.91520
## 42 14888.39392 15058.84179 15230.25881 15402.64499
## 43 15576.00032 15750.32480 15925.61844 16101.88123
## 44 16279.11317 16457.31427 16636.48451 16816.62391
## 45 16997.73247 17179.81017 17362.85703 17546.87304
## 46 17731.85821 17917.81252 18104.73599 18292.62862
## 47 18481.49039 18671.32132 18862.12140 19053.89064
## 48 19246.62902 19440.33656 19635.01326 19830.65910
#随机项
md_rand<-ts(md_rand_trend-md_trend)
plot(md_rand,main="随机项by decompose和二次函数拟合")

cat("\n","按照二次函数回归拟合得到随机项为:","\n")
## 
##  按照二次函数回归拟合得到随机项为:
md_rand
## Time Series:
## Start = 1 
## End = 192 
## Frequency = 1 
##   [1]   4.81835312   5.46750589   5.30585166   3.27918335   0.22414200
##   [6]  -0.22304063   2.42759375   4.14331405   2.85249131   0.23848728
##  [11]  -0.74810973  -0.14480083   1.50946503   3.77441961   5.56434120
##  [16]   4.92493871   2.42549318   1.35514636   2.66874656   4.07163267
##  [21]   2.83337575  -0.45978246  -2.65459366  -1.03761894   2.98951273
##  [26]   5.06504313   4.39202054   5.51808386   8.81090414  11.07552315
##  [31]   9.84118916   9.03734109  10.67154998  12.10855758   8.84991220
##  [36]   3.95225274   1.72965023   1.92854644   1.11598966  -1.79528119
##  [41]  -6.02049510  -9.78361028 -12.86637845 -15.51626070 -18.26108600
##  [46] -21.29681258 -23.26419215 -21.98668579 -17.62912248 -13.16346045
##  [51] -12.09545142 -14.39355646 -14.79860454 -11.31755391  -7.26815627
##  [56]  -4.99087270  -4.05353218  -3.07309294  -1.27230670   0.25236547
##  [61]  -0.93490540  -4.76907756  -6.23090270  -2.58884193   2.67627580
##  [66]   4.93249225   3.38105571   0.56550509   0.24901143   0.68561648
##  [71]  -0.87243146  -2.35059347  -2.12669853  -1.37670487  -2.32536420
##  [76]  -4.87113761  -6.11785406  -4.66447180  -2.94374252  -1.95912732
##  [81]  -1.43645517  -0.65068430   0.39743358   2.10943738   3.10549814
##  [86]   2.94065761   4.01816410   5.94155651   6.27300587   3.70255395
##  [91]   0.19744904  -0.92976995   1.59606802   5.52800471   5.55428841
##  [96]   2.75345802   0.54168460   0.33700989   1.73368219   2.38224041
## [101]   0.79185559  -3.14743051  -5.66736960  -4.36642277  -2.12841899
## [106]  -0.66931649  -0.19486698  -0.05653154   0.50186085   1.67235196
## [111]   2.43019007   1.10291411  -2.33530489  -5.44842518  -4.52019846
## [116]   0.96491419   5.01608379   3.79335211   1.06896744   1.10146869
## [121]   3.47502689   4.53468382   2.41668775   1.09657761   2.58052442
## [126]   5.82756995   8.81996249   9.72324095   9.75757636  11.17401050
## [131]  13.10479164  13.12745871  10.25318273   8.72400547   8.97217522
## [136]   8.05923089   6.46634352   5.75055486   6.45411322   8.43455750
## [141]  10.85505873  13.93265868  13.87960564  10.11343852   7.25732836
## [146]   9.25831692  14.36865248  15.70587397  10.47315241   3.85752957
## [151]   2.82125374   7.64186384   9.67253089   3.75029665  -5.23259057
## [156]  -8.48859187  -3.22453622   1.99661815  -2.13288046 -11.14549316
## [161] -14.56804890 -10.08350592  -6.15961594  -9.56884003 -17.08800716
## [166] -19.44007558 -14.70279699 -11.09863247 -14.97441100 -21.79309081
## [171] -22.70242362 -16.42487050 -10.77726042 -12.41255163 -17.81849583
## [176] -17.65755410 -10.92655542  -3.92845802  -1.91101361  -3.94668328
## [181]  -4.53229600  -0.37080999   4.95002302   6.42774196   3.45551785
## [186]   2.39039246   6.96461408  12.93572162  14.33688612  11.41514933
## [191]  10.55275956  15.87330570
acf(md_rand,lag.max=40)

adf.test(md_rand)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  md_rand
## Dickey-Fuller = -2.9823, Lag order = 5, p-value = 0.1651
## alternative hypothesis: stationary
auto.arima(md_rand)
## Series: md_rand 
## ARIMA(4,0,1) with zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4     ma1
##       2.3819  -2.8057  2.1213  -0.7288  0.2625
## s.e.  0.0634   0.1181  0.1145   0.0603  0.0956
## 
## sigma^2 = 0.8269:  log likelihood = -256.79
## AIC=525.59   AICc=526.04   BIC=545.13

分解后的随机项由acf图和adf.test检验可以看出其依旧不平稳,后面考虑差分随机项序列得到平稳序列,从而建立arima模型。

书上例题方法

library(forecast)
#何书元例题方法
md<-ts(md,frequency = 4)

#三次函数拟合趋势项
t <- seq(length(md))
lmr <- lm(md ~ t+I(t^2)+I(t^3)) 
md_trend.lm <- ts(predict(lmr,newdata=list(t)),frequency=4,start=start(md)) 
cat("按照三次函数回归拟合得到趋势项为:","\n")
## 按照三次函数回归拟合得到趋势项为:
md_trend.lm
##           Qtr1        Qtr2        Qtr3        Qtr4
## 1     38.92535    50.75396    63.54440    77.29675
## 2     92.01109   107.68750   124.32605   141.92682
## 3    160.48989   180.01534   200.50324   221.95367
## 4    244.36671   267.74244   292.08094   317.38227
## 5    343.64653   370.87379   399.06412   428.21760
## 6    458.33432   489.41434   521.45775   554.46463
## 7    588.43504   623.36908   659.26681   696.12832
## 8    733.95368   772.74297   812.49626   853.21365
## 9    894.89519   937.54098   981.15108  1025.72558
## 10  1071.26456  1117.76808  1165.23624  1213.66910
## 11  1263.06674  1313.42925  1364.75670  1417.04916
## 12  1470.30672  1524.52945  1579.71744  1635.87075
## 13  1692.98946  1751.07366  1810.12342  1870.13882
## 14  1931.11994  1993.06685  2055.97963  2119.85836
## 15  2184.70312  2250.51398  2317.29103  2385.03433
## 16  2453.74398  2523.42004  2594.06259  2665.67171
## 17  2738.24748  2811.78998  2886.29928  2961.77546
## 18  3038.21860  3115.62878  3194.00608  3273.35056
## 19  3353.66232  3434.94142  3517.18795  3600.40198
## 20  3684.58359  3769.73286  3855.84986  3942.93468
## 21  4030.98739  4120.00807  4209.99679  4300.95364
## 22  4392.87869  4485.77202  4579.63371  4674.46383
## 23  4770.26246  4867.02969  4964.76558  5063.47022
## 24  5163.14368  5263.78604  5365.39738  5467.97778
## 25  5571.52731  5676.04606  5781.53409  5887.99149
## 26  5995.41833  6103.81469  6213.18066  6323.51630
## 27  6434.82170  6547.09693  6660.34207  6774.55720
## 28  6889.74240  7005.89774  7123.02330  7241.11916
## 29  7360.18540  7480.22209  7601.22931  7723.20714
## 30  7846.15566  7970.07495  8094.96507  8220.82612
## 31  8347.65816  8475.46129  8604.23556  8733.98106
## 32  8864.69788  8996.38608  9129.04574  9262.67695
## 33  9397.27977  9532.85430  9669.40059  9806.91874
## 34  9945.40882 10084.87091 10225.30508 10366.71141
## 35 10509.08999 10652.44088 10796.76417 10942.05994
## 36 11088.32825 11235.56920 11383.78285 11532.96928
## 37 11683.12858 11834.26082 11986.36607 12139.44442
## 38 12293.49594 12448.52072 12604.51882 12761.49032
## 39 12919.43531 13078.35386 13238.24605 13399.11196
## 40 13560.95166 13723.76523 13887.55275 14052.31430
## 41 14218.04996 14384.75979 14552.44389 14721.10232
## 42 14890.73517 15061.34251 15232.92442 15405.48098
## 43 15579.01227 15753.51836 15928.99934 16105.45527
## 44 16282.88624 16461.29232 16640.67359 16821.03014
## 45 17002.36203 17184.66935 17367.95217 17552.21057
## 46 17737.44463 17923.65443 18110.84003 18299.00153
## 47 18488.13900 18678.25252 18869.34216 19061.40800
## 48 19254.45012 19448.46860 19643.46351 19839.43493
#季节项
md.detrended.lm<-md-md_trend.lm
get.season <- function(yd,n_NA){ # input: Detrended series,n_NA为缺失值个数
  ymat <- matrix(c(yd,rep(NA,n_NA)), byrow=TRUE, ncol=4)
  #print(ymat)   #检查
  ## season
  seas0 <- apply(ymat, 2, mean, na.rm=TRUE)
  return(seas0)
}
md_seas.lm <- get.season(md.detrended.lm,n_NA=0)   #季节项 4个

#随机项
md_seas.lm <- rep(md_seas.lm,length(md)/length(md_seas.lm))
md_seas.lm<- ts(md_seas.lm, start=start(md), frequency=4)
md_rand.lm<-ts(md.detrended.lm-md_seas.lm,frequency = 1)
plot(md_rand.lm,main="书上例题方法得到随机项序列")

#单位根检验   
adf.test(md_rand.lm)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  md_rand.lm
## Dickey-Fuller = -2.9727, Lag order = 5, p-value = 0.1691
## alternative hypothesis: stationary
acf(md_rand.lm,lag.max=40);pacf(md_rand.lm)

采用何书元例题方法,用三次曲线拟合趋势项,再用原序列减去趋势项,利用季度平均作为季节项,从而得到随机项。但观察acf和pacf图及单位根检验可知此时随机项序列并不平稳,可能是ARIMA模型。

建立模型并预测

对剩余部分进行ARIMA建模或ARMA建模,预测最后10个观测值,并与真实值进行对比。

随机序列建模一

注:利用decompose得到季节项,然后序列减去季节项后用二次函数拟合得到趋势项,最后得到随机项,下对该不平稳随机项进行处理,尝试建模。

模型识别

#对md_rand进行一阶差分
md_rand_diff<-diff(md_rand)
plot(md_rand_diff,main="decompose方法一次差分后的随机序列")

#画acf图,进行平稳性检验和白噪声检验
acf(md_rand_diff,lag.max=60);pacf(md_rand_diff)

adf.test(md_rand_diff)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  md_rand_diff
## Dickey-Fuller = -3.971, Lag order = 5, p-value = 0.01193
## alternative hypothesis: stationary
for (i in 1:2) print(Box.test(md_rand_diff, lag=6*i, type="Ljung-Box"))
## 
##  Box-Ljung test
## 
## data:  md_rand_diff
## X-squared = 247.68, df = 6, p-value < 2.2e-16
## 
## 
##  Box-Ljung test
## 
## data:  md_rand_diff
## X-squared = 402.31, df = 12, p-value < 2.2e-16
#模型识别
auto.arima(md_rand_diff)
## Series: md_rand_diff 
## ARIMA(3,0,1) with zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3751  -1.3889  0.6938  0.3102
## s.e.  0.0658   0.0603  0.0628  0.0897
## 
## sigma^2 = 0.8594:  log likelihood = -257.33
## AIC=524.66   AICc=524.99   BIC=540.92

观察到一阶差分后的数据md_rand_diff的acf图振荡衰减,pacf图三阶截尾,猜测序列为ARMA模型,且p=3。作单位根检验说明序列平稳,作白噪声检验说明序列不为白噪声,做auto.arima拟合是ARMA模型,为ARMA(3,1).

X1<-md_rand_diff
plot(X1,main="decompose方法得到平稳差分随机项序列")

fit1.X1<-auto.arima(X1,ic="aic")
fit2.X1<-auto.arima(X1,ic="bic")
fit1.X1;fit2.X1
## Series: X1 
## ARIMA(3,0,1) with zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3751  -1.3889  0.6938  0.3102
## s.e.  0.0658   0.0603  0.0628  0.0897
## 
## sigma^2 = 0.8594:  log likelihood = -257.33
## AIC=524.66   AICc=524.99   BIC=540.92
## Series: X1 
## ARIMA(3,0,1) with zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3751  -1.3889  0.6938  0.3102
## s.e.  0.0658   0.0603  0.0628  0.0897
## 
## sigma^2 = 0.8594:  log likelihood = -257.33
## AIC=524.66   AICc=524.99   BIC=540.92

无论是用AIC准则还是BIC准则,auto.arima函数拟合出来的模型都是ARMA(3,1),因此选择该模型建模。

模型拟合

para.fit.X1 = stats::arima(X1,order=c(3,0,1), method = "ML")  #用arima函数拟合模型
para.fit.X1
## 
## Call:
## stats::arima(x = X1, order = c(3, 0, 1), method = "ML")
## 
## Coefficients:
##          ar1      ar2     ar3     ma1  intercept
##       1.3744  -1.3883  0.6931  0.3106     0.0864
## s.e.  0.0658   0.0603  0.0628  0.0897     0.2679
## 
## sigma^2 estimated as 0.8409:  log likelihood = -257.28,  aic = 526.56

参数估计得到的结果,写出模型表达式为: \(X_t-0.0864=\frac{(1+0.3106B)\epsilon_t}{1-1.3744B+1.3883B^2-0.6931B^3},Var(\epsilon_t)=0.8409\)(这里的X即X1构成的时间序列)

模型检验

下对该模型进行残差诊断。

par(mfrow=c(1,2))
plot(para.fit.X1$residuals, type = "p", pch = "+")
qqnorm(para.fit.X1$residuals)

for (i in 1:2) {
  print(Box.test(para.fit.X1$residuals, lag = 6*i))
}
## 
##  Box-Pierce test
## 
## data:  para.fit.X1$residuals
## X-squared = 0.38348, df = 6, p-value = 0.999
## 
## 
##  Box-Pierce test
## 
## data:  para.fit.X1$residuals
## X-squared = 8.9455, df = 12, p-value = 0.7076

对这个模型分别做 qq图(考虑正态性)与散点图(考虑随机性),发现这个模型的残差都非常接近白噪声。
由于各阶延迟下的LB统计量的P值都显著大于\(\alpha=0.05\),接受残差序列为白噪声序列的原假设,即该拟合模型在置信水平95%下显著有效。所以最后我们选择 ARMA(3,1)模型完成数据的建模。

参数显著性检验

para.fit.X1
## 
## Call:
## stats::arima(x = X1, order = c(3, 0, 1), method = "ML")
## 
## Coefficients:
##          ar1      ar2     ar3     ma1  intercept
##       1.3744  -1.3883  0.6931  0.3106     0.0864
## s.e.  0.0658   0.0603  0.0628  0.0897     0.2679
## 
## sigma^2 estimated as 0.8409:  log likelihood = -257.28,  aic = 526.56
# ar1 系数的显著性检验
t1 = 1.3744/0.0658
# 利用 P 值函数 pt()计算P值,其中自由度df为时间序列样本数减去未知参数个数,本案例中为 191-5=186
p1 = 2*pt(t1, df=186, lower.tail = F)
# ar2 系数的显著性检验
t2 = -1.3883/0.0603  
p2 = 2*pt(t2, df=186, lower.tail = T)
# ar3 系数的显著性检验
t3 = 0.6931/0.0628
p3 = 2*pt(t3, df=186, lower.tail = F)
# ma1的显著性检验
m1 = 0.3106/0.0897
p4 = 2*pt(m1, df=186, lower.tail = F)
# 常数的显著性检验
t0 = 0.0864/0.2679
p0 = 2*pt(t0, df=186, lower.tail = F)
p_value = data.frame(p1, p2,p3,p4, p0)
row.names(p_value) = "P 值"
p_value
##                p1           p2           p3           p4        p0
## P 值 1.164528e-50 2.426785e-56 4.161385e-22 0.0006636692 0.7474298

检验结果显示,前四个系数的P值都小于\(\alpha=0.05\),拒绝系数等于0的原假设,因此在置信水平95%下均显著非零。但常数项系数不通过等于0的假设检验,故我们现在考虑重新拟合该序列,并且将常数项固定为0.(事实上前面通过auto.arima函数来进行模型拟合,给出的结果是均值为0,即拟合时不含均值项,因此在应用例题方法时,若auto.arima识别时给出结果依旧是均值为0,则可以直接在拟合时令均值为0)

模型拟合(重新)

para.fit.X1 = stats::arima(X1,order=c(3,0,1),include.mean = F, method = "ML")  #用arima函数拟合模型
para.fit.X1
## 
## Call:
## stats::arima(x = X1, order = c(3, 0, 1), include.mean = F, method = "ML")
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3751  -1.3889  0.6938  0.3102
## s.e.  0.0658   0.0603  0.0628  0.0897
## 
## sigma^2 estimated as 0.8414:  log likelihood = -257.33,  aic = 524.66

参数估计得到的结果,写出模型表达式为: \(X_t=\frac{(1+0.3102B)\epsilon_t}{1-1.3751B+1.3889B^2-0.6938B^3},Var(\epsilon_t)=0.8414\)(这里的X即X1构成的时间序列)

模型检验(重新)

下对该模型进行残差诊断。

par(mfrow=c(1,2))
plot(para.fit.X1$residuals, type = "p", pch = "+")
qqnorm(para.fit.X1$residuals)

for (i in 1:2) {
  print(Box.test(para.fit.X1$residuals, lag = 6*i))
}
## 
##  Box-Pierce test
## 
## data:  para.fit.X1$residuals
## X-squared = 0.38298, df = 6, p-value = 0.999
## 
## 
##  Box-Pierce test
## 
## data:  para.fit.X1$residuals
## X-squared = 8.9344, df = 12, p-value = 0.7085

对这个模型分别做 qq图(考虑正态性)与散点图(考虑随机性),发现这个模型的残差都非常接近白噪声。
由于各阶延迟下的LB统计量的P值都显著大于\(\alpha=0.05\),接受残差序列为白噪声序列的原假设,即该拟合模型在置信水平95%下显著有效。所以最后我们选择ARMA(3,1)模型完成对利用decompose函数得到的且再做一阶差分后的随机项数据的建模。

参数显著性检验(重新)

para.fit.X1
## 
## Call:
## stats::arima(x = X1, order = c(3, 0, 1), include.mean = F, method = "ML")
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3751  -1.3889  0.6938  0.3102
## s.e.  0.0658   0.0603  0.0628  0.0897
## 
## sigma^2 estimated as 0.8414:  log likelihood = -257.33,  aic = 524.66
# ar1 系数的显著性检验
t1 = 1.3751/0.0658
# 利用 P 值函数 pt()计算P值,其中自由度df为时间序列样本数减去未知参数个数,本案例中为 191-4=187
p1 = 2*pt(t1, df=187, lower.tail = F)
# ar2 系数的显著性检验
t2 = -1.3889/0.0603  
p2 = 2*pt(t2, df=187, lower.tail = T)
# ar3 系数的显著性检验
t3 = 0.6938/0.0628
p3 = 2*pt(t3, df=187, lower.tail = F)
# ma1的显著性检验
m1 = 0.3102/0.0897
p4 = 2*pt(m1, df=187, lower.tail = F)
p_value = data.frame(p1, p2,p3,p4)
row.names(p_value) = "P 值"
p_value
##                p1           p2           p3           p4
## P 值 8.445218e-51 1.684701e-56 3.654305e-22 0.0006733279

检验结果显示,所有系数的P值都小于\(\alpha=0.05\),拒绝系数等于0的原假设,因此在置信水平95%下均显著非零。

残差分析

shapiro.test(para.fit.X1$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  para.fit.X1$residuals
## W = 0.99002, p-value = 0.2058
Box.test(para.fit.X1$residuals)
## 
##  Box-Pierce test
## 
## data:  para.fit.X1$residuals
## X-squared = 2.8496e-05, df = 1, p-value = 0.9957

Shapiro-Wilk 正态性检验的 p 值为 0.2058,背景分布是正态的。
拟合残差密度直方图及正态 QQ 图:

par(mfrow=c(1,2))
hist(para.fit.X1$residuals,main="Residuals",col=4,ylim=c(0,0.6),xlab=expression(x-hat(x)),probability = T)
lines(density(para.fit.X1$residuals))
rug(para.fit.X1$residuals)
qqnorm(para.fit.X1$residuals)
qqline(para.fit.X1$residuals)

残差序列图及残差 acf 图:

plot(para.fit.X1$residuals,ylab="Residual",type="o",pch=16,main="残差时序图(模型1)")
abline(h=0,lty=12)

Acf(para.fit.X1$residuals,main="ACF of residuals(模型1)",lag.max = 60)

预测

library(forecast)
z1=forecast(para.fit.X1,h=10)
md_rand.fit<-diffinv(z1$fitted)   #反差分回去推得最后十个随机项的拟合值

observe.fit<-md_rand.fit[(length(md)-9):length(md)]+md_trend[(length(md)-9):length(md)]+md_seas[(length(md)-9):length(md)] 
#对比
d1<-data.frame(matrix(c(observe.fit,md[(length(md)-9):length(md)]),ncol=2))
colnames(d1)<-c("预测值(模型1)","真实值")
cat("模型1预测的序列最后十个预测值和真实值如下","\n")
## 模型1预测的序列最后十个预测值和真实值如下
d1
##    预测值(模型1)   真实值
## 1       18204.33 18209.77
## 2       18493.31 18499.10
## 3       18429.71 18434.89
## 4       18416.89 18423.64
## 5       18962.02 18969.17
## 6       19259.58 19266.87
## 7       19203.38 19210.91
## 8       19193.95 19201.68
## 9       19737.03 19745.65
## 10      20036.75 20046.58

对比序列最后十个预测值和真实值,发现数据相差不大(差距最多约在10之内),说明模型拟合的较好。

随机序列建模二

注:利用何书元例题方法得到不平稳的随机项序列,下对该序列进行差分处理,然后对处理后的平稳序列进行ARMA模型的建模和检验过程。

模型识别

ndiffs(md_rand.lm)
## [1] 1
auto.arima(md_rand.lm)
## Series: md_rand.lm 
## ARIMA(3,1,1) 
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3579  -1.3747  0.6789  0.3597
## s.e.  0.0676   0.0617  0.0647  0.0927
## 
## sigma^2 = 0.8336:  log likelihood = -254.46
## AIC=518.92   AICc=519.25   BIC=535.18

用ndiff函数发现得到的随机项数据还需要差分一次,用auto.arima函数拟合也说明了这一点,下面考虑一阶差分后的数据,检验其是否为平稳序列。

#差分运算
md_rand.diff<-diff(md_rand.lm)
plot(md_rand.diff,main="例题方法差分后的随机序列")

#acf图和平稳性检验,白噪声检验
acf(md_rand.diff,lag.max=60);pacf(md_rand.diff)

adf.test(md_rand.diff)
## Warning in adf.test(md_rand.diff): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  md_rand.diff
## Dickey-Fuller = -4.1013, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
for (i in 1:2) print(Box.test(md_rand.diff, lag=6*i, type="Ljung-Box"))
## 
##  Box-Ljung test
## 
## data:  md_rand.diff
## X-squared = 248.85, df = 6, p-value < 2.2e-16
## 
## 
##  Box-Ljung test
## 
## data:  md_rand.diff
## X-squared = 404.36, df = 12, p-value < 2.2e-16
#模型识别
auto.arima(md_rand.diff)
## Series: md_rand.diff 
## ARIMA(3,0,1) with zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3579  -1.3747  0.6789  0.3597
## s.e.  0.0676   0.0617  0.0647  0.0927
## 
## sigma^2 = 0.8336:  log likelihood = -254.46
## AIC=518.92   AICc=519.25   BIC=535.18

观察到一阶差分后的数据md_rand.diff的acf图振荡衰减,pacf图三阶截尾,猜测序列为ARMA模型,且p=3。作单位根检验说明序列平稳,作白噪声检验说明序列不为白噪声,做auto.arima拟合是ARMA模型,为ARMA(3,1).

X2<-md_rand.diff
plot(X2,main="例题方法得到平稳差分随机项序列")

fit1.X2<-auto.arima(X2,ic="aic")
fit2.X2<-auto.arima(X2,ic="bic")
fit1.X2;fit2.X2
## Series: X2 
## ARIMA(3,0,1) with zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3579  -1.3747  0.6789  0.3597
## s.e.  0.0676   0.0617  0.0647  0.0927
## 
## sigma^2 = 0.8336:  log likelihood = -254.46
## AIC=518.92   AICc=519.25   BIC=535.18
## Series: X2 
## ARIMA(3,0,1) with zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3579  -1.3747  0.6789  0.3597
## s.e.  0.0676   0.0617  0.0647  0.0927
## 
## sigma^2 = 0.8336:  log likelihood = -254.46
## AIC=518.92   AICc=519.25   BIC=535.18

无论是用AIC准则还是BIC准则,auto.arima函数拟合出来的模型都是ARMA(3,1),因此选择该模型建模。

模型拟合

para.fit.X2 = stats::arima(X2,order=c(3,0,1), include.mean=F,method = "ML")  #用arima函数拟合模型
para.fit.X2
## 
## Call:
## stats::arima(x = X2, order = c(3, 0, 1), include.mean = F, method = "ML")
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3579  -1.3747  0.6789  0.3597
## s.e.  0.0676   0.0617  0.0647  0.0927
## 
## sigma^2 estimated as 0.8161:  log likelihood = -254.46,  aic = 518.92

参数估计得到的结果,写出模型表达式为: \(X_t=\frac{(1+0.3597B)\epsilon_t}{1-1.3579B+1.3747B^2-0.6789B^3},Var(\epsilon_t)=0.8161\)(这里的X即X2构成的时间序列)

模型检验

下对该模型进行残差诊断。

par(mfrow=c(1,2))
plot(para.fit.X2$residuals, type = "p", pch = "+")
qqnorm(para.fit.X2$residuals)

for (i in 1:2) {
  print(Box.test(para.fit.X2$residuals, lag = 6*i))
}
## 
##  Box-Pierce test
## 
## data:  para.fit.X2$residuals
## X-squared = 1.6363, df = 6, p-value = 0.9499
## 
## 
##  Box-Pierce test
## 
## data:  para.fit.X2$residuals
## X-squared = 7.7447, df = 12, p-value = 0.8048

对这个模型分别做 qq图(考虑正态性)与散点图(考虑随机性),发现这个模型的残差都非常接近白噪声。
由于各阶延迟下的LB统计量的P值都显著大于\(\alpha=0.05\),接受残差序列为白噪声序列的原假设,即该拟合模型在置信水平95%下显著有效。所以最后我们选择 ARMA(3,1)模型完成对利用何书元例题方法得到的且再做一阶差分后的随机项数据的建模。

参数显著性检验

para.fit.X2
## 
## Call:
## stats::arima(x = X2, order = c(3, 0, 1), include.mean = F, method = "ML")
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3579  -1.3747  0.6789  0.3597
## s.e.  0.0676   0.0617  0.0647  0.0927
## 
## sigma^2 estimated as 0.8161:  log likelihood = -254.46,  aic = 518.92
# ar1 系数的显著性检验
t1 = 1.3579/0.0676
# 利用 P 值函数 pt()计算P值,其中自由度df为时间序列样本数减去未知参数个数,本案例中为 191-4=187
p1 = 2*pt(t1, df=187, lower.tail = F)
# ar2 系数的显著性检验
t2 = -1.3747/0.0617  
p2 = 2*pt(t2, df=187, lower.tail = T)
# ar3 系数的显著性检验
t3 = 0.6789/0.0647
p3 = 2*pt(t3, df=187, lower.tail = F)
# ma1的显著性检验
m1 = 0.3597/0.0927
p4 = 2*pt(m1, df=187, lower.tail = F)
p_value = data.frame(p1, p2,p3,p4)
row.names(p_value) = "P 值"
p_value
##               p1           p2           p3           p4
## P 值 1.42983e-48 1.614887e-54 1.504592e-20 0.0001445008

检验结果显示,四个系数的P值都小于\(\alpha=0.05\),拒绝系数等于0的原假设,因此在置信水平 95% 下均显著非零。

残差分析

shapiro.test(para.fit.X2$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  para.fit.X2$residuals
## W = 0.98828, p-value = 0.1169
Box.test(para.fit.X2$residuals)
## 
##  Box-Pierce test
## 
## data:  para.fit.X2$residuals
## X-squared = 0.011236, df = 1, p-value = 0.9156

Shapiro-Wilk 正态性检验的 p 值为 0.1169,背景分布是正态的。
拟合残差密度直方图及正态 QQ 图:

par(mfrow=c(1,2))
hist(para.fit.X2$residuals,main="Residuals",col=4,ylim=c(0,0.6),xlab=expression(x-hat(x)),probability = T)
lines(density(para.fit.X2$residuals))
rug(para.fit.X2$residuals)
qqnorm(para.fit.X2$residuals)
qqline(para.fit.X2$residuals)

残差序列图及残差 acf 图:

plot(para.fit.X2$residuals,ylab="Residual",type="o",pch=16,main="残差时序图(模型2)")
abline(h=0,lty=12)

Acf(para.fit.X2$residuals,main="ACF of residuals(模型2)",lag.max = 60)

预测

library(forecast)
z2=forecast(para.fit.X2,h=10)
md_rand.lm.fit<-diffinv(z2$fitted)   #反差分回去推得最后十个随机项的拟合值

observe.lm.fit<-md_rand.lm.fit[(length(md)-9):length(md)]+md_trend.lm[(length(md)-9):length(md)]+md_seas.lm[(length(md)-9):length(md)] 
#对比
d2<-data.frame(matrix(c(observe.lm.fit,md[(length(md)-9):length(md)]),ncol=2))
colnames(d2)<-c("预测值","真实值")
cat("模型2预测的序列最后十个预测值和真实值如下","\n")
## 模型2预测的序列最后十个预测值和真实值如下
d2
##      预测值   真实值
## 1  18199.50 18209.77
## 2  18488.63 18499.10
## 3  18424.96 18434.89
## 4  18412.24 18423.64
## 5  18957.47 18969.17
## 6  19255.17 19266.87
## 7  19198.89 19210.91
## 8  19189.62 19201.68
## 9  19732.75 19745.65
## 10 20032.67 20046.58

对比序列最后十个预测值和真实值,发现数据相差不大(差距最多约在10-20之间),说明模型拟合的较好。

随机序列建模三

注:利用decompose得到季节项,然后序列减去季节项后用二次函数拟合得到趋势项,最后得到随机项,下对该不平稳随机项直接建模。

模型识别

X3<-md_rand
fit1.X3<-auto.arima(X3,ic="aic")
fit2.X3<-auto.arima(X3,ic="bic")
plot(X3,main="decompose方法得到不平稳随机项序列")

fit1.X3;fit2.X3
## Series: X3 
## ARIMA(4,0,1) with zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4     ma1
##       2.3819  -2.8057  2.1213  -0.7288  0.2625
## s.e.  0.0634   0.1181  0.1145   0.0603  0.0956
## 
## sigma^2 = 0.8269:  log likelihood = -256.79
## AIC=525.59   AICc=526.04   BIC=545.13
## Series: X3 
## ARIMA(4,0,1) with zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4     ma1
##       2.3819  -2.8057  2.1213  -0.7288  0.2625
## s.e.  0.0634   0.1181  0.1145   0.0603  0.0956
## 
## sigma^2 = 0.8269:  log likelihood = -256.79
## AIC=525.59   AICc=526.04   BIC=545.13

无论是用AIC准则还是BIC准则,auto.arima函数拟合出来的模型都是ARIMA(4,0,1),但此时序列并不平稳。而通过上面建模一可以发现,一次差分后的序列是平稳的,所以我在此选择ARIMA(3,1,1)模型建模。

模型拟合

para.fit.X3 = stats::arima(X3,order=c(3,1,1),include.mean = F, method = "ML")  #用arima函数拟合模型
para.fit.X3
## 
## Call:
## stats::arima(x = X3, order = c(3, 1, 1), include.mean = F, method = "ML")
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3751  -1.3889  0.6938  0.3102
## s.e.  0.0658   0.0603  0.0628  0.0897
## 
## sigma^2 estimated as 0.8414:  log likelihood = -257.33,  aic = 524.66

参数估计得到的结果,写出模型表达式为:
\((1-B)X_t=\frac{(1+0.3102B)\epsilon_t}{1-1.3751B+1.3889B^2-0.6938B^3},Var(\epsilon_t)=0.8414\)(这里的X即X3构成的时间序列)

模型检验

下对该模型进行残差诊断

par(mfrow=c(1,2))
plot(para.fit.X3$residuals, type = "p", pch = "+")
qqnorm(para.fit.X3$residuals)

for (i in 1:2) {
  print(Box.test(para.fit.X3$residuals, lag = 6*i))
}
## 
##  Box-Pierce test
## 
## data:  para.fit.X3$residuals
## X-squared = 0.38337, df = 6, p-value = 0.999
## 
## 
##  Box-Pierce test
## 
## data:  para.fit.X3$residuals
## X-squared = 8.9689, df = 12, p-value = 0.7056

对这个模型分别做 qq图(考虑正态性)与散点图(考虑随机性),发现这个模型的残差都非常接近白噪声。
由于各阶延迟下的LB统计量的P值都显著大于\(\alpha=0.05\),接受残差序列为白噪声序列的原假设,即该拟合模型在置信水平95%下显著有效。所以最后我们选择ARIMA(3,1,1)模型完成对decompose函数得到的随机项数据的建模。

参数显著性检验

para.fit.X3
## 
## Call:
## stats::arima(x = X3, order = c(3, 1, 1), include.mean = F, method = "ML")
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3751  -1.3889  0.6938  0.3102
## s.e.  0.0658   0.0603  0.0628  0.0897
## 
## sigma^2 estimated as 0.8414:  log likelihood = -257.33,  aic = 524.66
# ar1 系数的显著性检验
t1 = 1.3751/0.0658
# 利用 P 值函数 pt()计算P值,其中自由度df为时间序列样本数减去未知参数个数,本案例中为 191-4=187
p1 = 2*pt(t1, df=187, lower.tail = F)
# ar2 系数的显著性检验
t2 = -1.3889/0.0603  
p2 = 2*pt(t2, df=187, lower.tail = T)
# ar3 系数的显著性检验
t3 = 0.6938/0.0628
p3 = 2*pt(t3, df=187, lower.tail = F)
# ma1的显著性检验
m1 = 0.3102/0.0897
p4 = 2*pt(m1, df=187, lower.tail = F)
p_value = data.frame(p1, p2,p3,p4)
row.names(p_value) = "P 值"
p_value
##                p1           p2           p3           p4
## P 值 8.445218e-51 1.684701e-56 3.654305e-22 0.0006733279

检验结果显示,所有系数的P值都小于\(\alpha=0.05\),拒绝系数等于0的原假设,因此在置信水平95%下均显著非零。

残差分析

shapiro.test(para.fit.X3$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  para.fit.X3$residuals
## W = 0.9901, p-value = 0.2081
Box.test(para.fit.X3$residuals)
## 
##  Box-Pierce test
## 
## data:  para.fit.X3$residuals
## X-squared = 2.5256e-05, df = 1, p-value = 0.996

Shapiro-Wilk 正态性检验的 p 值为 0.2081,背景分布是正态的。
拟合残差密度直方图及正态 QQ 图:

par(mfrow=c(1,2))
hist(para.fit.X3$residuals,main="Residuals",col=4,ylim=c(0,0.6),xlab=expression(x-hat(x)),probability = T)
lines(density(para.fit.X3$residuals))
rug(para.fit.X3$residuals)
qqnorm(para.fit.X3$residuals)
qqline(para.fit.X3$residuals)

残差序列图及残差 acf 图:

plot(para.fit.X3$residuals,ylab="Residual",type="o",pch=16,main="残差时序图(模型3)")
abline(h=0,lty=12)

Acf(para.fit.X3$residuals,main="ACF of residuals(模型3)",lag.max = 60)

预测

library(forecast)
z3=forecast(para.fit.X3,h=10)
md_rand.fit.3<-z3$fitted  #最后十个随机项的拟合值

observe.fit.3<-md_rand.fit.3[(length(md)-9):length(md)]+md_trend[(length(md)-9):length(md)]+md_seas[(length(md)-9):length(md)] 
#对比
d3<-data.frame(matrix(c(observe.fit.3,md[(length(md)-9):length(md)]),ncol=2))
colnames(d3)<-c("预测值(模型3)","真实值")
cat("模型3预测的序列最后十个预测值和真实值如下","\n")
## 模型3预测的序列最后十个预测值和真实值如下
d3
##    预测值(模型3)   真实值
## 1       18209.78 18209.77
## 2       18498.75 18499.10
## 3       18435.50 18434.89
## 4       18422.07 18423.64
## 5       18968.77 18969.17
## 6       19266.73 19266.87
## 7       19210.67 19210.91
## 8       19201.48 19201.68
## 9       19744.75 19745.65
## 10      20045.38 20046.58

对比序列最后十个预测值和真实值,发现数据相差很少(差距最多约在2之内),说明模型拟合的较好。

随机序列建模四

注:利用何书元例题方法得到不平稳的随机项序列,下对该序列直接进行arima模型的建模和检验过程。

模型识别

X4<-md_rand.lm
plot(X4,main="例题方法得到不平稳随机项序列")

fit1.X4<-auto.arima(X4,ic="aic")
fit2.X4<-auto.arima(X4,ic="bic")
fit1.X4;fit2.X4
## Series: X4 
## ARIMA(3,1,1) 
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3579  -1.3747  0.6789  0.3597
## s.e.  0.0676   0.0617  0.0647  0.0927
## 
## sigma^2 = 0.8336:  log likelihood = -254.46
## AIC=518.92   AICc=519.25   BIC=535.18
## Series: X4 
## ARIMA(3,1,1) 
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3579  -1.3747  0.6789  0.3597
## s.e.  0.0676   0.0617  0.0647  0.0927
## 
## sigma^2 = 0.8336:  log likelihood = -254.46
## AIC=518.92   AICc=519.25   BIC=535.18

无论是用AIC准则还是BIC准则,auto.arima函数拟合出来的模型都是ARIMA(3,1,1),因此选择该模型建模。

模型拟合

para.fit.X4 = stats::arima(X4,order=c(3,1,1), include.mean=F,method = "ML")  #用arima函数拟合模型
para.fit.X4
## 
## Call:
## stats::arima(x = X4, order = c(3, 1, 1), include.mean = F, method = "ML")
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3579  -1.3747  0.6789  0.3597
## s.e.  0.0676   0.0617  0.0647  0.0927
## 
## sigma^2 estimated as 0.8161:  log likelihood = -254.46,  aic = 518.92

参数估计得到的结果,写出模型表达式为: \((1-B)X_t=\frac{(1+0.3597B)\epsilon_t}{1-1.3579B+1.3747B^2-0.6789B^3},Var(\epsilon_t)=0.8161\)(这里的X即X4构成的时间序列)

模型检验

下对该模型进行残差诊断。

par(mfrow=c(1,2))
plot(para.fit.X4$residuals, type = "p", pch = "+")
qqnorm(para.fit.X4$residuals) 

for (i in 1:2) {
  print(Box.test(para.fit.X4$residuals, lag = 6*i))
}
## 
##  Box-Pierce test
## 
## data:  para.fit.X4$residuals
## X-squared = 1.6462, df = 6, p-value = 0.9492
## 
## 
##  Box-Pierce test
## 
## data:  para.fit.X4$residuals
## X-squared = 7.7913, df = 12, p-value = 0.8012

对这个模型分别做 qq图(考虑正态性)与散点图(考虑随机性),发现这个模型的残差都非常接近白噪声。 由于各阶延迟下的LB统计量的P值都显著大于\(\alpha=0.05\),接受残差序列为白噪声序列的原假设,即该拟合模型在置信水平95%下显著有效。所以最后我们选择 ARIMA(3,1,1)模型完成对何书元例题方法得到的随机项数据的建模。

参数显著性检验

para.fit.X4
## 
## Call:
## stats::arima(x = X4, order = c(3, 1, 1), include.mean = F, method = "ML")
## 
## Coefficients:
##          ar1      ar2     ar3     ma1
##       1.3579  -1.3747  0.6789  0.3597
## s.e.  0.0676   0.0617  0.0647  0.0927
## 
## sigma^2 estimated as 0.8161:  log likelihood = -254.46,  aic = 518.92
# ar1 系数的显著性检验
t1 = 1.3579/0.0676
# 利用 P 值函数 pt()计算P值,其中自由度df为时间序列样本数减去未知参数个数,本案例中为 191-4=187
p1 = 2*pt(t1, df=187, lower.tail = F)
# ar2 系数的显著性检验
t2 = -1.3747/0.0617  
p2 = 2*pt(t2, df=187, lower.tail = T)
# ar3 系数的显著性检验
t3 = 0.6789/0.0647
p3 = 2*pt(t3, df=187, lower.tail = F)
# ma1的显著性检验
m1 = 0.3597/0.0927
p4 = 2*pt(m1, df=187, lower.tail = F)
p_value = data.frame(p1, p2,p3,p4)
row.names(p_value) = "P 值"
p_value 
##               p1           p2           p3           p4
## P 值 1.42983e-48 1.614887e-54 1.504592e-20 0.0001445008

检验结果显示,四个系数的P值都小于\(\alpha=0.05\),拒绝系数等于0的原假设,因此在置信水平 95% 下均显著非零。

残差分析

shapiro.test(para.fit.X4$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  para.fit.X4$residuals
## W = 0.98841, p-value = 0.1197
Box.test(para.fit.X2$residuals)
## 
##  Box-Pierce test
## 
## data:  para.fit.X2$residuals
## X-squared = 0.011236, df = 1, p-value = 0.9156

Shapiro-Wilk 正态性检验的 p 值为 0.1197,背景分布是正态的。
拟合残差密度直方图及正态 QQ 图:

par(mfrow=c(1,2))
hist(para.fit.X4$residuals,main="Residuals",col=4,ylim=c(0,0.6),xlab=expression(x-hat(x)),probability = T)
lines(density(para.fit.X4$residuals))
rug(para.fit.X4$residuals)
qqnorm(para.fit.X4$residuals)
qqline(para.fit.X4$residuals)

残差序列图及残差 acf 图:

plot(para.fit.X4$residuals,ylab="Residual",type="o",pch=16,main="残差时序图(模型4)")
abline(h=0,lty=12)

Acf(para.fit.X4$residuals,main="ACF of residuals(模型4)",lag.max = 60)

预测

library(forecast)
z4=forecast(para.fit.X4,h=10)
md_rand.lm.fit.4<-z4$fitted   #最后十个随机项的拟合值

observe.lm.fit.4<-md_rand.lm.fit.4[(length(md)-9):length(md)]+md_trend.lm[(length(md)-9):length(md)]+md_seas.lm[(length(md)-9):length(md)] 
#对比
d4<-data.frame(matrix(c(observe.lm.fit.4,md[(length(md)-9):length(md)]),ncol=2))
colnames(d4)<-c("预测值","真实值")
cat("模型4预测的序列最后十个预测值和真实值如下","\n")
## 模型4预测的序列最后十个预测值和真实值如下
d4
##      预测值   真实值
## 1  18209.83 18209.77
## 2  18498.91 18499.10
## 3  18435.43 18434.89
## 4  18422.17 18423.64
## 5  18968.88 18969.17
## 6  19266.87 19266.87
## 7  19210.59 19210.91
## 8  19201.64 19201.68
## 9  19744.81 19745.65
## 10 20045.57 20046.58

对比序列最后十个预测值和真实值,发现数据相差很少(差距最多约在2之内),说明模型拟合的较好。

总结

综上,我对去除掉前十个数据后的192个时间序列数据采取两种方式进行时间序列分解后,进行arima模型的直接或间接拟合,并经过模型检验和参数显著性检验、残差分析等过程验证了模型成立的显著性,还利用模型对最后十个数据进行预测,与真实值进行了比较。现将结果汇总如下:
1.分解方式一:
(1)分解方法:先用decompose函数分解192个时间序列数据得到序列的季节项,用原序列减去季节项后,再用二次函数曲线拟合不包含季节项序列的趋势项,然后减去得到的趋势项序列,得到随机项序列Y1。
(2)拟合得到的模型:此处用Y指代上文的Y1,模型为ARIMA(3,1,1),参数估计得到的模型表达式为:\((1-1.3751B+1.3889B^2-0.6938B^3)(1-B)Y_t=(1+0.3102B)\epsilon_t,Var(\epsilon_t)=0.8414\)(这里的\((1-B)Y_t\)序列对应于上面代码模块里X1序列,也即模型1的模型拟合下面写的表达式中的\(X_t\);这里的\(Y_t\)序列对应于上面代码模块里X3序列,也即模型3的模型拟合下面写的表达式中的\(X_t\))。该模型通过了模型检验和参数显著性检验、残差分析等相关检验。
(3)拟合得到的模型进行预测,发现预测数据和真实数据相差很小,也说明模型拟合程度较好。
2.分解方式二:
(1)分解方法:先用三次函数曲线拟合含有192个数据的时间序列的趋势项,用原序列减去趋势项后,再按照季节取平均得到四个季节项,然后减去得到的季节项序列,得到随机项序列Y2。
(2)拟合得到的模型:此处用Y指代上文的Y2,模型为ARIMA(3,1,1),参数估计得到的模型表达式为:\((1-1.3579B+1.3747B^2-0.6789B^3)(1-B)Y_t=(1+0.3597B)\epsilon_t,Var(\epsilon_t)=0.8161\)(这里的\((1-B)Y_t\)序列对应于上面代码模块里X2序列,也即模型2的模型拟合下面写的表达式中的\(X_t\);这里的\(Y_t\)序列对应于上面代码模块里X4序列,也即模型4的模型拟合下面写的表达式中的\(X_t\))。该模型通过了模型检验和参数显著性检验、残差分析等相关检验。
(3)拟合得到的模型进行预测,发现预测数据和真实数据相差很小,也说明模型拟合程度较好。
3.比较上面四个模型的最后十个数据的预测值和真实值,发现第一、三个模型拟合程度更好一些,说明第一、三个模型的参数估计值可能更接近模型的真实参数值。
4.在分解方式相同情况下,无论是直接用ARIMA模型拟合随机项序列,还是先对随机项序列进行差分再用ARMA模型拟合差分后的序列然后反推回随机项序列的模型,得到的参数估计值都是相同的,但预测时用ARIMA模型直接拟合的参数估计值作为forecast函数参数得到的预测值和真实值相差更小(几乎一致),主要问题在于在前两个模型反推回预测的随机项序列时此时第一项为0,然后按差值递推后续所有项,但实际上第一项真实值不为0,故和原随机序列差别较大。因此综合四个模型,在进行建模和预测时还是选择使用模型3最佳。