(1)去掉前10个观测值之后,阐述时间序列的特点,分解出趋势和周期
(2)分解时间序列,对剩余部分进行ARIMA建模或ARMA建模,预测最后10个观测值,并与真实值进行对比。
(3)报告形式可以为word,pdf,html,报告中应包括:数据预处理、模型识别、参数估计与检验、残差分析、预测等等,需要包含代码以及一定的图文说明.
data<-read.csv("data2.csv",header=TRUE,sep=",",fileEncoding = "GBK",na.strings = "")
mydata<-data[data$姓名=="李馨雨*"&data$学号=="PB19010425",]
mydata<-na.omit(mydata) #删去NA值
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
mydata<-mydata[-(1:12)] #删去前面的姓名学号和前十个数据
md1<-as.double(as.matrix(mydata[1,])) #读取数据后发现大部分数据都有两位小数后的截尾(即第三位小数及之后均为0),对整体数据分析影响不大
md<-ts(md1) #转为时间序列数据
plot(md,main="原始序列时序图",ylab="")
acf(md);pacf(md)
tseries::adf.test(md)
##
## Augmented Dickey-Fuller Test
##
## data: md
## Dickey-Fuller = 0.91189, Lag order = 5, p-value = 0.99
## alternative hypothesis: stationary
plot(md[1:25],type="l",main="原始序列部分时序图",ylab="",axes = F)
axis(2)
axis(1,at=seq(1:25),labels=seq(1:25),las=1)
从时序图可以看出,该时间序列有明显的上升趋势和周期性(呈波动式上升),是非平稳时间序列,且单位根检验和acf图、pacf图也说明了这一点。并且由时间序列间断图可以看出两个峰值之间的差值为4,故周期可以估计为4。
library(forecast)
#decompose方法
md<-ts(md,frequency = 4)
d1<-decompose(md)
plot(d1)
md_seas<-d1$seasonal #decompose分解得到季节项
cat("用decompose函数分解得到序列季节项为:","\n")
## 用decompose函数分解得到序列季节项为:
md_seas[1:4]
## [1] -50.05591 -250.07171 100.08398 200.04364
#二次函数拟合趋势项
md_rand_trend<-md-md_seas #原序列减去季节项得到趋势项和随机项
t <- seq(length(md_rand_trend))
lmr <- lm(md_rand_trend ~ t+I(t^2))
md_trend <- ts(predict(lmr,newdata=list(t)),frequency=4,start=start(md_rand_trend))
cat("\n","按照二次函数回归拟合得到趋势项为:","\n")
##
## 按照二次函数回归拟合得到趋势项为:
md_trend
## Qtr1 Qtr2 Qtr3 Qtr4
## 1 47.58100 59.08781 71.56377 85.00888
## 2 99.42314 114.80655 131.15912 148.48084
## 3 166.77172 186.03175 206.26093 227.45926
## 4 249.62675 272.76338 296.86918 321.94412
## 5 347.98822 375.00147 402.98387 431.93543
## 6 461.85613 492.74600 524.60501 557.43318
## 7 591.23050 625.99697 661.73260 698.43737
## 8 736.11131 774.75439 814.36663 854.94802
## 9 896.49856 939.01826 982.50710 1026.96511
## 10 1072.39226 1118.78857 1166.15403 1214.48864
## 11 1263.79241 1314.06532 1365.30740 1417.51862
## 12 1470.69900 1524.84853 1579.96721 1636.05504
## 13 1693.11203 1751.13817 1810.13347 1870.09792
## 14 1931.03152 1992.93427 2055.80617 2119.64723
## 15 2184.45744 2250.23681 2316.98532 2384.70299
## 16 2453.38982 2523.04579 2593.67092 2665.26520
## 17 2737.82863 2811.36122 2885.86296 2961.33385
## 18 3037.77390 3115.18310 3193.56145 3272.90895
## 19 3353.22561 3434.51142 3516.76638 3599.99050
## 20 3684.18376 3769.34619 3855.47776 3942.57849
## 21 4030.64837 4119.68740 4209.69558 4300.67292
## 22 4392.61941 4485.53506 4579.41985 4674.27380
## 23 4770.09690 4866.88916 4964.65057 5063.38113
## 24 5163.08084 5263.74971 5365.38773 5467.99490
## 25 5571.57123 5676.11670 5781.63133 5888.11512
## 26 5995.56805 6103.99014 6213.38139 6323.74178
## 27 6435.07133 6547.37003 6660.63788 6774.87489
## 28 6890.08105 7006.25636 7123.40083 7241.51444
## 29 7360.59722 7480.64914 7601.67022 7723.66044
## 30 7846.61983 7970.54836 8095.44605 8221.31289
## 31 8348.14888 8475.95403 8604.72833 8734.47178
## 32 8865.18439 8996.86614 9129.51705 9263.13712
## 33 9397.72633 9533.28470 9669.81223 9807.30890
## 34 9945.77473 10085.20971 10225.61384 10366.98713
## 35 10509.32957 10652.64116 10796.92190 10942.17180
## 36 11088.39085 11235.57906 11383.73641 11532.86292
## 37 11682.95858 11834.02340 11986.05736 12139.06048
## 38 12293.03276 12447.97418 12603.88476 12760.76449
## 39 12918.61338 13077.43142 13237.21861 13397.97495
## 40 13559.70045 13722.39510 13886.05890 14050.69185
## 41 14216.29396 14382.86522 14550.40563 14718.91520
## 42 14888.39392 15058.84179 15230.25881 15402.64499
## 43 15576.00032 15750.32480 15925.61844 16101.88123
## 44 16279.11317 16457.31427 16636.48451 16816.62391
## 45 16997.73247 17179.81017 17362.85703 17546.87304
## 46 17731.85821 17917.81252 18104.73599 18292.62862
## 47 18481.49039 18671.32132 18862.12140 19053.89064
## 48 19246.62902 19440.33656 19635.01326 19830.65910
#随机项
md_rand<-ts(md_rand_trend-md_trend)
plot(md_rand,main="随机项by decompose和二次函数拟合")
cat("\n","按照二次函数回归拟合得到随机项为:","\n")
##
## 按照二次函数回归拟合得到随机项为:
md_rand
## Time Series:
## Start = 1
## End = 192
## Frequency = 1
## [1] 4.81835312 5.46750589 5.30585166 3.27918335 0.22414200
## [6] -0.22304063 2.42759375 4.14331405 2.85249131 0.23848728
## [11] -0.74810973 -0.14480083 1.50946503 3.77441961 5.56434120
## [16] 4.92493871 2.42549318 1.35514636 2.66874656 4.07163267
## [21] 2.83337575 -0.45978246 -2.65459366 -1.03761894 2.98951273
## [26] 5.06504313 4.39202054 5.51808386 8.81090414 11.07552315
## [31] 9.84118916 9.03734109 10.67154998 12.10855758 8.84991220
## [36] 3.95225274 1.72965023 1.92854644 1.11598966 -1.79528119
## [41] -6.02049510 -9.78361028 -12.86637845 -15.51626070 -18.26108600
## [46] -21.29681258 -23.26419215 -21.98668579 -17.62912248 -13.16346045
## [51] -12.09545142 -14.39355646 -14.79860454 -11.31755391 -7.26815627
## [56] -4.99087270 -4.05353218 -3.07309294 -1.27230670 0.25236547
## [61] -0.93490540 -4.76907756 -6.23090270 -2.58884193 2.67627580
## [66] 4.93249225 3.38105571 0.56550509 0.24901143 0.68561648
## [71] -0.87243146 -2.35059347 -2.12669853 -1.37670487 -2.32536420
## [76] -4.87113761 -6.11785406 -4.66447180 -2.94374252 -1.95912732
## [81] -1.43645517 -0.65068430 0.39743358 2.10943738 3.10549814
## [86] 2.94065761 4.01816410 5.94155651 6.27300587 3.70255395
## [91] 0.19744904 -0.92976995 1.59606802 5.52800471 5.55428841
## [96] 2.75345802 0.54168460 0.33700989 1.73368219 2.38224041
## [101] 0.79185559 -3.14743051 -5.66736960 -4.36642277 -2.12841899
## [106] -0.66931649 -0.19486698 -0.05653154 0.50186085 1.67235196
## [111] 2.43019007 1.10291411 -2.33530489 -5.44842518 -4.52019846
## [116] 0.96491419 5.01608379 3.79335211 1.06896744 1.10146869
## [121] 3.47502689 4.53468382 2.41668775 1.09657761 2.58052442
## [126] 5.82756995 8.81996249 9.72324095 9.75757636 11.17401050
## [131] 13.10479164 13.12745871 10.25318273 8.72400547 8.97217522
## [136] 8.05923089 6.46634352 5.75055486 6.45411322 8.43455750
## [141] 10.85505873 13.93265868 13.87960564 10.11343852 7.25732836
## [146] 9.25831692 14.36865248 15.70587397 10.47315241 3.85752957
## [151] 2.82125374 7.64186384 9.67253089 3.75029665 -5.23259057
## [156] -8.48859187 -3.22453622 1.99661815 -2.13288046 -11.14549316
## [161] -14.56804890 -10.08350592 -6.15961594 -9.56884003 -17.08800716
## [166] -19.44007558 -14.70279699 -11.09863247 -14.97441100 -21.79309081
## [171] -22.70242362 -16.42487050 -10.77726042 -12.41255163 -17.81849583
## [176] -17.65755410 -10.92655542 -3.92845802 -1.91101361 -3.94668328
## [181] -4.53229600 -0.37080999 4.95002302 6.42774196 3.45551785
## [186] 2.39039246 6.96461408 12.93572162 14.33688612 11.41514933
## [191] 10.55275956 15.87330570
acf(md_rand,lag.max=40)
adf.test(md_rand)
##
## Augmented Dickey-Fuller Test
##
## data: md_rand
## Dickey-Fuller = -2.9823, Lag order = 5, p-value = 0.1651
## alternative hypothesis: stationary
auto.arima(md_rand)
## Series: md_rand
## ARIMA(4,0,1) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1
## 2.3819 -2.8057 2.1213 -0.7288 0.2625
## s.e. 0.0634 0.1181 0.1145 0.0603 0.0956
##
## sigma^2 = 0.8269: log likelihood = -256.79
## AIC=525.59 AICc=526.04 BIC=545.13
分解后的随机项由acf图和adf.test检验可以看出其依旧不平稳,后面考虑差分随机项序列得到平稳序列,从而建立arima模型。
library(forecast)
#何书元例题方法
md<-ts(md,frequency = 4)
#三次函数拟合趋势项
t <- seq(length(md))
lmr <- lm(md ~ t+I(t^2)+I(t^3))
md_trend.lm <- ts(predict(lmr,newdata=list(t)),frequency=4,start=start(md))
cat("按照三次函数回归拟合得到趋势项为:","\n")
## 按照三次函数回归拟合得到趋势项为:
md_trend.lm
## Qtr1 Qtr2 Qtr3 Qtr4
## 1 38.92535 50.75396 63.54440 77.29675
## 2 92.01109 107.68750 124.32605 141.92682
## 3 160.48989 180.01534 200.50324 221.95367
## 4 244.36671 267.74244 292.08094 317.38227
## 5 343.64653 370.87379 399.06412 428.21760
## 6 458.33432 489.41434 521.45775 554.46463
## 7 588.43504 623.36908 659.26681 696.12832
## 8 733.95368 772.74297 812.49626 853.21365
## 9 894.89519 937.54098 981.15108 1025.72558
## 10 1071.26456 1117.76808 1165.23624 1213.66910
## 11 1263.06674 1313.42925 1364.75670 1417.04916
## 12 1470.30672 1524.52945 1579.71744 1635.87075
## 13 1692.98946 1751.07366 1810.12342 1870.13882
## 14 1931.11994 1993.06685 2055.97963 2119.85836
## 15 2184.70312 2250.51398 2317.29103 2385.03433
## 16 2453.74398 2523.42004 2594.06259 2665.67171
## 17 2738.24748 2811.78998 2886.29928 2961.77546
## 18 3038.21860 3115.62878 3194.00608 3273.35056
## 19 3353.66232 3434.94142 3517.18795 3600.40198
## 20 3684.58359 3769.73286 3855.84986 3942.93468
## 21 4030.98739 4120.00807 4209.99679 4300.95364
## 22 4392.87869 4485.77202 4579.63371 4674.46383
## 23 4770.26246 4867.02969 4964.76558 5063.47022
## 24 5163.14368 5263.78604 5365.39738 5467.97778
## 25 5571.52731 5676.04606 5781.53409 5887.99149
## 26 5995.41833 6103.81469 6213.18066 6323.51630
## 27 6434.82170 6547.09693 6660.34207 6774.55720
## 28 6889.74240 7005.89774 7123.02330 7241.11916
## 29 7360.18540 7480.22209 7601.22931 7723.20714
## 30 7846.15566 7970.07495 8094.96507 8220.82612
## 31 8347.65816 8475.46129 8604.23556 8733.98106
## 32 8864.69788 8996.38608 9129.04574 9262.67695
## 33 9397.27977 9532.85430 9669.40059 9806.91874
## 34 9945.40882 10084.87091 10225.30508 10366.71141
## 35 10509.08999 10652.44088 10796.76417 10942.05994
## 36 11088.32825 11235.56920 11383.78285 11532.96928
## 37 11683.12858 11834.26082 11986.36607 12139.44442
## 38 12293.49594 12448.52072 12604.51882 12761.49032
## 39 12919.43531 13078.35386 13238.24605 13399.11196
## 40 13560.95166 13723.76523 13887.55275 14052.31430
## 41 14218.04996 14384.75979 14552.44389 14721.10232
## 42 14890.73517 15061.34251 15232.92442 15405.48098
## 43 15579.01227 15753.51836 15928.99934 16105.45527
## 44 16282.88624 16461.29232 16640.67359 16821.03014
## 45 17002.36203 17184.66935 17367.95217 17552.21057
## 46 17737.44463 17923.65443 18110.84003 18299.00153
## 47 18488.13900 18678.25252 18869.34216 19061.40800
## 48 19254.45012 19448.46860 19643.46351 19839.43493
#季节项
md.detrended.lm<-md-md_trend.lm
get.season <- function(yd,n_NA){ # input: Detrended series,n_NA为缺失值个数
ymat <- matrix(c(yd,rep(NA,n_NA)), byrow=TRUE, ncol=4)
#print(ymat) #检查
## season
seas0 <- apply(ymat, 2, mean, na.rm=TRUE)
return(seas0)
}
md_seas.lm <- get.season(md.detrended.lm,n_NA=0) #季节项 4个
#随机项
md_seas.lm <- rep(md_seas.lm,length(md)/length(md_seas.lm))
md_seas.lm<- ts(md_seas.lm, start=start(md), frequency=4)
md_rand.lm<-ts(md.detrended.lm-md_seas.lm,frequency = 1)
plot(md_rand.lm,main="书上例题方法得到随机项序列")
#单位根检验
adf.test(md_rand.lm)
##
## Augmented Dickey-Fuller Test
##
## data: md_rand.lm
## Dickey-Fuller = -2.9727, Lag order = 5, p-value = 0.1691
## alternative hypothesis: stationary
acf(md_rand.lm,lag.max=40);pacf(md_rand.lm)
采用何书元例题方法,用三次曲线拟合趋势项,再用原序列减去趋势项,利用季度平均作为季节项,从而得到随机项。但观察acf和pacf图及单位根检验可知此时随机项序列并不平稳,可能是ARIMA模型。
对剩余部分进行ARIMA建模或ARMA建模,预测最后10个观测值,并与真实值进行对比。
注:利用decompose得到季节项,然后序列减去季节项后用二次函数拟合得到趋势项,最后得到随机项,下对该不平稳随机项进行处理,尝试建模。
#对md_rand进行一阶差分
md_rand_diff<-diff(md_rand)
plot(md_rand_diff,main="decompose方法一次差分后的随机序列")
#画acf图,进行平稳性检验和白噪声检验
acf(md_rand_diff,lag.max=60);pacf(md_rand_diff)
adf.test(md_rand_diff)
##
## Augmented Dickey-Fuller Test
##
## data: md_rand_diff
## Dickey-Fuller = -3.971, Lag order = 5, p-value = 0.01193
## alternative hypothesis: stationary
for (i in 1:2) print(Box.test(md_rand_diff, lag=6*i, type="Ljung-Box"))
##
## Box-Ljung test
##
## data: md_rand_diff
## X-squared = 247.68, df = 6, p-value < 2.2e-16
##
##
## Box-Ljung test
##
## data: md_rand_diff
## X-squared = 402.31, df = 12, p-value < 2.2e-16
#模型识别
auto.arima(md_rand_diff)
## Series: md_rand_diff
## ARIMA(3,0,1) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3751 -1.3889 0.6938 0.3102
## s.e. 0.0658 0.0603 0.0628 0.0897
##
## sigma^2 = 0.8594: log likelihood = -257.33
## AIC=524.66 AICc=524.99 BIC=540.92
观察到一阶差分后的数据md_rand_diff的acf图振荡衰减,pacf图三阶截尾,猜测序列为ARMA模型,且p=3。作单位根检验说明序列平稳,作白噪声检验说明序列不为白噪声,做auto.arima拟合是ARMA模型,为ARMA(3,1).
X1<-md_rand_diff
plot(X1,main="decompose方法得到平稳差分随机项序列")
fit1.X1<-auto.arima(X1,ic="aic")
fit2.X1<-auto.arima(X1,ic="bic")
fit1.X1;fit2.X1
## Series: X1
## ARIMA(3,0,1) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3751 -1.3889 0.6938 0.3102
## s.e. 0.0658 0.0603 0.0628 0.0897
##
## sigma^2 = 0.8594: log likelihood = -257.33
## AIC=524.66 AICc=524.99 BIC=540.92
## Series: X1
## ARIMA(3,0,1) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3751 -1.3889 0.6938 0.3102
## s.e. 0.0658 0.0603 0.0628 0.0897
##
## sigma^2 = 0.8594: log likelihood = -257.33
## AIC=524.66 AICc=524.99 BIC=540.92
无论是用AIC准则还是BIC准则,auto.arima函数拟合出来的模型都是ARMA(3,1),因此选择该模型建模。
para.fit.X1 = stats::arima(X1,order=c(3,0,1), method = "ML") #用arima函数拟合模型
para.fit.X1
##
## Call:
## stats::arima(x = X1, order = c(3, 0, 1), method = "ML")
##
## Coefficients:
## ar1 ar2 ar3 ma1 intercept
## 1.3744 -1.3883 0.6931 0.3106 0.0864
## s.e. 0.0658 0.0603 0.0628 0.0897 0.2679
##
## sigma^2 estimated as 0.8409: log likelihood = -257.28, aic = 526.56
参数估计得到的结果,写出模型表达式为: \(X_t-0.0864=\frac{(1+0.3106B)\epsilon_t}{1-1.3744B+1.3883B^2-0.6931B^3},Var(\epsilon_t)=0.8409\)(这里的X即X1构成的时间序列)
下对该模型进行残差诊断。
par(mfrow=c(1,2))
plot(para.fit.X1$residuals, type = "p", pch = "+")
qqnorm(para.fit.X1$residuals)
for (i in 1:2) {
print(Box.test(para.fit.X1$residuals, lag = 6*i))
}
##
## Box-Pierce test
##
## data: para.fit.X1$residuals
## X-squared = 0.38348, df = 6, p-value = 0.999
##
##
## Box-Pierce test
##
## data: para.fit.X1$residuals
## X-squared = 8.9455, df = 12, p-value = 0.7076
对这个模型分别做
qq图(考虑正态性)与散点图(考虑随机性),发现这个模型的残差都非常接近白噪声。
由于各阶延迟下的LB统计量的P值都显著大于\(\alpha=0.05\),接受残差序列为白噪声序列的原假设,即该拟合模型在置信水平95%下显著有效。所以最后我们选择
ARMA(3,1)模型完成数据的建模。
para.fit.X1
##
## Call:
## stats::arima(x = X1, order = c(3, 0, 1), method = "ML")
##
## Coefficients:
## ar1 ar2 ar3 ma1 intercept
## 1.3744 -1.3883 0.6931 0.3106 0.0864
## s.e. 0.0658 0.0603 0.0628 0.0897 0.2679
##
## sigma^2 estimated as 0.8409: log likelihood = -257.28, aic = 526.56
# ar1 系数的显著性检验
t1 = 1.3744/0.0658
# 利用 P 值函数 pt()计算P值,其中自由度df为时间序列样本数减去未知参数个数,本案例中为 191-5=186
p1 = 2*pt(t1, df=186, lower.tail = F)
# ar2 系数的显著性检验
t2 = -1.3883/0.0603
p2 = 2*pt(t2, df=186, lower.tail = T)
# ar3 系数的显著性检验
t3 = 0.6931/0.0628
p3 = 2*pt(t3, df=186, lower.tail = F)
# ma1的显著性检验
m1 = 0.3106/0.0897
p4 = 2*pt(m1, df=186, lower.tail = F)
# 常数的显著性检验
t0 = 0.0864/0.2679
p0 = 2*pt(t0, df=186, lower.tail = F)
p_value = data.frame(p1, p2,p3,p4, p0)
row.names(p_value) = "P 值"
p_value
## p1 p2 p3 p4 p0
## P 值 1.164528e-50 2.426785e-56 4.161385e-22 0.0006636692 0.7474298
检验结果显示,前四个系数的P值都小于\(\alpha=0.05\),拒绝系数等于0的原假设,因此在置信水平95%下均显著非零。但常数项系数不通过等于0的假设检验,故我们现在考虑重新拟合该序列,并且将常数项固定为0.(事实上前面通过auto.arima函数来进行模型拟合,给出的结果是均值为0,即拟合时不含均值项,因此在应用例题方法时,若auto.arima识别时给出结果依旧是均值为0,则可以直接在拟合时令均值为0)
para.fit.X1 = stats::arima(X1,order=c(3,0,1),include.mean = F, method = "ML") #用arima函数拟合模型
para.fit.X1
##
## Call:
## stats::arima(x = X1, order = c(3, 0, 1), include.mean = F, method = "ML")
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3751 -1.3889 0.6938 0.3102
## s.e. 0.0658 0.0603 0.0628 0.0897
##
## sigma^2 estimated as 0.8414: log likelihood = -257.33, aic = 524.66
参数估计得到的结果,写出模型表达式为: \(X_t=\frac{(1+0.3102B)\epsilon_t}{1-1.3751B+1.3889B^2-0.6938B^3},Var(\epsilon_t)=0.8414\)(这里的X即X1构成的时间序列)
下对该模型进行残差诊断。
par(mfrow=c(1,2))
plot(para.fit.X1$residuals, type = "p", pch = "+")
qqnorm(para.fit.X1$residuals)
for (i in 1:2) {
print(Box.test(para.fit.X1$residuals, lag = 6*i))
}
##
## Box-Pierce test
##
## data: para.fit.X1$residuals
## X-squared = 0.38298, df = 6, p-value = 0.999
##
##
## Box-Pierce test
##
## data: para.fit.X1$residuals
## X-squared = 8.9344, df = 12, p-value = 0.7085
对这个模型分别做
qq图(考虑正态性)与散点图(考虑随机性),发现这个模型的残差都非常接近白噪声。
由于各阶延迟下的LB统计量的P值都显著大于\(\alpha=0.05\),接受残差序列为白噪声序列的原假设,即该拟合模型在置信水平95%下显著有效。所以最后我们选择ARMA(3,1)模型完成对利用decompose函数得到的且再做一阶差分后的随机项数据的建模。
para.fit.X1
##
## Call:
## stats::arima(x = X1, order = c(3, 0, 1), include.mean = F, method = "ML")
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3751 -1.3889 0.6938 0.3102
## s.e. 0.0658 0.0603 0.0628 0.0897
##
## sigma^2 estimated as 0.8414: log likelihood = -257.33, aic = 524.66
# ar1 系数的显著性检验
t1 = 1.3751/0.0658
# 利用 P 值函数 pt()计算P值,其中自由度df为时间序列样本数减去未知参数个数,本案例中为 191-4=187
p1 = 2*pt(t1, df=187, lower.tail = F)
# ar2 系数的显著性检验
t2 = -1.3889/0.0603
p2 = 2*pt(t2, df=187, lower.tail = T)
# ar3 系数的显著性检验
t3 = 0.6938/0.0628
p3 = 2*pt(t3, df=187, lower.tail = F)
# ma1的显著性检验
m1 = 0.3102/0.0897
p4 = 2*pt(m1, df=187, lower.tail = F)
p_value = data.frame(p1, p2,p3,p4)
row.names(p_value) = "P 值"
p_value
## p1 p2 p3 p4
## P 值 8.445218e-51 1.684701e-56 3.654305e-22 0.0006733279
检验结果显示,所有系数的P值都小于\(\alpha=0.05\),拒绝系数等于0的原假设,因此在置信水平95%下均显著非零。
shapiro.test(para.fit.X1$residuals)
##
## Shapiro-Wilk normality test
##
## data: para.fit.X1$residuals
## W = 0.99002, p-value = 0.2058
Box.test(para.fit.X1$residuals)
##
## Box-Pierce test
##
## data: para.fit.X1$residuals
## X-squared = 2.8496e-05, df = 1, p-value = 0.9957
Shapiro-Wilk 正态性检验的 p 值为 0.2058,背景分布是正态的。
拟合残差密度直方图及正态 QQ 图:
par(mfrow=c(1,2))
hist(para.fit.X1$residuals,main="Residuals",col=4,ylim=c(0,0.6),xlab=expression(x-hat(x)),probability = T)
lines(density(para.fit.X1$residuals))
rug(para.fit.X1$residuals)
qqnorm(para.fit.X1$residuals)
qqline(para.fit.X1$residuals)
残差序列图及残差 acf 图:
plot(para.fit.X1$residuals,ylab="Residual",type="o",pch=16,main="残差时序图(模型1)")
abline(h=0,lty=12)
Acf(para.fit.X1$residuals,main="ACF of residuals(模型1)",lag.max = 60)
library(forecast)
z1=forecast(para.fit.X1,h=10)
md_rand.fit<-diffinv(z1$fitted) #反差分回去推得最后十个随机项的拟合值
observe.fit<-md_rand.fit[(length(md)-9):length(md)]+md_trend[(length(md)-9):length(md)]+md_seas[(length(md)-9):length(md)]
#对比
d1<-data.frame(matrix(c(observe.fit,md[(length(md)-9):length(md)]),ncol=2))
colnames(d1)<-c("预测值(模型1)","真实值")
cat("模型1预测的序列最后十个预测值和真实值如下","\n")
## 模型1预测的序列最后十个预测值和真实值如下
d1
## 预测值(模型1) 真实值
## 1 18204.33 18209.77
## 2 18493.31 18499.10
## 3 18429.71 18434.89
## 4 18416.89 18423.64
## 5 18962.02 18969.17
## 6 19259.58 19266.87
## 7 19203.38 19210.91
## 8 19193.95 19201.68
## 9 19737.03 19745.65
## 10 20036.75 20046.58
对比序列最后十个预测值和真实值,发现数据相差不大(差距最多约在10之内),说明模型拟合的较好。
注:利用何书元例题方法得到不平稳的随机项序列,下对该序列进行差分处理,然后对处理后的平稳序列进行ARMA模型的建模和检验过程。
ndiffs(md_rand.lm)
## [1] 1
auto.arima(md_rand.lm)
## Series: md_rand.lm
## ARIMA(3,1,1)
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3579 -1.3747 0.6789 0.3597
## s.e. 0.0676 0.0617 0.0647 0.0927
##
## sigma^2 = 0.8336: log likelihood = -254.46
## AIC=518.92 AICc=519.25 BIC=535.18
用ndiff函数发现得到的随机项数据还需要差分一次,用auto.arima函数拟合也说明了这一点,下面考虑一阶差分后的数据,检验其是否为平稳序列。
#差分运算
md_rand.diff<-diff(md_rand.lm)
plot(md_rand.diff,main="例题方法差分后的随机序列")
#acf图和平稳性检验,白噪声检验
acf(md_rand.diff,lag.max=60);pacf(md_rand.diff)
adf.test(md_rand.diff)
## Warning in adf.test(md_rand.diff): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: md_rand.diff
## Dickey-Fuller = -4.1013, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
for (i in 1:2) print(Box.test(md_rand.diff, lag=6*i, type="Ljung-Box"))
##
## Box-Ljung test
##
## data: md_rand.diff
## X-squared = 248.85, df = 6, p-value < 2.2e-16
##
##
## Box-Ljung test
##
## data: md_rand.diff
## X-squared = 404.36, df = 12, p-value < 2.2e-16
#模型识别
auto.arima(md_rand.diff)
## Series: md_rand.diff
## ARIMA(3,0,1) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3579 -1.3747 0.6789 0.3597
## s.e. 0.0676 0.0617 0.0647 0.0927
##
## sigma^2 = 0.8336: log likelihood = -254.46
## AIC=518.92 AICc=519.25 BIC=535.18
观察到一阶差分后的数据md_rand.diff的acf图振荡衰减,pacf图三阶截尾,猜测序列为ARMA模型,且p=3。作单位根检验说明序列平稳,作白噪声检验说明序列不为白噪声,做auto.arima拟合是ARMA模型,为ARMA(3,1).
X2<-md_rand.diff
plot(X2,main="例题方法得到平稳差分随机项序列")
fit1.X2<-auto.arima(X2,ic="aic")
fit2.X2<-auto.arima(X2,ic="bic")
fit1.X2;fit2.X2
## Series: X2
## ARIMA(3,0,1) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3579 -1.3747 0.6789 0.3597
## s.e. 0.0676 0.0617 0.0647 0.0927
##
## sigma^2 = 0.8336: log likelihood = -254.46
## AIC=518.92 AICc=519.25 BIC=535.18
## Series: X2
## ARIMA(3,0,1) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3579 -1.3747 0.6789 0.3597
## s.e. 0.0676 0.0617 0.0647 0.0927
##
## sigma^2 = 0.8336: log likelihood = -254.46
## AIC=518.92 AICc=519.25 BIC=535.18
无论是用AIC准则还是BIC准则,auto.arima函数拟合出来的模型都是ARMA(3,1),因此选择该模型建模。
para.fit.X2 = stats::arima(X2,order=c(3,0,1), include.mean=F,method = "ML") #用arima函数拟合模型
para.fit.X2
##
## Call:
## stats::arima(x = X2, order = c(3, 0, 1), include.mean = F, method = "ML")
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3579 -1.3747 0.6789 0.3597
## s.e. 0.0676 0.0617 0.0647 0.0927
##
## sigma^2 estimated as 0.8161: log likelihood = -254.46, aic = 518.92
参数估计得到的结果,写出模型表达式为: \(X_t=\frac{(1+0.3597B)\epsilon_t}{1-1.3579B+1.3747B^2-0.6789B^3},Var(\epsilon_t)=0.8161\)(这里的X即X2构成的时间序列)
下对该模型进行残差诊断。
par(mfrow=c(1,2))
plot(para.fit.X2$residuals, type = "p", pch = "+")
qqnorm(para.fit.X2$residuals)
for (i in 1:2) {
print(Box.test(para.fit.X2$residuals, lag = 6*i))
}
##
## Box-Pierce test
##
## data: para.fit.X2$residuals
## X-squared = 1.6363, df = 6, p-value = 0.9499
##
##
## Box-Pierce test
##
## data: para.fit.X2$residuals
## X-squared = 7.7447, df = 12, p-value = 0.8048
对这个模型分别做
qq图(考虑正态性)与散点图(考虑随机性),发现这个模型的残差都非常接近白噪声。
由于各阶延迟下的LB统计量的P值都显著大于\(\alpha=0.05\),接受残差序列为白噪声序列的原假设,即该拟合模型在置信水平95%下显著有效。所以最后我们选择
ARMA(3,1)模型完成对利用何书元例题方法得到的且再做一阶差分后的随机项数据的建模。
para.fit.X2
##
## Call:
## stats::arima(x = X2, order = c(3, 0, 1), include.mean = F, method = "ML")
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3579 -1.3747 0.6789 0.3597
## s.e. 0.0676 0.0617 0.0647 0.0927
##
## sigma^2 estimated as 0.8161: log likelihood = -254.46, aic = 518.92
# ar1 系数的显著性检验
t1 = 1.3579/0.0676
# 利用 P 值函数 pt()计算P值,其中自由度df为时间序列样本数减去未知参数个数,本案例中为 191-4=187
p1 = 2*pt(t1, df=187, lower.tail = F)
# ar2 系数的显著性检验
t2 = -1.3747/0.0617
p2 = 2*pt(t2, df=187, lower.tail = T)
# ar3 系数的显著性检验
t3 = 0.6789/0.0647
p3 = 2*pt(t3, df=187, lower.tail = F)
# ma1的显著性检验
m1 = 0.3597/0.0927
p4 = 2*pt(m1, df=187, lower.tail = F)
p_value = data.frame(p1, p2,p3,p4)
row.names(p_value) = "P 值"
p_value
## p1 p2 p3 p4
## P 值 1.42983e-48 1.614887e-54 1.504592e-20 0.0001445008
检验结果显示,四个系数的P值都小于\(\alpha=0.05\),拒绝系数等于0的原假设,因此在置信水平 95% 下均显著非零。
shapiro.test(para.fit.X2$residuals)
##
## Shapiro-Wilk normality test
##
## data: para.fit.X2$residuals
## W = 0.98828, p-value = 0.1169
Box.test(para.fit.X2$residuals)
##
## Box-Pierce test
##
## data: para.fit.X2$residuals
## X-squared = 0.011236, df = 1, p-value = 0.9156
Shapiro-Wilk 正态性检验的 p 值为 0.1169,背景分布是正态的。
拟合残差密度直方图及正态 QQ 图:
par(mfrow=c(1,2))
hist(para.fit.X2$residuals,main="Residuals",col=4,ylim=c(0,0.6),xlab=expression(x-hat(x)),probability = T)
lines(density(para.fit.X2$residuals))
rug(para.fit.X2$residuals)
qqnorm(para.fit.X2$residuals)
qqline(para.fit.X2$residuals)
残差序列图及残差 acf 图:
plot(para.fit.X2$residuals,ylab="Residual",type="o",pch=16,main="残差时序图(模型2)")
abline(h=0,lty=12)
Acf(para.fit.X2$residuals,main="ACF of residuals(模型2)",lag.max = 60)
library(forecast)
z2=forecast(para.fit.X2,h=10)
md_rand.lm.fit<-diffinv(z2$fitted) #反差分回去推得最后十个随机项的拟合值
observe.lm.fit<-md_rand.lm.fit[(length(md)-9):length(md)]+md_trend.lm[(length(md)-9):length(md)]+md_seas.lm[(length(md)-9):length(md)]
#对比
d2<-data.frame(matrix(c(observe.lm.fit,md[(length(md)-9):length(md)]),ncol=2))
colnames(d2)<-c("预测值","真实值")
cat("模型2预测的序列最后十个预测值和真实值如下","\n")
## 模型2预测的序列最后十个预测值和真实值如下
d2
## 预测值 真实值
## 1 18199.50 18209.77
## 2 18488.63 18499.10
## 3 18424.96 18434.89
## 4 18412.24 18423.64
## 5 18957.47 18969.17
## 6 19255.17 19266.87
## 7 19198.89 19210.91
## 8 19189.62 19201.68
## 9 19732.75 19745.65
## 10 20032.67 20046.58
对比序列最后十个预测值和真实值,发现数据相差不大(差距最多约在10-20之间),说明模型拟合的较好。
注:利用decompose得到季节项,然后序列减去季节项后用二次函数拟合得到趋势项,最后得到随机项,下对该不平稳随机项直接建模。
X3<-md_rand
fit1.X3<-auto.arima(X3,ic="aic")
fit2.X3<-auto.arima(X3,ic="bic")
plot(X3,main="decompose方法得到不平稳随机项序列")
fit1.X3;fit2.X3
## Series: X3
## ARIMA(4,0,1) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1
## 2.3819 -2.8057 2.1213 -0.7288 0.2625
## s.e. 0.0634 0.1181 0.1145 0.0603 0.0956
##
## sigma^2 = 0.8269: log likelihood = -256.79
## AIC=525.59 AICc=526.04 BIC=545.13
## Series: X3
## ARIMA(4,0,1) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1
## 2.3819 -2.8057 2.1213 -0.7288 0.2625
## s.e. 0.0634 0.1181 0.1145 0.0603 0.0956
##
## sigma^2 = 0.8269: log likelihood = -256.79
## AIC=525.59 AICc=526.04 BIC=545.13
无论是用AIC准则还是BIC准则,auto.arima函数拟合出来的模型都是ARIMA(4,0,1),但此时序列并不平稳。而通过上面建模一可以发现,一次差分后的序列是平稳的,所以我在此选择ARIMA(3,1,1)模型建模。
para.fit.X3 = stats::arima(X3,order=c(3,1,1),include.mean = F, method = "ML") #用arima函数拟合模型
para.fit.X3
##
## Call:
## stats::arima(x = X3, order = c(3, 1, 1), include.mean = F, method = "ML")
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3751 -1.3889 0.6938 0.3102
## s.e. 0.0658 0.0603 0.0628 0.0897
##
## sigma^2 estimated as 0.8414: log likelihood = -257.33, aic = 524.66
参数估计得到的结果,写出模型表达式为:
\((1-B)X_t=\frac{(1+0.3102B)\epsilon_t}{1-1.3751B+1.3889B^2-0.6938B^3},Var(\epsilon_t)=0.8414\)(这里的X即X3构成的时间序列)
下对该模型进行残差诊断
par(mfrow=c(1,2))
plot(para.fit.X3$residuals, type = "p", pch = "+")
qqnorm(para.fit.X3$residuals)
for (i in 1:2) {
print(Box.test(para.fit.X3$residuals, lag = 6*i))
}
##
## Box-Pierce test
##
## data: para.fit.X3$residuals
## X-squared = 0.38337, df = 6, p-value = 0.999
##
##
## Box-Pierce test
##
## data: para.fit.X3$residuals
## X-squared = 8.9689, df = 12, p-value = 0.7056
对这个模型分别做
qq图(考虑正态性)与散点图(考虑随机性),发现这个模型的残差都非常接近白噪声。
由于各阶延迟下的LB统计量的P值都显著大于\(\alpha=0.05\),接受残差序列为白噪声序列的原假设,即该拟合模型在置信水平95%下显著有效。所以最后我们选择ARIMA(3,1,1)模型完成对decompose函数得到的随机项数据的建模。
para.fit.X3
##
## Call:
## stats::arima(x = X3, order = c(3, 1, 1), include.mean = F, method = "ML")
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3751 -1.3889 0.6938 0.3102
## s.e. 0.0658 0.0603 0.0628 0.0897
##
## sigma^2 estimated as 0.8414: log likelihood = -257.33, aic = 524.66
# ar1 系数的显著性检验
t1 = 1.3751/0.0658
# 利用 P 值函数 pt()计算P值,其中自由度df为时间序列样本数减去未知参数个数,本案例中为 191-4=187
p1 = 2*pt(t1, df=187, lower.tail = F)
# ar2 系数的显著性检验
t2 = -1.3889/0.0603
p2 = 2*pt(t2, df=187, lower.tail = T)
# ar3 系数的显著性检验
t3 = 0.6938/0.0628
p3 = 2*pt(t3, df=187, lower.tail = F)
# ma1的显著性检验
m1 = 0.3102/0.0897
p4 = 2*pt(m1, df=187, lower.tail = F)
p_value = data.frame(p1, p2,p3,p4)
row.names(p_value) = "P 值"
p_value
## p1 p2 p3 p4
## P 值 8.445218e-51 1.684701e-56 3.654305e-22 0.0006733279
检验结果显示,所有系数的P值都小于\(\alpha=0.05\),拒绝系数等于0的原假设,因此在置信水平95%下均显著非零。
shapiro.test(para.fit.X3$residuals)
##
## Shapiro-Wilk normality test
##
## data: para.fit.X3$residuals
## W = 0.9901, p-value = 0.2081
Box.test(para.fit.X3$residuals)
##
## Box-Pierce test
##
## data: para.fit.X3$residuals
## X-squared = 2.5256e-05, df = 1, p-value = 0.996
Shapiro-Wilk 正态性检验的 p 值为 0.2081,背景分布是正态的。
拟合残差密度直方图及正态 QQ 图:
par(mfrow=c(1,2))
hist(para.fit.X3$residuals,main="Residuals",col=4,ylim=c(0,0.6),xlab=expression(x-hat(x)),probability = T)
lines(density(para.fit.X3$residuals))
rug(para.fit.X3$residuals)
qqnorm(para.fit.X3$residuals)
qqline(para.fit.X3$residuals)
残差序列图及残差 acf 图:
plot(para.fit.X3$residuals,ylab="Residual",type="o",pch=16,main="残差时序图(模型3)")
abline(h=0,lty=12)
Acf(para.fit.X3$residuals,main="ACF of residuals(模型3)",lag.max = 60)
library(forecast)
z3=forecast(para.fit.X3,h=10)
md_rand.fit.3<-z3$fitted #最后十个随机项的拟合值
observe.fit.3<-md_rand.fit.3[(length(md)-9):length(md)]+md_trend[(length(md)-9):length(md)]+md_seas[(length(md)-9):length(md)]
#对比
d3<-data.frame(matrix(c(observe.fit.3,md[(length(md)-9):length(md)]),ncol=2))
colnames(d3)<-c("预测值(模型3)","真实值")
cat("模型3预测的序列最后十个预测值和真实值如下","\n")
## 模型3预测的序列最后十个预测值和真实值如下
d3
## 预测值(模型3) 真实值
## 1 18209.78 18209.77
## 2 18498.75 18499.10
## 3 18435.50 18434.89
## 4 18422.07 18423.64
## 5 18968.77 18969.17
## 6 19266.73 19266.87
## 7 19210.67 19210.91
## 8 19201.48 19201.68
## 9 19744.75 19745.65
## 10 20045.38 20046.58
对比序列最后十个预测值和真实值,发现数据相差很少(差距最多约在2之内),说明模型拟合的较好。
注:利用何书元例题方法得到不平稳的随机项序列,下对该序列直接进行arima模型的建模和检验过程。
X4<-md_rand.lm
plot(X4,main="例题方法得到不平稳随机项序列")
fit1.X4<-auto.arima(X4,ic="aic")
fit2.X4<-auto.arima(X4,ic="bic")
fit1.X4;fit2.X4
## Series: X4
## ARIMA(3,1,1)
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3579 -1.3747 0.6789 0.3597
## s.e. 0.0676 0.0617 0.0647 0.0927
##
## sigma^2 = 0.8336: log likelihood = -254.46
## AIC=518.92 AICc=519.25 BIC=535.18
## Series: X4
## ARIMA(3,1,1)
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3579 -1.3747 0.6789 0.3597
## s.e. 0.0676 0.0617 0.0647 0.0927
##
## sigma^2 = 0.8336: log likelihood = -254.46
## AIC=518.92 AICc=519.25 BIC=535.18
无论是用AIC准则还是BIC准则,auto.arima函数拟合出来的模型都是ARIMA(3,1,1),因此选择该模型建模。
para.fit.X4 = stats::arima(X4,order=c(3,1,1), include.mean=F,method = "ML") #用arima函数拟合模型
para.fit.X4
##
## Call:
## stats::arima(x = X4, order = c(3, 1, 1), include.mean = F, method = "ML")
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3579 -1.3747 0.6789 0.3597
## s.e. 0.0676 0.0617 0.0647 0.0927
##
## sigma^2 estimated as 0.8161: log likelihood = -254.46, aic = 518.92
参数估计得到的结果,写出模型表达式为: \((1-B)X_t=\frac{(1+0.3597B)\epsilon_t}{1-1.3579B+1.3747B^2-0.6789B^3},Var(\epsilon_t)=0.8161\)(这里的X即X4构成的时间序列)
下对该模型进行残差诊断。
par(mfrow=c(1,2))
plot(para.fit.X4$residuals, type = "p", pch = "+")
qqnorm(para.fit.X4$residuals)
for (i in 1:2) {
print(Box.test(para.fit.X4$residuals, lag = 6*i))
}
##
## Box-Pierce test
##
## data: para.fit.X4$residuals
## X-squared = 1.6462, df = 6, p-value = 0.9492
##
##
## Box-Pierce test
##
## data: para.fit.X4$residuals
## X-squared = 7.7913, df = 12, p-value = 0.8012
对这个模型分别做 qq图(考虑正态性)与散点图(考虑随机性),发现这个模型的残差都非常接近白噪声。 由于各阶延迟下的LB统计量的P值都显著大于\(\alpha=0.05\),接受残差序列为白噪声序列的原假设,即该拟合模型在置信水平95%下显著有效。所以最后我们选择 ARIMA(3,1,1)模型完成对何书元例题方法得到的随机项数据的建模。
para.fit.X4
##
## Call:
## stats::arima(x = X4, order = c(3, 1, 1), include.mean = F, method = "ML")
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 1.3579 -1.3747 0.6789 0.3597
## s.e. 0.0676 0.0617 0.0647 0.0927
##
## sigma^2 estimated as 0.8161: log likelihood = -254.46, aic = 518.92
# ar1 系数的显著性检验
t1 = 1.3579/0.0676
# 利用 P 值函数 pt()计算P值,其中自由度df为时间序列样本数减去未知参数个数,本案例中为 191-4=187
p1 = 2*pt(t1, df=187, lower.tail = F)
# ar2 系数的显著性检验
t2 = -1.3747/0.0617
p2 = 2*pt(t2, df=187, lower.tail = T)
# ar3 系数的显著性检验
t3 = 0.6789/0.0647
p3 = 2*pt(t3, df=187, lower.tail = F)
# ma1的显著性检验
m1 = 0.3597/0.0927
p4 = 2*pt(m1, df=187, lower.tail = F)
p_value = data.frame(p1, p2,p3,p4)
row.names(p_value) = "P 值"
p_value
## p1 p2 p3 p4
## P 值 1.42983e-48 1.614887e-54 1.504592e-20 0.0001445008
检验结果显示,四个系数的P值都小于\(\alpha=0.05\),拒绝系数等于0的原假设,因此在置信水平 95% 下均显著非零。
shapiro.test(para.fit.X4$residuals)
##
## Shapiro-Wilk normality test
##
## data: para.fit.X4$residuals
## W = 0.98841, p-value = 0.1197
Box.test(para.fit.X2$residuals)
##
## Box-Pierce test
##
## data: para.fit.X2$residuals
## X-squared = 0.011236, df = 1, p-value = 0.9156
Shapiro-Wilk 正态性检验的 p 值为 0.1197,背景分布是正态的。
拟合残差密度直方图及正态 QQ 图:
par(mfrow=c(1,2))
hist(para.fit.X4$residuals,main="Residuals",col=4,ylim=c(0,0.6),xlab=expression(x-hat(x)),probability = T)
lines(density(para.fit.X4$residuals))
rug(para.fit.X4$residuals)
qqnorm(para.fit.X4$residuals)
qqline(para.fit.X4$residuals)
残差序列图及残差 acf 图:
plot(para.fit.X4$residuals,ylab="Residual",type="o",pch=16,main="残差时序图(模型4)")
abline(h=0,lty=12)
Acf(para.fit.X4$residuals,main="ACF of residuals(模型4)",lag.max = 60)
library(forecast)
z4=forecast(para.fit.X4,h=10)
md_rand.lm.fit.4<-z4$fitted #最后十个随机项的拟合值
observe.lm.fit.4<-md_rand.lm.fit.4[(length(md)-9):length(md)]+md_trend.lm[(length(md)-9):length(md)]+md_seas.lm[(length(md)-9):length(md)]
#对比
d4<-data.frame(matrix(c(observe.lm.fit.4,md[(length(md)-9):length(md)]),ncol=2))
colnames(d4)<-c("预测值","真实值")
cat("模型4预测的序列最后十个预测值和真实值如下","\n")
## 模型4预测的序列最后十个预测值和真实值如下
d4
## 预测值 真实值
## 1 18209.83 18209.77
## 2 18498.91 18499.10
## 3 18435.43 18434.89
## 4 18422.17 18423.64
## 5 18968.88 18969.17
## 6 19266.87 19266.87
## 7 19210.59 19210.91
## 8 19201.64 19201.68
## 9 19744.81 19745.65
## 10 20045.57 20046.58
对比序列最后十个预测值和真实值,发现数据相差很少(差距最多约在2之内),说明模型拟合的较好。
综上,我对去除掉前十个数据后的192个时间序列数据采取两种方式进行时间序列分解后,进行arima模型的直接或间接拟合,并经过模型检验和参数显著性检验、残差分析等过程验证了模型成立的显著性,还利用模型对最后十个数据进行预测,与真实值进行了比较。现将结果汇总如下:
1.分解方式一:
(1)分解方法:先用decompose函数分解192个时间序列数据得到序列的季节项,用原序列减去季节项后,再用二次函数曲线拟合不包含季节项序列的趋势项,然后减去得到的趋势项序列,得到随机项序列Y1。
(2)拟合得到的模型:此处用Y指代上文的Y1,模型为ARIMA(3,1,1),参数估计得到的模型表达式为:\((1-1.3751B+1.3889B^2-0.6938B^3)(1-B)Y_t=(1+0.3102B)\epsilon_t,Var(\epsilon_t)=0.8414\)(这里的\((1-B)Y_t\)序列对应于上面代码模块里X1序列,也即模型1的模型拟合下面写的表达式中的\(X_t\);这里的\(Y_t\)序列对应于上面代码模块里X3序列,也即模型3的模型拟合下面写的表达式中的\(X_t\))。该模型通过了模型检验和参数显著性检验、残差分析等相关检验。
(3)拟合得到的模型进行预测,发现预测数据和真实数据相差很小,也说明模型拟合程度较好。
2.分解方式二:
(1)分解方法:先用三次函数曲线拟合含有192个数据的时间序列的趋势项,用原序列减去趋势项后,再按照季节取平均得到四个季节项,然后减去得到的季节项序列,得到随机项序列Y2。
(2)拟合得到的模型:此处用Y指代上文的Y2,模型为ARIMA(3,1,1),参数估计得到的模型表达式为:\((1-1.3579B+1.3747B^2-0.6789B^3)(1-B)Y_t=(1+0.3597B)\epsilon_t,Var(\epsilon_t)=0.8161\)(这里的\((1-B)Y_t\)序列对应于上面代码模块里X2序列,也即模型2的模型拟合下面写的表达式中的\(X_t\);这里的\(Y_t\)序列对应于上面代码模块里X4序列,也即模型4的模型拟合下面写的表达式中的\(X_t\))。该模型通过了模型检验和参数显著性检验、残差分析等相关检验。
(3)拟合得到的模型进行预测,发现预测数据和真实数据相差很小,也说明模型拟合程度较好。
3.比较上面四个模型的最后十个数据的预测值和真实值,发现第一、三个模型拟合程度更好一些,说明第一、三个模型的参数估计值可能更接近模型的真实参数值。
4.在分解方式相同情况下,无论是直接用ARIMA模型拟合随机项序列,还是先对随机项序列进行差分再用ARMA模型拟合差分后的序列然后反推回随机项序列的模型,得到的参数估计值都是相同的,但预测时用ARIMA模型直接拟合的参数估计值作为forecast函数参数得到的预测值和真实值相差更小(几乎一致),主要问题在于在前两个模型反推回预测的随机项序列时此时第一项为0,然后按差值递推后续所有项,但实际上第一项真实值不为0,故和原随机序列差别较大。因此综合四个模型,在进行建模和预测时还是选择使用模型3最佳。