Laboratory 6 - Estimation, filtering and system identification - Prof. M. Taragna

Exercise: model identification for a water heater, using linear models (real data)

Introduction
Procedure
Problem setup
ARX identification, residual test and possible validation
ARX comparison
OE identification, residual test and possible validation
OE comparison
Trade-off selection

Introduction

The program code may be splitted in sections using the characters "%%". Each section can run separately with the command "Run Section" (in the Editor toolbar, just to the right of the "Run" button). You can do the same thing by highlighting the code you want to run and by using the button function 9 (F9). This way, you can run only the desired section of your code, saving your time. This script can be considered as a reference example.

clear all, close all, clc

Procedure

Load the .mat file containing input-output data
Remove the mean value from the data
Plot input-output data
Define estimation and validation datasets
For each input-output delay nk from 1 to 5
And for each order na=nb (ARX) / nb=nf (OE) from 1 to 5
Identify the ARX/OE model using the estimation dataset
Perform the whiteness test on the estimation dataset
Compute the predicted output on the validation dataset
Compute the RMSE and MDL criteria
Compare RMSE and MDL values
Select the best trade-off between all the model classes

Problem setup

% Step 1: load of data

load heater

% Step 2: remove the mean value

ue_0mean = ue - mean(ue);
ye_0mean = ye - mean(ye);
uv_0mean = uv - mean(uv);
yv_0mean = yv - mean(yv);

% Step 3: plot of input-output data

Ne = length(ue); % number of estimation data
Nv = length(uv); % number of validation data

figure, plot(ue_0mean), ylabel('ue\_0mean'),
figure, plot(ye_0mean), ylabel('ye\_0mean'),
figure, plot(uv_0mean), ylabel('uv\_0mean'),
figure, plot(yv_0mean), ylabel('yv\_0mean')

% Step 4: definition of estimation and validation datasets

DATAe = [ye_0mean,ue_0mean]; % estimation dataset
DATAv = [yv_0mean,uv_0mean]; % validation dataset

ARX identification, residual test and possible validation

To run this part in a smart way, you can put a break point at the command "close all" at the end of the external for-loop. This way you can check in figure N the corrisponding model with na=nb=N. After that you can click on "Continue" on the toolbar to evaluate for the next nk value. After the analysis you may comment the 'resid' command to speedup the next run instance.

% Step 5: loop on input-output delay

close all
for nk = 1:5,

    % Step 6: loop on model orders

    for k = 1:5,

        % Step 7: ARX(na=k,nb=k,nk) model identification

        model = arx(DATAe,[k, k, nk]);

        % Step 8: computation of model residuals and whiteness test

        % figure, resid(DATAe, model, 'corr', 30);

        % Step 9: computation of the predicted output on validation dataset

        yh = compare(DATAv, model, 1); % III param: 1 -> prediction; inf -> simulation
        % figure, plot(1:Nv,yv_0mean,'-k',1:Nv,yh,'--r'),
        % title(['ARX(',num2str(k),',',num2str(k),',',num2str(nk),')']),
        % legend('Output data yv\_0mean','Predicted output yh');
        N0 = 10;

        % Step 10: RMSE and MDL computation on validation dataset

        MSE = 1 /(Nv-N0)*norm(yv_0mean(N0+1:end)-yh(N0+1:end))^2;
        RMSE_arx(nk,k) = sqrt(MSE);

        n = 2*k;
        n_arx(k)=n;
        MDL_arx(nk,k) = n*log(Nv-N0)/(Nv-N0)+log(MSE);

    end
    close all
end

ARX comparison

The figures generated by resid show in the first part (AutoCorr) the residual values and the confidence intervals. You have to check all the generated plots: the more residual values are inside the confidence interval, the better is the model. You count the number of residuals sufficiently outside the confidence interval and you select a threshold: if this number is greater than the threshold, then the model is wasted, otherwise it is considered for further analysis. In this case, a reasonable threshold could be 3. The whiteness test results for ARX models are:

            na=1, na=2, na=3, na=4, na=5
       nk=1:  10     2     2     1     2
       nk=2:  10     2     2     1     2
       nk=3:   7     2     1     1     1
       nk=4:   4     4     1     1     0
       nk=5:   5     3     1     0     1

=> the ARX models that sufficiently satisfy the residual whiteness test are:

For nk = 1, 2, 3, 5: na = nb >= 2
For nk = 4: na = nb >= 3

Then the performance criteria can be printed

% Step 11: ARX comparison

RMSE_arx, MDL_arx
figure, plot(n_arx,RMSE_arx','o-'), title('RMSE of ARX models'),
legend('ARX(nk=1)','ARX(nk=2)','ARX(nk=3)','ARX(nk=4)','ARX(nk=5)')
figure, plot(n_arx,MDL_arx','o-'), title('MDL of ARX models'),
legend('ARX(nk=1)','ARX(nk=2)','ARX(nk=3)','ARX(nk=4)','ARX(nk=5)')

RMSE_arx =
    1.0199    0.7555    0.6790    0.6301    0.6054
    0.9524    0.7161    0.6286    0.6078    0.6035
    0.8569    0.6615    0.6077    0.6072    0.5988
    0.7309    0.6331    0.6088    0.6033    0.5996
    0.6836    0.6608    0.6394    0.6339    0.6313
MDL_arx =
    0.0533   -0.5330   -0.7325   -0.8680   -0.9340
   -0.0837   -0.6401   -0.8869   -0.9400   -0.9403
   -0.2949   -0.7985   -0.9544   -0.9420   -0.9559
   -0.6129   -0.8864   -0.9508   -0.9548   -0.9533
   -0.7468   -0.8008   -0.8525   -0.8559   -0.8502

OE identification, residual test and possible validation

% Step 5: loop on input-output delay

close all
for nk = 1:5,

    % Step 6: loop on model orders

    for k = 1:5,

        % Step 7: OE(nb=k,nf=k,nk) model identification

        model = oe(DATAe,[k, k, nk]);

        % Step 8: computation of model residuals and whiteness test

        % figure, resid(DATAe, model, 'corr', 30);

    end
    close all
end

OE comparison

The whiteness test results for OE models are:

            nb=1, nb=2, nb=3, nb=4, nb=5
      nk=1:   7    16    17    17    19
      nk=2:   7    16    17    17    14
      nk=3:  10    16    16    14    18
      nk=4:  14    16    16    18    17
      nk=5:  15    16    18    16    17

NO OE model passes the residual whiteness test with Threshold = 3 => OE is not the right model class in this case: for any order and delay, the number of residuals outside the confidence interval is too high

Trade-off selection

The goal is to minimize the RMSE and MDL criteria, taking into account the previous residual test and the model complexity n, that for models in prediction is: n=na+nb (ARX), n=nb+nf (OE) In this case, the ARX model class only has to be considered and the best trade-off between complexity and both the validation criteria is the ARX(3,3,3)

% Step 12: trade-off selection

best_tradeoff = arx(DATAe,[3,3,3]);
yh = compare(DATAv, best_tradeoff, 1); % III param: 1 -> prediction; inf -> simulation
figure, plot(1:Nv,yv_0mean,'-k',1:Nv,yh,'--r'),
legend('Output data yv\_0mean','Predicted output yh',4);

RMSE_best_tradeoff = RMSE_arx(3,3)
MDL_best_tradeoff = MDL_arx(3,3)

present(best_tradeoff)
[A,B,C,D,F] = polydata(best_tradeoff)

% When used in prediction mode, the ARX model transfer functions are:
% - between u(t) and yh(t): B(z)
% - between y(t) and yh(t): 1-A(z)
% Pay care that the polynomials A(z) and B(z) are in the z^-1 variable!!

RMSE_best_tradeoff =
    0.6077
MDL_best_tradeoff =
   -0.9544
                                                                              
best_tradeoff =                                                               
Discrete-time ARX model:  A(z)y(t) = B(z)u(t) + e(t)                          
                                                                              
  A(z) = 1 - 1.072 (+/- 0.02048) z^-1 - 0.09977 (+/- 0.03149) z^-2            
                                               + 0.2512 (+/- 0.01675) z^-3    
                                                                              
                                                                              
  B(z) = 0.3813 (+/- 0.1255) z^-3 + 1.468 (+/- 0.1734) z^-4 + 1.638 (         
                                                          +/- 0.1552) z^-5    
                                                                              
Sample time: 1 seconds                                                        
                                                                              
Parameterization:                                                             
   Polynomial orders:   na=3   nb=3   nk=3                                    
   Number of free coefficients: 6                                             
   Use "polydata", "getpvec", "getcov" for parameters and their uncertainties.
                                                                              
Status:                                                                       
Estimated using ARX on time domain data.                                      
Fit to estimation data: 93.69% (prediction focus)                             
FPE: 0.4125, MSE: 0.4112                                                      
More information in model's "Report" property.                                
A =
    1.0000   -1.0715   -0.0998    0.2512
B =
         0         0         0    0.3813    1.4678    1.6377
C =
     1
D =
     1
F =
     1