Polynomial regression in a matrixnotation
A selfstudy attempt using partial derivatives
The general problem of a leastsquaresapproximation/regression
can be stated as a polynomial version for instance with a polynomial of order 2, where the prognosed y_{k} are estimated based on a quadratic formula on x_{k} :
where we are searching for the "best" values for a, b, and c, such that as usual the sumofsquares of the residuals is minimal.
A matrixexpression for the data might look like the following
where we have n datapoints and y_{k} and x_{k} are measured, and the r_{k} are the residuals.
Let K denote the vector of sought coefficients K=[a,b,c] and an extension K1=[1,a,b,c], V the matrix of combined Y and Xdata, R the vector of residuals after predicting Y from X so that we have (using ' for transposition)
K1 * V = R
where we want
R * R' = min! //"leastsquares"criterion
So also
K1 * V * V' * K1' = min!
The inner product, divided by the scalar n is just a covariancematrix which contains the meansofcoproducts indicated by the indexes:
Cov = V * V' /n =
Now let the vector of coefficients K1 be used as diagonal and the summing of the matrixelements indicated by a sumoperation, then, writing
T = dK1 * Cov * dK1'
the matrix T is now
T =
and we search a,b,c such that
sum(T) = min
We look at it with respect to a, then a must be such that the sum is zero:
The second derivative is 2C_{11} which is positive, so in the first direction the extremum at a is a minimum.
The same with respect to b, then also b must be such that the according sum is zero:
The second derivative is 2C_{x}^{2} which is positive, so in the second direction the extremum at b is a minimum.
Finally the same with respect to c, then also c must be such that the according sum is zero:
The second derivative is 2C_{x}^{4} which is positive, so in the third direction the extremum at c is a minimum.
Result:
In all, if a,b,c are determined in a simultaneous solution, we shall have a minimum in all directions of the function.
Due to the symmetry of the matrix each of these sums involving partial derivatives gives a very simple equation. We get a system of three equations:
and the solution K by solving
If that model is reduced to y being a linear function of x then the ansatz is like
ŷ = a + b x
Ŷ = K * X
R = Y – Ŷ
R * R' = min !
where the rowvector K contains the coefficients [a, b ]
Then we get the solution for K by solving
It is interesting, that the usual formula for the regression parameters are immediately visible when we consider, that the x and ydata are centered; then the means C_{y1} and C_{x1} are zero and we get the simpler system
which is then the "direct" formula for the simple linear regression.
The pattern in the matrixform of the result suggests also that it can easily be generalized to polynomials of higher order; in general, if X denotes the matrix of the xvalues and all x^0 to x^m of the m+1 required powers. So we formulate the model with the unknown parameters to be estimated:
ŷ
= b_{0} + b_{1} x + b_{2} x^{2} + … + b_{m} x^{m}
Ŷ = B * X
R = Y – Ŷ
R * R' = min !
where the rowvector B contains the coefficients [b_{0}, b_{1}, b_{2}, …, b_{m}]
The datamatrix of X has now 1+m rows:
and Y the matrix of yvalues looks –as before– like
then again
COV_{xx}
= X * X ' / n
COV_{xy} = X * Y' /
n
and
which is also the common formula for the multiple linear regression.
Let's look at the matrix T again:
T =
We observe, that this is the Hadamardproduct
T = Cov # (K1' * K1)
and the derivative wrt a,b,c is the sum of the partial derivatives:
T' = Cov # ( (K1~ * K1) +
where the sum in each summand separately must equal zero. Exploiting the symmetry we can write
T =
Dataset; the "empirical" y_{k} were generated using the coefficients a=1, b=3, c=2 and a uniform randomdisturbance "rnd"
k 
x 
y 

^y_{k}= 
rnd*1000 
1 
1.00 
2.01 

2.00 
14.11 
2 
2.00 
0.99 

1.00 
14.35 
3 
3.00 
7.95 

8.00 
50.08 
4 
4.00 
19.00 

19.00 
2.17 
5 
5.00 
33.94 

34.00 
59.29 
6 
6.00 
53.00 

53.00 
0.96 
7 
7.00 
75.92 

76.00 
77.45 
8 
8.00 
102.93 

103.00 
65.07 
9 
9.00 
133.92 

134.00 
77.05 
10 
10.00 
168.93 

169.00 
70.81 
11 
11.00 
207.94 

208.00 
55.75 
12 
12.00 
250.98 

251.00 
20.60 
13 
13.00 
297.93 

298.00 
68.11 
14 
14.00 
348.94 

349.00 
59.29 
15 
15.00 
403.90 

404.00 
95.55 
16 
16.00 
462.94 

463.00 
64.41 
17 
17.00 
525.90 

526.00 
99.84 
18 
18.00 
592.98 

593.00 
24.38 
19 
19.00 
663.93 

664.00 
67.62 
20 
20.00 
738.97 

739.00 
29.57 
The disturbance (sumofsquares) of the original vector rnd is
ssqr(rnd)~ 0.06859 .
The procedure gave the matrix COV as

y 
1 
x 
x² 
y 
118824.98 
254.45 
3968.40 
65499.74 
1 
254.45 
1.00 
10.50 
143.50 
x 
3968.40 
10.50 
143.50 
2205.00 
x² 
65499.74 
143.50 
2205.00 
36133.30 
CC is

1 
x 
x² 
1 
1.00 
10.50 
143.50 
x 
10.50 
143.50 
2205.00 
x² 
143.50 
2205.00 
36133.30 
CC^{1}is





11.07 
2.16 
0.09 

2.16 
0.53 
0.02 

0.09 
0.02 
0.00 
The solution K=[a,b,c] is found as [0.9989, 3.0104, 2.0004] .
That solution leads to a improved/smaller ssqr of the residuals:
ssqr(res) ~ 0.01143
This should then be the least possible residual disturbance.