Go to CCP Homepage Go to Materials Page Go to Linear Algebra Materials Go to Table of Contents
Go Back One Page Go Forward One Page

Curve Fitting

Part 2:  Linear Least Squares

The method we used in Part 1 can be easily adapted to fitting other model functions to data if the model function is of the form

f(t) = c1 g1(t) + c2 g2(t) + ... + ck gk(t),

that is, if the model function is linear in the parameters c1, c2, ... , ck of the model.  In our quadratic model of Part 1, the component functions used in defining f(t) were g1(t) = 1, g2(t) = t, and  g3(t) = t2.

Let's see why the method of Part I works for the more general linear model.  Suppose the data points are (Ti, Yi), i = 1, ... , n. Define the following vectors in Rn:

y = (Y1, Y2, ..., Yn)T
g1 =  (g1(T1), g1(T2), ..., g1(Tn) )T ,
  g2 =  (g2(T1), g2(T2), ..., g2(Tn) )T ,
...
gk = (gk(T1), gk(T2), ..., gk(Tn) )T .

Then W = Span(g1, g2, ... , gk) is the k-dimensional subspace of Rn of vectors of the form

c1 g1 + c2 g2 + ... + ck gk

The least squares problem is to find values of the parameters c1, c2, ... , ck that produce the vector in W closest to y.  This vector is the projection p of y onto the subspace W.

As we saw in Part 1, the values of the parameters that minimize the distance from y to W are the components of the vector
v = (c1, c2, ... , ck)T  that solves the normal equations

XTXv = XTy,

where X is the matrix whose columns are g1, g2, ... , gk.

Let's look at another example.  The data below are measurements of the signal output by a small electronic device.  The signal is sampled every half second over the given time interval.  We want to find a sinusoidal model function that provides a good fit to the observed data.
 
Sampled Signal
Time
(sec)
Signal Strength
    (millivolts)
-2.0 -6.32
-1.5 -3.23
-1.0  1.62
-0.5  3.13
 0.0  1.74
 0.5 -0.75
 1.0 -1.41
 1.5  1.78
 2.0  8.88
 2.5 9.98
 3.0  7.10
Scatter plot of Signal Strength data

Because of theoretical considerations based on physical properties of such electronic devices, we believe that a likely model function for the given output is a trigonometric polynomial of the form

f(t) = a0 + a1 sin(t) + b1 cos(t) + a2 sin(2t) + b2 cos(2t).

The figure below shows such a candidate function.  We want to choose the parameters a0, a1, b1, a2, and b2, to minimize the sum of squares of the residuals, which are shown in the figure.  That is, we want the best least squares fit.

Signal Strength data with sinusoidal curve

  1. Using the given data, the vectors

    y = (Y1, Y2, ..., Y11)T

    s1 =  (sin(T1), sin(T2), ..., sin(T11) )T ,
      c1= (cos(T1), cos(T2), ..., cos(T11) )T ,
    s2 = (sin(2T1), sin(2T2), ..., sin(2T11) )T ,
       c2= (cos(2T1), cos(2T2), ..., cos(2T11) )T ,
    and
    1 = (1, 1, ..., 1)T

    are defined in your helper application worksheet.  Solve the normal equations to find the trigonometric polynomial of best least squares fit.
     

  2. Plot the least squares trig polynomial that you just found together with a scatter plot of the signal strength data.  How good is the fit?

  3.  
  4. Compute the residuals y - p and the sum of squares S of the residuals. Make a plot of the residuals versus time t.  Such a plot is called a residual plot.  What do you learn from the plot about the goodness of fit?  (Note that non-random patterns of the residuals often indicate that the model function is not an appropriate choice.)

Go to CCP Homepage Go to Materials Page Go to Linear Algebra Materials Go to Table of Contents
Go Back One Page Go Forward One Page


modules at math.duke.edu