Go to CCP Homepage Go to Materials Page Go to Linear Algebra Materials Go to Table of Contents
Go Back One Page Go Forward One Page

Curve Fitting

Part 1: Example:  Quadratic Fit to U.S. Population Data

In the module Least Squares, we learned how to find the best fit of a straight line to a set of data points. The method of least squares can be generalized to allow fitting more complex functions to data. In this Part, we review the method while adapting it to the problem of finding a quadratic function to fit the set of U.S. population data given below.
 
U.S. Populations
[ Census data]
Year Population
 (in millions)
1900 75.996
1910 91.972
1920 105.711
1930 122.775
1940 131.669
1950 150.697
1960 179.323
1970 203.185
1980 226.546
1990 248.710
Scatter plot of U.S. Population data

Our model function is a quadratic of the form y = a + b t + c t2.  Below, we plot such a quadratic function, along with vertical line segments indicating the deviations or residuals from the data points to the corresponding points on the model curve. As in the "Least Squares" module, our criterion for best fit is that the best choice of quadradic curve should minimize the sum of the squares of the residuals -- hence the name "least squares."

Population data with quadratic curve

Remark about notation: As in the "Least Squares" module, we will maintain a distinction between vectors and scalars by boldfacing vector names but not boldfacing scalars.

  1. Suppose we represent the data points as
  2. (T1,Y1), (T2,Y2), ..., (T10,Y10).

    Explain why the quantity to be minimized is

    [Y1 - (a + bT1+ cT12)]2 + [Y2 - (a+ bT2+ cT22)]2 + ...
    + [Y10 - (a + bT10+ cT102)]2.


     
  3. Our least squares problem is to find numbers a, b, and c so as to minimize the distance in R10 between the vector

  4. y = (Y1, Y2, ..., Y10)T

    and the set of vectors of the form

    (a+ bT1+ cT12,  a+ bT2+ cT22, ... ,  a + bT10+ cT102)T.

    Let

    t = (T1, T2, ..., T10)T,

    t2 = (T12, T22, ..., T102)T

    and
    1 = (1, 1, ..., 1)T

    be vectors in R10. Then the set of vectors described two sentences above can also be described as the set of vectors of the form

    a 1 + b t + c t2

    where a, b, and c are real numbers.  

  5. What are the possible dimensions of the model space? What would it mean if the dimension were something other than 3?

  6. What would it mean if the vector y were in the model space? Do you think that this the case for the U.S. population data? Why or why not?

  7. Let  W = Span(1, t, t2) be the subspace spanned by the three vectors defined in step 2.   If  p = a 1 + b t + c tis the projection of y onto W, then explain why we can write

    p = Xv,

    where X is the matrix with columns 1, t, and t2 and v is the solution vector (a, b, c)T.

  8. Since p is the projection of y onto the subspace W, we know that y - p is othogonal to the entire subspace W, in particular to vectors 1, t, and t2.  Explain why this orthogonality can be expressed as

    XT(y - p) = 0.

    Show that this implies that v must satisfy the matrix equation

    XTXv = XTy.

    This matrix equation consists of three scalar equations in the three parameters a, b, and c of the best fitting quadratic model.  The equations are known as the normal equations.

  9. In your helper application worksheet, you will find the vectors 1, t, t2, and y for the U.S. population data.  Note that time is measured in years since 1900. Form the matrix X and solve the matrix form of the normal equations for the parameters a, b, and c of the best fitting quadratic.

  10. Plot the least squares quadradic that you just found together with a scatter plot of the U.S. population data.  How good is the fit?

  11. Compute the projection p of y onto the subspace W and also compute y - p.  Explain the relationship between these vectors and features of your plot from the previous step.

  12. One common measure of goodness of fit is the sum of the residuals, the least-squares function that we have minimized. Tell why this quantity can be computed as  S = norm(y - p)2.  Compute S for the quadratic fit you have found.


Go to CCP Homepage Go to Materials Page Go to Linear Algebra Materials Go to Table of Contents
Go Back One Page Go Forward One Page


modules at math.duke.edu