P. Fieguth - SD372 Homepage
Paul Fieguth
Dept. of Systems Design Engineering
Faculty of Engineering
University of Waterloo
Waterloo, Ontario
Canada N2L 3G1
pfieguth@uwaterloo.ca
Tel: (519) 888-4567 x84970
FAX: (519) 746-4791

SD372 Homepage

I'll be putting things here occasionally, although anything important will of course be announced in class.

Because of the SYDE and MTE symposia, this week's schedule is a bit changed:

  Monday, March 23 - 1:30 Lecture on clustering
  Wednes, March 25 - 1:30 Repeat of Monday's lecture
  Wednes, March 25 - 2:30 Tutorial regarding lab 3
  Friday, March 27 - 1:30 Lectures continue

Mar 18: Lab 3 has been posted.

Mar 9: Midterm solutions have been posted.

Feb 27: I have put a few more pages in the Handouts section.

Feb 26: Because the midterm was on Wednesday, the schedule for this week ended up being a little ambiguous (ie, was Wednesday a double-class with a tutorial on Friday, or not?). Since the default plan, schedule-wise, is that we have class on Friday, I *will* plan on being in class tomorrow.


Final Exam: the final exam is scheduled for Thursday, April 13, 9am-11:30am, E2-1303 A,B.
A few things:
  No calculators or any other aids
  Two sides of an 8.5x11 sheet (or two sheets, each with one side written) of paper permitted with anything written/typed on them
  The exam will have 4-5 questions, and will have a similar balance to previous exams.

  Important Dates:
      Last Class     Wed April  1, 1:30-3:30
      Last Tutorial  Fri April  3, 1:30-2:30
      Lab 3 Due      Fri April  3, Email to y30liu@engmail

      Office Hour    Wed April  8, 1:00 - 2:30   P. Fieguth  DC-2615
      Office Hour    Thu April  9, 1:00 - 2:30   Amir        DC-2628
      Office Hour    Fri April 10, 1:00 - 2:30   Ying        DC-2620

      Final Exam     Mon April 13, 9:00 - 11:30  E2-1303A/B

Midterm Results:


Overview Material

2009 Course Syllabus

Office Hours

TA Office hours are
   Wednesdays, 3:30 - 4:30  Tutorial TA  E2-1303A  (If tutorial is on Wednesday, the office hour is in-class, after tutorial, TA will leave once there are no students around)
   Fridays,    2:30 - 3:30  Tutorial TA  E2-1303A  (If tutorial is on Friday, the office hour is in-class, after tutorial, TA will leave once there are no students around)
   Mondays    12:30 - 1:20  Fieguth      DC-2643  
To talk with me, the easiest arrangement is to ask questions after class, at 2:20 Mon, Wed, Fri. Since there is no class after ours, I will generally hang around and avoid leaving too quickly at the end of class. I will, of course, be scheduling additional office hours before the midterm and final exams.

Handouts

  1. GED Classification Boundary Sketching The following three plots did not reproduce very well in your notes (Figure 3.2), so I'm including the originals here. If you look carefully, you will see how the equi-distance contours (dotted blue curves) intersect at the classification boundary (interface between white and grey regions): A past TA has also prepared some additional sketches. The three-cluster example from the in-class demo is here.

  2. Overview of Distance-Based Classification Just to make sure that students are getting the big picture, here are two pages that give an overview of distance-based classification:

  3. Eigendecomposition Discussion The three pages which I showed you are here (scans of my handwritten pages):

  4. Gaussian ML/MAP and Thresholds Here is a one page handout that discusses the thresholds for ML and MAP in the Gaussian case, and clarifies the multiple ways in which the thresholds can be written.

  5. ML Parameter Estimation Example Here is a short handout that gives you the derivation for the ML estimator for the mean and variance in a Gaussian distribution:

  6. ML Parameter Estimation and Bias Slightly repititious from the previous handout, but here bias is discussed in a bit more depth.

  7. kNN Examples: Here are a couple of examples to help to quantify kNN nonparametric estimation. These are all computer generated out of Matlab, and so don't show how to solve the problems, just the final answer. You should be able to do the first two by hand, the last two are there for context. To see a worked solution, you should look at the weekly sample problem for week 9.

Tutorial Schedule

The tutorials are meant to cover areas of potential difficulty to students, and to allow a more informal time of question and answer. We will try to plan the tutorials quite carefully; in particular, there will be two types of tutorials:
  1. Teaching tutorials, with 20 minutes of review or teaching on a topic, 20 minutes to solve an example problem, and the remaining time open for student questions.
  2. Lab tutorials, which will focus on discussion relevant to one of the three labs this term. The TA will start with 10-20 minutes of clarifications and suggestions, with the rest of the time available for questions.
Please give comments after class to me if you feel you need help in some area. Note that some tutorials will take place on Fridays, some on Wednesdays. The Wednesday/Friday schedule is not yet accurate past January.
   Date                              Style  Content
   --------------------------------  -----  -----------------------------------------
   Week  1 - Friday,    January   9  Teach  Statistics & Algebra Review
   Week  2 - Wednesday, January  14   Lab   Matlab Overview & Information (see below)
   Week  3 - Friday,    January  23  Teach  Classifier Overview, Nonparametric Methods
   Week  4 - Friday,    January  30  Teach  Eigendecompositions, Cov. sketching, GED sketch
   Week  5 - Wednesday, February  4   Lab   Lab 1 discussion and help
   Week  7 - Wednesday, February 18   ---   (reading week)
   Week  6 - Monday,    February 23  Teach  GED Sketching, MAP, Pr(Error), Midterm Review
   Week  8 - Wednesday, February 25   ---   (midterm)
   Week  9 - Wednesday, March     4   Lab   Lab 2
   Week 10 - Friday,    March    13  Teach  Parameter Estimation, Nonparametric Estimation (Parz, kNN)
   Week 11 - Friday,    March    20  Teach  Discriminants
   Week 12 - Friday,    March    27  Teach  Lab 3 discussion and help
   Week 13 - Friday,    April     3   Lab   Discrim / Clustering  

Weekly Sample Problems

Some of the problems at the end of the chapters in the course notes aren't so helpful. I have tried to update the notes to include more realistic problems, however I have also started to prepare new problems, which I'll be posting here on a weekly basis. These problems should be roughly similar to those which you might expect to see on an exam, and I usually give some sort of partial solution.
  1. Week 1: Basic PDFs, means, and variances.

  2. Week 2: 2D PDFs, marginal PDFs.

  3. Week 3: Basic ellipse plotting, MED.

  4. Week 4: Covariances & Eigendecompositions.

  5. Week 5: Distance-Based Classification.

  6. Week 6: Classifier Error Evaluation.

  7. Week 7: Parameter Estimation.

  8. Week 8: ML Estimation (*** Challenging ***).

  9. Week 9: Non-Parametric Estimation.

  10. Week 10: Discriminants.

  11. Week 11: Unlabeled / Hierarchical.

Suggested Homework Problems

Although there are no "problem sets" in this course, the ability to solve classification problems by hand is very important, so all students should really practice on sample problems. Some of the problems in the course notes are somewhat advanced and probably beyond the level of most undergraduate students, however there are a number of more basic problems which everyone should be able to do, and are strongly recommended!

Students who want some basic background problems in statistics should consult Shanmugan & Breipohl (on reserve, TK5102.5.S447):

   Examples 2 - 2, 3, 12
   Problems 2 - 1, 9, 12, 18, 23, 29, 33, 42
There are two other excellent references for students needing some help with statistics and related background. Appendices 1 and 2 in Schalkoff (on reserve, Q327.S27) and Appendix A in the book by Duda, Hart, & Stork.

Suggested problems from course notes:

   Chapter 2:   1    3   5   6    7
   Chapter 3:   1    2   7   8ab  12   13   14
   Chapter 4:   1abc 2   3   7abc  8abcd    10  (typo in 4.8: '3/2', not '2/3')
   Chapter 5:   1    4
   Chapter 6:   3    4
   Chapter 7:   3    4
   Chapter 8:   1    2abd 3   4
Old 372 midterms and finals are available from the exambank. Alternatively, here are PDF versions of my 2001 midterm, 2001 final, and 1997 final exams. The 1997 final is probably a bit on the hard side, however you may find the questions useful for studying purposes.

Solutions to Problems in Course Notes

Many of the solutions, below, are very poorly done, or written out with poor handwriting. For detailed solutions please talk to your professor or to one of the TAs. However if you are just wanting to check to see whether your answer is right, the summary solutions below should be ok:

Chapter 2: Questions 2.1 - 2.5 Question 2.6ab Question 2.6c
Question 2.7
Chapter 3: Questions 3.1 - 3.4 Questions 3.5 - 3.6 Questions 3.7 - 3.8
Questions 3.9 - 3.10 Questions 3.11 Questions 3.12
Questions 3.13 Questions 3.14
Chapter 4: Questions 4.1 - 4.2 Questions 4.3, 4.7 Question 4.4
Question 4.5 Question 4.6 Question 4.8
Question 4.9 Question 4.10
Chapter 5: Question 5.1 Question 5.4 Question 5.4
Chapter 6: Question 6.1, 6.2, 6.4 Question 6.4c (i) Question 6.4c (ii)
Chapter 7: Question 7.2a Question 7.2b Question 7.3
Question 7.4
Chapter 8: Question 8.1 Question 8.2 Question 8.3
Question 8.4

Reference Material

If students would like some alternate references for course material and for additional problems to work on, I would suggest two books:

Duda, Hart & Stork (Q327.D83, On Reserve) Pattern classification and scene analysis
(There are two editions; the second is quite new, and is an excellent book. Any student seriously interested in pattern recognition could consider purchasing this book.)

Schalkoff (Q327.S27, On Reserve) Pattern recognition : statistical, structural, and neural approaches
(An excellent book; not as comprehensive as Duda & Hart, but a considerably more readable book.)

        Topic                 SD372        Schalkoff      Duda & Hart   Duda, Hart & Stork
                           Course Notes                   (First Ed.)      (Second Ed.)

Introduction                  Ch  1         Ch 1              Ch 1            Ch 1
Statistics Background         Ch  2         Appen. 1, 2                       Appen A
Distance Classification       Ch  3                                   
Statistical Classification    Ch  4         Ch 2              Ch 2            Ch 2
Parameter Estimation          Ch  5.1       Ch 3 (p.58-70)    Ch 3            Ch 3
NonParametric Estimation      Ch  5.2       Ch 3 (p.70-75)    Ch 4            Ch 4
Linear Discriminants          Ch  6         Ch 4              Ch 5            Ch 5
Unlabeled Clustering          Ch  7         Ch 5              Ch 6            Ch 6
Feature Selection             Ch  8                           Ch 6.14         Ch 1.3

Lab Information

There will be three labs this term. The lab reports will be done in groups of two or three; the report should be formal (with an introduction, a report neatly organized into sections etc.), but not long. I have a
grammar and style guide which students should take a look at. The first lab will be handed out towards the end of January. Each of the labs will be due two weeks later being handed out. I strongly suggest that you use Matlab; here are several demo files which will help you to get started:

A very simple demo page, with a simple Matlab script which calls a short function.

Lab 0 Tutorial for SD372

Second Matlab Tutorial for SD372

Lab 1

The Assignment for Lab 1, a four-page PDF file.

Lab 1 is due by Monday, February 9 at 5pm. Since that is the week before reading week, with midterms, I strongly suggest that students get started early. You already know enough to do nearly the entire lab; only MAP has not yet been discussed in class. In the past students have spent an awful lot of time trying to figure out how to generate the appropriate plots in Matlab, so I am providing you with the appropriate routine here:

Ellipse plotting routine for Lab #1.

Lab 2

Lab Assignment: The assignment for Lab 2 can be found here.

Lab Data Sets:

Lab Comments: For the 2D case, the ideas don't really get much more complicated than 1D. A couple of pointers: 2D Parzen Routine: A few students asked about my 2D Parzen code, saying that they weren't really clear how to use it.
  [p,x,y] = parzen( data, res, win )    

  Input Parameters:
  
    data - a two-column matrix; each row [x y] defines one point

    res  - determines the spatial step between PDF estimates
           This should be a vector of five values  [res lowx lowy highx highy],
           giving the limits along the x and y axes for which the PDF should be
           estimated.
           For example, to estimate a PDF over -1 < x < 1  and 3 < y < 7,
           and the estimates should be spaced 0.01 units apart, then the 
           vector should be  [0.01 -1 3 1 7]

    win  - The code says this is optional, but the default window in the code
           is not the one you need to use for lab 2, so that means this is NOT
           optional.
           You should define the window as a matrix.  For example, a rectangular
           window would be the easiest one to do.  To get a 10x10 rectangular
           window we would pass in ones(10,10).  In your case you need a Gaussian
           window - this takes a bit more thought:  how do you create a matrix
           with a Gaussian shape?  How big should the matrix be?  

  Returned Parameters:

    x    - estimated locations along x-axis; this is just  [lowx:res(1):highx]
    y    - estimated locations along y-axis; this is just  [lowy:res(1):highy]
    p    - estimated 2D PDF, a matrix of values

The aggregation part of the lab will not be based directly on material seen in-class, however everything you need to know is specified in the lab handout.

Lab 3

Lab Assignment: The assignment for Lab 3 can be found here.

Here is the ZIP file containing the images, Matlab code, and features which you need for lab 3. There are 13 files:

  1. Ten images
  2. Two Matlab .m files - one to read the images, the other to plot features
  3. A Matlab .mat file which contains the feature values
For ease in visualization, so you don't have to plot the images yourself, the ten images are shown below:

ClothCottonGrassPigskinWood
CorkPaperStoneRaiffaFace

There is also the combined image, for segmentation, which looks like

Course Notes

Your course notes have been revised considerably from the material from a few years back. However if you have any feedback, or have errors to point out, it would be appreciated. Certainly the chapters still need more sample problems, and a lot of the figures need to be improved.


Home Teach Rsrch Stdnts Sware Paper Links
Home Teaching Research Students Software Papers Links


(Page last updated September 13, 2016, [an error occurred while processing this directive]) Locations of visitors to this page