教学大纲
|
这门课程是为了修过概率以及线性代数的学生所开设,对应用统计学以及数据分析进行介绍;主题包含了简单 /多元线性回归(simple and multiple linear regression),方差分析 (analysis of variance, ANOVA) 以及非参数数方法 (nonparametric methods)。课程为整学期,每周有三小时的授课以及一小时的复习。由于在没有软件工具和实作的情况下很难进行数据分析,因此,除了解答教材里的问题之外,还会介绍统计计算。
讲师
Elizabeth Newton博士。通常在下课后都找得到我,然而要与我见面的最好方法是先约个时间-不论是用电话或是电邮。办公室的开放时间会再公布。关于这堂课的消息将会发布在课程教学网站上。
教材
《统计与数据分析:从基础到进阶》,Ajit C. Tamhane 以及 Dorothy D. Dunlop 合着 (Prentice Hall, 2000)。我还会列一些参考书目。关于这本教科书的介绍以及勘误可参照作者的网页。
授课方式
我在讲课时会使用投影机,并且会把副本放在教学网页上。其他细节会写在教室的黑板上。即便你不全部了解,但在授课之前如果能先阅读教材,对于讨论的内容才会有概念。在我授课时请多问问题。大部分的时间会花在较难的主题上,而不是那些你们轻易就能了解的部分。课堂授课绝对无法取代研读教材,因为我时常会在课堂上演算例题,而不是重复教科书里的细节。在第一次阅读教材和授课之后,你们将会有作业。这会使你们再重读一次教材并且想出一些新问题,这些问题会在复习课或是办公室开放时间内提出来讨论。
复习/实习课
一般说来,助教会主持复习课,讲述数据分析、统计计算的内容。同时也会有时间来讨论家庭作业的问题、例题或者是澄清在上课时不清楚的地方。
G评分及测验
本课程的目的是为了让每个人都能学到课程内容。评分是必须的,然而拿到低分显示了你我都没有尽到本分。如果你有问题,不要拖到最后一刻。每一周或每十天会有作业,作业评分后会发回。(可能会采抽样的方式;也就是说,作业中的问题只有一部分会被打分。不过所有的问题都会提供答案。)一旦我们发出答案(通常是在缴交期限后的下一堂课),就不会再收迟交的作业。
期中考会在课堂上进行,开卷考试,时间为一个半小时;期末考试是开卷考试三个小时。在学期中可能会有随堂小测,如有需要,内容也许是课堂上强调的内容。
如果没有随堂小测,则作业分数将占35%,课堂参与占10%,期中考试成绩占20%,期末考试成绩占35%。有随堂小测的话这些比重会再调整。
使用计算机计算
现今有不少数据分析包,本学期我们会使用S-PLUS,这在麻省理工学院可以很容易获得,并且是在学界以及业界广泛使用的一套软件。然而,你也可以使用其他软件,但我们不能保证提供支持。服务器有S-PLUS,SAS和STATA;史隆计算实验室提供了S-PLUS, SAS,JMP,SPSS,想玩一些程序设计的人也有STATA和MATLAB可供选择。我们发现对S-PLUS,SAS或是SPSS有一些基本的认识对于之后的学术研究、暑期工或是正职工作都有帮助。
关于S-PLUS有很多介绍性的书籍,包括Longbow Lam着的《Windows版S-Plus简介》,以及 Krause 和 Olson 合着的《S与S-Plus基础》,这两本在Insightful Corporation找得到。
书本所附的光盘里没有的数据集会放在文件服务器上。
在家使用计算机计算
S-PLUS,SAS,JMP,STATA和SPSS都有学生版供学生安装在自家的计算机上。我们有S-PLUS 6.1版的光盘供学生复制并安装在家庭计算机上。
课程份量
这是一门4-0-8的课程,意思是每周主要会有三个小时的授课,另外还会有一小时的复习或是演示。而家庭作业大概会花掉一个中等程度的学生每周八小时的时间。如果我们错估了这门课的份量(通常是因为计算机计算会花掉比我们预估得还要久的时间),请让我们知道,我们会去了解其他同学是否也有一样的想法。
意见反馈
请让我或助教知道授课方式、作业或是课程内容等是否无误(必要时可匿名)。填写学期末的意见表格对未来的学生会有帮助,但是对正在修这门课的你们却不会。在学期中一起来解决这些问题,让你们对这门课感到满意,这对我们大家来说都好得多。
学术上的诚实
你们最好自己试着做作业并且提问。万不得已时可以跟你的同学讨论以澄清疑惑。你所交上来的作业应该要是你自己的成果。最好的情况是我将收到不一样的答案,其中有对有错。在以往,如果作业缺乏这样的差异,将招致令人难堪的询问以及难看的成绩。当然,考试更应该全部是你自己的成果。只要有作弊的证据,这门课就会不及格,并且会透过适当的麻省理工学院程序以及委员会,执行惩处行为。协助作弊者亦同。
麻省理工学院的学术诚实政策可在 麻省理工学院政策和流程网页中找到。
JMP®和所有其它SAS联会公司的产品或服务名称是美国或者其它国家的SAS联会公司的注册商标或商标。®表示美国注册。
MATLAB®是MathWorks公司的商标。
S-PLUS®是Insightful公司的注册商标。
SAS®和所有其它SAS联会公司的产品或服务名称是美国或者其它国家的SAS联会公司的注册商标或商标。®代表美国注册。
SPSS®是SPSS公司的注册商标。
S-PLUS® 和S-PLUS®标识是StataCorp LP的注册商标。
This course is an introduction to data analysis and applied statistics, including multiple regression, analysis of variance and nonparametric methods for students who have taken a course in probability and a course in linear algebra. It is a full semester course with three hours of lectures and a one-hour recitation each week. Data analysis is difficult without some computing tools and the recitations, in addition to answering questions about the course material, will introduce you to statistical computing.
Instructor
Dr. Elizabeth Newton. Often, I am available after class, but the best way to see me is to schedule some time by phone or email. Office hours will be announced. Information about the course will be posted on the class web site.
Text
Statistics and Data Analysis: From Elementary to Intermediate (SDA) by Ajit C. Tamhane and Dorothy D. Dunlop (Prentice Hall, 2000). I will put some other books on reserve. Information about the text and any errors it contains can be found at the authors' web site.
Lectures
I will use overheads during lectures and will put copies on the class web site. Other details will be given on the board. You should read the material before lecture so that you have some idea of what will be discussed even if you don't understand everything. Please ask questions as I go along. Most of our time will be spent covering more difficult material rather than things you can understand easily. Class lectures definitely will not replace reading the textbook as I often will do examples rather than repeat details that are in the text. After a first reading and the lectures, you should attempt the homework. This will require you to reread the material and generate some new questions that you should bring up in recitation or in office hours.
Recitations
Generally, the Teaching Assistant will conduct the recitations and cover material related to data analysis and statistical computing. There will also be time to discuss homework problems, examples, and clear up any confusion from lectures.
Grading and Exams
The idea is to have everyone learn the material. Grades are required, but low grades are an indication that both you and I have failed to do our job. If you are having problems, don't let them slide until the end. There will also be homework every week or ten days that will be graded and returned. (Sampling may be used; i.e. only a portion of the problems may be graded. However, solutions will be provided to all of them.) Once we have handed out the solution sheet for a homework set (usually at the next class after it is due), late homework will not be accepted.
The midterm will be a 1.5 hour in-class open book examination and the final will be a scheduled 3 hour open book examination. There may be quizzes during the semester covering material that is emphasized in lecture if this appears to be necessary.
Without quizzes, the homework will count 35%, class participation 10%, midterm 20% and final 35%. If quizzes become necessary these percentages will be adjusted accordingly.
Computing
Many data analysis packages are available and this semester we will use S-PLUS® which is easily available at MIT and widely used in teaching and industry. However, if you wish, you may use anything else but we cannot promise support. The server has S-PLUS®, SAS® and STATA®. The Sloan Computing Labs support S-PLUS®, SAS®, JMP®, SPSS®, and STATA®. MATLAB® is also a possibility for those who want to do some programming. We have found that some knowledge of S-PLUS®, SAS® or SPSS® can lead to good academic year, summer, and permanent jobs.
There are many introductory books on S-PLUS®, including An Introduction to S-Plus® for Windows by Longhow Lam, or The Basics of S and S-Plus®, by Krause and Olson, both available from the Insightful Corporation.
Datasets not on the disks at the back of the textbooks will be on the fileserver.
Home Computing
There are student versions of S-PLUS®, SAS®, JMP®, STATA® and SPSS® that allow you to have these packages on your home computer. An S-PLUS® 6.1 CD will be available which students can copy for installation on their home machine.
Work Load
This is a 4-0-8 course. We will have three main hours of lecture and one additional hour of recitation or demonstration each week. Homeworks should take the median student about 8 hours each week. If we have misjudged this load (most often because computing can take more time than we think), please let us know and we will see how the rest of the class feels as well.
Feedback
Please let me (or the TA) know (anonymously, if you wish) what is going right and what is going wrong with lectures, homework, content, etc. Filling out forms at the end of the course will help future students, but will not help you while you are taking the course. It is far better for all of us if we can work on these problems during the course and leave you satisfied at the end.
Academic Honesty
It is best to attempt the homework on your own and then ask us questions. In a pinch, talk to your classmates for clarification. What goes on your homework paper should be your own work. As a statistician, I expect variation among students both in correct and incorrect solutions. Lack of such variation has led to embarrassing questions and reduced grades in the past. The exams should, of course, be entirely your own work. Any evidence of cheating will result in a failing grade for the course and disciplinary action through the appropriate MIT procedures and committees. This applies to those who give help as well as to those who receive it.
MIT's academic honesty policy can be found at MIT Policies and Procedures.
JMP® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
MATLAB® is a trademark of The MathWorks, Inc.
S-PLUS® is a registered trademark of Insightful Corporation.
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SPSS® is a registered trademark of SPSS Inc.
STATA® and the STATA® logo are registered trademarks of StataCorp LP.
