Faculty of Arts & Sciences

Return to Department List

Department - STAT

Statistics 91 rSupervised Reading and Research
David P. Harrington (FAS/Public Health), and members of the Department

Supervised reading and research in an area of statistics agreed upon by the student and a faculty adviser.

Statistics 98 Tutorial - Junior Year
David P. Harrington (FAS/Public Health) and Viktoriia Liublinska

Introduction to reading, writing, presenting, and research in statistics. Students will learn to formulate and approach a research question, critically review papers that make use of statistics, and clearly communicate statistical ideas and arguments orally and in writing. Limited to junior concentrators in statistics.

Statistics 99 hfTutorial - Senior Year
David P. Harrington (FAS/Public Health)

Supervised research for the senior thesis, under the mentorship of a Harvard facultymember.

Statistics 100 Introduction to Quantitative Methods for the Social Sciences and Humanities
Luke Weisman Miratrix

Introduces the basic concepts of statistical inference and statistical computing, both increasingly used in the social sciences and humanities. The emphasis of this course is on statistical reasoning, visualization, data analysis, and use of statistical software instead of theory. The goal is to provide pragmatic tools for assessing statistical claims and conducting basic statistical analyses. The main areas covered are classic one- and two-sample statistics, regression with one or more predictors, and bootstrap and randomization based inference. Explores applications in a wide range of fields, including the social and political sciences, medical research, and psychology.

Statistics 101 Introduction to Quantitative Methods for Psychology and the Behavioral Sciences
Kevin A. Rader

Similar to Statistics 100, but emphasizes concepts and practice of statistics used in psychology and other social and behavioral sciences. Topics covered: describing center and variability; probability and sampling distributions; estimation and hypothesis testing for comparing means and comparing proportions; contingency tables; correlation and regression; multiple regression; analysis of variance. Emphasis on translation of research questions into statistically testable hypotheses and models, and interpretation of results in context.

Statistics 102 Introduction to Statistics for Life Sciences
David P. Harrington (FAS/Public Health)

Introduces the basic concepts of probability, statistics and statistical computing used in medical and biological research. The emphasis is on data analysis and visualization instead of theory. Designed for students who intend to concentrate in a discipline from the life sciences.

Statistics 104 Introduction to Quantitative Methods for Economics
Michael Isaac Parzen

A rigorous introduction to statistics for students intending to study economics. Examples drawn from finance, decision analysis and economic decision-making. In addition to descriptive statistics, probability, inference and regression modeling, also covers portfolio optimization, decision analysis, and time series analysis. Students with prior exposure to introductory statistics will find some overlap of material but be exposed to new applications and learn more advanced modeling techniques.

Statistics 107 Introduction to Business and Financial Statistics
Michael Isaac Parzen

Introduces the technical skills required for data-driven analysis of business and financial data. Emphasis on applying statistical methods to summarize and make inferences from complex data and to develop quantitative models to assist business decision making. Topics include: how to collect and summarize financial data, understanding the concept of risk, portfolio construction and analysis, testing trading systems, and simulation techniques.

Statistics 110 Introduction to Probability
Kevin Andrew Rader

A comprehensive introduction to probability. Basics: sample spaces and events, conditional probability, and Bayes' Theorem. Univariate distributions: density functions, expectation and variance, Normal, t, Binomial, Negative Binomial, Poisson, Beta, and Gamma distributions. Multivariate distributions: joint and conditional distributions, independence, transformations, and Multivariate Normal. Limit laws: law of large numbers, central limit theorem. Markov chains: transition probabilities, stationary distributions, convergence.

Statistics 111 Introduction to Theoretical Statistics
Kevin Andrew Rader

Basic concepts of statistical inference from frequentist and Bayesian perspectives. Topics include maximum likelihood methods, confidence and Bayesian interval estimation, hypothesis testing, least squares methods and categorical data analysis.

Statistics 115 Introduction to Computational Biology and Bioinformatics
Xiaole Shirley Liu (Public Health)

The course will cover basic technology platforms, data analysis problems and algorithms in computational biology. Topics include sequence alignment and search, high throughput experiments for gene expression, transcription factor binding and epigenetic profiling, motif finding, RNA/protein structure prediction, proteomics and genome-wide association studies. Computational algorithms covered include hidden Markov model, Gibbs sampler, clustering and classification methods.

Statistics 120 Introduction to Applied Bayesian Inference and Multilevel Models
Edoardo Maria Airoldi

An introduction to the nuances of statistical inference in applied contexts. Frequentist and Bayesian techniques. A variety of classic and modern models for high-dimensional, categorical, sequence, spatial and network data. Evaluation techniques for modeling assumptions and inference strategies. Hands-on implementation of estimation and inference procedures in R. Knowledge of R programming is required (and assumed).

Statistics 121 Data Science
Rafael A. Irizarry (Public Health) and Verena S. Kaynig-Fittkau

Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries. Built around three modules: prediction and elections, recommendation and business analytics, and sampling and social network analysis.

Statistics 123 Applied Quantitative Finance
Stephen James Blyth

Introduction to financial derivatives and the probabilistic techniques used to analyze them. Topics include: forwards, swaps and options; replication, no-arbitrage and risk-neutrality; martingales, numeraires and the fundamental theorem of asset pricing; and an introduction to interest-rate derivatives and their valuation. Provides a rigorous but accessible treatment of the elegant theory underpinning quantitative finance, motivated by real problems from the financial industry.

Statistics 131 Time Series Analysis and Forecasting
Neil Shephard

Introduction to time series models and associated methods of data analysis and inference. Auto regressive (AR), moving average (MA), ARMA, and ARIMA processes, stationary and non-stationary processes, seasonal processes, auto-correlation and partial auto-correlation functions, identification of models, estimation of parameters, diagnostic checking of fitted models, forecasting, spectral analysis, and transfer function models.

Statistics 135 Statistical Computing Software
Steven Richard Finch

An introduction to major statistics packages used in academics and industry (SAS and R). Will discuss data entry and manipulation, implementing standard analyses and graphics, exploratory data analysis, simulation-based methods, and new programming methods.

Statistics 139 Statistical Sleuthing Through Linear Models
Viktoriia Liublinska (fall term) and Kevin Andrew Rader (spring term)

A serious introduction to statistical inference with linear models and related methods. Topics include t-tools and permutation-based alternatives, multiple-group comparisons, analysis of variance, linear regression, model checking and refinement, and causation versus correlation. Emphasis on thinking statistically, evaluating assumptions, and developing tools for real-life applications.

Statistics 140 Design of Experiments
Tirthankar Dasgupta and Donald B. Rubin

Statistical designs for efficient experimentation in the physical, life, social and management sciences and in engineering. A systematic approach to explore input-output relationships by deliberately manipulating input variables. Topics include completely randomized and randomized block designs, Latin square designs, balanced incomplete block designs, factorial designs, confounding in blocks, fractional replications, and re-randomization. Each topic motivated by real-life examples.

Statistics 149 Statistical Sleuthing through Generalized Linear Models
Mark E. Glickman (Boston University)

Sequel to Statistics 139, emphasizing common methods for analyzing continuous non-normal and categorical data. Topics include logistic regression, log-linear models, multinomial logit models, proportional odds models for ordinal data, Gamma and inverse-Gaussian models, over-dispersion, analysis of deviance, model selection and criticism, model diagnostics, and an introduction to non-parametric regression methods.

Statistics 160 Design and Analysis of Sample Surveys
Alan M. Zaslavsky (Medical School)

Methods for design and analysis of sample surveys. The toolkit of sample design features and their use in optimal design strategies. Sampling weights and variance estimation methods, including resampling methods. Brief overview of nonstatistical aspects of survey methodology such as survey administration and questionnaire design and validation (quantitative and qualitative). Additional topics: calibration estimators, variance estimation for complex surveys and estimators, nonresponse, missing data, hierarchical models, and small-area estimation.

Statistics 170 Quantitative Analysis of Capital Markets
Neil Shephard

An introduction to the analysis of capital markets using quantitative methods. Concepts include risk, expected utility, discounting, binomial-tree valuation methods, martingales, continuous time stochastic calculus methods, stochastic discount factors, financial econometric models and Monte Carlo simulations. These concepts are applied to equities, risk management and derivative pricing.

Statistics 171 Introduction to Stochastic Processes
Natesh S. Pillai

An introductory course in stochastic processes. Topics include Markov chains, branching processes, Poisson processes, birth and death processes, Brownian motion, martingales, introduction to stochastic integrals, and their applications.

Statistics 183 Learning from Big Data
Luke Bornn

Through a series of forecasting and prediction competitions, each based on a large real-world dataset, students will acquire the tools and experience to explore and model large-scale, real-life data. In addition, the course will cover a series of tools for statistical modeling in real-world environments. Some examples include bagging, boosting, collaborative model development, cross-validation, and model validation and verification.

Statistics 186 Statistical Methods for Evaluating Causal Effects
Fabrizia Mealli

Statistical methods for inferring causal effects from data from randomized experiments or observational studies. Students will develop expertise to assess the credibility of causal claims and the ability to apply the relevant statistical methods for causal analyses. Examples from many disciplines: economics, education, other social sciences, epidemiology, and biomedical science. Evaluations of job training programs, educational voucher schemes, changes in laws such as minimum wage laws, medical treatments, smoking, military service.

Statistics 201 Statistical Communication and Graphics
Andrew Gelman

Statistics 210 aProbability Theory
Jun S. Liu and Carl N. Morris

Random variables, measure theory, reasoning by representation. Families of distributions: Multivariate Normal, conjugate, marginals, mixtures. Conditional distributions and expectation. Convergence, laws of large numbers, central limit theorems, and martingales.

Statistics 210 bTopics in Probability Theory
Natesh S. Pillai and Alexander Volfovsky

A graduate introduction to advanced probability. Foundations of probability: exchangeability and de Finetti's type theorem. Martingales: convergence theorems, optional stopping, Polya urn schemes and applications to statistics. Poisson processes: general spaces, marked processes, applications to Bayesian nonparametrics. Gaussian processes: existence theorem, Brownian bridges, meanders and excursions. Empirical processes: inequalities, central limit theorems and the bootstrap. Markov Chains: ergodicity, convergence and MCMC algorithms. Stochastic calculus: Ito's formula.

Statistics 211 aStatistical Inference
Tirthankar Dasgupta

Inference: frequency, Bayes, decision analysis, foundations. Likelihood, sufficiency, and information measures. Models: Normal, exponential families, multilevel, and non-parametric. Point, interval and set estimation; hypothesis tests. Computational strategies, large and moderate sample approximations.

Statistics 215 Introduction to Computational Biology and Bioinformatics
Xiaole Shirley Liu (Public Health)

Meets with Statistics 115, but graduate students are required to do more coding, complete a research project and submit a written report during reading period in addition to completing all work assigned for Statistics 115.

Statistics 220 Bayesian Data Analysis
Jun S. Liu

Basic Bayesian models, followed by more complicated hierarchical and mixture models with nonstandard solutions. Includes methods for monitoring adequacy of models and examining sensitivity of models.

Statistics 221 Statistical Computing and Learning
Edoardo Maria Airoldi

Computational methods commonly used in statistics: random number generation, optimization methods, numerical integration, Monte Carlo methods including Metropolis-Hastings and Gibbs samplers, approximate inference techniques including Expectation-Maximization algorithms, Laplace approximation and variational methods, data augmentation strategies, data augmentation strategies.

Statistics 225 Spatial Statistics
Luke Bornn

Introduction to spatial and spatio-temporal statistics. Classic spatial statistics will be covered in addition to more modern hierarchical techniques and computational methods. The course will blend theory and application, with a focus on the latter.

Statistics 230 Multivariate Statistical Analysis
S. C. Samuel Kou

Multivariate inference and data analysis. Advanced matrix theory and distributions, including Multivariate Normal, Wishart, and multilevel models. Supervised learning: multivariate regression, classification, and discriminant analysis. Unsupervised learning: dimension reduction, principal components, clustering, and factor analysis.

Statistics 231 Time Series Analysis and Forecasting
Tirthankar Dasgupta

A graduate-level course on time series models and associated methods of data analysis and inference. Review of ARIMA models, time series regression, long-memory models, state space models and Kalman filtering, multivariate time series, statistical methods in the frequency domain.

Statistics 232 rTopics in Missing Data
Donald B. Rubin and Natesh S. Pillai

The modern era of work on missing data problems began in the 1970s and has seen an explosion of developments since then. Seminar will focus on an updated version of a classic text, supplemented with classic articles.

Statistics 240 Matched Sampling and Study Design
Donald B. Rubin and Luke Weisman Miratrix

This course provides an accessible introduction to the study of matched sampling and other design techniques in any field (e.g., economics, education, epidemiology, medicine, political science, etc.) conducting empirical research to evaluate the causal effects of interventions.

Statistics 242 Permutation and Resampling Based Statistical Methods
Luke Weisman Miratrix

Bootstrap and resampling allow for principled data analysis in diverse areas such as social, biological, or physical sciences. We will implement methods in R, conduct simulation studies, tackle applied projects, and do theoretical work.

Statistics 244 Linear and Generalized Linear Models
Alan Agresti (University of Florida)

The theory and application of LINEAR and generalized linear models, including linear models for normal responses, logistic models for binary and multinomial data, loglinear models for count data, overdispersion and quasi likelihood methods, and models and methods for clustered (e.g., repeated measurement) correlated data.

Statistics 245 Statistics and Litigation
Daniel James Greiner (Law School)

Interaction between quantitative methods and law. Teaming with law students: analyze data, prepare expert reports, and give testimony. Learn how to communicate with and present results to untrained but intelligent users, and to defend conclusions.

Statistics 260 Design and Analysis of Sample Surveys
Alan M. Zaslavsky (Medical School)

Meets with Statistics 160, but graduate students will have an extended class period and complete additional assignments for a more theoretical, in-depth treatment of topics.

Statistics 265 rReading Efron
Xiao-Li Meng, Joseph K. Blitzstein and Viktoriia Liublinska

Exploration of the statistical contributions of Bradley Efron through study of his writings. Both deeply influential and deeply controversial ideas will be discussed; topics include statistical foundations and principles, estimating the number of unseen species, self-consistency, empirical Bayes, large-scale inference, and the bootstrap.

Statistics 286 Theory and Practice of Principal Stratification Analysis
Fabrizia Mealli

Introduces the concept of Principal Stratification (PS), first formalized by Frangakis and Rubin (2002), but with roots in the Instrumental Variables literature. PS has been applied to analyze causal effects in different settings, allowing to deal with various "selection" problems or post-treatment complications, including censoring due to death, noncompliance, missing outcomes, mediation analysis, and applied in various fields including economic, social and medical studies, using different modes of inference (moment- based, likelihood-based and Bayesian). The course will blend theory and application. Recent papers will be discussed; participants will be encouraged to develop their own research problems in this active area.

Statistics 290 Statistical Communication
Andrew Gelman

Statistics 300 hfrResearch in Statistics

Participants discuss recent research in statistics and present their own work in progress. Open to doctoral students in statistics.

Statistics 301 Special Reading and Research

Statistics 302 Direction of Doctoral Dissertations

Statistics 303 hfThe Art and Practice of Teaching Statistics

Required of all first-year doctoral students in Statistics.

Statistics 306 Research Topics in Sports Analytics

Advanced stochastic models for the analysis of sports. Focus will be on methods for understanding player-tracking data.

Statistics 310 hfrTopics in Astrostatistics

Statistics 311 Monte Carlo Methods in Scientific Computing

Statistics 312 rEstimation Problems for Stochastic Processes and High Dimensional Data

Focusing on inference problems for stochastic processes and statistical modeling in high dimensions. Contemporary papers from different fields will be discussed and presented by students. Participants will be encouraged to develop their own research problems in this active area.

Statistics 314 hfrTimely Topics in Statistics

Statistics 315 High Dimensional Causal Inference

Conducting causal inference under the Neyman-Rubin model when the number of possible covariates is on the order of the number of observations, or even larger, is non-obvious. Recent developments using a variety of methods and approaches such as (sparse) regularization, nonparametric Bayes, BART, model selection, or dimension reduction, claim to address this problem. We will read and discuss the literature in this emerging area with a critical eye.

Statistics 321 Stochastic Modeling and Bayesian Inference

Stochastic processes and their applications in biological, chemical and financial modeling. Bayesian inference about stochastic models based on the Monte Carlo sampling approach.

Statistics 324 rParametric Statistical Inference and Modeling

Theory of multi-level parametric models, including hidden Markov models, and applications likely to include biostatistics, health services, education, and sports.

Statistics 325 hfrTopics in Environmental Modeling

Focus will be on research topics in spatial statistics, Monte Carlo, and the overlap and interplay between the two fields.

Statistics 328 Bayesian Nonparametrics

Bayesian nonparametric methods including both random discrete measures and random functions. Gaussian processes (e.g., for nonparametric regression), the Chinese Restaurant process (e.g., for clustering), Pitman-Yor processes (e.g., for hierarchical clustering), and Dirichlet processes (e.g., for topic modeling).

Statistics 329 Special Topics in Bootstrap and Permutation Methods

Bootstrap and permutation methods with readings both applied and theoretical. Selection of topics will vary by interest, potentially including any of Bayesian approaches, high dimensional concerns, the wild bootstrap and regression, semi-parametric likelihood with bootstrap techniques, subsampling, and more complex extensions of permutation tests.

Statistics 340 Random Network Models

Random graph models for biological, social, and information networks, including fixed degree, exponential, power law, small world, and geometric random graphs. Estimation and sampling methods for network data.

Statistics 341 Advanced Topics in Experimental Design

Statistics 342 Causal Graphs in Low and High Dimensions

Papers in this area will be read with a skeptical but judicious eye. When could these methods offer something tangible, when might they fail, and how can we know in which circumstance we lie?

Statistics 366 hfrIntroduction to Research

Introduction to the process of developing research ideas into publications in Statistics, using case studies and actual research projects. Emphasizes scientific communication in research papers and presentations, deciphering referee reports, and finding the right forum.

Statistics 385 Statistical Machine Learning

Hands-on introduction to network statistics, with applications to social, biological and communication networks. Topics in sampling designs and inference. Modeling network evolution. Processes on networks. Critical literature review, in class-presentations, and final projects.

Statistics 392 hfResearch Topics in Missing Data, Matching and Causality

Students will make at least one presentation on current research in applied or theoretical statistics. All registered students are expected to participate by offering commentary/suggestions during presentations. This is a requirement to obtain credit.

Statistics 399 Problem Solving in Statistics

Aimed at helping Statistics PhD students transition through the qualifying exams and into research.