Faculty of Arts & Sciences

STAT

Statistics 91r	Supervised Reading and Research David P. Harrington (FAS/Public Health), and members of the Department Supervised reading and research in an area of statistics agreed upon by the student and a faculty adviser.
Statistics 98	Tutorial - Junior Year David P. Harrington (FAS/Public Health) and Viktoriia Liublinska Introduction to reading, writing, presenting, and research in statistics. Students will learn to formulate and approach a research question, critically review papers that make use of statistics, and clearly communicate statistical ideas and arguments orally and in writing. Limited to junior concentrators in statistics.
Statistics 99hf	Tutorial - Senior Year David P. Harrington (FAS/Public Health) Supervised research for the senior thesis, under the mentorship of a Harvard facultymember.
Statistics 100	Introduction to Quantitative Methods for the Social Sciences and Humanities Luke Weisman Miratrix Introduces the basic concepts of statistical inference and statistical computing, both increasingly used in the social sciences and humanities. The emphasis of this course is on statistical reasoning, visualization, data analysis, and use of statistical software instead of theory. The goal is to provide pragmatic tools for assessing statistical claims and conducting basic statistical analyses. The main areas covered are classic one- and two-sample statistics, regression with one or more predictors, and bootstrap and randomization based inference. Explores applications in a wide range of fields, including the social and political sciences, medical research, and psychology.
Statistics 101	Introduction to Quantitative Methods for Psychology and the Behavioral Sciences Kevin A. Rader Similar to Statistics 100, but emphasizes concepts and practice of statistics used in psychology and other social and behavioral sciences. Topics covered: describing center and variability; probability and sampling distributions; estimation and hypothesis testing for comparing means and comparing proportions; contingency tables; correlation and regression; multiple regression; analysis of variance. Emphasis on translation of research questions into statistically testable hypotheses and models, and interpretation of results in context.
Statistics 102	Introduction to Statistics for Life Sciences David P. Harrington (FAS/Public Health) Introduces the basic concepts of probability, statistics and statistical computing used in medical and biological research. The emphasis is on data analysis and visualization instead of theory. Designed for students who intend to concentrate in a discipline from the life sciences.
Statistics 104	Introduction to Quantitative Methods for Economics Michael Isaac Parzen A rigorous introduction to statistics for students intending to study economics. Examples drawn from finance, decision analysis and economic decision-making. In addition to descriptive statistics, probability, inference and regression modeling, also covers portfolio optimization, decision analysis, and time series analysis. Students with prior exposure to introductory statistics will find some overlap of material but be exposed to new applications and learn more advanced modeling techniques.
Statistics 107	Introduction to Business and Financial Statistics Michael Isaac Parzen Introduces the technical skills required for data-driven analysis of business and financial data. Emphasis on applying statistical methods to summarize and make inferences from complex data and to develop quantitative models to assist business decision making. Topics include: how to collect and summarize financial data, understanding the concept of risk, portfolio construction and analysis, testing trading systems, and simulation techniques.
Statistics 110	Introduction to Probability Kevin Andrew Rader A comprehensive introduction to probability. Basics: sample spaces and events, conditional probability, and Bayes' Theorem. Univariate distributions: density functions, expectation and variance, Normal, t, Binomial, Negative Binomial, Poisson, Beta, and Gamma distributions. Multivariate distributions: joint and conditional distributions, independence, transformations, and Multivariate Normal. Limit laws: law of large numbers, central limit theorem. Markov chains: transition probabilities, stationary distributions, convergence.
Statistics 111	Introduction to Theoretical Statistics Kevin Andrew Rader Basic concepts of statistical inference from frequentist and Bayesian perspectives. Topics include maximum likelihood methods, confidence and Bayesian interval estimation, hypothesis testing, least squares methods and categorical data analysis.
Statistics 115	Introduction to Computational Biology and Bioinformatics Xiaole Shirley Liu (Public Health) The course will cover basic technology platforms, data analysis problems and algorithms in computational biology. Topics include sequence alignment and search, high throughput experiments for gene expression, transcription factor binding and epigenetic profiling, motif finding, RNA/protein structure prediction, proteomics and genome-wide association studies. Computational algorithms covered include hidden Markov model, Gibbs sampler, clustering and classification methods.
Statistics 120	Introduction to Applied Bayesian Inference and Multilevel Models Edoardo Maria Airoldi An introduction to the nuances of statistical inference in applied contexts. Frequentist and Bayesian techniques. A variety of classic and modern models for high-dimensional, categorical, sequence, spatial and network data. Evaluation techniques for modeling assumptions and inference strategies. Hands-on implementation of estimation and inference procedures in R. Knowledge of R programming is required (and assumed).
Statistics 121	Data Science Rafael A. Irizarry (Public Health) and Verena S. Kaynig-Fittkau Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries. Built around three modules: prediction and elections, recommendation and business analytics, and sampling and social network analysis.
Statistics 123	Applied Quantitative Finance Stephen James Blyth Introduction to financial derivatives and the probabilistic techniques used to analyze them. Topics include: forwards, swaps and options; replication, no-arbitrage and risk-neutrality; martingales, numeraires and the fundamental theorem of asset pricing; and an introduction to interest-rate derivatives and their valuation. Provides a rigorous but accessible treatment of the elegant theory underpinning quantitative finance, motivated by real problems from the financial industry.
Statistics 131	Time Series Analysis and Forecasting Neil Shephard Introduction to time series models and associated methods of data analysis and inference. Auto regressive (AR), moving average (MA), ARMA, and ARIMA processes, stationary and non-stationary processes, seasonal processes, auto-correlation and partial auto-correlation functions, identification of models, estimation of parameters, diagnostic checking of fitted models, forecasting, spectral analysis, and transfer function models.
Statistics 135	Statistical Computing Software Steven Richard Finch An introduction to major statistics packages used in academics and industry (SAS and R). Will discuss data entry and manipulation, implementing standard analyses and graphics, exploratory data analysis, simulation-based methods, and new programming methods.
Statistics 139	Statistical Sleuthing Through Linear Models Viktoriia Liublinska (fall term) and Kevin Andrew Rader (spring term) A serious introduction to statistical inference with linear models and related methods. Topics include t-tools and permutation-based alternatives, multiple-group comparisons, analysis of variance, linear regression, model checking and refinement, and causation versus correlation. Emphasis on thinking statistically, evaluating assumptions, and developing tools for real-life applications.
Statistics 140	Design of Experiments Tirthankar Dasgupta and Donald B. Rubin Statistical designs for efficient experimentation in the physical, life, social and management sciences and in engineering. A systematic approach to explore input-output relationships by deliberately manipulating input variables. Topics include completely randomized and randomized block designs, Latin square designs, balanced incomplete block designs, factorial designs, confounding in blocks, fractional replications, and re-randomization. Each topic motivated by real-life examples.
Statistics 149	Statistical Sleuthing through Generalized Linear Models Mark E. Glickman (Boston University) Sequel to Statistics 139, emphasizing common methods for analyzing continuous non-normal and categorical data. Topics include logistic regression, log-linear models, multinomial logit models, proportional odds models for ordinal data, Gamma and inverse-Gaussian models, over-dispersion, analysis of deviance, model selection and criticism, model diagnostics, and an introduction to non-parametric regression methods.
Statistics 160	Design and Analysis of Sample Surveys Alan M. Zaslavsky (Medical School) Methods for design and analysis of sample surveys. The toolkit of sample design features and their use in optimal design strategies. Sampling weights and variance estimation methods, including resampling methods. Brief overview of nonstatistical aspects of survey methodology such as survey administration and questionnaire design and validation (quantitative and qualitative). Additional topics: calibration estimators, variance estimation for complex surveys and estimators, nonresponse, missing data, hierarchical models, and small-area estimation.
Statistics 170	Quantitative Analysis of Capital Markets Neil Shephard An introduction to the analysis of capital markets using quantitative methods. Concepts include risk, expected utility, discounting, binomial-tree valuation methods, martingales, continuous time stochastic calculus methods, stochastic discount factors, financial econometric models and Monte Carlo simulations. These concepts are applied to equities, risk management and derivative pricing.
Statistics 171	Introduction to Stochastic Processes Natesh S. Pillai An introductory course in stochastic processes. Topics include Markov chains, branching processes, Poisson processes, birth and death processes, Brownian motion, martingales, introduction to stochastic integrals, and their applications.
Statistics 183	Learning from Big Data Luke Bornn Through a series of forecasting and prediction competitions, each based on a large real-world dataset, students will acquire the tools and experience to explore and model large-scale, real-life data. In addition, the course will cover a series of tools for statistical modeling in real-world environments. Some examples include bagging, boosting, collaborative model development, cross-validation, and model validation and verification.
Statistics 186	Statistical Methods for Evaluating Causal Effects Fabrizia Mealli Statistical methods for inferring causal effects from data from randomized experiments or observational studies. Students will develop expertise to assess the credibility of causal claims and the ability to apply the relevant statistical methods for causal analyses. Examples from many disciplines: economics, education, other social sciences, epidemiology, and biomedical science. Evaluations of job training programs, educational voucher schemes, changes in laws such as minimum wage laws, medical treatments, smoking, military service.
Statistics 201	Statistical Communication and Graphics Andrew Gelman
Statistics 210a	Probability Theory Jun S. Liu and Carl N. Morris Random variables, measure theory, reasoning by representation. Families of distributions: Multivariate Normal, conjugate, marginals, mixtures. Conditional distributions and expectation. Convergence, laws of large numbers, central limit theorems, and martingales.
Statistics 210b	Topics in Probability Theory Natesh S. Pillai and Alexander Volfovsky A graduate introduction to advanced probability. Foundations of probability: exchangeability and de Finetti's type theorem. Martingales: convergence theorems, optional stopping, Polya urn schemes and applications to statistics. Poisson processes: general spaces, marked processes, applications to Bayesian nonparametrics. Gaussian processes: existence theorem, Brownian bridges, meanders and excursions. Empirical processes: inequalities, central limit theorems and the bootstrap. Markov Chains: ergodicity, convergence and MCMC algorithms. Stochastic calculus: Ito's formula.
Statistics 211a	Statistical Inference Tirthankar Dasgupta Inference: frequency, Bayes, decision analysis, foundations. Likelihood, sufficiency, and information measures. Models: Normal, exponential families, multilevel, and non-parametric. Point, interval and set estimation; hypothesis tests. Computational strategies, large and moderate sample approximations.
Statistics 215	Introduction to Computational Biology and Bioinformatics Xiaole Shirley Liu (Public Health) Meets with Statistics 115, but graduate students are required to do more coding, complete a research project and submit a written report during reading period in addition to completing all work assigned for Statistics 115.
Statistics 220	Bayesian Data Analysis Jun S. Liu Basic Bayesian models, followed by more complicated hierarchical and mixture models with nonstandard solutions. Includes methods for monitoring adequacy of models and examining sensitivity of models.
Statistics 221	Statistical Computing and Learning Edoardo Maria Airoldi Computational methods commonly used in statistics: random number generation, optimization methods, numerical integration, Monte Carlo methods including Metropolis-Hastings and Gibbs samplers, approximate inference techniques including Expectation-Maximization algorithms, Laplace approximation and variational methods, data augmentation strategies, data augmentation strategies.
Statistics 225	Spatial Statistics Luke Bornn Introduction to spatial and spatio-temporal statistics. Classic spatial statistics will be covered in addition to more modern hierarchical techniques and computational methods. The course will blend theory and application, with a focus on the latter.
Statistics 230	Multivariate Statistical Analysis S. C. Samuel Kou Multivariate inference and data analysis. Advanced matrix theory and distributions, including Multivariate Normal, Wishart, and multilevel models. Supervised learning: multivariate regression, classification, and discriminant analysis. Unsupervised learning: dimension reduction, principal components, clustering, and factor analysis.
Statistics 231	Time Series Analysis and Forecasting Tirthankar Dasgupta A graduate-level course on time series models and associated methods of data analysis and inference. Review of ARIMA models, time series regression, long-memory models, state space models and Kalman filtering, multivariate time series, statistical methods in the frequency domain.
Statistics 232r	Topics in Missing Data Donald B. Rubin and Natesh S. Pillai The modern era of work on missing data problems began in the 1970s and has seen an explosion of developments since then. Seminar will focus on an updated version of a classic text, supplemented with classic articles.
Statistics 240	Matched Sampling and Study Design Donald B. Rubin and Luke Weisman Miratrix This course provides an accessible introduction to the study of matched sampling and other design techniques in any field (e.g., economics, education, epidemiology, medicine, political science, etc.) conducting empirical research to evaluate the causal effects of interventions.
Statistics 242	Permutation and Resampling Based Statistical Methods Luke Weisman Miratrix Bootstrap and resampling allow for principled data analysis in diverse areas such as social, biological, or physical sciences. We will implement methods in R, conduct simulation studies, tackle applied projects, and do theoretical work.
Statistics 244	Linear and Generalized Linear Models Alan Agresti (University of Florida) The theory and application of LINEAR and generalized linear models, including linear models for normal responses, logistic models for binary and multinomial data, loglinear models for count data, overdispersion and quasi likelihood methods, and models and methods for clustered (e.g., repeated measurement) correlated data.
Statistics 245	Statistics and Litigation Daniel James Greiner (Law School) Interaction between quantitative methods and law. Teaming with law students: analyze data, prepare expert reports, and give testimony. Learn how to communicate with and present results to untrained but intelligent users, and to defend conclusions.
Statistics 260	Design and Analysis of Sample Surveys Alan M. Zaslavsky (Medical School) Meets with Statistics 160, but graduate students will have an extended class period and complete additional assignments for a more theoretical, in-depth treatment of topics.
Statistics 265r	Reading Efron Xiao-Li Meng, Joseph K. Blitzstein and Viktoriia Liublinska Exploration of the statistical contributions of Bradley Efron through study of his writings. Both deeply influential and deeply controversial ideas will be discussed; topics include statistical foundations and principles, estimating the number of unseen species, self-consistency, empirical Bayes, large-scale inference, and the bootstrap.
Statistics 286	Theory and Practice of Principal Stratification Analysis Fabrizia Mealli Introduces the concept of Principal Stratification (PS), first formalized by Frangakis and Rubin (2002), but with roots in the Instrumental Variables literature. PS has been applied to analyze causal effects in different settings, allowing to deal with various "selection" problems or post-treatment complications, including censoring due to death, noncompliance, missing outcomes, mediation analysis, and applied in various fields including economic, social and medical studies, using different modes of inference (moment- based, likelihood-based and Bayesian). The course will blend theory and application. Recent papers will be discussed; participants will be encouraged to develop their own research problems in this active area.
Statistics 290	Statistical Communication Andrew Gelman
Statistics 300hfr	Research in Statistics Participants discuss recent research in statistics and present their own work in progress. Open to doctoral students in statistics.
Statistics 301	Special Reading and Research
Statistics 302	Direction of Doctoral Dissertations
Statistics 303hf	The Art and Practice of Teaching Statistics Required of all first-year doctoral students in Statistics.
Statistics 306	Research Topics in Sports Analytics Advanced stochastic models for the analysis of sports. Focus will be on methods for understanding player-tracking data.
Statistics 310hfr	Topics in Astrostatistics
Statistics 311	Monte Carlo Methods in Scientific Computing
Statistics 312r	Estimation Problems for Stochastic Processes and High Dimensional Data Focusing on inference problems for stochastic processes and statistical modeling in high dimensions. Contemporary papers from different fields will be discussed and presented by students. Participants will be encouraged to develop their own research problems in this active area.
Statistics 314hfr	Timely Topics in Statistics
Statistics 315	High Dimensional Causal Inference Conducting causal inference under the Neyman-Rubin model when the number of possible covariates is on the order of the number of observations, or even larger, is non-obvious. Recent developments using a variety of methods and approaches such as (sparse) regularization, nonparametric Bayes, BART, model selection, or dimension reduction, claim to address this problem. We will read and discuss the literature in this emerging area with a critical eye.
Statistics 321	Stochastic Modeling and Bayesian Inference Stochastic processes and their applications in biological, chemical and financial modeling. Bayesian inference about stochastic models based on the Monte Carlo sampling approach.
Statistics 324r	Parametric Statistical Inference and Modeling Theory of multi-level parametric models, including hidden Markov models, and applications likely to include biostatistics, health services, education, and sports.
Statistics 325hfr	Topics in Environmental Modeling Focus will be on research topics in spatial statistics, Monte Carlo, and the overlap and interplay between the two fields.
Statistics 328	Bayesian Nonparametrics Bayesian nonparametric methods including both random discrete measures and random functions. Gaussian processes (e.g., for nonparametric regression), the Chinese Restaurant process (e.g., for clustering), Pitman-Yor processes (e.g., for hierarchical clustering), and Dirichlet processes (e.g., for topic modeling).
Statistics 329	Special Topics in Bootstrap and Permutation Methods Bootstrap and permutation methods with readings both applied and theoretical. Selection of topics will vary by interest, potentially including any of Bayesian approaches, high dimensional concerns, the wild bootstrap and regression, semi-parametric likelihood with bootstrap techniques, subsampling, and more complex extensions of permutation tests.
Statistics 340	Random Network Models Random graph models for biological, social, and information networks, including fixed degree, exponential, power law, small world, and geometric random graphs. Estimation and sampling methods for network data.
Statistics 341	Advanced Topics in Experimental Design
Statistics 342	Causal Graphs in Low and High Dimensions Papers in this area will be read with a skeptical but judicious eye. When could these methods offer something tangible, when might they fail, and how can we know in which circumstance we lie?
Statistics 366hfr	Introduction to Research Introduction to the process of developing research ideas into publications in Statistics, using case studies and actual research projects. Emphasizes scientific communication in research papers and presentations, deciphering referee reports, and finding the right forum.
Statistics 385	Statistical Machine Learning Hands-on introduction to network statistics, with applications to social, biological and communication networks. Topics in sampling designs and inference. Modeling network evolution. Processes on networks. Critical literature review, in class-presentations, and final projects.
Statistics 392hf	Research Topics in Missing Data, Matching and Causality Students will make at least one presentation on current research in applied or theoretical statistics. All registered students are expected to participate by offering commentary/suggestions during presentations. This is a requirement to obtain credit.
Statistics 399	Problem Solving in Statistics Aimed at helping Statistics PhD students transition through the qualifying exams and into research.