Skip to Main Content

Learning and Using R at Stanford

Stanford Classes that teach R

Spring 2024

number title description term / units
BIODS 352 Topics in Computing for Data Science (STATS 352) A seminar-style course with lectures on a range of computational topics important for modern data-intensive science, jointly supported by the Statistics department and Stanford Data Science, and suitable for advanced undergraduate/graduate students engaged in either research on data science techniques (statistical or computational, for example) or research in scientific fields relying on advanced data science to achieve its goals. Seminars will alternate a presentation of a topic, usually by an expert on that topic, typically leading to exercises applying the techniques, with a follow up lecture to further discuss the topic and the exercises. Prerequisites: Understanding of basic modern data science and competence in related programming, e.g., in R or Python. https://stats352.stanford.edu/ Terms: Spr | Units: 1
EPI 270 Big Data Methods for Behavioral, Social, and Population Health Research This course will expose students from a variety of quantitative backgrounds to study design and analysis strategies for addressing specific hypotheses using the varied sources of behavioral, social, and population health sciences research data, and the analytic tools available for analyzing these data. The purpose of this foundational course is to lay the groundwork to have a framework for conceptualizing experiments and observational studies that rely on big data in behavioral science and population health. The two types of data included are: (1) intensive or voluminous longitudinal data from mHealth, smartphone, and sensor technologies large and (2) large and complex data from internet data sources such as social media and Google search trends. The course features many speakers from Stanford and other institutions who are carrying out cutting edge research using high-dimensional or heterogenous data using innovative methods. Students will have the opportunity to choose a data set from among a variety of data sources, analyze the data and present their findings to the class. Each student will do a final project in an area of their own primary interest; many students are able to substantially develop projects that they subsequently use in their own thesis or dissertation research. Prerequisites: EPI 258/259 (or equivalent statistics course, please contact instructor for approval). Students must have some experience in statistical programming in SAS or R. Terms: Spr | Units: 2-3
MS&E 244 Statistical Arbitrage Practical introduction to statistical arbitrage, which typically refers to trading strategies that are bottom up, market neutral, with trading driven by statistical or econometric models. Models may focus on tendency of short term returns to revert, leads/lags among correlated instruments, volume momentum, or behavioral effects. A classic statistical arbitrage program is relatively high frequency over a large universe of stocks and is driven algorithmically. This course discusses a taxonomy of market participants and what motivates trading, data: different types, how to obtain data, timestamps, errors and dirty data, methods of exploring relationships between instruments, forecasting, portfolio construction across a large number of instruments, trading: the execution of portfolio changes in real markets, risks inherent in statistical arbitrage, nonstationarity of relationships due to changes in market regulations, fluctuations in market volatility and other factors, frictions such as costs of trading and constraints and how strategies scale, analysis of strategies. Prepares students with valuable skills for engaging in quantitative trading in a hedge fund or investment bank trading desk, understanding how to evaluate quantitative strategies from the point of view of an investor or asset allocator, including performance evaluation, risk analysis, and strategy capacity analysis. Occasional hands-on data projects supporting weekly topics. Weekly lectures and a final data-driven project. The objective of the final project is to build, test and analyze some kind of statistical arbitrage strategy. Prerequisites: MS&E 245A or similar, some background in probability and statistics, working knowledge of R, Python or similar computational/statistical package. Terms: Spr | Units: 3
OCEANS 174H Experimental Design and Probability (OCEANS 274H) Nature is inherently variable. Statistics gives us the tools to quantify the uncertainty of our measurements and draw conclusions from data. This course is an introduction to experimental design, probability, and data analysis. Topics include summary statistics, data visualization, probability distributions, statistical inference, and general linear models (e.g., t-tests, analysis of variance, regression). Students will use R to explore and analyze datasets relevant to the life and ocean sciences. No programming or statistical background is assumed. This course takes place in-person only at Hopkins Marine Station; for information on how to spend spring quarter in residence: https://hopkinsmarinestation.stanford.edu/undergraduate-studies/spring-courses-23-24 (Individual course registration also permitted.) Depending on enrollment numbers, a weekly shuttle to Hopkins or mileage reimbursements for qualifying carpools will be provided; terms and conditions apply. Graduate students register for OCEANS 274H. Terms: Spr | Units: 4
OCEANS 274H Experimental Design and Probability (OCEANS 174H) Nature is inherently variable. Statistics gives us the tools to quantify the uncertainty of our measurements and draw conclusions from data. This course is an introduction to experimental design, probability, and data analysis. Topics include summary statistics, data visualization, probability distributions, statistical inference, and general linear models (e.g., t-tests, analysis of variance, regression). Students will use R to explore and analyze datasets relevant to the life and ocean sciences. No programming or statistical background is assumed. This course takes place in-person only at Hopkins Marine Station; for information on how to spend spring quarter in residence: https://hopkinsmarinestation.stanford.edu/undergraduate-studies/spring-courses-23-24 (Individual course registration also permitted.) Depending on enrollment numbers, a weekly shuttle to Hopkins or mileage reimbursements for qualifying carpools will be provided; terms and conditions apply. Graduate students register for OCEANS 274H. Terms: Spr | Units: 4
STATS 32 Introduction to R for Undergraduates This short course runs for weeks one through five of the quarter. It is recommended for undergraduate students who want to use R in the humanities or social sciences and for students who want to learn the basics of R programming. The goal of the short course is to familiarize students with R's tools for data analysis. Lectures will be interactive with a focus on learning by example, and assignments will be application-driven. No prior programming experience is needed. Topics covered include basic data structures, File I/O, data transformation and visualization, simple statistical tests, etc, and some useful packages in R. Prerequisite: undergraduate student. Priority given to non-engineering students. Laptops necessary for use in class. Terms: Aut, Spr | Units: 1
STATS 191 Introduction to Applied Statistics Statistical tools for modern data analysis. Topics include regression and prediction, elements of the analysis of variance, bootstrap, and cross-validation. Emphasis is on conceptual rather than theoretical understanding. Applications to social/biological sciences. Student assignments/projects require use of the software package R. Prerequisite: introductory statistical methods course. Recommended: 60, 110, or 141. Terms: Spr, Sum | Units: 3
STATS 305C Applied Statistics III Methods for multivariate responses. Theory, computation, and practice for multivariate statistical tools. Topics may include multivariate Gaussian models, probabilistic graphical models, MCMC and variational Bayesian inference, dimensionality reduction, principal components, factor analysis, independent components analysis, canonical correlations, linear discriminant analysis, hierarchical clustering, bi-clustering, multidimensional scaling and variants (e.g., Isomap, spectral clustering, t-SNE), matrix completion, topic modeling, and state space models. Extensive work with data involving programming, ideally in Python and/or R. Prerequisites: Stats 305A and Stats 305B or consent of the instructor. Terms: Spr | Units: 3
STATS 352 Topics in Computing for Data Science (BIODS 352) A seminar-style course with lectures on a range of computational topics important for modern data-intensive science, jointly supported by the Statistics department and Stanford Data Science, and suitable for advanced undergraduate/graduate students engaged in either research on data science techniques (statistical or computational, for example) or research in scientific fields relying on advanced data science to achieve its goals. Seminars will alternate a presentation of a topic, usually by an expert on that topic, typically leading to exercises applying the techniques, with a follow up lecture to further discuss the topic and the exercises. Prerequisites: Understanding of basic modern data science and competence in related programming, e.g., in R or Python. https://stats352.stanford.edu/ Terms: Spr | Units: 1