Same school, new name. The School of Informatics and Computing is changing its name effective January 11, 2023. Learn more about the name change

INFO-I 415 Introduction to Statistical Learning

3 credits

  • Prerequisites: None
  • Delivery: On-Campus, Online
  • Semesters offered: Fall, Spring, Summer 2
    The above are the semesters this course is generally offered. View the course schedule to confirm.
  • This course applies statistical learning methods for data mining and inferential and predictive analytics to informatics-­related fields. The course also covers techniques for exploring and visualizing data, assessing model accuracy, and weighing the merits of different methods for a given real-­world application. This course is an essential toolset for transforming large, complex informatics datasets into actionable knowledge.

    Learning Outcomes

    • Analyze datasets with the following supervised learning methods: for functional approximation, multiple linear regression, splines, and local regression; for classification, logistic regression, linear discriminant analysis, decision trees, bagging, random forests, and boosting, and support vector machines.
    • Analyze datasets with the following unsupervised learning methods: for dimensionality reduction, principal components analysis; for grouping, k­-means clustering and hierarchical clustering.
    • Explore, transform, and visualize large, complex datasets with graphs in R.
    • Solve real­-world problems by adapting and applying statistical learning methods to large, complex datasets.
    • Identify and select appropriately among statistical learning methods for a particular real­world problem; analyze each method with respect to a given dataset or research question in terms of modeling accuracy and the bias­variance trade­off; perform model assessment (i.e., estimate test error rates) and selection by resampling: cross­validation and bootstrapping; identify overfitting and underfitting; perform model selection and regularization by subset selection and shrinkage methods: ridge regression and Lasso; explain the relative advantages and disadvantages of each statistical learning method for the real-­world problem.
    • Write programs to perform data analytics on large, complex datasets in R.
    • Analyze data from case studies in informatics­related fields (e.g., digital media, human­computer interaction, health informatics, bioinformatics, and business intelligence).