Are these courses built for data science?

Hey everyone,

My college does not offer a data science degree but I am looking forward to taking some statistics courses that will be the most relevant for a career in it. How closely related are some of these courses connected with data science?

  1. Data analysis:
    This course focuses on choosing, fitting, assessing and using statistical models. Simple linear regression, multiple regression, analysis of variance, general linear models, logistic regression and discrete data analysis will provide the foundation for the course. Classical interference methods that rely on the normality of the error terms will be thoroughly discussed

  2. Nonparametric Statistics:
    This course will focus on nonparametric and distribution-free statistical procedures. These procedures will rely heavily on counting and ranking techniques. In the one and two sample settings, the sign, signed-rank and Mann-Whitney-Wilcoxon procedures will be discussed. Correlation and one-way analysis of variance techniques also will be investigated.

  3. Linear Regression Models
    Simple linear regression with one predictor variable will serve as the starting point. Models, inferences, diagnostics and remedial measures for dealing with invalid assumptions will be examined. The matrix approach to simple linear regression will be presented and used to develop more general multiple regression models. Building and evaluating models for real data will be the ultimate goal

  4. Statistical Computing in R
    The primary goal of the course is to learn and apply Monte-Carlo simulation techniques to a wide variety of problems. We will focus on solving problems from a numerical point of view, with methods to complete numerical integration, root finding, curve fitting, variance reduction and optimization. Core knowledge of R and basic programming concepts will be introduced.

These are just some specialized courses, and besides them, I’ll have to meet the requirements for linear algebra, elements of stats, calculus series, and some intro cs classes.

Will greatly appreciate any input/advice. Thanks!

You’ll need to take all of those classes as a foundation for getting into data science.

Data science is simply using scientific processes and computational tools to analyze data. Most people use it to refer to big data - millions or billions of lines of observations or what have you - that require higher computational power and programming knowledge to execute (usually an understanding of machine learning and/or data mining). It’s a fusion of mathematics, computer science, and statistics.

So yes, you’ll need all of those classes. Data analysis is the foundation of data science, and the basic analyses you’ll learn there are required for understanding any higher-level statistical analyses. The second two classes - nonparametric statistics and linear regression models - are also basic to intermediate analysis methods/topics that any data scientist would be required to know. Linear regression is the basic one, and MANY common statistical analyses are built on top of linear regression/linear algebra theory. Nonparametric statistics is a technique that’s more intermediate and is a bit more situational, but is really common in health sciences/biostatistical/epidemiological applications, and may be common in other use cases as well. And R is just essential. Definitely take that.

Other statistics classes may be dictated by what field you want to enter; so for example, stochastic modeling is more common in finance, I think. Machine learning may appeal to you if you’re interested in tech (or really, most applications, I think).

And yeah, I was going to say make sure you’re taking the calculus series, linear algebra, and some other CS classes - I would ask your CS professors what they recommend for data science (they’ll know). SQL is useful to know, as is Hadoop and Python. (I am not a data scientist, but I work with a lot of them.)

1 Like