Supervision

Introduction

Supervised learning methods are the most commonly used, but there has been development/use in other methods, such as unsupervised or semi-supervised learning. Here we explore the meaning of supervision, and when supervision might not be the best learning modality.

Developing Intuition

You will recall from the Starter Guide on the meaning of f(x) in data science that we have 3 constituent parts:

The output, $Y$
The function, $f$
The data, $X$

Unsupervised learning is when we remove the output $Y$. Hence, we would merely have a function modifying our data. But what is the function doing to our data? How do we know its the right function? One of the primary purposes of having $Y$ in the first place was to see if our function is working as expected. How can we know if the learning is accurate with no output to verify with? Really, we have no way to confirm the function is working as expected (that is, there is no supervision, no output, to confirm with). So what use is this type of learning?

Another way to state this is that we are not attaching labels to data before training.

Semi-Supervised Learning

Semi-supervised learning is when some of the data is labeled, but not all. Often this is because there are advantages to having data be labeled, but it is expensive to label all of the data at scale; so only some of it gets labeled and used for training, and the rest is unsupervised.

Types of Semi- or Unsupervised Learning

In general, the common approaches fall into a few categories, including:

Clustering, including but not limited to K-means, hierarchical, and mixture models
Latent variable models, including but not limited to method of moments, principal component analysis (PCA), and singular value decomposition (SVD)
Anomaly Identification, including but not limited to local outlier factor and isolation forest

When you want to factor in some labeled data, it becomes semi-supervised. If you want to use no labeled data at all, it is unsupervised.

When Unsupervised Learning Could Be Used

When you have lots of data that is hard to label at scale
When we can’t get accurate labels for our data
When you have lots of computing power (supervised learning methods can be much less complex computationally speaking)
When you are doing anomaly detection, recommendation engines, or marketing segmentation