Thinking About The Output

What are we building? What kind of data are we analyzing? What do we want to make happen? Are we trying to build a model that can classify, or regress? These and many other questions will be considered here, as we think about what kind of output we seek.

Choosing a Model

You may have seen the Starter Guides on choosing models. Naturally, a model choice limits the outputs. Once we have chosen a model, we then have to choose from the range of outputs it can offer, then permutate from there, or rethink the model(s) we have chosen.

Imagining The Output

You may recall from the commentary on what a function means in data science that we are looking to get some output $Y$: we put data ($X$) in, try to minimize the reducible error by modifying our function $f$, and get some output $Y$.

Is $Y$ a fitted regression line? Is it a prediction, a probability representing the chance that an image contains a stop sign in it? Is it multi label output, namely, are the labels that can be applied to it non-exclusive? Is the output a number, a string, an image, or what kind of data type? Are we trying to deploy the model? Are we just building a graph one time and thats it? If you can go through each model choosing page and identify roughly where your output should lie there, often the model and output will naturally choose itself.

For instance, image classification is a very common neural network task. The output in that case would be (implicitly) a model which can make predictions, and the individual probability predictions themselves, a numerical value between 0 and 1. These predictions might come in arrays, or they might be singular one-off predictions.

Being really specific about the form of output is essential. Maybe you aren’t sure when you start, but over time you will get used to knowing roughly what you want as an output- it comes with practice.

Metrics

Metrics are used to measure not only the training, but also the validation and test sets of data. Metrics go hand in hand with thinking about the output because we need to measure our output, both during and after training. Here is the page of all the metrics available in Tensorflow, a well known data science toolkit. All of the data science toolkits have metrics, and most will have many to account for the variety of ways to measure output.

Summarizing Output Types

Try to summarize the following things to get a clear idea of your output:

  • What kind of model(s) am I building? K-nearest neighbors, multivariable regression, etc?

  • What data types (floats, strings etc) will be outputted?

  • What kind of analysis am I doing in general? Computer vision, NLP, etc?

  • What kind of metrics should I be using to measure the output?

  • Are we building dashboards, interactive webapps, graphs, databases, predictions, etc out of this? What form do they need the data in to be able to feed in to them?

  • Go through each of the pages in the choosing models section, and write out if your problem seems to prefer a particular type of model. Maybe you think a supervised model is best, or you’re sure that regression is the best type of analysis for your problem. If you aren’t sure, try a few different ways and see if you can develop intuition!