Transkrypcja: Supervised vs. Unsupervised Learning

Opis wygenerowany przez Skrybot

Understanding the differences between supervised and unsupervised learning is essential for getting the most out of your data and applications. Supervised learning involves training a model on a labeled dataset, while unsupervised learning uses algorithms to uncover patterns and insights from unlabeled data. With the right approach, you can use these two powerful techniques to get the most out of your data and create powerful and effective models."

Transkrypcja wygenerowana przez Skrybot

Supervised and unsupervised learning are two core components in building machine learning models. So what's the difference? Well, just to cut to the chase, supervised learning that uses labeled input and output data, while an unsupervised learning model doesn't. But what does that really mean? Well, let's better define both learning models, go deeper into the differences between them and then answer the question of which is best for you. Now, in supervised learning, the machine learning algorithm is trained on a labeled data set. So this means that each example in the training data set, the algorithm knows what the correct output is. And the algorithm uses this knowledge to try to generalize to new examples that it's never seen before.

Now, using labeled inputs and outputs, the model can measure its accuracy and learn over time. So supervised learning can be actually divided into a couple of subcategories. Firstly, there is a category of classification. And classification talks about where the output is a discrete class label, such as spam and not spam. Linear classifiers support vector machines or SVMs, decision trees and random forest. They're all common examples of classification algorithms. The other example is regression. The output here is a continuous value, such as price or probability. Linear regression and logistic regression are two common types of regression algorithms. Now, unsupervised learning is where the machine learning algorithm is not really given any labels at all.

And these algorithms discover hidden patterns in data without the need for human intervention. They're unsupervised. Unsupervised learning models are used for three main tasks. There's clustering, association and dimensionality reduction. So let's take a look at each one of those, starting with clustering. Now, clustering is where the algorithm groups similar experiences together. So a common application of clustering is customer segmentation, where businesses might group customers together based on similarities, like age or location or spending habits, something like that. Then you have association. And association is where the algorithm looks for relationships between variables in the data. Now, association rules are often used in market basket analysis, where businesses want to know which items are often brought together.

Something along the lines of customers who bought this item also bought, that sort of thing. The final one to talk about is dimensional reduction. And this is where the algorithm reduces the number of variables in the data while still preserving as much of the information as possible. Now, often this technique is used in the pre-processing data stage, such as when autoencoders remove noise from visual images to improve picture quality. Okay, so let's talk about the differences between these two types of learning. In supervised learning, the algorithm learns from training datasets by iteratively making predictions on the data and then adjusting for the correct answer.

While supervised learning models tend to be more accurate than unsupervised learning models, they do require all of this upfront human intervention to label the data appropriately. For example, a supervised learning model can predict how long your commute will be on the time of day and thinking about the weather conditions and so forth. But first you'll have to train it to know things like rainy weather extends the driving time. By contrast, unsupervised learning models work on their own to discover the inherent structure of unlabeled data. These models don't need humans to intervene. They can automatically find patterns in data and group them together. So for example, an unsupervised learning model can cluster images by the objects they contain.

Things like people and animals and buildings without being told what those objects were ahead of time. Now, an important distinction to make is that unsupervised learning models don't make predictions. They only group data together. So if you were to use an unsupervised learning model on that same commute dataset, it would group together commutes with similar conditions like the time of day and the weather, but it wouldn't be able to predict how long each commute would take. Okay, so which of these two options is right for you? In general, supervised learning is more commonly used than unsupervised learning and that's really because it's more accurate and efficient. But that being said, unsupervised learning has its own advantages.

There's two that I can think of. Firstly, unsupervised learning can be used on data that is not labeled, which is often the case in real world datasets. And then secondly, unsupervised learning can be used to find hidden patterns in data that supervised learning models just wouldn't find. Classifying big data can be a real challenge in supervised learning, but the results are highly accurate and trustworthy. And in contrast, unsupervised learning can handle large volumes of data in real time. But there's a lack of transparency into how that data is clustered and a higher risk of inaccurate results. But wait, it is not an either or choice. Today I present to you the middle ground known as semi-supervised learning.

This is, well, a happy medium where you use a training dataset with both labeled and unlabeled data. And it's particularly useful when it's difficult to extract relevant features from data and when you have a high volume of data. For example, you could use a semi-supervised learning algorithm on a dataset with millions of images where only a few thousand of those images are actually labeled. Semi-supervised learning is ideal for medical images where a small amount of training data can lead to a significant improvement in accuracy. For example, a radiologist can look at and label some small subset of CT scans for tumors or diseases.

And then the machine can more accurately predict which patients might require more medical attention without going through and labeling the entire set. Pre-learning models are a powerful way to gain the data insights that improve our world. The right model for your data depends on the type of data that you have and what you want to do with it. And the choice between supervised and unsupervised learning is only the first step. If you have any questions, please drop us a line below. And if you want to see more videos like this in the future, please like and subscribe. Thanks for watching. .

IBM Technology

Supervised vs. Unsupervised Learning

Opis wygenerowany przez Skrybot

Transkrypcja wygenerowana przez Skrybot

S K R Y B O T

Skrybot. 2023, Transkrypcje video z YouTube