0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
20 Ansichten33 Seiten
Machine learning is an algorithm that can learn from data without relying on explicit programming. It has three main purposes: pushing workloads to self-sufficient machines, pattern recognition, and data analysis. There are two main types of machine learning: supervised learning which uses labeled input and output data to learn relationships to make predictions on new data, and unsupervised learning which looks for hidden patterns in unlabeled input data. Popular machine learning algorithms discussed include k-nearest neighbors for classification, decision trees which split data into nodes to make predictions, and random forests which create an ensemble of decision trees to make predictions.
Machine learning is an algorithm that can learn from data without relying on explicit programming. It has three main purposes: pushing workloads to self-sufficient machines, pattern recognition, and data analysis. There are two main types of machine learning: supervised learning which uses labeled input and output data to learn relationships to make predictions on new data, and unsupervised learning which looks for hidden patterns in unlabeled input data. Popular machine learning algorithms discussed include k-nearest neighbors for classification, decision trees which split data into nodes to make predictions, and random forests which create an ensemble of decision trees to make predictions.
Machine learning is an algorithm that can learn from data without relying on explicit programming. It has three main purposes: pushing workloads to self-sufficient machines, pattern recognition, and data analysis. There are two main types of machine learning: supervised learning which uses labeled input and output data to learn relationships to make predictions on new data, and unsupervised learning which looks for hidden patterns in unlabeled input data. Popular machine learning algorithms discussed include k-nearest neighbors for classification, decision trees which split data into nodes to make predictions, and random forests which create an ensemble of decision trees to make predictions.
An algorithm that can learn from data without being reliant on standard programming practices like object-oriented design. The Purpose of Machine Learning Push workload to self-sufficient machine. Pattern Recognition. Analysis of data. Example: Trends in house pricing Machine Learning: Topics and Terminology Supervised and Unsupervised Learning Static and Model evaluation Popular Algorithm: o K-Nearest Neighbor o Decision Tree o K-means Clustering, & so on. Supervised Learning Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output. Y = f(X) The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data. Supervised Learning further grouped Classification: A classification problem is when the output variable is a category, such as red or blue or disease and no disease. Regression: A regression problem is when the output variable is a real value, such as dollars or weight. Unsupervised Learning Unsupervised learning is where you only have input data (X) and no corresponding output variables. The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data. Unsupervised Learning further grouped Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y. Supervised Vs Unsupervised Learning
Supervised Learning Unsupervised Learning
Deals with labeled data Deals with unlabeled data
Algorithm for classification and regression Algorithm for clustering and association
Classification is the organization of Clustering is the analysis of patterns and
labeled data grouping of unlabeled data
Regression is the prediction of trends in
labeled data to determine future outcome Classification The concept of categorizing data is based off of training with a set of data so that the machine can essentially learn boundaries that separate categories of data. Therefore, new data inputted into the model can be categorized based on where the point exists. for example:
Iris Data Set
for example:
Iris Data Set
Continue.. Now, the model can easily classify the new point from out-of-sample data set with this classification. K-Nearest Neighbors A classification Model In K-Nearest Neighbor, data points are categorized and when determining the category of a new data point, the K nearest points are used in this process. for example: (K = 5) for example: (K = 14) Unsupervised Learning Play a video of 1:41 minutes Supervised Learning Algorithms K-Nearest Neighbors Algorithm 1. Pick a value for K. 2. Search for the K observations in the training data that are nearest to the measurements of the unknown data. 3. Predict the response of the unknown data point using the most popular response value from the K-Nearest Neighbors. Continue For classification, the output of the K-NN algorithm is the classification of an unknown data point based on the k 'nearest' neighbors in the training data. For regression, the output is an average of the values of a target variable based on the k 'nearest' neighbors in the training data. Lower the K value, bad prediction and over-fitting the dataset. Much higher the K value, overly generalize the model. Medium the K value, good prediction. Continue KNN for regression Decision Tree Decision Trees are built by splitting the training set into distinct nodes, where one node contains all of, or most of, one category of the data. These categories can be called subsets. Decision tree may not build an optimal tree since, it uses greedy algorithm to build it. Some Decision Tree Terminology Node: A test for the values(data) of a certain attribute. Leaf: A terminal node that predict the outcome. Root: The beginning node that contains the entire dataset. Entropy: It is the amount of information disorder, or the amount of randomness in the data. Information Gain: Information collected that can increase the level of certainty in a particular prediction. Information Gain example: Good Information Gain example: Good Information Gain example: Bad Random Forest Algorithm 1. For b=1 to B: a). Draw a bootstrap sample Z* of size N from the training data. b). Grow a random-forest tree Tb to the bootstrapped data, by recursively repeating the following steps for each terminal node of the tree, until the minimum node size nmin is reached. i). Select m variables at random from p features. ii). Pick the best variable/split-point among them. iii). Split the node into two daughter nodes. 2. Output the ensemble of tree {Tb} 1 to B. Advantages over Decision Tree Faster Reliable example example example