Sie sind auf Seite 1von 5

Machine Learning

Machine Learning is generally taken to encompass automatic computing procedures based on logical or binary operations, that learn a task from a series of examples. Machine Learning aims to generate classifying expressions simple enough to be understood easily by the human. They must mimic human reasoning sufficiently to provide insight into the decision process. Supervised Learning In supervised machine learning we have a set of data points or observations for which we know the desired output, class, target variable or outcome. The outcome may take one of many values called classes or labels. A classic example is that given a few thousand emails for which we know whether they are spam or ham (their labels), the idea is to create a model that is able to deduce whether new, unsean emails are spam or not. In other words, we are creating a mapping function where the inputs are the email's sender, subject, date, time, body, attachments and other attributes, and the output is a prediction as to whether the email is spam or ham. The target variable is in fact providing some level of supervision in that it is used by the learning algorithm to adjust parameters or make decisions that will allow it to predict labels for new data. Finally of note, when the algorithm is predicting labels of observations we call it a classifier.

For supervised learning, the process is:


1. Scale and prepare training data: First we build input vectors that are appropriate for feeding into our supervised learning algorithm. 2. Create a training set and a validation set by randomly splitting the universe of data. The training set is the data that the classifier uses to learn how to classify the data, whereas the validation set is used to feed the already trained model in order to get an error rate (or other measures and techniques) that can help us identify the classifier's performance and accuracy. Typically you will use more training data (maybe 80% of the entire universe) than validation data. Note that there is also cross-validation), but that is beyond the scope of this article. 3. Train the model. We take the training data and we feed it into the algorithm. The end result is a model that has learned (hopefully) how to predict our outcome given new unknown data. 4. Validation and tuning: After we've created a model, we want to test its accuracy. It is critical to do this on data that the model has not seen yet - otherwise you are cheating. This is why on step 2 we separated out a subset of the data that was not used for

Compiled By: Raj G. Tiwari

training. We are indeed testing our model's generalization capabilities. It is very easy to learn every single combination of input vectors and their mappings to the output as observed on the training data, and we can achieve a very low error in doing that, but how does the very same rules or mappings perform on new data that may have different input to output mappings? If the classification error of the validation set is very big compared to the training set's, then we have to go back and adjust model parameters. The model will have essentially memorized the answers seen in the training data, loosing its generalization capabilities. This is called overfitting, and there are various techniques for overcoming it. 5. Validate the model's performance. There are numerous techniques for achieving this, such as ROC analysis and many others. The model's accuracy can be improved by changing its structure or the underlying training data. If the model's performance is not satisfactory, change model parameters, inputs and or scaling, go to step 3 and try again. 6. Use the model to classify new data. Unsupervised Learning While supervised algorithms aim to minimize the classification error, unsupervised algorithms aim to create groups or subsets of the data where data points belonging to a cluster are as similar to each other as possible, while making the difference between the clusters as high as possible. As a simple example, you could imagine clustering customers by their demographics. The learning algorithm may help you discover distinct groups of customers by region, age ranges, gender and other attributes in such way that we can develop targeted marketing programs. Another example may be to cluster patients by their chronic diseases and comorbidities in such a way that targeted interventions can be developed to help manage their diseases and improve their lifestyles.

For unsupervised learning, the process is:

1. Scale and prepare raw data: As with supervised learners, this step entails selecting features to feed into our algorithm, and scaling them to build a suitable data set. 2. Build model: We run the unsupervised algorithm on the scaled dataset to get groups of like observations. 3. Validate: After clustering the data, we need to verify whether it cleanly separated the data in significant ways. This includes calculating a set of statistics on the resulting clusters (such as the Compiled By: Raj G. Tiwari

within group sum of squares), as well as analysis based on domain knowledge, where you may measure how certain attributes behave when aggregated by the clusters. 4. Once we are satisfied with the clusters created there is no need to run the model with new data

Reinforcement learning
Reinforcement Learning (RL) is a sub-science of Machine Learning. Its considered a Hybrid of supervised and unsupervised Learning. It simulates the human learning based on trial and error.
In reinforcement learning or learning with a critic, no desired category signal is given; instead, the only teaching feedback is that the tentative category is right or wrong. This is analogous to a critic who merely states that something is right or wrong, but does not say specifically how it is wrong.

Pattern Recognition Approaches


The four best known approaches for pattern recognition are: 1) Template matching, 2) Statistical classification, 3) Syntactic or structural matching, and 4) Neural networks.

1. Template Matching
One of the simplest and earliest approaches to pattern recognition is based on template matching. Matching is a generic operation in pattern recognition which is used to determine the similarity between two entities (points, curves, or shapes) of the same type. In template matching, a template (typically, a 2D shape) or a prototype of the pattern to be recognized is available. The pattern to be recognized is matched against the stored template while taking into account all allowable pose (translation and rotation) and scale changes. The similarity measure, often a correlation, may be optimized based on the available training set. Often, the template itself is learned from the training set. Template matching is computationally demanding, but the availability of faster processors has now made this approach more feasible. Disadvantages: The rigid template matching mentioned above, while effective in some application domains, has a number of disadvantages. For instance, it would fail if the patterns are distorted due to the imaging process, viewpoint change, or large intraclass variations among the patterns.

2. Statistical Approach
Patterns classified based on an underlying statistical model of the features. The statistical model is defined by a family of class-conditional probability density functions Pr(X|Ci) (Probability of feature vector X given class Ci). In the statistical approach, each pattern is represented in terms of d features or measurements and is viewed as a point in a d-dimensional space. The goal is to choose those features that allow pattern vectors belonging to different categories to occupy compact and disjoint regions in a d-dimensional feature space. The effectiveness of the representation space (feature set) is determined by how well patterns from different classes can be separated. Given a set of training patterns from each class, the objective is to establish decision boundaries in the feature space which separate patterns belonging to different classes. In the statistical decision theoretic approach, the decision boundaries are determined by the probability distributions of the patterns belonging to each class, which must either be specified or learned

Compiled By: Raj G. Tiwari

3. Syntactic Approach
Patterns classified based on measures of structural similarity. Knowledge is represented by means of formal grammars or relational descriptions (graphs) In many recognition problems involving complex patterns, it is more appropriate to adopt a hierarchical perspective where a pattern is viewed as being composed of simple subpatterns which are themselves built from yet simpler subpatterns. The simplest/elementary subpatterns to be recognized are called primitives and the given complex pattern is represented in terms of the interrelationships between these primitives. In syntactic pattern recognition, a formal analogy is drawn between the structure of patterns and the syntax of a language. The patterns are viewed as sentences belonging to a language, primitives are viewed as the alphabet of the language, and the sentences are generated according to a grammar. Thus, a large collection of complex patterns can be described by a small number of primitives and grammatical rules. The grammar for each pattern class must be inferred from the available training samples. Structural pattern recognition is intuitively appealing because, in addition to classification, this approach also provides a description of how the given pattern is constructed from the primitives.

1.

Neural Networks

Classification is based on the response of a network of processing units (neurons) to an input stimuli (pattern). Knowledge is stored in the connectivity and strength of the synaptic weights. Neural networks can be viewed as massively parallel computing systems consisting of an extremely large number of simple processors with many interconnections. Neural network models attempt to use some organizational principles (such as learning, generalization, adaptivity, fault tolerance and distributed representation, and computation) in a network of weighted directed graphs in which the nodes are artificial neurons and directed edges (with weights) are connections between neuron outputs and neuron inputs. The main characteristics of neural networks are that they have the ability to learn complex nonlinear inputoutput relationships, use sequential training procedures, and adapt themselves to the data. The learning process involves updating network architecture and connection weights so that a network can efficiently perform a specific classification/clustering task. A neural network is a powerful data modeling tool that is able to capture and represent complex input/output relationships. The motivation for the development of neural network technology stemmed from the desire to develop an artificial system that could perform "intelligent" tasks similar to those performed by the human brain. Neural networks resemble the human brain in the following two ways: a. A neural network acquires knowledge through learning. b. A neural network's knowledge is stored within inter-neuron connection strengths known as synaptic weights. The true power and advantage of neural networks lies in their ability to represent both linear and nonlinear relationships and in their ability to learn these relationships directly from the data being modeled. Traditional linear models are simply inadequate when it comes to modeling data that contains non-linear characteristics. The most common neural network model is the multilayer perceptron (MLP). This type of neural network is known as a supervised network because it requires a desired output in order to learn. The goal of this type of network is to create a model that correctly maps the input to the output using historical data so that the model can then be used to produce the output when the desired output is unknown. A graphical representation of an MLP is shown below.

Compiled By: Raj G. Tiwari

The MLP and many other neural networks learn using an algorithm called backpropagation. With backpropagation, the input data is repeatedly presented to the neural network. With each presentation the output of the neural network is compared to the desired output and an error is computed. This error is then fed back (backpropagated) to the neural network and used to adjust the weights such that the error decreases with each iteration and the neural model gets closer and closer to producing the desired output. This process is known as "training".

A good way to introduce the topic is to take a look at a typical application of neural networks. Many of today's document scanners for the PC come with software that performs a task known as optical character recognition (OCR). OCR software allows you to scan in a printed document and then convert the scanned image into to an electronic text format such as a Word document, enabling you to manipulate the text. In order to perform this conversion the software must analyze each group of pixels (0's and 1's) that form a letter and produce a value that corresponds to that letter. Some of the OCR software on the market use a neural network as the classification engine.

Compiled By: Raj G. Tiwari

Das könnte Ihnen auch gefallen