Beruflich Dokumente
Kultur Dokumente
ISSN- 2455-5703
Abstract
Supervised Machine Learning (SML) is a search for algorithms that cause given external conditions to produce general hypotheses,
and then make predictions about future events. Supervised classification is one of the most frequently performed tasks by smart
systems. This paper describes various Supervised Machine Learning (ML) methods for comparing, comparing different learning
algorithms and determines the best-known algorithm based on the data set, number of variables and variables (features). : Decision
Table, Random Forest (RF), Naive Bayes (NB), vector Support Machine (SVM), Neural Networks (Perception), JRip and Tree
Decision (J48) using learning tool the Waikato Information Machine (WEKA). In order to use algorithms, diabetes data were set
up to be classified into 786 cases with eight characteristics such as independent variables and reliability analyzes. The results
indicate that the SVM was found to be an algorithm with great accuracy and accuracy. Naive Bayes and Random Forest
classification algorithms were found to be more accurate following SVM. Studies show that the time it takes to build a model and
accuracy (accuracy) is a factor on the other hand; while statistical kappa and mean Absolute Error (MAE) are another factor on the
other hand. Therefore, ML algorithms require more precision, accuracy and less error to evaluate machine learning prediction.
Keywords- Machine Learning, Classifiers, Mining Techniques, Data Analysis, Learning Algorithms, Monitored Machine
Learning
I. INTRODUCTION
Machine learning is one of the most rapidly developing areas of computer science. It means automatic detection of meaningful
patterns in the data. Machine learning tools are concerned with learning and adaptive learning systems.
Machine learning has become one of the most important forms of Information Technology and, therefore, the central,
often hidden, part of our lives. With the ever-increasing prices of available data, there is good reason to believe that systematic
data processing will be as complete as necessary ingredients for technological advancements.
There are many applications of Machine Learning (ML), most important of which are data minerals. People tend to make
mistakes between analyzes or, perhaps, when trying to build relationships between multiple symptoms.
Data Mining and Machine Learning for Siamese twins where more information can be found with relevant learning
algorithms. There has been great progress in data mining and machine learning due to the emergence of smart and Nano
technologies that has raised the interest in discovering hidden patterns of quantitative data. The combination of mathematics,
machine learning, information orders, and computer has created strong science, solid mathematical foundations, and very powerful
tools.
Supervised reading creates the mapping function of the desired output input.
The unprecedented data generation has made machine learning techniques more sophisticated at times. This requires the
use of a few algorithms to study an unmanaged virtual machine. Readings made for that are most common in partition problems
because the purpose is usually to get the computer to read through the editing program we created.
ML is wholly intended to achieve access hidden within Big Data. ML contributes to ensuring value extraction from large
and unique data sources with minimal systematic dependence on the individual track as data is cut and increased at machine level.
Machine learning is well suited to the sophisticated input of managing different data sources and the large range of variables and
amount of data involved when ML succeeds in non-additive information. The more information that is provided in the ML
structure, the more it can be trained and affect the effects of a higher level of understanding. In the relief that comes from the
restriction of the scale and the consideration of individual levels, ML is wise to discover and display patterns hidden in the data.
Another common way of doing supervised reading work is the problem of classification: The student needs to learn (guess
how he or she performs some memory-based work in one of the many classes by looking at examples of reproducible mechanical
input). Learning is the process of learning a set of rules from specific contexts (examples in a training set), or multitasking, to
create a classifier that can be used to generalize from new contexts. The procedure for using the supervised ML for a real-world
problem is described in Figure 1.
This work focuses on the classification of ML algorithms and determining the most efficient algorithms with high
accuracy and accuracy. As well as introducing the functionality of the different algorithms to large and small data sets with a view
it has separated them well and provided insight into how to build supervised machine learning models.
The remainder of this work is organized as follows: Section 2 presents a review of the literature on the categories of
supervised learning algorithms; section 3 presents the methodology used, section 4 discusses the results of the work while section
5 presents the conclusions and recommendations of the other works.
1) Linear Classifiers
Linear models for the classification of different vegetation types using the boundary (plane line) of decision boundaries. The goal
of linear segmentation with machine learning is to group objects with the same element values, in groups. It emphasized that the
classifier placed in the queue achieves this goal by making a classification decision based on the number of combinations of
features. A high classifier is often used in cases where the speed of separation is problematic, since it is measured with a fast
classifier. Also, line separators tend to be most efficient when the maximum value is large, such as document fragmentation, where
each component is generally a word count in a text. The degree of overlap between variable data sets however depends on the line.
In short, the margin specifies how well the partitioned data is, which is why it is very easy to solve the given partitioning problem.
2) Logistic Regression
This is a classification function which uses a class for building and uses a single multinomial logistic regression model with a
single estimator. Logistic regression usually tells where the boundary between classes is, and it also states that the squared
probability depends, in a certain way, on the distance from the boundary. When the data set is large it moves towards the intensity
(0 and 1). These statements about probability make logic more than just classification. It makes stronger, more detailed predictions
and fits in a different way; but those strong assumptions may be wrong. Logistic regression is an approach to estimation such as
Ordinary List Square (OLS) regression. However, with logistic regression, the estimation results in bipolar.
4) Multi-layer Perception
This is a classification in which the network's wattage is found by solving convex, uncontrolled minimization problems in standard
neural network training rather than solving quadratic programming problems with linear constraints. Other popular algorithms rely
on umption. Perception algorithm is used to learn from training sets, repeatedly running through the training set until the algorithm
becomes an estimation vector. Find the right one in all training sets. This estimation rule is used to evaluate labels on test sets.
6) K-means
It is one of the most simple and unpredictable learning algorithms that can solve a clustering problem. This process follows a
simple and easy way to classify a given data by a specific number of clusters (beyond clusters). When the labelled data is not
available — the algorithm is used. The simplest way to change a strict rule. The most accurate assessment rule is thumb. If you
look at the weak― learning algorithm that can find consistent classifiers (rules of thumb), at least better than random, say, accuracy
_ 55%, with enough data, a very high accuracy algorithm can produce, say, 99%.
7) Decision Trees
Decision Trees (DT) are trees that sort examples by feature values. Each node in the decision tree represents an attribute in the
context that needs to be categorized and each branch node can represent a value. The examples are initialized at the root node and
sorted based on their attribute values. Decision tree learning, used in data mining and machine learning, uses the decision tree to
model observations about conclusions about the objective value of an object. Those tree models are classification trees or regression
trees. Decision tree classifiers typically use post-pruning techniques to assess tree performance, as they can be sorted using a valid
set. You can remove any node and assign it to the most common class of serialization.
8) Neural Networks
Advanced Neural Networks (NNs), which can perform multiple regression and / or classification tasks simultaneously but usually
each network only does one. Therefore, in most cases, the network has only one output variable, although in the case of multi-state
classification problems, it may be compatible with multiple output units (the next stage of processing takes care of the mapping.
Output Units of Output Variables). The three basic elements of the Artificial Neural Network (ANN) unit depend on the input and
activation functions, network architecture, and the weight of each input connection. Since the first two issues are solved, the
behaviour of the ANN is defined by the current values of the weights. The net weight of the training set is initially set to random
values, and then the instances of the training set are repeatedly exposed to the net. The values of the input of an instance are placed
in the input units and the output of the net is compared to the desired output for this example. Then, all the weights of the net are
adjusted slightly in the direction of the net, bringing the output values of the net closer to the desired output. There are many
algorithms that train the network.
9) Bayesian Network
Bayesian Network (BN) is a graphical model for probability relationships between variables. Bayesian networks are well known
representatives of statistical learning algorithms. The problem with BN classmates is that they are not suitable for datasets with
multiple attributes. This prior expertise or domain knowledge about Bayesian network architecture can take the following forms:
– A node declares a parent node, which means it has no parent.
– Declare that a node is a leaf node, which means it has no children.
– Declare that a node is the direct cause or direct effect of another node.
– Declare that a node is indirectly connected to another node.
– N is given a condition-set declaring that the two nodes are independent.
– Ordering partial nodes, that is, a node in the order appears earlier than another node.
– Providing full node commands.
A. Results
WEKA was used to classify and compare different machine tilt algorithms. Table 3 shows the results along with the 9 parameters
along with the considered parameters.
Table 3: Comparison of different classification algorithms with larger data sets and more features
Correctly Incorrectly Accuracy Accuracy
Time No of Kappa
Algorithm classified Classified Test mode Attributes MAE Of Of Classification
(SEC) instances statistic
(%) (%) Yes No
Decision 10-fold-cross 0.3752 0.341
0.23 72.3948 27.6042 9 768 0.619 0.771 rules
Table validation
Random 10-fold-cross Trees
0.55 74.7396 25.2604 9 768 0.4313 0.3105 0.653 0.791
Forest validation
Naive 10-fold-cross Bayes
0.03 76.3021 23.6979 9 768 0.4664 0.2841 0.678 0.802
Bayes validation
10-fold-cross Functions
SVM 0.09 77.3438 22.6563 9 768 0.4682 0.2266 0.740 0.785
validation
Neural 10-fold-cross Functions
0.81 75.1302 24.8698 9 768 0.4445 0.2938 0.653 0.799
network validation
10-fold-cross rules
JRip 0.19 74.4792 25.5208 9 768 0.4171 0.3461 0.659 0.780
validation
Decision 10-fold-cross
0.14 73.8281 26.1719 9 768 0.4164 0.3158 0.632 0.790 tree
Tree (J48) validation
Time taken to build My Model (Mean Absolute Error), which is a measure of how close the end result or forecast is.
Table 4: comparison of the 6 features of the classifier and the various machines tilt algorithms and scales
Time Correctly Incorrectly Accuracy Accuracy
Attrib No of Kappa Classificat
Algorithm (SEC) classified Classified Test mode MAE Of Of
utes instances statistic ion
(%) (%) Yes No
Decision 10-fold-cross
0.09 67.9688 32.0313 6 384 0.3748 0.3101 0.581 0.734 Rules
Table validation
Random 10-fold-cross
0.42 71.875 28.125 6 384 0.3917 0.348 0.639 0.763 Trees
Forest validation
Naive 10-fold-cross-
0.01 70.5729 29.4271 6 384 0.352 0.3297 0.633 0.739 Bayes
Bayes validation
10-fold-cross-
SVM 0.04 72.9167 27.0833 6 384 0.3837 0.2708 0.711 0.735 Functions
validation
Neural 10-fold-cross- 0.672
0.17 59 41 6 384 0.1156 0.4035 0.444 Functions
Networks validation
10-fold-cross-
JRip 0.01 64 36 6 384 0.2278 0.4179 0.514 0.714 Rules
validation
Decision 10-fold-cross-
0.03 64 36 6 384 0.1822 0.4165 0.56 0.685 Tree
Tree (J48) validation
Time is the time taken to model. MAE (Mean Absolute Error) is a measure of how close the end result or expectations are.
Yes, testing positive for diabetes. NO is a negative test for diabetes
Table 4 shows the results of a comparison of the 6 features of the classifier and the various machines tilt algorithms and scales.
The kappa statistic is a metric that compares the observed accuracy with the expected accuracy (random chance). That means
testing positive for diabetes. NO is a negative test for diabetes
Tables 5: Ranking of Accuracy of Positive Diabetes and Negative Diabetes Using Small Sets of Different Algorithms
Small Dataset 384
Algorithm Yes (Positive Diabetes) NO (negative diabetes)
SVM 0.711 0.735
Random Forest 0.639 0.761
Naive bays 0.633 0.739
Decision table 0.581 0.734
Decision Tree (J48) 0.519 0.685
JRip 0.514 0.714
Neural Network (Perception) 0.444 0.672
Tables 6: Ranking of Accuracy of Positive Diabetes and Negative Diabetes Using Large Sets of Different Algorithms
Large Dataset 384
Algorithm Yes (Positive Diabetes) NO (negative diabetes)
SVM 0.74 0.785
Naive Bayes 0.678 0.802
JRip 0.659 0.78
Random Forest 0.653 0.791
Neural Network (Perception) 0.653 0.799
Decision tree (J48) 0.632 0.79
Decision Table 0.619 0.771
Tables 7: Small data sets are shown over time to classify correctly and misclassify the model to be classified with the correct algorithm
Small Dataset 384
Algorithm Time Correctly Classified Incorrectly Classified
SVM 0.04 sec 72.92% 27.08%
Random Forest 0.42 sec 71.88% 28.13%
Naive Bayes 0.01 sec 70.57% 29.43%
Decision Tree 0.09 sec 67.97% 32.03%
JRip 0.01 sec 64% 36%
Decision Tree(J48) 0.03 sec 64% 36%
Neural network(perception) 0.17 sec 59% 41%
Table 8: Large data sets are shown over time to classify correctly and misclassify the model to be classified with the correct algorithm
Large dataset 768
Algorithm Time Correctly Classified Incorrectly Classified
SVM 0.09 sec 77.34% 22.66%
Naive Bayes 0.03 sec 76.30% 23.70%
Neural network(Perception) 0.81 sec 75.13% 24.87%
Random Forest 0.55 sec 74.74% 25.26%
JRip 0.19 sec 74.48% 25.52%
Decision Tree(J48) 0.14 sec 73.83% 26.17%
Decision Table 0.23 sec 72.40% 27.60%
Table 9: Detailed analysis of various dataset attributes
Attribute number Mean Standard Deviation
1 3.8 3.4
2 120.9 32.0
3 69.1 19.4
4 20.5 16.0
5 79.8 115.2
6 32.0 7.9
7 0.5 0.3
8 33.2 11.8
B. Discussion
Table 3 shows a comparison of the results of the 768 cases and 9 features. It is observed that all algorithms have higher kappa
statistics than MAE (Mean Absolute Error). Furthermore, correctly classified examples outperform incorrectly classified examples.
With more data sets, this is an indication that attendance analysis is more reliable. SVM and NB require large sample sizes to
achieve maximum prediction accuracy, as shown in Table 3, while Decision Tree and Decision Table have minimum accuracy.
Table 4 shows a comparison of the results of the 384 cases and 6 features. Kappa's statistics for neural networks, JRIPs,
and J48 are lower than MAE and do not describe accuracy and accuracy. However, SVM and RF with high data sets show high
accuracy and accuracy. Decision Table has produced more time models than JRip and Decision Tree. Therefore, less time does
not guarantee accuracy. If the kappa statistic is less than the mean absolute error (MAE), the algorithm does not show accuracy
and precision. It follows that an algorithm that cannot use such features for that data set does not show accuracy and precision.
Table 6 shows the accuracy of large data sets and small data sets with SVM. Yet Table 5 shows the SVM as a very accurate
algorithm. Small data set.
Stories 7 and 8 show a comparison of correctly classified and misclassified percentages for small and large datasets over
time during model preparation. From Table 7, the results are revealed as naive Bayes and JRip as the fastest time algorithms to
build, although JRip has a correctly classified lower percentage, which shows that the construction time is not an accurate model.
In the same vein, SVM has the highest level of accuracy with a time of 0.04 seconds. Table 8 compares this result with the neural
network () h) The third is the correctly classified algorithm. This means that the neural network works better with larger datasets
compared to smaller data sets. Furthermore, the results indicate that the decision table does not work well with large datasets. The
SVM algorithm performs the highest classification and the larger the dataset, the greater the accuracy.
Table 9 shows the mean and standard deviation of all traits used in this research, indicating that plasma glucose
concentrations (feature 2) have the highest average and the lowest mean of diabetic pedigree function (symptom 7), indicating
strong effects. In small data sets. However, low standard deviation (SD) is not desirable, meaning that the function of the diabetes
pedigree (feature 7) may not be of importance when analyzing large data sets.
REFERENCES
[1] Alex S. Sgt. and Viswanathan, S.V.N. (2008). Introduction to machine learning. Copyright 5 Cambridge University Press 2008. ISBN: 0-521-82583-0.
Available on the KTH website: https://www.kth.se/social/upload/53a14887f276540ebc81aec3/online.pdf
Retrieved from: http://alex.smola.org/drafts/thebook/pdf
[2] Bishop, c. M. (1995). Neural Networks for Pattern Recognition. Clarendon Press, Oxford, England. 1995.
[3] Brazil P., Soares C.. & D. Costa, J. (2003). Ranking Learning Algorithms: Using IBL and Meta-Learning on Accurate and Timely Outcomes. Machine
Learning Volume 50, Issue 3, 2003. Copyright © Kluwer Academic Publishers. Made in the Netherlands, doi: 10.1023/A:1021713901879P.251-277.
Available on the Springer website: https://link.springer.com/content/pdf/10.1023%2FA%3A1021713901879.pdf
[4] Cheng, J., Greener, R., Kelly, J., Bell, D. & Liu, W. (2002). Bayesian Network Learning from Data: Information-Theory Based Approach. Artificial
Intelligence Volume 137. Copyright © 2002. Published by Elsevier Science. Wife. All rights reserved. 43 - 90.
Science Direct: http://www.sciencedirect.com/science/article/pii/S00043704200191111
[5] Domingo, p. And Pajani, M. (1997). On the suitability of ordinary Bayesian classifiers in zero-one loss. Machine Learning Volume 29, pp. 103-130 Copyright
© 1997 Kluwer Academic Publishers. Made in the Netherlands.
Available on the University of Trento website: http://disi.unitn.it/~p2p/RelatedWork/Matching/domingos97optimality.pdf