The Quantifier: Yangal Raghavendra 16BD1A05DF

THE QUANTIFIER
YANGAL RAGHAVENDRA
16BD1A05DF
Introduction:
 The Quantifier is an algorithm which will input the parameterized statistics of the
Cricket Tournaments and result in the possibilities of permuting the players to
attain a fair chance of winning.
 Using all the retrievable statistical information about the players and the ground
conditions, we can foresee the result and also elevate the chances of winning.
 Analysing boundary conditions, player analysis in accordance with the
differentiable bowling actions, an optimized result set can be devised.
 This application ensures the unambiguous and mostly accurate result from the
pre-existing data sets of the players.
 Using Regression of the data sets, we can come to an outcome to differentiate
and deploy the result.
 This outcome can be applied and the players are permuted and arranged to
attain a definite win.
 The parameters can be variable to the particular match or a particular team.
The technologies used to procure the results are:
 DataMining-To regress and extract the data sets of the matches

 Machine Learning-To intercept and interpret the data.
We use all these technologies to attain the algorithm to predict and set a well
formed team to attain a plausible victory.
Data Mining:
 Data Mining is a well-applicable study of how to relate and examine
a large pre-existing databases in order to generate new information
which can be applied in any scenarios.
 It is the process of discovering patterns in large data sets involving
methods at the intersection of Machine Learning, Statistics and
database systems.
 To turn raw data into useful information in a given pattern in large
batches of data
 By using software to look for patterns in large batches of data, businesses and
other realms of applicable fields can learn more about the customers, increase or
decrease different parameters constituting them.
 These processes are used to build machine learning models that power
applications including search engine technology and website recommendation
programs.
 It involves exploring and analyzing large blocks of information to glean
meaningful patterns and trends.
 This can be used in a variety of ways, such as db marketing, fraud detection or
even to discern the sentiment or opinion of users.
 The data mining process breaks down into five steps:
a. Organizations collect data and load it into data warehouses.
b. They store and manage the data, either on in-house servers or the cloud.
c. Business Analysts , management teams and information technology
professionals access the data and determine how they organize it
d. The application software sorts the data based on the user's results.
e. The end user presents the data in an easy-to-share format, such as a graph or
table.
Machine Learning:
 Machine Learning is an application of Artificial Intelligence that
provides systems the ability to automatically learn and improve
from experience without being explicitly programmed.
 It focuses on the development of computer programs that can access
data and use it learn for themselves.
 The process of learning begins with observations or data, such as
examples , direct experience , or instruction , in order to look for
patterns in data and make better decisions in the future based on the
examples we provide.
 Machine Learning algorithms are often categorized as two types:
1. Supervised machine learning algorithms:
These can apply what has been learned in the past to new data using
labeled examples to predict future events . Starting from the analysis of a known
training datasets , the learning algorithm produces an inferred function to make
predictions about output values.
2. Unsupervised machine learning algorithms:
These are used when the information used to train is neither classified nor
labeled . Unsupervised learning studies how systems can infer a function to describe
a hidden structure from unlabeled data.
 This field of study that gives computers the capability to learn without being
explicitly programmed.
 The data can be any unprocessed fact value ,text, sound or picture that is not
being interpreted and analyzed. Data is the most important part of all data
analytics.
 We split data in Machine learning into 3 parts:
1. Training data: The part of data used to train our model.

2. Validation data : The part of data which is used to do a frequent evaluation of
model , to fit on a data set.
3. Testing data : Once our model is completely trained , testing data provides the
unbiased evaluation.
 The Quantifier has 2 functionalities . Either to predict and forecast the winning
team or to devise a well-positioned team permutation to attain victory.
 The data sets can be player analysis in accordance with the differentiable bowling
actions , ground conditions , weather conditions.
 Using regression we can obtain a data set which can be used to produce the
desired output.
 As we retrieve the data sets in the well-formed way, we can use it to any of our
purpose.
 Teams, matches, and factors affecting outcomes of matches will be analyzed.
 Some factors that affect match outcomes could be venue (stadium), city, toss
winner, and toss decision (field/bat).
 Python 3+ has helpful analytics, predictive, and charting libraries.
 Libraries we use include linear algebra (numpy), data processing for CSV
(pandas), charting (Matplotlib), statistical data visualization (seaborn), and
machine learning modeling (scikit-learn).
 Consider the predictive functionality
 First, we address missing data with a process called impute. There are numerous
ways to fill missing data based on imaginary scenarios. Let's consider match-
related columns -team1, team2, city, toss_decision, toss_winner, venue, and
winner.
 Then we compute a team-wise graphical representation of total match wins using
a histogram.
 The scikit-learn open-source library provides machine learning in Python. This
library requires all inputs to be numeric. All categorical variables should be
converted into numeric variables using the concept of encoding with scikit-learn
LabelEncoder. Later, a predictive model is created using a generic function called
class_model that takes parameters model (algorithm), data, predictors input, and
outcome predictable feature
 The k-fold cross (k=5) validation technique is used to reserve a sample set on
which we do not train the model but it will be used to test the model before
finalizing.
 Mean score error is used to determine the evaluation. Standard deviation could
also be used for evaluation.
 More predictor variables may result in unseen training data. This unseen training
data results in over fitting.
 The user needs to balance the training set and predictor variables based on the
accuracy and cross-validation score.
 A multiple model classifier was tested for given data. The Random Forest
Classifier mode was used.
 Consider two teams A and B in an IPL match, we estimate the data sets regarding
each player either batsmen or a bowler ; how one runs against the other or how
the partnership of two batsmen affects the score in a particular over of bowling of
a bowler.
 Their capabilities and weaknesses with respect to particular players are devised
and a team with probability of winning is predicted.
 The same is inputed into the algorithm to compute fruitful arrangement of

players of the team.
Future Work:
 There are some future works that can be done in order to improve
this project.
 The data set can include some of the external factors like player
injury, player fatigue , winning streak with a particular team, overall
winning streak, average runs scored by a team against a particular
team in previous matches, etc. and on the basis of these data, we can
try to do the prediction and check to see if the accuracy improves.
 The prediction can also be done taking into consideration the performance of the
players in the team like the total number of runs scored by a player in the
tournament, the form guide of the player, the number of man of the match awards
earned, etc. rather than only using a high level data about the different matches
like toss winner, toss decision, home team, etc.
 There is no web/mobile application or UI that my project contains. So, a
web/mobile application can be made which would take in the entire data set as
input and display the prediction result for each of the instances to a pdf or text
file
THANK YOU

The Quantifier: Yangal Raghavendra 16BD1A05DF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

The Quantifier: Yangal Raghavendra 16BD1A05DF

Hochgeladen von

Copyright:

Verfügbare Formate

THE QUANTIFIER

 DataMining-To regress and extract the data sets of the matches

1. Training data: The part of data used to train our model.

 The same is inputed into the algorithm to compute fruitful arrangement of

Das könnte Ihnen auch gefallen