Sie sind auf Seite 1von 1

Machine learning

Study online at quizlet.com/_2yfzom

1. Convergence Read this: 14. Memorization Memorizing, given facts, is an obvious


https://www.researchgate.net/post/How_to_proof_the_convergence_properties_of_a_metaheuristic_algorithm
task in learning. This can be done by
2. Data generating Now that we have defined our loss function, we need to consider storing
wherethe
theinput
datasamples
(trainingexplicitly,
and test) or
comes
distribution by identifying the concept behind
from. The model that we will use is the probabilistic model of learning. Namely, there is a probability the
inputgenerating
distribution D over input/output pairs. This is often called the data data, and memorizing
distribution. their
A useful way to
general rules.
think about D is that it gives high probability to reasonable (x, y) pairs, and low probability to unreasonable
(x, y) pairs. 15. Reasons for *Noise* in the training data: 1) at
3. "Divide & failure in ML feature level (e.g. incorrect values such
Conquer" as typos) or 2) at label level (e.g. the
algorithm wrong label is assigned to a set of
features).
*Insufficient features*: There are not
enough features / data available for a
learning algorithm to work.
*More than one correct answer*: There
might exist more than one correct
answer.
*Inductive bias* is too far away from the
A divide and conquer algorithm works by recursively breaking downthat
concept a problem
is beinginto two or more sub-
learned.
problems of the same (or related) type (divide), until these become simple enough to be solved directly
16. Regularization
(conquer).
4. Expected Loss

The loss l (loss function) we expect given a data generating distribution D.


5. Formal Given (i) a loss function l and (ii) a sample D from some unknown distribution D, you must compute a
definition of function f that has low expected error e over D with respect to l.
induction Helps avoid overfitting by reducing the
machine magnitude of a certain feature.
learning 17. Shallow We limit the depth of the decision tree?
6. Generalization The ability to identify the rules, to generalize,decision
allows the system to make predictions on unknown data.
tree
7. Greedy Greedy Algorithm works by making the decision that seems
18. Training error most promising at any moment; it never
algorythm reconsiders this decision, whatever situation may arise later. They are shortsighted in their approach in the
sense that they take decisions on the basis of information at hand without worrying about the effect these
decisions may have in the future.
8. Hypercube a geometrical figure in four or more dimensions which is analogous to a cubeerror
in three dimensions.
The training is simply our average
9. Hyperparameter A parameter that controls other parameters in the model. Cannot naivelyerror adjusted using
over the the training data but
training
need a validation set or development data because if we do it on the train data wedata. risk overfitting whereas if
we do it on the test data we break the rule that test data has always have to be unseen.
10. Hyperspheres a geometrical figure in four or more dimensions which is analogous to a cube in three dimensions.
11. Inductive bias / The set of assumptions that the learner uses to predict outputs given inputs that it has not encountered. E.g.
Learning bias Maximum margin bias (SVM), nearest neighbors bias (knn), etc.
12. knn scaling You should normalize when the scale of a feature is irrelevant or misleading, and not normalize when the
scale is meaningful.
K-means considers Euclidean distance to be meaningful. If a feature has a big scale compared to another, but
the first feature truly represents greater diversity, then clustering in that dimension should be penalized.
13. Loss function

Tells us how 'bad' a system's prediction is in comparison to the truth. E.g. can be seen as a measure of error.

Das könnte Ihnen auch gefallen