Sie sind auf Seite 1von 33

Partial 3

IP
The Condition
All the groups that did not discuss their projects in the class room has to develop the following
application for the 3d partial. The conditions will be strictly evaluated.
The application should calculate correlation between 3 pairs of data and shows the results.
The application should have the following:
1. Application of at least 4 of the Java primitive types
2. Application of Loops (while, do-while, for-each, for)
3. Application of conditionals (If-else, switch)
4. Application of Graphical User Interface(GUI) utilizing Swing (It is a bonus if you plot the graphics)
5. Application of Java Input/output to read and write from file(The data can be read from file)
6. Application of Arrays (Matrices)
The Condition
Minimum for the GUI: Label, TextField, TextArea, Button. (Additional point Canvas)
File (Bonus: https://docs.oracle.com/javase/tutorial/uiswing/components/filechooser.html)
Java JTable
The JTable class is used to display data in tabular form. It is composed of rows and columns.
JTable tutorials
http://www.java2s.com/Code/Java/Swing-JFC/AppendingaRowtoaJTableComponent.htm
JFileChooser
https://www.mkyong.com/swing/java-swing-jfilechooser-example/
Task at home
The application has 6 arrays of the type double and the size of minimum 100 elements.
The arrays are filled with random numbers. Each pair :
1. Positive: 1<x+rand<100 and y = ax + rand (-0.5<rand<0.5)
2. Negative: 1<x+rand<100 and y = ax + rand and y = -ax + rand (-0.5<rand<0.5)
3. No correlation: 1<x+rand<100 and y is completely random, and y = ax + rand

The function correlation receives two arrays and returns the correlation between the two
Three correlation values are printed out or shown on the Graphic User Interface GUI.
Correlation
In statistics, dependence or association is any statistical relationship, whether causal or not,
between two random variables or bivariate data. Correlation is any of a broad class of statistical
relationships involving dependence, though in common usage it most often refers to how close
two variables are to having a linear relationship with each other. Familiar examples of
dependent phenomena include the correlation between the physical statures of parents and
their offspring, and the correlation between the demand for a limited supply product and its
price.
Correlation
Correlations are useful because they can indicate a predictive relationship that can be exploited
in practice. For example, an electrical utility may produce less power on a mild day based on the
correlation between electricity demand and weather. In this example, there is a causal
relationship, because extreme weather causes people to use more electricity for heating or
cooling. However, in general, the presence of a correlation is not sufficient to infer the presence
of a causal relationship (i.e., correlation does not imply causation).
Pearson's product-moment coefficient
The most familiar measure of dependence between two quantities is the Pearson product-
moment correlation coefficient, or "Pearson's correlation coefficient", commonly called simply
"the correlation coefficient". It is obtained by dividing the covariance of the two variables by the
product of their standard deviations.
The population correlation coefficient ρX,Y between two random variables X and Y with
expected values μX and μY and standard deviations σX and σY is defined as

where E is the expected value operator, cov means covariance, and corr is a widely used
alternative notation for the correlation coefficient.
where x and y are the sample means of X and Y, and sx and sy are the corrected sample standard deviations of X
and Y.
Common Examples
of Positive Correlations
The more time you spend running on a treadmill, the more calories you will burn.
Taller people have larger shoe sizes and shorter people have smaller shoe sizes.
The longer your hair grows, the more shampoo you will need.
When enrollment at college decreases, the number of teachers decreases.
As a student’s study time increases, so does his test average.
Example: Ice Cream Sales
The local ice cream shop keeps track of how much ice cream they sell versus the temperature on
that day, here are their figures for the last 12 days:
We can easily see that warmer weather and higher sales go
together. The relationship is good but not perfect.
In fact the correlation is 0.9575
Steps
Let us call the two sets of data "x" and "y" (in our case Temperature is x and Ice Cream Sales is
y):

Step 1: Find the mean of x, and the mean of y


Step 2: Subtract the mean of x from every x value (call them "a"), do the same for y (call them
"b")
Step 3: Calculate: ab, a^2 and b^2 for every value
Step 4: Sum up ab, sum up a^2 and sum up b^2
Step 5: Divide the sum of ab by the square root of [(sum of a^2) × (sum of b^2)]
Other Methods
There are other ways to calculate a correlation coefficient, such as "Spearman's rank correlation
coefficient"
To prevent Greek symbol phobia maybe just say ρ is the common symbol used to represent
correlation.
Covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random
variables. If the greater values of one variable mainly correspond with the greater values of the
other variable, and the same holds for the lesser values, i.e., the variables tend to show similar
behavior, the covariance is positive.
Covariance
The problem with covariance is that it is hard to compare: when you calculate the covariance of
a set of heights and weights, as expressed in (respectively) meters and kilograms, you will get a
different covariance from when you do it in other units (which already gives a problem for
people doing the same thing with or without the metric system!), but also, it will be hard to tell
if (e.g.) height and weight 'covariate better' than, e.g. the length of your toes and fingers, simply
because the 'scale' you calculate the covariance on is different.
The solution to this is to 'normalize' the covariance: you divide the covariance by something that
represents the diversity and scale in both the covariates, and end up with a value that is assured
to be between -1 and 1: the correlation. Whatever unit your original variables were in, you will
always get the same result, and this will also ensure that you can, to a certain degree, compare
whether two variables 'correlate' more than two others, simply by comparing their correlation.
Plot the data
Plot the data

Das könnte Ihnen auch gefallen