Statistics Data Collection

Ammaar Saiyed
Statistics Data Collection

1. THE DEEPER THE EARTHQUAKE (THE DEPTH) THE BIGGER THE MAGNITUDE ON THE RICHTER
SCALE
For my first hypothesis, I was originally going to do; the bigger the seismic moment of an
earthquake, the bigger the magnitude, but then when attempting to collect the data, I found it very
difficult. I could find no websites which recorded the seismic moment of an earthquake and after
searching for hours, I decided that it would be best to choose another hypothesis. Therefore, I
changed my hypothesis to The deeper the earthquake, the bigger the magnitude on the Richter
scale. The source that I used was secondary; earthquakes.usgs.gov. The reason that I chose this
website was because I thought that it was very reliable. It was very time consuming as I used
systematic sampling and there was a lot of data and also a lot of extra fields, therefore it took a
long time and would have been easier using a less reliable but easier to read website, such as
Wikipedia, but I thought that I would be better to use a reliable source. I chose to do systematic
sampling, as it is less time consuming than other methods but more random than convenient
sampling. Firstly, I put the years of earthquakes that I could choose from in a bag, and picked one
out, which happened to be 2000. I chose the first one then missed one and then chose the next one
and so on and so forth. I did this until I had 30 pieces of data; 30 earthquakes with magnitude and
depth recorded. I chose to use 30 earthquakes as it will be a wide variety of data, and not all
similar, and I chose not to do more than this as 30 earthquakes took a lot of time in itself, and I
thought that choosing to do more than this would be pointless and have no benefit. I am going to
deal with the anomalies by plotting my scatter graph with the bivariate data and see if I can visually
see any earthquakes which a far away from the rest of the data/any value that is far away or
removed from the main distribution of data. I think that my data is very reliable as this website is
always on a list of search results on Google when searching for any earthquakes statistics. I dont
think that the data is biased in anyway but the only possible problem that I think there may be is
that there are a lot of earthquakes with depths of 33 or 10 and also, the depths all have different
letters after them, either an N, D or a G. To represent my data, I may do a bar chart as it is simple
and easy to view, but I will most likely plot a scatter graph. The reason for this is because, with a
scatter graph, I can easily and clearly see if there is any correlation between the two variables. I can
also plot a line of best fit. Another method that I am going to use to check for correlation between
the two variables is the Spearmans Rank Coefficient Correlation. I will do the Spearmans Rank
method for testing between correlations after the scatter graph, and then I will miss out any
anomalies in the data when carrying out the method.

Ammaar Saiyed
Place Year Depth
Magnitude
(Mb)
Tonga Islands 2000 183 6.5
Northeast China 2000 10 4.9
Yunnan 2000 33 4.9
Northern and Central Iran 2000 33 5.1
S Africa 2000 5 4.5
Mariana Islands 2000 132 6
Volcano Islands 2000 127 6.8
Sulawest 2000 26 6.7
Turkey 2000 5 4.5
Jujuy Province 2000 225 6.2
Off coast of Oregon 2000 10 5.8
Southern Sumatra 2000 33 6.8
Turkey 2000 10 5.5
Sea of Japan 2000 10 5.7
Southern Sumatra 2000 33 6.1
West India-Antartica Ridge 2000 10 5.9
South India Ocean 2000 10 6.8
South coast of Honshu 2000 10 6
Turkyey 2000 9 4.2
Jawa 2000 33 5.2
Phillipine 2000 33 6.1
Hindu Kush 2000 141 6
Honshu 2000 10 6.5
Sakhalin Islands 2000 10 6.3
Banda Sea 2000 649 6.5
Kermadec 2000 358 6
Yunnan 2000 33 4.9
Banda Sea 2000 16 6.5
N California 2000 10 4.9
Coast of Ecuador 2000 33 5.4

Ammaar Saiyed
2. HYPOTHESIS: EARTHQUAKES FROM 1990-1999 HAVE MORE CASUALTIES THAN EARTHQUAKES
WITH THE SAME MAGNITUDE IN 2000-2010.
For my second hypothesis, I found it very difficult to find a source which would give me a list or
earthquakes ranging from 1990-2010 which also showed the magnitudes and casualties of the
earthquakes. I chose to use secondary data. As a result of this, as a last resort, I had to use Wikipedia,
although at first I was very hesitant, I had to choose it. I think that some of my data may still be
unreliable as Wikipedia allows users to edit the pages; however, to do this you must sign up and your
editing would be checked by site moderators. Despite this, there is still the risk of not being 100%
certain on the reliability of the data. The reliability may be weak but all the data had references.
However, I could have researched the earthquakes in a lot more depth; found earthquakes, then
research the magnitude, then research casualties, which could have given me more credible and
reliable data. Nevertheless, I chose not do this as I think it would have been too time consuming and
difficult to get the data. The method I was originally going to use to choose my data was systematic
sampling, but many of the pieces of data had no casualties so the data would be invalid. Subsequently,
I had to choose another sampling method, but I needed to choose one which would allow me to
choose myself the pieces of data. Therefore, I reverted to using convenient sampling, choosing only the
first 15 highest magnitudes from 1990s to 1999s. I sorted the data from highest casualties to lowest
casualties then chose the first 15 earthquakes. Using this method allowed to manually choose myself
which earthquakes to use and it also meant that all my pieces of data were valid to use. After this, I
went on a list of earthquakes from 2000-2010. I first found as many earthquakes as I could which had
the same magnitudes of the earthquakes from 1990-1999. Then, from these, I chose the earthquakes
from each magnitude which had the most casualties. After doing this, I found that 3 earthquakes from
2000-2010 compared to earthquakes of the same magnitude in 1990-1999 had more casualties and
therefore were outliers, and never matched the hypothesis. I made sure I chose earthquakes with the
highest casualties from the second time range as well as making sure they have the same magnitude,
to ensure that my data isnt biased; as I may have chosen earthquakes with the same magnitude but
purposely chosen the ones with fewer casualties to make sure that my hypothesis is correct. The
reason I chose 30 pieces of data (15 from each time range) is so that the data isnt incorrect, or so that
the data isnt all the same but varied, which would mean I could get good results from my data. If I
chose less, for example 5 from each time zone, my data would have been unable to be used. To then
represent my data, I will plot 2 sets of box plots both comparing the earthquakes from the 2 time
zones. I will also calculate the standard deviation of the data to see if the data is actually even close to
the mean. I will work out the mean, median and range and also the 1
st
and 3
rd
quartile as well as the
interquartile range. The current data I have has 3 anomalies in both sets, and both these 3 are in the
first five pieces of data. If I used only 5 from each that would have meant that 3/5 or 60% of both my
sets of data, and in just 5 pieces of data, the ranges would have been 14,944 and 315,990. From the 30
pieces of data, I still thought that the data may have been a bit too varied, so after calculating the
lower quartile, upper quartile and inter-quartile ranges of both the sets of data, I multiplied the inter-
quartile range by 1.5 then added this number to the upper quartile and subtracted it from the lower
quartile to see if there were any anomalies. I then found 3 anomalies from the first set of data and also
in the second set of data. I will not plot these 3 pieces of data on my box plot as this will mean I will
have to widen/make the box plot too big and it will mean that the box plot will be very difficult to plot,
as my data has a very big range. The first set ranges from 78 - 17,127 and in the second set of data it
ranges from 2 - 316,000. Once the outliers are removed, the ranges are from 78 - 2400 and in the
second set of data from 2 - 944. I was originally planning to plot the outliers but after attempting a
Ammaar Saiyed
Mean 22673.8
Median 32
Mode 4
Range 315998
Lower Quartile 7.5
Upper Quartile 555
IQR 547.5
Standard
Deviation
81308.9771
1376.25
-813.75
Deaths
Outliers
practice box plot to see what it would be like, I saw that it would be too big, and the actual box plots
would be too difficult to plot. I will have four box plots, 2 sets which both contain 2 comparative box
plots. The two sets will have different medians, quartiles and different max values. The first set, will
have quartiles and medians of the full data, but then I will mark the outliers with x. Also, I will not
include any of the outliers as the minimum or maximum values. The second set of box plots will have
medians and quartiles of only the data which arent outliers i.e. quartiles and medians of 12 pieces of
data for both time periods as both periods have 3 outliers. The max values will be those which arent
outliers and I will only plot the outliers with an x if they fit on without making the box plot too big. The
reason that I am doing this is because I want a set of box plots which have a true representation of the
data or the true median and quartiles of the full data and also a set of data which completely doesnt
include any of the outliers in anything, so a median and quartiles are determined without using these in
the data.
1990-1999 2000-2010
Place Year Deaths Magnitude Place Year Deaths Magnitude
Great
Hanshin
1995 6434 6.9 Yushu 2010 2698 6.9
zmit 1993 17,127 7.4 Hindu Kush 2002 166 7.4
Latur 1995 9,748 6.2 Aisen Fjord 2007 10 6.2
Chichi 1999 2,400 7.7 Gujarat 2001 20085 7.7
Papua New
Guinea
1999 2,183 7 Haiti 2010 316000 7
Armenia 1999 1,185 6.2 Taiwan 2013 4 6.2
Dzce 1998 894 7.2 Baja 2010 4 7.2
Cairo 1992 545 5.8 N. Italy 2012 20 5.8
Hokkaido 1993 230 7.7 El Salvador 2001 944 7.7
Liwa 1994 207 6.9 Iwate-Miyagi 2008 13 6.9
Ceyhan 1998 146 6.2 Tatar Strait 2007 2 6.2
Nicaragua 1992 116 7.7 Papua 2009 5 7.7
Biak 1996 108 8.1 Solomon Islands 2007 54 8.1
Mindoro 1994 78 7.1 Papua 2004 32 7.1
Dinar 1995 90 6.1 Borujerd 2006 70 6.1

Mean 2766.06667
Median 545
Mode #N/A
Range 17049
Lower Quartile 131
Upper Quartile 2291.5
IQR 2160.5
Standard
Deviation
4841.55716
5532.25
-3109.75
Deaths
Outliers
Ammaar Saiyed
3. EARTHQUAKES IN JAPAN HAVE HIGHER MAGNITUDES THAN EARTHQUAKES IN AUSTRALIA
I also changed my hypothesis for this one. I was originally going to research that the bigger the two
tectonic plates colliding, the bigger the magnitude, but this was also very difficult to find data for
and was very time consuming. The reason this is the case for 2 of my hypotheses is that I tried
thinking of original ideas and original hypotheses, but then when you try to research it, the data is
very difficult to get hold of. For my third hypothesis, I used secondary data, once again using
Wikipedia, as I couldnt find any websites that stored information on the earthquakes in Australia,
most websites contained less than 15. I then chose to do convenient sampling as I thought it would
be counterproductive to do any other method. The reason behind this is that the data on Wikipedia
was already sorted by a field which has no value or effect to what I was measuring. Because it was
sorted by date, and I was measuring magnitude, there would be no reason at all to do systematic
sampling; therefore I just did convenient sampling. Using this method, I chose 15 pieces of data
from Australia and 15 from Japan, giving me a total of 30. I think that 30 is a good amount of data
to have as it widens the range of data and isnt all clustered or similar. There was a lot more than
this but I think that I only need about 30 pieces to have fairly accurate data for my hypothesis. I
dont think that I will have any anomalies in my data as the ranges of the two sets of data are both
just 2.6, therefore being unlikely to contain anomalies. If I do have anomalies however, I will just
exclude these from the calculations and the presenting of the data. The methods which I will use to
present and process the data are:
box plots to compare the two, mean, median, mode and range and will then work out the 1
st
and
3
rd
quartiles which will also allow me to work out if there are any outliers in the data by multiplying
the interquartile range by 1.5 and then subtracting this from the 1
st
quartile and adding to the 3
rd
.
Then if there are any values below or above these two numbers, they will be classes as outliers and
therefore excluded from my data. Another calculation that I will be carrying out is the Standard
Deviation one to measure how close my data is to the mean that I will calculate for it. I think that
the reliability of my data isnt very good however as the website isnt a very trusted one and the
references to all the earthquakes arent the same and do not have a fixed reference for all the data.
What I could do alternatively is research it further and look for a more trusted site to obtain my
data from. The anomalies if found wont be a part of my box plot and calculations, but will just be
plotted as an x.

Ammaar Saiyed
Earthquake(name/location) Date Magnitude Earthquake(name/location) Date Magnitude
Newcastle,New South Wales 28/10/1842 5.3 Hakuhou Nankai earthquake 29/11/684 8.4
OffshoreCape Schanck 17/9/1855 5.5 Minoh 5/7/745 7.9
Newcastle,New South Wales 18/6/1868 5.3 869 Jogan Sanriku earthquake 13/7/869 9
Eastern Highlands,Victoria 29/9/1868 5 1293 Kamakura earthquake 27/5/1293 7.5
Gayndah,Queensland 28/9/1883 5.9 Shhei earthquake 3/8/1361 8.5
Tasman
Sea,TasmaniaandVictoria
13/7/1884 6.4 1498 Mei Nankaid earthquake 20/9/1498 8.6
Tasman
12/5/1885 6.8 Tensho or Ise Bay earthquake 18/1/1586 7.9
Cape LiptrapVictoria 2/7/1885 5.7 1605 Keich Nankaid earthquake 3/2/1605 7.9
Yass,New South Wales/ACT 15/11/1886 5.7 1611 Keicho Sanriku earthquake 2/12/1611 8.1
Tasman
26/1/1892 6.9 1703 Genroku earthquake 31/12/1703 8
Beachport-Robe,South
Australia
10/5/1897 6.5 1707 Hei earthquake 28/10/1707 8.6
Warooka,South Australia 19/09/1902 6 1771 Great Yaeyama Tsunami 24/4/1771 7.4
Warrnambool,Victoria 14/07/1903 5.3 1792 Unzen earthquake and tsunami 21/5/1792 6.4
Alpine National Park,Victoria 10/04/1904 5 1854 Ansei-Tkai earthquake 23/12/1854 8.4
Indian Ocean,Western
Australia
19/11/1906 7.6 Ansei-Nankai earthquake 24/12/1854 8.4
Australia Japan

Earthquake(name/location) Date Magnitude Earthquake(name/location) Date Magnitude
Newcastle,New South Wales 28/10/1842 5.3 Hakuhou Nankai earthquake 29/11/684 8.4
OffshoreCape Schanck 17/9/1855 5.5 Minoh 5/7/745 7.9
Newcastle,New South Wales 18/6/1868 5.3 869 Jogan Sanriku earthquake 13/7/869 9
Eastern Highlands,Victoria 29/9/1868 5 1293 Kamakura earthquake 27/5/1293 7.5
Gayndah,Queensland 28/9/1883 5.9 Shhei earthquake 3/8/1361 8.5
Tasman
13/7/1884 6.4 1498 Mei Nankaid earthquake 20/9/1498 8.6
Tasman
12/5/1885 6.8 Tensho or Ise Bay earthquake 18/1/1586 7.9
Cape LiptrapVictoria 2/7/1885 5.7 1605 Keich Nankaid earthquake 3/2/1605 7.9
Yass,New South Wales/ACT 15/11/1886 5.7 1611 Keicho Sanriku earthquake 2/12/1611 8.1
Tasman
26/1/1892 6.9 1703 Genroku earthquake 31/12/1703 8
Beachport-Robe,South
Australia
10/5/1897 6.5 1707 Hei earthquake 28/10/1707 8.6
Warooka,South Australia 19/09/1902 6 1771 Great Yaeyama Tsunami 24/4/1771 7.4
Warrnambool,Victoria 14/07/1903 5.3 1792 Unzen earthquake and tsunami 21/5/1792 6.4
Alpine National Park,Victoria 10/04/1904 5 1854 Ansei-Tkai earthquake 23/12/1854 8.4
Indian Ocean,Western
Australia
19/11/1906 7.6 Ansei-Nankai earthquake 24/12/1854 8.4
Australia Japan

Statistics Data Collection

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Statistics Data Collection

Hochgeladen von

Copyright:

Verfügbare Formate

Ammaar Saiyed

Statistics Data Collection

Das könnte Ihnen auch gefallen