Beruflich Dokumente
Kultur Dokumente
Transmitting Storing
Organising Receiving Retrieving
Processing
/' Displaying
22 97 February 22 97
20 99 March 20 99
18 94 April 18 94
15 81 May 15 81
13 86 June 13 86
12 80 July 12 80
14 56 August 14 56
16 51 September 16 51
18 63 October 18 63
20 63 November 20 63
22 70 December 22 70
Tot a l 930
Maxi m u m 24 99
Minimum 12 51
Data analysis is used for Another way of analysing data is to find patterns or trends in the data. In figure - __
decision-making . some patterns in the data have been made much more obvious by using the height
the column graphs. For example, the coldest months in Parramatta are not the wett
months.
One of the main reasons for analysing data is to help make decisions. If the da
analysis reveals certain trends or patterns, then that information can assist
decision-making process. In planning the Sydney Olympics, for example, a detail
data analysis of Sydney's past weather patterns revealed that September-October h
the best weat her conditions for holding a range of different sporting events. The de i ~
of the Olympic Stadium was influenced b an analysis of the wind patterns experienc .
in Homebush during those months.
Describe : provide characteristics and features (Note: describe does not mean 'explain')
Describe the role of the analysing process in informat ion systems. The purpose of
analysing is to transform collected data into information by giving the data meaning or
purpose . The original data are not altered and the information produced can be used as p:=
of a decision-making process.
( ) ':,
6 S Clock speed
7 9 5
3 2
4 o 6
+ 1
8 6
Floating point operations
Figure 5.3 Features of the CPU that affect the analysing process
Office ment:
Th e cu rren t r, Mrs Hea ley, has recently returned from
a m ent confere nce wh ere sh e presented a paper on
how t o an office staffed by part-time workers. Her
experiences in t hi s new area of m ent skill s were we ll
"'card matches use the In wildcard matching, not all the characters in the search key have to be
bois ' 7' and ,* , to described before the search begins. You can replace individual characters or whole
;:'esent oth er
groups of characters in the search key with special 'placeholders'. These place
holders are often called wildcard characters and are usually the characters '?' and
'*'. The '?' character is used to represent any single character in a particular pos
ition . For example, the search key 'h?t' can be used to search for 'hot', 'hit', 'hat'
and any three characters where the first character is '11' and the last is 't'o In this
case '?' represents any single character at the second character position. The three
characters can be a complete word like 'hat' or they can be part of a larger word
like 'that'. The '*' wildcard character is used to represent zero or more additional
characters in the search key. It actually means 'match the entered text characters
plus any following characters up to the first space, punctuation, or non-displayable
character'. Non-displayable characters include the 'end-of-line' and 'end-of
document' markers. For example, figure S.5 shows a search using the search key
'manage*'. This search would find five places in the data which match the search
key. Included in the matches is the word 'manage' because the wildcard character
also means 'no e}..ira characters' as well as any number of extra characters.
Office
Th e current , Mrs Healey, has recently returned from
a conference w here she presented a pa per on
how to an off ice st affed by part-time workers. Her
experiences in t hi s new area of skill s were well
Figure 5.5 Using a wildcard match text search
Figure 5.6 Combining searches using the AND and the OR logical operators
Image, audio and video Searching image, audio and video data
data searches are usuall y Because there is a huge variety of different data organising methods for image, audio
based on search ing their
fil names or file and video data, there is no single effective way of searching any of these data types. The
descriptions. most commonly used methods involve searching through their filel1ames or their text
descriptions. Th e people who build multimedia file libraries would also have to create
the file descriptions manually or use intelligent automated search software for the task.
If you have ever tried to locate a particular image on the Web then you will
probably know what a frustrating experience it can be. Most search engines
are designed to work with the tex t found in web pages, not the graphics. The
main disadvantage of searching for images is that you have to rely entirely
on how a person or a software application has described them. People
supplied descriptions are the most accurate but are limited by the range of
keywords and categories they use. Nearly all web search engines use
automated software agents to locate and index new web pages. When they
find an image, they will attempt to classify it from information on the page,
the name of the image file, or where it is stored on the web server. For
example, you may be interested in locating images of cirrus clouds - the
thin wispy clouds found at high altitudes. When searching for the word
'cirrus ' using five image search engines, the results were :
• AltaVista at www.altavista.com (select the Images link first) - returned
719 hits, each with an image thumbnail (a small, low resolution copy of
the image). Only 83 were of cirrus clouds but many of these were identical
copies of the same image found on different web sites. Products (such as
cars and sailboats), pets and even people named 'cirrus' were also present
in the results. All the selected images contained the word 'cirrus ' in their
filename (e .g. 'cirrus01.gif') . AltaVista performs a search that only looks at
the filenames of the images it finds.
Search results will vary depending on when the search is conducted. The
databases of most engines are constantly updated and impro ved, so past
performances are not alwa ys a good guide. Use the search engines listed
above, and other specialised engines that you may know of, to search for
particular images, video or audio files . Analyse (see the check box on page
114 for an example) the performances of your selected engines for the
different types of data in your searches.
Sorting data
A simple sorting operation can be a useful first step in transfor m ing collected data
rting pu t s data into a into information. Unlike a manual information system, the data stored in a computer
' - order. based information system do not have to be set in a permanent order. Think of the
problems you would have using a telephone book if the names were not in alphabetical
order. However, the speed of m odern computer systems usually makes sorting a large
- text data can
~1'1 g quantity of data a relatively quick and simple task. The result of a sorting operation will
se unexpected resu lts. depend on the type of data being sorted. Text data, for example, will be sorted differ
ently to numerical data. Table 5.2 gives a summary of the effect sorting has on text and
numerical data types.
As shown in table 5.2, the sort operation that usually causes the most problems and
confusion is sorting text data. Text characters are usually sorted character by character
according to th e ASCII value of each character. The first characters are compared, and
only if they are th e same will the second characters be compared, and so on until a dif
ference is fo und. All upper case letters, which have lower ASCII val ues, will be placed
ahead of the lower case letters in an ascending (A to Z) sort. So 'XYZ' will come before
'abc' because X' has a lower ASCII value than 'a'. When alphabetical symbols are being
sorted, the differences between upper and lower case letters should be ignored to
Text A to Z: using the ASCII value of each individual character Z to A: using the ASCII
starting with the first character. ihis will result in some value of each individual
unexpected sort orders. For example, 'z' will be placed before 'a'. character starting with
Numbers are treated as text characters so '100' will be placed the first character.
before '2' and the date '14/10/2002' will be placed before
'2/5/1999'.
Numerical o to 9: using the numerical value of the entire data item, not its 9 to 0: using the
individual characters, 2 will be placed before 100, 0.9 will be numerical value of the
placed before 1.0 . entire data item.
The sorting of ima e, audio and video data is usually based on sorting th rr
filenam es, their text descriptions, or even their file sizes (for' example, the larg _.
images to the smallest) .
Analyse: identify components and the relationship between them; draw out and relate
imp licati on s
Analyse the results of this sorting operation. The results are significantly different for thE
three date formats. Of the three, the Swedish format is probably the most useful as all dates
in the same year are grouped together and sorted into their correct month and day order.
However, if only two digits are used for the year then dates from different centuries wil'l not bE
correctly sorted. The other two lists did not so rt any of the dates on their year unless two
dates had the same month and day. The sorted lists in both Australian and US formats had
dates from widely different years placed next to each other. The US format could be useful fo
locating dates from particular months, regardless of the year.
pojnt A computer model uses the analysing process to describe or represent another
- model is a description system, real or imaginary. The data collected about the system are analysed in a way
: c a system, process or that lets the computer build its model of the system. The model could be an image, a
: :ject. set of equations or even a sound or animation. For it to be useful, a model must be as
realistic as possible. Computer modelling helps us to understand systems or processes
\vithout always having to build them first.
Computer models are created because someone, such as a scientist, engineer, econ
omist or accountant, needs to study or understand a system. Data, in the form of
measu rements, observations and ru les (or equations) are collected from the system
and built into the model.
~ .J..L A computer simulation is used to test the behaviour of a model by analysing how it
simulation uses a reacts to changes in its data and rules. A si mulation is able to predict how the system
_:",1 to predict the \vill react when it is placed under different conditions. It can be used as an experiment
•c a lour of a system or because you can alter the conditions and see what happens. This is something you
. -ess .
cannot always do in real life. Computer simulations include weath r prediction pro
grams, transport simulators and applications in many scientific and business areas.
Mo t computer games are simulations.
Simulators are used in a \vide range of research and training areas. Simulations of the
economy allow you to alter economic conditions such as interest rates or inflation to
observe the results . A business can use a spreadsheet application to simulate the effect
of an increase in raw material costs or interest rates (see 'what-if analysis on page 116).
Engineers can use simulation software to predict the flight characteristics of a new air
- ~ m ulations can be craft design or the road performance of a new car design, saving both money and time
Zl un a spreadsheet in development costs. Ever} commercial airline pilot and ship's captain \viII spend time
"""":;...., specialised trai ning and being assessed on simulators. Even military personnel \viII spend some time
" e is often used to
e faster ca lculations
training in battle simulators instead of with real troops and real ammunition.
<I er realism. While st andard spreadsheet applications can be used to run many simulations, the
realism demanded for training simulations means that specialised softwme and hard
ware are often requi red. Specialised simulation programs are also designed to perform
thei r data analysis much faster than standard 'off the shelf software.
Figu re 5.7 shows Auran® Trainz®, a
model railroad simulator created by an Aus
tralian computer gam developer. The pro
gram allows you to create a railway layout,
complete with bridges, tunnels, points
(track switches) , scenery and working sig
nals. It uses data and niles that describe :
• track ooditions such as gradients and
curves, the operation of points
• locomotive characteristics such as
weight, p ulli ng power, acceleration,
braking, maximum peed
• rolling stock (wagons and carriages) Figure 5.7 Aura n Trainz - a model
charactetistics such as weight, braking railroad Simulator, cheaper and
· .1 , quicker than building the real thing
and maXImu m speeus
• the effect of moving trains on the operations of signals and points
• the phy ics of moving a train, collisions and the movements of derailed locomotives
and rolling stock
• weather conditions and scenery.
The data and rules are analysed to predict the speed and performance of the locomotive
_ ~ 'lly as and its attached rolling stock along the constructed tracks. This is then translated into
a computer-generated anim tion. A user of this program is able to run simulations usi ng
a variety of locomotives, rolling stock, track routes and weather conditions. They can
also collect data of their own to create new locomotives, rolli ng stock and scenery.
tWhat-if' analysis
One of the advantages of a spreadsheet application is its ability to quickly recalculate a
sheet full of equations whenever a single data item is altered. This feature is used for
fL 0 r I 'what-if" analysis in a wide variety of application areas. Figure 5.8 shows a simple
'What-if' analysi s allows example using a loan repayment calculation . By altering the interest rate in the spread
a user to make temporary sheet, a user is able to see the effect of this on the monthly loan repayments. BecausE'
alterations to data to
observe the effects on a
tbe spreadsh eet contains an tbe conected data and the equati.ons that analyse the data
model . the user can simply change any data value or any equation to observe the resulL
'What-if analysis asks the question 'what would happen if this is changed?' - it i_ (i.
simulation using a model that can have its data and rules (equations) altered.
~
5
-
6 loan Amount I
$ 10,000.00 $8,000.00 $5,000.00 $4,000.00 I
7 Repayments (monthly) $ 324.44 $ 259 .56 $ 162.22 $ 129.78
IS Repayments (fortnightly) $ 149.74 $ 119.79 $ 74.87 $ 59 .90
Ig .
I~ ' ~ .. 1. 1 Sheet 1 I. Sheet2 I. Sheet3 I J~ I I i ll
I Otaw y ~ c;" , ~oShdpes .. " " D 0 ~ ... Ij] Ie-· ;! ,. A. • == ~ ~ II>
The advantages of
Charts and graphs
charts and graphs Charts and graphs are popular methods for analysing data. They can show relation
over tables are :
Im pact
ships, trends and comparisons at a glance and they are a much faster way to abso
• speed information than a table full of figures. The advantages of charts and graphs as ~
simplicity. method of analysing data are:
• impact - the use of colours, symbols and fill patterns can draw attention to impor
tant details in the data
• speed - the trends shown in a well-drawn chart or graph can be very obvious
• simplicity - almost anyone can understand a message displayed in a chart or graph,
where the same message would be lost in rows and columns of boring figures.
The selection of the type of chart or
graph to use for an analysis is important
51anc:lefd Types 1 CustOO'l Types I
the wrong type can easily lead to confusion.
Figure 5.9 shows the main chart types
found in Microsoft Excel. Each type has ~e.r
~Lne
several different variations. (JPIe
L XI' (5<att.,)
-~ Ice of whi ch chart In table 5.4, the data are described as -.. S.e3
~ ill depend on t he being 'continuous' or 'discrete'. These @ Do..\tln<JI
- data be ing terms refer to how the data were col
!iJr RadH
~ SisfMe
sed. f' Bubbia
lected. 'Continuous' data refers to data
Ibi Slo<i<
which were collected or sampled repeat
Clusttred Cc&mn. CompMes V¥.les o!CfOS~
edly. Examples include sound data col categories ,
Table S.4 also uses the term 'data series'. This term means the number of com
plete data sets, such as columns or rows of data, used to construct the chart or
graph. A single serie is a single column or row of data, such as the monthly bal
an ce of payments figures for Australia throughout one year. Multiple series are
several columns and rows of data, such as the monthly balance of payments figures
for Australia during each year from 1990 to 2000. Table 5.5 represents a multiple
series showing the different types of vehicles crossing an intersection during every
hour throughout the day.
Cars 275 322 157 102 112 132 127 105 124 251 276 14::
Vans 13 21 18 34 32 42 21 28 42 41 31 19
Trucks 7 15 16 21 18 21 19 16 13 21 7 4
Taxis 19 21 18 12 16 15 18 16 6 16 21 2
Buses 37 36 31 24 23 23 25 23 30 29 33 1:
Motorcycles 12 18 11 7 10 6 8 12 18 12 19
From figure 5. 10 it is clear that some chart types have problems displaying the
shown in table 5.5. The 3D column ch art hides almost as much data as it shQ\\
2D column chart looks cramped and crowded but it does display all the data.
chart separates each data series but lacks the 'solid colour' impact of the other .
also possible to lose lines where they are plotted ery close to each other.
JIID E.ie ~ 'jJew Insert Fgrmat I ools Qata ~ndow t1elp Aaokat
l.Q. w: g Ia ~ ~ I ~ ~ fi ~ . ~'6 ~ ~~ I, I~m- 100% ~
j Anal • 10 • I BI D 1-' If = ~ 1 $ % J ~oS .~g ~~ ~ 0 • & • I A • •1
0 24
1
2 300
3 250
4
250
5 200
6
7 200
8 150
9 150
10 100
11 100
12
13
50
.1!.
15
16 0
17 9 10 2 3 4 5 6 7 8 9 10 11
1!
18
19 300
20
21 250
22
23 200
24 150
25
26 100
27 j 50
28
29
30 0
31 2 3 4 5 6 7 8 9 10 11
32
Figure 5.10 Different types of charts used to analyse multiple series data
Comparing files
sing the diffe rences One role of analysing is to check the results of a processing task by comparing the
:en data files works processed data with the original data. Often the easiest way to do this is to compare the
n fixed length flies . original data file with the processed data file to detect any changes that have been
made. This type of analysis works best on fixed length data files where the information
system can directly compare individual data values. A word-processed document, for
exam ple, will change its length and the sequence of text characters as tex.t is added,
altered or deleted. An analysis that compares the original and the altered documents
can easily find the positi on in the processed document where the fi rst alteration was
made. It would then h ave problems matching up the text characters in the two files
after that point to detect all the other changes.
The simplest file comparisons that can be made involve comparing file lengths (in
characters or bytes), file creation and modification dates and other file properties as
shown in figure 5.1 1 .
.. ,1·}lIPul ~-_
~---
I~ -[----~----~.--~~------
B~k.,.__-----'--~-~========rl
I,/Ad,*=
~
Rollout Ldptops xis Properties D~
laptops.Hls I I
General S1.Il1mafy Statistics I Contents I Custom I
Worksh eet
~1oditi ed :
31 / 8/ 982 :19 AM
Size : 15KB
OK Cancel I !:lDP'.;
Every year the enterta inment and recording industries, amongst others, lose billior ~
of dollars in sales worldwide because of pirated and illegally cop ied images, mus :
and electronic book files. The trade in pirated data has proven very difficult to sto r:
Attempts to add copy protection features to data storage devices have always fali eG
becau se the pirates have eventually been able to overcome them or the consumer:
have refused to use them.
Recently, the focus has shifted to identifying pirated data files stored on t hE
Internet. Once an illegally copied me has been identified, the copyright owners arE
able to take legal action against the person or persons who placed it there or t h"
service provi der for storing the file on their system . The cou rt case that effective!
closed down Napster - for storing only a list of pirated MP3 music fil es on th eir
system - demonstrated how successful this approach can be . A number of syste m _
have been developed to identify pirated data files.
Image auributesC j
r Restricted use less dllrable
r DOllotcopy I
r e,duIt content rv )ieri/)/
------
OK Cancel Help A!1out ..
~
For Help. press Fl Ilma!)e: 11 52 x 664 " 16 Milion· 2.8 MB~",
Fig ure 5.12 Adding a digital watermark to an image
,S.ize:
0~ l
r
: : : ::
I FA •
Textules
Cle"le ~s
r '{eclOt
r Selecti<ln r. floaling
---
Help
Question
If, as some claim, the majority of illegal copying is by individuals download ing a
music fil e for their ow n use, then w ill such measures have an effe ct) Discuss .
THI NK
4 Can wildcard searches be used on numerical data? Explain why or Vi:
not.
5 How could charts and graphs be misused as data analysin g tools?
6 Why would data collection be a vital process in the construction of a
model or simulation?
INVESTIGATE
9 How do automated intelligent search agen ts work ?
(/
e
Appolntmeht
rei " /
Diary
Request
result
an the individual adva ntage comes with a price - an increased risk to privacy. Individually, separate
a abases themselves. databases may not contain much material that could threaten your privacy. For
example, your health fund database will contain details of all your medical expenses
during your membership of the fund. The database at your bank will contain details
of your financial transactions, loans, savings and investments. Your health fund
records will not contain any data about your bank accounts, and your bank records
will contain nothing about your medical history. There will probably be no reason
for the two databases to ever be linked together. The Tax Office database will con
t ain details about your employment and income history. Because membership of a
health fund can attract a tax rebate, it would also contain details about your health
fund membersh ip, bt t not your medical claims. Your bank is required to provide
the Tax Office with your account numbers, but not your fin ancial history. These
three databases are quite separate and cannot be linked except to confirm that you
have bank accounts that may be earning taxable interest and are a member of a
health fund that may entitle you to a tax rebate.
Consider the implications of an unethical organisation gaining access to your tax
records. The data stored there could be used to analyse your employment history
and income. From the tax database, they may also be able to access your records at
your health fund and your bank. From the analysis of this linked data, they may be
able to construct a profile of your financial position (for example, level of debt),
your state of health and your employment history. This profile could be used to
assess your fitness for a loan, an insurance policy or employment. This illegal
analysis could easily be conducted without your knowledge or consent. Because of
the way most information systems are linked together, it is even possible that it
could be conducted without the knowledge or consent of your bank, health fu nd
and the Tax Offic . Your privacy could easily be violated without anyone, least of aU
you, knowing about it.
information
simulations:
(c) requires fast computers and a lot (a) have completely replaced manual
of data storage space modelling and Simulation
Cd) can be used to give data systems
meaning and a purpose (b) are only as good as the data
3 Searching and sorting are two used to construct them
examples of: (c) require specialised hardware and
Ca) processing data into Information software applications
(b) 'what-if' analYSIS (d) need highly trained participants
(c) modellmg
to supervise the information
Cd) the analysing process
system
4 The clock speed of a computer
9 An 'exact match' search;
based Information system:
Ca) will always find the data you are
(a) determines whether the system looking for
will work to a set schedule (b) is used to search Image, audio
(b) can measure its analysing speed and video data
(e) sets t he data and time stamp on (c) allows the use of wildcard
all saved data files characters in the search key
(d) can be increased by adding more (d) can be used to search text and
parallel CPUs
numerical data
5 Secondary storage:
10 One method of increasing the
(a) stores data and software that are quantity of data that can be
not being used analysed by a computer-based
(b) allows data to be removed for Information system is to:
analysis (a) compress the data to reduce ItS
(c) Is used as an emergency backup size
for the data analYSing process (b) add extra CPUs to the system
(d) Is used to control the data (c) increase the speed at which the
analysis process data are collected and organised
(d) reduce the data collection errors
19 A sort that places data Items into a sequence with the last item placed first is
commonly known as a(n) order sort.
20 'Raw facts without any clear meaning or purpose' is a definition of _ _ _ __
Match the terms
21 COLUIVIN A - TERMS COLUMN B - MEANINGS
1 simulation A text characters used to locate data
2 primary storage B uses a special symbol to represent other
characters
3 logical operator C graphical display of numerical data
4 data D a descnption of a system
5 information E a single set of data values, usually a column
or a row
6 search key F produced by an analysis of data
7 data senes G used to combine two or more searches
8 chart H using a model to observe the effects of
altering the data
9 wildcard I read only memory
10 model J raw facts with no clear meaning or purpose