Sie sind auf Seite 1von 5

Faculty of Science and Technology

ITECH1103
Big Data and Analytics

Tutorial Week 1

Review Questions

1. Fill in the following table as you discuss what data is collected by the different organisation
types, and how this data is used regarding individuals and groups.

Organization Type Data collected for What information can What information can
individual be gained from this data be gained from this data
about individuals? about groups?
Government Name, age , Address, Name, DOB, Address, Age group ,nationality.
Departments nationality, Character contact number, gender,
certificate, contact ID proof, medical history.
number, gender, ID proof,
medical history.

Banking / Finance Name ,age ,country Name , id proof , address, Purpose , feedbacks
,address, purpose to visit purpose
bank

Retail Card details , mode of Amount of items, card Mode of payment,


payment, interest in details interested brands
brands

Education Sector Name, age , Address, Name, age , Address, Age group
nationality, Character nationality, Character ,gender,location
certificate, contact certificate, contact
number, gender, ID proof, number, gender, ID proof,
disability if any, disability if any,
educational background educational background
Search Engines Email Email, location, previous Email, location, previous
address,Location,feedback search , interest search , interest

Social networking sites Name, age , Address, Name, age , Address, Number of members in
nationality, contact nationality, area of interest group, last availability
number, gender. contact number,
gender,hobbies

2. What is data redundancy, and which characteristics of the file system can lead to it?

Ans :-Data redundancy is the reiteration or overabundance of data. Data redundancy is a common issue in
computer data storage and database systems. In other words, it is a condition created within a database or
data storage technology in which the same piece of data is held in two separate places. This can occur by
accident, but is also done deliberately for backup and recovery purposes. The organizational structure of a
file system and poorly design databases can lead to data redundancy.

CRICOS Provider No. 00103D ITECH 1103 Tutorial 1 Page 1 of 2


3. What is data independence, and why is it lacking in file systems.
Ans :- Data independence is changing the data storage characteristics without affecting the program's
ability to access the data. It lacks in file systems because it holds no practical significance to the
logical data format or the physical data format.

4. What is a DBMS and what are its functions?


Ans:- DBMS stands for Database management system. It enables the formation and maintenance of
Databases. In other words, it is collection of programs that manages the database structure.
Functions of Database are: -
1. It helps us in describing the database or collection of data base.
2. It leverages a suite of tools to configure, provision, archive and report storage activities.
3. It helps in data transformation and presentation.
4. It also helps in providing security to databases
5. It provides multiuser access control
6. It also provides backup and recovery management.
7. It is also used to provide decision support and transaction processing

5. What is structural independence, and why is it important?


Ans: - Structural independence exists when it is possible to make changes in the filestructure without
affecting the application programs ability to access the data. It is important because without it any changes
such as adding a field would render applications which access the new file structure inoperable.

6. Explain the difference between data and information?


Ans: - Data is the basis of all information systems. It is also called as raw data. But on the other hand
processed data is Information. Data can be any character, text, word, number, and, if not put into context,
means little or nothing to a human. However, information is data formatted in a manner that allows it to be
utilized by human beings in some significant way.

Example of Data: UT, 1234, Joe, Circle, SLC, 8015553211, 84084, Smith

Example of information: Joe Smith 1234 Circle, Salt Lake City, UT 84084

7. What is metadata in the context of a database system?


Ans: - Metadata is the data that describes us the information about the other data. Meta is a prefix that in
most information technology usages means "an underlying definition or description. Metadata summarizes
basic information about data, which can make finding and working with particular instances of data easier.

CRICOS Provider No. 00103D ITECH 1103 Tutorial 1 Page 2 of 2


Portfolio Questions

Figure 1

1. Given the file structure shown in the Figure 1, answer the following questions:

a) How many records does the file contain? How many fields are there per record?
Ans:- The file contains 7 records and 5 fields .

b) What problem would you encounter if you wanted to produce a listing by city? How would
you solve this problem by altering the file structure?
Ans:- The city names are contained within the Manager address attribute and
decomposing this character field at the application level is unmanageable. It is difficult to
understand the query and it becomes much more difficult to write and take longer to
execute when internal string searches must be conducted. If theability to produce city
listings is important, it is best to store the city name as a separate attribute.

c) If you wanted to produce a listing of the file contents by last name, area code, city, state,
or zip code, how would you alter the file structure?
Ans :- The more we divide the address into its component parts, the greater its
information capabilities. For e x a m p l e , b y d i v i d i n g M A N A G E R _ A D D R E S S
i n t o i t s c o m p o n e n t p a r t s ( M G R _ S T R E E T , MGR_CITY, MGR_STATE, and
MGR_ZIP) , we gain the ability to easily select records on the basis of zip codes,
city names, and states. Similarly, by subdividing the MANAGER name into its
components MGR_LASTNAME, MGR_FIRSTNAME, and MGR_INITIAL, we gain the
ability to produce more efficient searches and listings. For example, creating a phone
directory is easy when you can sort by last name, first name, and initial. Finally,
separating the area code and the phone number will yield the ability to efficiently
group data by area codes. Thus MGR_PHONE might be decomposed
into MGR_AREA_CODE and MGR_PHONE. The more you decompose the data into
their component parts, the greater the search flexibility.

d) What data redundancies do you detect? How could those redundancies lead to anomalies?
Ans :- Note that the manager named Holly B. Parker occurs three times, indicating that she
manages three projects coded 21-5Z, 25-9T, and 29-2D, respectively. (The occurrences
indicate that there is a 1:M Relationship between PROJECT and MANAGER: each project
is managed by only one manager but, apparently, a manager may manage more than one
project.) Ms. Parker's phone number and address also occur three times. If Ms. Parker
moves and/or changes her phone number, these changes must be made more than once
and they must all be made correctly... without missing a single occurrence. If any
occurrence is missed during the change, the data are "different" for the same person.
Aftersome time, it may become difficult to determine what the correct data are. In
CRICOS Provider No. 00103D ITECH 1103 Tutorial 1 Page 3 of 2
addition, multiple occurrences invite misspellings and digit transpositions, thus
producing the same anomalies. The same problems exist for the multiple occurrences of
George F. Dort’s.

Figure 2

2. Given the file structure shown in the Figure 2, answer the following questions.
a) Identify and discuss the serious data redundancy problems exhibited by the file structure
shown in Figure 2.
Ans :- In the given table there are many data redundancy exists in file stricture . For example, if
the charge for JOB_CODE = EE changes from $85.00 to $90.00, that change must be made twice.
Also, if employee June H. Sattlemeier is deleted from the file, you also lose information about
the existence of her JOB_CODE = EE, its hourly charge of $85.00, and the PROJ_HOURS = 17.5.
The loss of the PROJ_HOURS value will ultimately mean that the Coast project costs are not
being charged properly, thus causing a loss of PROJ_HOURS*JOB_CHG_HOUR = 17.5 x $85.00 =
$1,487.50 to the company. Incidentally, note that the file contains different JOB_CHG_HOUR
values for the same CT job code, thus illustrating the effect of changes in the hourly charge rate
over time. The file structure appears to represent transactions that charge project hours to each
project. However, the structure of this file makes it difficult to avoid update anomalies and it is
not possible to determine whether a charge change is accurately reflected in each record.
Ideally, a change in the hourly charge rate would be made in only one place and this change
would then be passed on to the transaction based on the hourly charge. Such a structural
change would ensure the historical accuracy of the transactions. You might want to emphasize
that the recommended changes require a lot of work in a file system

b) Looking at the EMP_NAME and EMP_PHONE contents in Figure 2, what changes would you
recommend?
Ans:- Separate the EMP_NAME into its components EMP_FirstNAME, and EMP_LastNAME. This
change will make it much easier to organize employee data through the employee name
component. Similarly, the EMP_PHONE data should be categories into EMP_AREACODE and
EMP_PHONE. For example, breaking up the phone number 653-234-3245 into the area code 653
and the phone number 234-3245 will make it much easier to organize the phone numbers by area
code. (If you want to print an employee phone directory, the more atomic employee name data
will make the job much easier.)

CRICOS Provider No. 00103D ITECH 1103 Tutorial 1 Page 4 of 2


c) Identify the different data sources in the file you examined in Problem 2a).
Ans :- The purpose of a data source is to gather all of the technical information needed to
access the data — the driver name, network address, network software, and so on — into a
single place and hide it from the user.

d) Given your answer to Problem 2c), what new files should you create to help eliminate the
data redundancies found in the file shown in Figure 2?

Ans :- By creating a database containing the PROJECT,EMPLOYEE,JOB,CHARGE ,it will help


to eliminate data redundancy

CRICOS Provider No. 00103D ITECH 1103 Tutorial 1 Page 5 of 2

Das könnte Ihnen auch gefallen