Sie sind auf Seite 1von 41

LECTURE 3:

Representing Numerical Data


CIS510
IS/IT Architectures
Fall 2015
Instructor: Dr. Song Xing
Department of Information Systems
California State University, Los Angeles

Learning Objectives
Describe how data is represented and
stored within computer hardware
Describe how nonnumeric data is
represented
Describe how simple data types are
used as building blocks to create more
complex data structures (e.g., arrays,
records)
2

Outline
Goals of data representation
Primitive data types

Integer
Real number
Character
Boolean
Memory address

Data structures
Arrays and lists
Records and files
Classes and objects
3

Goals of Computer Data


Representation

Compactness
Accuracy
Range
Ease of manipulation
Standardization

Goals of Computer Data


Representation (Cont.)
Compactness
Describes number of bits used to represent
a numeric value
More compact data representation format;
less expense to implement in computer
hardware

Accuracy
Precision of representation increases with
number of data bits used
5

Goals of Computer Data


Representation (Cont.)
Ease of manipulation
Machine efficiency when executing processor
instructions (addition, subtraction, equality
comparison)
Processor efficiency depends on its complexity

Standardization
Ensures correct and efficient data transmission
Provides flexibility to combine hardware from
different vendors with minimal data
communication problems

Data Stored in Memory


Computers
Process and store all forms of data in binary format

Outline
Goals of data representation
Primitive data types

Integer
Real number
Character
Boolean
Memory address

Data structures
Arrays and lists
Records and files
Classes and objects
8

Primitive Data Types

Five Primitive Data Types:


1. Integer

Twos Complement Notation


2. Real number

Floating Point Notation


3. Character
4. Boolean
5. Memory address

Representation format for each type


balances compactness, accuracy, ease of
manipulation, and standardization
9

Outline
Goals of data representation
Primitive data types
Integer

Twos Complement Notation

Real number
Character
Boolean
Memory address

Data structures
Arrays and lists
Records and files
Classes and objects
10

Data Types: Integer


An integer is a whole number (For example: 3, 5, 16).
Integers can be signed or unsigned.
Unsigned whole number or integer , e.g.,
1111 1111 = +255
A signed integer uses one bit to represent the sign.

Signed integers can be represented as a combination


of
Value or magnitude
Sign (plus or minus)

Signed integer format


Excess notation
Twos complement notation (most common)
11

Data Range for Integer Numbers


It is determined by the number of bits to
represent the data.
Unsigned integer format (all considered
positive ones)
0 to 2# of bits -1
Signed integer format (left-most bit is
used for sign bit)
-2# of bits - 1 to +(2# of bits - 1 -1)
12

Unsigned Integers: Binary and


BCD
BCD (binary-coded decimal)
The number is stored as a digit-by-digit binary representation
of the original decimal integer
4 bits per digit
Decimal

Binary

68

BCD

= 0100 0100

= 0110

1000

= 26 + 22 = 64 + 4 = 68

= 22 + 21 = 6

23 = 8

99
(largest 8-bit
BCD)

= 0110 0011

= 1001

1001

+ + =
=
= 64 + 32 + 2 + 1 = 99

=
=

255
(largest 8-bit
binary)

= 1111 1111

= 0010

= 28 1 = 255

= 21
= 2

26 +

25

21

20

23

+
9

20

23 + 20
9
0101

22 + 20
5

0101
22 + 20
5

Value Range: Binary vs. BCD


BCD range of values < conventional binary
representation
Binary: 4 bits can hold 16 different values (0 to 15)
BCD: 4 bits can hold only 10 different values (0 to 9)
No. of Bits

BCD Range

Binary Range

0-9

1 digit

0-15

1+ digit

0-99

2 digits

0-255

2+ digits

12

0-999

3 digits

0-4,095

3+ digits

16

0-9,999

4 digits

0-65,535

4+ digits

20

0-99,999

5 digits

0-1 million

6 digits

24

0-999,999

6 digits

0-16 million

7+ digits

32

0-99,999,999

8 digits

0-4 billion

9+ digits

64

0-(1016-1)

16 digits

0-16 quintillion

19+ digits

Conventional Binary vs. BCD


Binary representation generally preferred
Greater range of value for given number of bits
Calculations easier

BCD often used in business applications to


maintain decimal rounding and decimal

precision
Support by business-oriented languages like
COBOL

Signed Integers
Sign-and-magnitude representation
Excess-N notation
2s complement (most common)

16

Signed Integer: Sign-andMagnitude


Use left-most bit for sign
0 = plus; 1 = minus

Total range of integers


Half of integers positive; half negative

Example using 8 bits:


Unsigned: 1111 1111 = +255
Signed:
0111 1111 = +127
1111 1111 = -127
Note: 2 values for 0:
+0 (0000 0000) and -0 (1000 0000)

Sign-and-magnitude algorithms complex and


difficult to implement in hardware

Signed Integer: Excess Notation


Can be used to represent signed integers.
Divides a range of ordinary binary numbers in half;
uses lower half for negative values and upper half for
nonnegative values.
Always uses the leftmost bit representing the sign (1
for nonnegative and 0 for negative values).
Excess-N notation:
Pick the middle value N of the ordinary binary number as
offset, for example, 8 when the ordinary binary number can
take on values 0 to 15, and declare that offset to correspond
to the signed integer 0, then every value lower the offset will
be negative and those above will be positive.
18

Excess-8 Notation
How to represent?
Add the offset to the
decimal value of the
signed integer to
represent it.
For example, 0 is
represented by 0+8
= 8 = 1000, -5 is
represented by -5+8
= 3 = 0011, and +5 is
represented by 5+8
= 13 = 1101
19

Complementary Representation
Sign of the number does not have to be
handled separately
Consistent for all different signed
combinations of input numbers

10

Signed Integer: Twos Complement


Use left-most bit for sign
0 = plus; 1 = minus, i.e.,

Numbers beginning with 0 are positive


Numbers beginning with 1 are negative

Nonnegative integer values are represented


as ordinary binary numbers.
Negative integer values are represented
using
Complement of positive value + 1
The complement of a number is formed by
changing 1s to 0s and 0s to 1s .
21

Twos Complement
Most common use in computers.
A fixed number of bit positions are used.
Consistent for all different signed combinations of
input numbers
Subtraction can be performed as addition of a negative
value: a - b = a + (-b)
Numbers
Representation method
Range of decimal
numbers
Calculation
Representation
example

Negative

Positive

Complement

Number itself

-12810

-110

Inversion+1
10000000

11111111

+010

12710
None

00000000

01111111
22

11

Addition and Subtraction


Addition:
Add 2 positive 8-bit numbers:
45+58

Subtraction: converted to
addition
add 2 8-bit numbers with
different signs:
45 - 58 = 45 + (-58) = -13

Take the 2s complement of 58


to obtain 58 (i.e., invert +1)
1. 58 = 0011 1010
2. Invert of 0011 1010
= 1100 0101
3. 1100 0101 + 1 = 1100 0101 +
0000 0001 = 1100 0110
Thus, -58 = 1100 0110

0010 1101 =

45

+0011 1010 =
0110 0111 =

+58
103

0010 1101 =
+1100 0110 =
1111 0011 =

45
58
13

23

How to Verify the Decimal Value of


a 2s Complement Notation?
Apply the 2s complement conversion
procedure again to the 2s complement
notation, you will get its decimal value.
In the last example, the result is 11110011.
Since the sign bit is 1, it is a negative number.
Then,
11110011 = - (invert of 11110011 + 1)
= - (00001100+00000001)
= -00001101
= -13
24

12

Overflow and Carry Conditions


Carry flag: set when the result of an
addition or subtraction exceeds fixed
number of bits allocated
Overflow: result of addition or
subtraction overflows into the sign bit

Integer Overflow
Fixed word size has a fixed range size.
Occurs when absolute value of a
computational result contains too many bits
to fit into fixed-width data format
i.e., the value is too large to be stored.

Complementary arithmetic: numbers out


of range have the opposite sign.
Test: If both inputs to an addition have the
same sign and the output sign is different,
an overflow occurred.
26

13

Example of Overflow
8-bit signed number
256 different values
Data range:
-128 to +127

0100 0000 =

64

+ 0100 0001 =

+65

1000 0001

-127

Add
2 positive inputs
produced negative
result
overflow!
Wrong answer!

0111 1111
Invert +1 to
get magnitude

12710

Condition: result of addition or subtraction overflows


into the sign bit.
Avoid overflow: increase number of bits representing
the data
Use double precision data format;
Careful programming.
27

Avoiding Overflow
Overflow can be avoided by increasing number of
bits representing the data.
On the previous example, if we add one more bit, the
new data range will be from -256 to +255 for 9-bit
signed numbers.
Each number is represented by 9-bit notation.
2 positive inputs
0 0100 0000 =
produced positive
+ 0 0100 0001 =
result
0 1000 0001
Right answer!
01000001=+12910

64
+65
+129

28

14

Outline
Goals of data representation
Primitive data types
Integer
Real number

Floating Point Notation

Character
Boolean
Memory address

Data structures
Arrays and lists
Records and files
Classes and objects

29

Data Types: Real Numbers


A real number can contain both whole
and fractional components
The whole portion appears to the left of the
radix point
The fractional portion appears to the right
of the radix point

Representation formats:
Fixed radix point format
Floating point notation (commonly used)
30

15

Real Number: Fixed Radix Point


Format

Problem: numeric range is limited.


31

Real Number: Floating Point


Notation
Used in computer when the number
Is outside the integer range of the computer (too
large or too small)
Contains a decimal fraction

Trades numeric range for accuracy


Value can have many digits of precision for large
or small magnitudes, but not both simultaneously

Similar to scientific notation, except that 2 is


the base
value = mantissa x 2exponent
32

16

Scientific Notation of Decimal


Number
Sign of the mantissa

Sign of the exponent

-6.35790 x 10-6
Location
of decimal
point

Mantissa

Base

Exponent

33

Scientific Notation of Binary


Number
Sign of the mantissa

Sign of the exponent

-1.110010 x 2-6
Location
of radix
point

Mantissa

Base

Exponent

34

17

Floating Point in the Computer:


IEEE 754 Single-precision Format
32-digit format
Sign of the mantissa
SEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMM

8-digit Exponent

23-digit Mantissa

35

IEEE 754 Floating Point Format


IEEE standard 754 defines formats for
floating-point data.

36

18

Convert Numbers to IEEE 754


Floating Point Format
1. Convert the decimal or Hexadecimal
number to binary number.
2. Convert binary number into scientific
notation format.

Provide number with exponent (0 if not yet


specified).
Increase/decrease exponent to shift radix point
to the position behind the first 1.
Scientific Notation Format:
+/- 1.MMMMM x 2+/-EE

37

Convert Numbers to IEEE 754


Floating Point Format (Cont.)
3. Recall Scientific Notation Format: +/- 1.MMMMM x 2+/-EE
Corresponding 32-bit IEEE 754 format:
8-bit exponent: excess-127 notation

Exponent = 127 + (+/- EE)

23 bits of mantissa: preceded by a binary 1 and the radix


point.

Extends the precision of the mantissa to 24 bits, although only


23 actually are stored.

38

19

More about Exponent


The 8-bit exponent is formatted using Excess127 notation.
Allow an exponent range of 2-127 to 2+128

Excess-N notation: simpler than


Complementary notation
Pick middle value as offset where N is the middle
value, add the offset to the exponent to store it.
Representation in Decimal
Representation in Binary

127

128

255

00000000

01111111

10000000

11111111

Exponent being represented

-127

Increasing value

128
+
39

Conversion Example: Decimal to


IEEE 754 Format
Convert -253.7510 to binary floating point form
Convert to binary: -11111101.11 or
equivalent
-1.111110111 x 27
Mantissa = 11111011100000000000000
Exponent = 127+7 = 13410 = 10000110
IEEE 754 32-bit representation:

40

20

Conversion Example: IEEE 754


Floating Point to Decimal
IEEE 754 Floating Point Format
Sign Excess-127
Exponent
0

1000 0000

Mantissa
1011 0000 0000 0000 0000 000
= +1.1011 x 21 = +11.0 11 = 3.37510

1000 0011

1000 0111 0000 0000 0000 000


= -1.10000111 x 24 = -11000.0111 = - 24.437510

0111 1101

0101 0000 0000 0000 0000 000


= -1.0101 x 2-2 = -0.010101 = - 0.328110
41

Real Number Overflow and


Underflow
Overflow: Possible for the number of
magnitude is too large to be stored.
Underflow: the fraction of magnitude is too
small for representation.

21

Precision and Truncation


Precision
Accuracy is reduced as the number of digits
available to store mantissa is reduced

Truncation
Stores numeric value in the mantissa until
available bits are consumed; discards
remaining bits
Causes an error or approximation which can
magnify
Avoid by using integer types
43

Processing Complexity
Floating point formats
Optimized for processing efficiency
Require complex processing circuitry
(translates to difference in speed)

Programmers never use real numbers


when an integer will suffice (speed and
accuracy)

44

22

Programming Considerations
Integer advantages

Easier for computer to perform


Potential for higher precision
Faster to execute
Fewer storage locations to save time and
space

Most high-level languages provide 2 or


more formats
Short integer (16 bits)
Long integer (64 bits)
5-45

Programming Considerations
Real numbers
Variable or constant has fractional part
Numbers take on very large or very
small values outside integer range
Program should use least precision
sufficient for the task

5-46

23

Outline
Goals of data representation
Primitive data types
Integer
Real number
Character

ASCII and Unicode

Boolean
Memory address

Data structures
Arrays and lists
Records and files
Classes and objects

47

Data Types: Character


Computer uses numbers. They store characters by
assigning a number for each one.
An individual symbol is a character.
Characters grouped together form a string.

Represented indirectly by defining a table that


assigns numeric values to individual characters.
Character data can only be represented in the
computer system using a coding scheme.
All users must share same coding/decoding method.
Coded values must be capable of being stored or
transmitted.
A coding method represents a tradeoff among compactness,
range, ease of manipulation, accuracy, and standardization
48

24

Common Coding Schemes


ASCII (American Standard Code for
Information Interchange, pronounced
as-key): the original coding scheme;
subset of Unicode
Unicode: developed for worldwide use

49

American Standard Code of


Information Interchange (ASCII)
Developed by ANSI (American National
Standards Institute)
Represents
Latin alphabet (lowercase & uppercase English
letters), Arabic numerals, standard punctuation
characters
Plus small set of accents and other European
special characters

It has 128 characters, each stored as a 7-bit


number.
7-bit code: 27 = 128 characters

Also includes device control codes.


50

25

ASCII Control Codes (Partial


Listing)

51

ASCII Reference Table


MSD
LSD

NUL

DLE

space

SOH

DC1

STX

DC2

ETX

DC3

EOT

DC4

ENQ

NAK

ACJ

SYN

&

BEL

ETB

BS

CAN

HT

EM

LF

SUB

VT

ESC

FF

FS

<

CR

GS

SO

RS

>

SI

US

DEL

7416
111 0100

52

26

ASCII Reference Table (Cont.)


An ASCII encoding is a combined MSD (most
significant digits) and LSD (least significant
digits).
MSD

LSD

MSD and LSD are represented by


hexadecimal numbers in ASCII table.
For example, t will be encoded as
hexadecimal numbers 74, which is 111 0100
in binary.
7 = 111, 4 = 0100
53

ASCII Data Encoding Example


Original message:
Sam, what time is the meeting with accounting?
Hannah.

ASCII strings:
1010011 1100001 1101101
S
a
m

Hexadecimal encoding:
53

61

6D

Corresponding decimal stream:


83

97

109
54

27

ASCII Limitations
Insufficient range
Uses 7-bit code, providing 128 table
entries

33 for device control


95 printable characters can be represented

English-based

55

Unicode
The prevalent code today is Unicode.
Multilingual character encoding standard
encompassing all of the worlds written
languages.
Defines codes for
Nearly every character-based alphabet
Large set of ideographs for Chinese, Japanese
and Korean
Composite characters for vowels and syllabic
clusters required by some languages

56

28

Unicode (cont.)
Unicode was originally a 2-byte character set.
Each character is coded using 16 bit binary
strings.
65,535, or 216 characters are represented.
ASCII as a subset

Unicode version 3 is a 4-byte code and is


fully compatible with ASCII.
Each character is coded using 32 bit binary
strings.
4,294,967,296, or 232 characters are represented.
57

Outline
Goals of data representation
Primitive data types

Integer
Real number
Character
Boolean
Memory address

Data structures
Arrays and lists
Records and files
Classes and objects
58

29

Data Types: Boolean Data


Two data values true and false.
Potentially most concise coding format;
Data is represented using a single bit.
1 bit in size
Binary 1 can represent true and binary 0
can represent false.

59

Boolean Variables in Java


boolean done = false;
/* true and false are the only valid
values for a boolean type in Java */
boolean isSixBigger = (6 > 5);
// Value of isSixBigger would be true
boolean overtime = (hours > 40);
/* Value of isSixBigger would be false
if the hours variable is not greater than 40 */
60

30

Outline
Goals of data representation
Primitive data types

Integer
Real number
Character
Boolean
Memory address

Data structures
Arrays and lists
Records and files
Classes and objects
61

Data Types: Memory Addresses

Flat memory model and Segmented memory


model.
Flat memory model:

Memory addresses can be represented using a


single integer ranged from 0 to the maximum
address -1.
Simple interface for programmers
Maximum execution speed
The vast majority of processor architectures
implement a flat memory design with advanced
memory management and protection technology.

62

31

Data Types: Memory Addresses


(Cont.)

Segmented memory model:

Used with earlier generations of processors.


Memory is divided into a series of equal-sized segments
called pages.
Memory addresses require two integers: segment:offset

Segment identifies the page, offset identifies the byte within the
page.

Suitable for multitasking, general operating system design,


resource protection and allocation.
Suitable for virtual memory implementation.
complex to program, difficult for compilers
Intel Pentium and Core 2 processors maintains backward
compatibility with earlier generations of CPUs using
segmented memory model.
63

Outline
Goals of data representation
Primitive data types

Integer
Real number
Character
Boolean
Memory address

Data structures
Arrays and lists
Records and files
Classes and objects
64

32

Data Structures
Related groups of primitive data elements
organized for a type of common processing
Defined and manipulated within software
Commonly used data structures: arrays,
linked lists, records, tables, files, indices, and
objects
Many use pointers to link primitive data
components

65

Outline
Goals of data representation
Primitive data types

Integer
Real number
Character
Boolean
Memory address

Data structures
Arrays and lists
Records and files
Classes and objects
66

33

Arrays and Lists


A list is a set of
related data
values.
An array is an
ordered list in
which each
element can be
referenced by
an index to its
position.
67

Character String Stored Within an


Array
An address is the location of some data
element within a storage device.

68

34

Linked Lists
A pointer is a data
element that contains
the address of another
data element.
A linked list is a data
structure that uses
pointers so list
elements can be
scattered among nonsequential storage
locations
Easier to expand or
shrink than an array
69

Character String Stored Within a


Linked List

70

35

Add a New Element Into a Linked


List: Easy

Adding a new
element to the existing
linked list is easy:
1. Allocate an empty
storage unit for the new
element C;
2. Copy the pointer from
the element B which is
preceding C into the
pointer field of C;
3. Make a new pointer
connecting B and C.

71

Add a New Element Into an Array:


Hard

Inserting a new element to an existing array is not easy:


1. Allocate an empty storage location to the end of the array;
2. For EACH element past the inserting point, copy the element value
to the next storage location, starting with the last element and working
backward to the insertion point;
3. Write the new element value in the storage location at the inserting
72
point.

36

Outline
Goals of data representation
Primitive data types

Integer
Real number
Character
Boolean
Memory address

Data structures
Arrays and lists
Records and files
Classes and objects
73

Records and Files/Databases


Records
Data structures composed of other data structures
or primitive data elements.
Used as a unit of input and output to files or
databases.

Files or Databases
Sequence of records on secondary storage.
74

37

Two Methods of Organizing Files


Sequential files
Stores records in contiguous storage locations.
Suffers the same problems as contiguous arrays when
records are being inserted and deleted. And the procedure is
even less efficient for files than arrays because of the
relatively large size of the records that must be moved or
copied.

Indexed files
Records will not be stored in contiguous storage locations.
Uses an index which is an array of pointers to records.
Efficient record insertion, deletion, and retrieval. Each time
upon records change, the index needs updating. But as the
index is a small array, it is fast to update.

75

An Indexed File of Records Ordered by


Ascending Account Number

76

38

Outline
Goals of data representation
Primitive data types

Integer
Real number
Character
Boolean
Memory address

Data structures
Arrays and lists
Records and files
Classes and objects
77

Classes and Objects


Classes
Data structures that contain traditional data
elements and programs that manipulate
that data

The programs in a class are methods.

Combine related data items and extend the


record to include methods that manipulate
the data items

Objects
One instance, or variable, of the class
78

39

A Class Example

79

Summary
Understanding data representation is
key to understanding hardware and
software technology
All data, including nonnumeric data, are
represented within a modern computer
system as strings of binary digits, or
bits.
Each bit string has a specific data
format and coding method.
80

40

Summary (Cont.)
Numeric data is stored using integer, real
number, and floating point formats.
Characters are converted to numbers by
means of a coding table.
Boolean values can have only two values,
true and false.
Data structures are used by programs to
define and manipulate data in larger and
more complex units than primitive CPU data
types.
81

41

Das könnte Ihnen auch gefallen