Sie sind auf Seite 1von 11

Field Data Types

Once you have identified the tables and the fields for your database, the next step is to determine each fields data type. With any relational database management system, you need to define what kind of information each field will contain. In most relational database management systems, there are three primary categories of field types: 1. Text 2. Numbers 3. Dates and Times Within each of these primary categories, there are variations of these categories, some of which may be specific to individual RDMSs. I will highlight particular differences as they arise. It is important to give careful thought and consideration to field types because they dictate what information can be stored and how it is stored which may affect database performance.

MySQL Data Types


Below is a list of data types for MySQL (adapted from Ullman L. MySQL. A visual quickstart guide. Peachpit Press: Berkeley. 2003.): Note: The square brackets [] indicate an optional parameter to be put in parentheses, while parentheses () indicate required arguments. Type CHAR[Length] VARCHAR(Length) TINYTEXT TEXT MEDIUMTEXT LONGTEXT TINYINT[Length] SMALLINT[Length] MEDIUMINT[Length] Size Length bytes String length + 1 bytes String length + 1 bytes String length + 2 bytes String length + 3 bytes String length + 4 bytes 1 byte 2 bytes 3 bytes Description A fixed-length field from 0 to 255 characters long. A fixed-length field from 0 to 255 characters long. A string with a maximum length of 255 characters A string with a maximum length of 65,535 characters. A string with a maximum length of 16,777,215 characters. A string with a maximum length of 4,294,967,295 characters. Range of -128 to 127 or 0 to 255 unsigned. Range of -32,768 to 32,767 or 0 to 65,535 unsigned. Range of -8,388,608 to 8,388,607 or 0 to 16,777,215 unsigned

INT[Length] BIGINT[Length] FLOAT DOUBLE[Length, Decimals] DECIMAL[Length, Decimals] DATE DATETIME TIMESTAMP TIME ENUM SET

4 bytes 8 bytes 4 bytes 8 bytes Length +1 bytes or Length + 2 bytes 3 bytes 8 bytes 4 bytes 3 bytes 1 or 2 bytes 1, 2, 3, 4, or 8 bytes

Range of -2,147,483,648 to 2,147,483,647 or 0 to 4,294,967,295 unsigned Range of -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 or 0 to 18,446,744,073,709,551,615 unsigned A small number with a floating decimal point. A large number with a floating decimal point. A DOUBLE stored as a string, allowing for a fixed decimal point. In the format YYYY-MM-DD In the format YYYY-MM-DD HH:MM:SS. In the format YYYYMMDDHHMMSS; acceptable range ends in the year 2037. In the format of HH:MM:SS. Short for enumeration, that is, each column can have one of several possible values. Like ENUM except that each column can have more than one of several possible values.

Microsoft Access Data Types


Below is a list of data types for Microsoft Access (adapted from Schwartz S. Microsoft Office Access 2003. A visual quickstart guide. Peachpit Press: Berkeley 2004). In Microsoft Access, each field can be one of the following data types: 1. Text: a Text field can store any combination of typed characters: letters, numbers, and punctuation. 2. Memo: a Memo field is an extra large Text field (up to 65,535 characters). When sorting on a Memo field, Access only considers the first 255 characters. 3. Number: a Number field is used to store most types of numeric data (with the exception of monetary amounts which can be stored using the Currency field type). 4. Currency: this data type has been designed specifically to store currency amounts and prevent rounding errors. 5. Auto Number: the Auto Number data type is used to automatically assign a new number to a primary key field, increasing the previously assigned number by one. Data stored in an Auto Number field cannot be edited. 2

6. Date/Time: a Date/Time field is used to store dates or times. A specific format to display dates and times can be selected. 7. Yes/No: a Yes/No field is the Access data type for recording one of two opposing values. Yes/No fields can be formatted as Yes/No, True/False, or On/Off (although all are equivalent). Every Yes/No field is formatted as a single check box; checked is Yes, True, or On whereas unchecked is No, False, or Off. 8. OLE Object: Object Linking and Embedding (OLE) Object data types enable you to embed or link to documents created in other programs, such as worksheets created in MS Excel, images (eg, gif, or jpg files), or word processing files (eg, doc files); one can either embed the object in the Access data file or link the object to the database, thereby storing a pointer or a reference to the original document. OLE is the technology which allows an object (such as a spreadsheet) to be embedded (and/or linked) inside of another document (like a word processor document). 9. Hyperlink: the Hyperlink data type enables you to store a clickable address in the field. For example: a. http:// b. mailto: c. ftp:// 10. Lookup Wizard: a Lookup Wizard field type enables a field to display a drop-down list of values from which the user can choose; this list of values can come from another table, a query, or event he same table.

A Review of Bits and Bytes


(This section has been adapted from Marshall Brains article on How Bits and Bytes Work from http://www.howstuffworks.com.) In order to understand a little about field types and their bytes, its important to review briefly what bytes (and bits) are. Lets first review the concept of decimal numbers. I think everyone is familiar with decimal places and digits of a decimal number. Each digit can range from 0 to 9, that is, ten possible values. For example, we know that the number 1,488 has four digits with four decimal places: 8 ones, 8 tens, 4 hundreds, and 1 thousands. So the number 1,488 could be expressed as: (1*1000) + (4*100) + (8*10) + (8*1) = 1000 + 400 + 80 + 8 = 1,488 Similarly, we could express the same number in terms of powers of 10: (1*10^3) + (4*10^2) + (8*10^1) + (8*10^0) = 1000 + 400 + 80 + 8 = 1,488 Recall, anything to the zero power is equal to 1. Bits and bytes could be viewed in a similar way. Computers use the base-2 system known as the binary number system just as the base-10 number system is known as the decimal number system. Each binary digit (also called a bit for Binary digIT) can only take on one of two

values, 1 or 0, instead of 0 to 9 like the decimal system. So the binary number 1100 would represent: (1*2^3) + (1*2^2) + (0*2^1) + (0*2^0) = 12 This time we use base of 2 instead of base of 10. A collection of 8 bits is known as one byte. With 8 bits in a byte, one can represent 256 values ranging from 0 to 255: 00000000 = (0*2^7) + (0*2^6) + (0*2^5) + (0*2^4) + (0*2^3) + (0*2^2) + (0*2^1) + (0*2^0) =0 11111111 = (1*2^7) + (1*2^6) + (1*2^5) + (1*2^4) + (1*2^3) + (1*2^2) + (1*2^1) + (1*2^0) = 255 Three bytes (24 bits) can then represent a number ranging from 0 to 16,777,215. 000000000000000000000000 = (0*2^23) + (0*2^22) + (0*2^21) + + (0*2^4) + (0*2^3) + (0*2^2) + (0*2^1) + (0*2^0) =0 111111111111111111111111 = (1*2^23) + (1*2^22) + (1*2^21) + + (1*2^4) + (1*2^3) + (1*2^2) + (1*2^1) + (1*2^0) = 16,777,215 Bytes are used to hold individual characters in a text document. For example, 00100000 is equal to 32 which is the numeric code for space. So text characters are coded as numbers which are stored as bytes in a computer file. The computer usually stores each character in 1 byte. TEXT Types CHAR -> A fixed section from 0 to 255 characters long. VARCHAR-> A variable section from 0 to 255 characters long. TINYTEXT-> A string with a maximum length of 255 characters. TEXT-> A string with a maximum length of 65535 characters. BLOB-> A string with a maximum length of 65535 characters. MEDIUMTEXT-> A string with a maximum length of 16777215 characters. MEDIUMBLOB-> A string with a maximum length of 16777215 characters. LONGTEXT->A string with a maximum length of 4294967295 characters. LONGBLOB->A string with a maximum length of 4294967295 characters.

CHAR and VARCHAR are the most widely used types. CHAR is a fixed length string and is mainly used when the data is not going to vary much in it's length. VARCHAR is a variable length string and is mainly used when the data may vary in length. CHAR may be faster for the database to process considering the fields stay the same length down the column. VARCHAR may be a bit slower as it calculates each field down the column, but it saves on memory space. Which one to ultimatly use is up to you. Using both a CHAR and VARCHAR option in the same table, MySQL will automatically change the CHAR into VARCHAR for compatability reasons. BLOB stands for Binary Large OBject. Both TEXT and BLOB are variable length types that store large amounts of data. They are similar to a larger version of VARCHAR. These types can store a large piece of data information, but they are also processed much slower. NUMBER TYPES TINYINT( ) -128 to 127 normal 0 to 255 UNSIGNED. SMALLINT( ) -32768 to 32767 normal 0 to 65535 UNSIGNED. MEDIUMINT( ) -8388608 to 8388607 normal 0 to 16777215 UNSIGNED. INT( ) -2147483648 to 2147483647 normal 0 to 4294967295 UNSIGNED. BIGINT( ) -9223372036854775808 to 9223372036854775807 normal 0 to 18446744073709551615 UNSIGNED. FLOAT A small number with a floating decimal point. DOUBLE( , ) A large number with a floating decimal point. DECIMAL( , ) A DOUBLE stored as a string , allowing for a fixed decimal point. The integer types have an extra option called UNSIGNED. Normally, the integer goes from an negative to positive value. Using an UNSIGNED command will move that range up so it starts at zero instead of a negative number. DATE TYPES DATE YYYY-MM-DD. DATETIME YYYY-MM-DD HH:MM:SS.

TIMESTAMP YYYYMMDDHHMMSS. TIME HH:MM:SS. MISC TYPES ENUM ( ) Short for ENUMERATION which means that each column may have one of a specified possible values. SET Similar to ENUM except each column may have more than one of the specified possible values. ENUM is short for ENUMERATED list. This column can only store one of the values that are declared in the specified list contained in the ( ) brackets. You can list up to 65535 values in an ENUM list. If a value is inserted that is not in the list, a blank value will be inserted. SET is similar to ENUM except SET may contain up to 64 list items and can store more than one choice.

Properly defining the fields in a table is important to the overall optimization of your database. You should use only the type and size of field you really need to use; don't define a field as 10 characters wide if you know you're only going to use 2 characters. These types of fields (or columns) are also referred to as data types, after the type of data you will be storing in those fields. MySQL uses many different data types, broken into three categories: numeric, date and time, and string types. Numeric Data Types: MySQL uses all the standard ANSI SQL numeric data types, so if you're coming to MySQL from a different database system, these definitions will look familiar to you. The following list shows the common numeric data types and their descriptions. INT - A normal-sized integer that can be signed or unsigned. If signed, the allowable range is from -2147483648 to 2147483647. If unsigned, the allowable range is from 0 to 4294967295. You can specify a width of up to 11 digits. TINYINT - A very small integer that can be signed or unsigned. If signed, the allowable range is from -128 to 127. If unsigned, the allowable range is from 0 to 255. You can specify a width of up to 4

digits. SMALLINT - A small integer that can be signed or unsigned. If signed, the allowable range is from -32768 to 32767. If unsigned, the allowable range is from 0 to 65535. You can specify a width of up to 5 digits. MEDIUMINT - A medium-sized integer that can be signed or unsigned. If signed, the allowable range is from -8388608 to 8388607. If unsigned, the allowable range is from 0 to 16777215. You can specify a width of up to 9 digits. BIGINT - A large integer that can be signed or unsigned. If signed, the allowable range is from -9223372036854775808 to 9223372036854775807. If unsigned, the allowable range is from 0 to 18446744073709551615. You can specify a width of up to 11 digits. FLOAT(M,D) - A floating-point number that cannot be unsigned. You can define the display length (M) and the number of decimals (D). This is not required and will default to 10,2, where 2 is the number of decimals and 10 is the total number of digits (including decimals). Decimal precision can go to 24 places for a FLOAT. DOUBLE(M,D) - A double precision floating-point number that cannot be unsigned. You can define the display length (M) and the number of decimals (D). This is not required and will default to 16,4, where 4 is the number of decimals. Decimal precision can go to 53 places for a DOUBLE. REAL is a synonym for DOUBLE. DECIMAL(M,D) - An unpacked floating-point number that cannot be unsigned. In unpacked decimals, each decimal corresponds to one byte. Defining the display length (M) and the number of decimals (D) is required. NUMERIC is a synonym for DECIMAL. Date and Time Types: The MySQL date and time datatypes are: DATE - A date in YYYY-MM-DD format, between 1000-01-01 and 9999-12-31. For example, December 30th, 1973 would be stored as 1973-12-30. DATETIME - A date and time combination in YYYY-MM-DD HH:MM:SS format, between 1000-01-01 00:00:00 and 9999-12-31 23:59:59. For example, 3:30 in the afternoon on December 30th, 1973 would be stored as 1973-12-30 15:30:00.

TIMESTAMP - A timestamp between midnight, January 1, 1970 and sometime in 2037. This looks like the previous DATETIME format, only without the hyphens between numbers; 3:30 in the afternoon on December 30th, 1973 would be stored as 19731230153000 ( YYYYMMDDHHMMSS ). TIME - Stores the time in HH:MM:SS format. YEAR(M) - Stores a year in 2-digit or 4-digit format. If the length is specified as 2 (for example YEAR(2)), YEAR can be 1970 to 2069 (70 to 69). If the length is specified as 4, YEAR can be 1901 to 2155. The default length is 4. String Types: Although numeric and date types are fun, most data you'll store will be in string format. This list describes the common string datatypes in MySQL. CHAR(M) - A fixed-length string between 1 and 255 characters in length (for example CHAR(5)), right-padded with spaces to the specified length when stored. Defining a length is not required, but the default is 1. VARCHAR(M) - A variable-length string between 1 and 255 characters in length; for example VARCHAR(25). You must define a length when creating a VARCHAR field. BLOB or TEXT - A field with a maximum length of 65535 characters. BLOBs are "Binary Large Objects" and are used to store large amounts of binary data, such as images or other types of files. Fields defined as TEXT also hold large amounts of data; the difference between the two is that sorts and comparisons on stored data are case sensitive on BLOBs and are not case sensitive in TEXT fields. You do not specify a length with BLOB or TEXT. TINYBLOB or TINYTEXT - A BLOB or TEXT column with a maximum length of 255 characters. You do not specify a length with TINYBLOB or TINYTEXT. MEDIUMBLOB or MEDIUMTEXT - A BLOB or TEXT column with a maximum length of 16777215 characters. You do not specify a length with MEDIUMBLOB or MEDIUMTEXT. LONGBLOB or LONGTEXT - A BLOB or TEXT column with a maximum length of 4294967295 characters. You do not specify a length with LONGBLOB or LONGTEXT.

ENUM - An enumeration, which is a fancy term for list. When defining an ENUM, you are creating a list of items from which the value must be selected (or it can be NULL). For example, if you wanted your field to contain "A" or "B" or "C", you would define your ENUM as ENUM ('A', 'B', 'C') and only those values (or NULL) could ever populate that field. Char vs VARCHAR A value does not need to be invariant in its length to be a candidate for a CHAR field. It should, however, have very little variance. Phone numbers, for example, can be stored safely in a CHAR(13) field even though phone number length varies from nation to nation. The variance simply is not that great, so there is no value to making a phone number field variable in length. The important thing to keep in mind with a CHAR field is that no matter how big the actual string being stored is, the field always takes up exactly the number of characters specified as the field's size -- no more, no less. Any difference between the length of the text being stored and the length of the field is made up by padding the value with spaces. While the few potential extra characters being wasted on a subset of the phone number data is not anything to worry about, you do not want to be wasting much more. Variable-length text fields meet this need. A good, common example of a field that demands a variable-length datatype is a web URL. Most web addresses can fit into a relatively small amount of space -- http://www.ora.com, http://www.hughes.com.au, http://www.mysql.com -- and consequentially do not represent a problem. Occasionally, however, you will run into web addresses like: http://www.winespectator.com/Wine/Spectator/_notes|5527293926834323221480 431354?Xv11=&Xr5=&Xv1=&type-region-search-code=&Xa14=flora+springs&Xv4=. If you construct a CHAR field large enough to hold that URL, you will be wasting a significant amount of space for most every other URL being stored. Variable-length fields let you define a field length that can store the odd, long-length value while not wasting all that space for the common, short-length values. MySQL and mSQL each take separate approaches to this problem. If you are using only mSQL, you can skip this section. The advantage of variable-length text fields under MySQL is that such fields use precisely the minimum storage space required to store an individual field. A VARCHAR(255) column that holds the string "hello world," for example, only takes up 12 bytes (one byte for each character plus an extra byte to store the length). NOTE

In opposition to the ANSI standard, VARCHAR in MySQL fields are not padded. Any extra spaces are removed from a value before it is stored. You cannot store strings whose lengths are greater than the field length you have specified. With a VARCHAR(4) field, you can store at most a string with 4 characters. If you attempt to store the string "happy birthday," MySQL will truncate the string to "happ." The downside of the MySQL approach to variable-length text fields over the mSQL approach is that there is no way to store the odd string that exceeds your designated field size. Table 6-1 shows the storage space required to store the 144 character Wine Spectator URL shown above along with an average-sized 30 character URL. Table 6-1. The Storage Space Required by the Different MySQL Character Types

6.3.2.2. Variable-length character fields in mSQL


You can skip this section if you are only interested in MySQL. Variable-length character fields in mSQL enable you to define a field's length to be the size of the average character string length it will hold. While every value you insert into this field will still take up at least the amount you specify, it can hold more. The database does this by creating an overflow table to hold the extra data. The downside of this approach comes in the form of performance and the inability to index variable-length fields. Let's take a moment to examine the impact of different design choices with mSQL. In order to store all of the above URLs in a CHAR field, we would need to have a CHAR(144) column. Under this scenario, the four URLs in question would take up 576 bytes (144x3), even though you are only actually storing 216 bytes of data. The other 360 bytes is simply wasted space. If you multiple that times thousands or millions of rows, you can easily see how this becomes a serious problem. Using a variable-length TEXT(30) field, however, only 234 bytes (30x3+144) are required to store the 216 bytes of data. Only 18 bytes are wasted. That is a 41% savings!

6.3.3. Binary Datatypes


mSQL has no support for binary data. MySQL, on the other hand, provides a set of binary datatypes that closely mirror their character counterparts. The MySQL binary types are CHAR BINARY, VARCHAR BINARY , TINYBLOB , BLOB , MEDIUMBLOB, and LONGBLOB . The practical distinction between character types and their binary counterparts is the concept of

encoding. Binary data is basically just a chunk of data that MySQL makes no effort to interpret. Character data, on the other hand, is assumed to represent textual data from human alphabets. It thus is encoded and sorted based on rules appropriate to the character set in question. Specifically, MySQL sorts binary in a case-insensitive, ASCII order

6.3.4. Enumerations and Sets


MySQL provides two other special kinds of types with no mSQL analog. The ENUM type allows you specify at table creation a list of possible values that can be inserted into that field. For example, if you had a column named fruit into which you wanted to allow only "apple," "orange," "kiwi," or "banana," you would assign this column the type ENUM:
CREATE TABLE meal(meal_id INT NOT NULL PRIMARY KEY, fruit ENUM(`apple', `orange', `kiwi', `banana'))

When you insert a value into that column, it must be one of the specified fruits. Because MySQL knows ahead of time what valid values are for the column, it can abstract them to some underlying numeric type. In other words, instead of storing "apple" in the column as a string, it stores it as a single byte number. You just use "apple" when you call the table or when you view results from the table. The MySQL SET type works in the same way, except it lets you store multiple values in a field at the same time.

6.3.5. Other Kinds of Data


Every piece of data you will ever encounter can be stored using numeric or character types. Technically, you could even store numbers as character types. Just because you can do so, however, does not mean that you should do so. Consider, for example, storing money in the database. You could store that as an INT or a REAL. While a REAL might seem more intuitive -money requires decimal places, after all -- an INT actually makes more sense. With floating point values like REAL fields, it is often impossible to capture a number with a specific decimal value. If, for example, you insert the number 0.43 to represent $0.43, MySQL and mSQL may store that as 0.42999998. This small difference can be problematic when applied to a large number of mathematical operations. By storing the number as an INT and inserting the decimal into the right place, you can be certain that the value represents exactly what you intend it to represent. Isn't all of that a major pain? Wouldn't it be nice if MySQL and mSQL provided some sort of datatype specifically suited to money values? MySQL and, to a lesser degree, mSQL both provide special datatypes to handle special kinds of data. MONEY is an example of one of these kinds of data. DATE is another. For a full description of all

Das könnte Ihnen auch gefallen