Sie sind auf Seite 1von 13

Difference between TRUNCATE, DELETE and DROP commands

Submitted by admin on Sat, 2006-02-11 02:07

DELETE

The DELETE command is used to remove rows from a table. A WHERE clause can be used to only remove some rows. If no WHERE condition is specified, all rows will be removed. After performing a DELETE operation you need to COMMIT or ROLLBACK the transaction to make the change permanent or to undo it. Note that this operation will cause all DELETE triggers on the table to fire.

SQL> SELECT COUNT(*) FROM emp;

COUNT(*) ---------14

SQL> DELETE FROM emp WHERE job = 'CLERK';

4 rows deleted.

SQL> COMMIT;

Commit complete.

SQL> SELECT COUNT(*) FROM emp;

COUNT(*) ---------10

TRUNCATE

TRUNCATE removes all rows from a table. The operation cannot be rolled back and no triggers will be fired. As such, TRUCATE is faster and doesn't use as much undo space as a DELETE.

SQL> TRUNCATE TABLE emp;

Table truncated.

SQL> SELECT COUNT(*) FROM emp;

COUNT(*) ---------0

DROP

The DROP command removes a table from the database. All the tables' rows, indexes and privileges will also be removed. No DML triggers will be fired. The operation cannot be rolled back.

SQL> DROP TABLE emp;

Table dropped.

SQL> SELECT * FROM emp; SELECT * FROM emp * ERROR at line 1: ORA-00942: table or view does not exist

DROP and TRUNCATE are DDL commands, whereas DELETE is a DML command. Therefore DELETE operations can be rolled back (undone), while DROP and TRUNCATE operations cannot be rolled back. What are the difference between DDL, DML and DCL commands? up How does one escape special characters when writing SQL queries?

Login to post comments

Difference between TRUNCATE and DELETE commands


Submitted by Dipal havsar (not verified) on Tue, 2006-09-19 07:39.

1>TRUNCATE is a DDL command whereas DELETE is a DML command. 2>TRUNCATE is much faster than DELETE. Reason:When you type DELETE.all the data get copied into the Rollback Tablespace first.then delete operation get performed.Thatswhy when you type ROLLBACK after deleting a table ,you can get back the data(The system get it for you from the

Rollback Tablespace).All this process take time.But when you type TRUNCATE,it removes data directly without copying it into the Rollback Tablespace.Thatswhy TRUNCATE is faster.Once you Truncate you cann't get back the data. 3>You cann't rollback in TRUNCATE but in DELETE you can rollback.TRUNCATE removes the record permanently. 4>In case of TRUNCATE ,Trigger doesn't get fired.But in DML commands like DELETE .Trigger get fired. 5>You cann't use conditions(WHERE clause) in TRUNCATE.But in DELETE you can write conditions using WHERE clause

Login to post comments

One more difference.


Submitted by Shreehari (not verified) on Tue, 2006-11-07 21:25.

Thanks for this information. There is one more difference that TRUNCATE command resets the High Water Mark for the table but DELETE does not. So after TRUNCATE the operations on table are much faster.

about truncate and drop


Drop command will delete the entire row also the structure.But truncate will delete the contenets only not the strucure, so no need to give specifications for another table creation.

DELETE,DROP,TRUNCATE
Submitted by shoaib on Thu, 2007-01-18 00:04.

DELETE Delete is the command that only remove the data from the table. It is DML statement. Deleted data can be rollback. By using this we can delete whole data from the table(if use without where clause).If ew want to remove only selected data then we should specify condition in the where clause SQL>delete from employee;(this command will remove all the data from table) SQL>delete from employee where employee_name='JOHN';(This command will remove only that row from employee table where employee_name is JOHN'); DROP: Drop command remove the table from data dictionary. This is the

DDL statement. We cannot recover the table before Oracle 10g. But Oracle 10g provide the command to recover it by using the command (FLASHBACK) TRUNCATE: This is the DML command. This command delete the data from table. But there is one difference from ordinary delete command. Truncate command drop the storage held by this table. Drop storage can be use by this table again or some other table. This is the faster command because it directly drop the storage

Truncate is DDL command


DML commands have a roll back option.But DDL commands do not have.So truncate is a ddl statement

DELETE,DROP,TRUNCATE
DROP command is a DDL command. It removes the information along with structure. It also removes all information about the table from data dictionary. TRUNCATE command is DDL command. It removes all the information of the table. Regarding performance if you have to delete all the rows of a table you should perform TRUNCATE command with DROP STORAGE option. DELETE is a DML command. It provides the facility of conditionalbased deletion. It also generates REDO information.

Login to post comments

.:: Blogger Home ::

Delete We can fire the triggers We can use filtering conditions in delete When delete is used DB manager locks the table for user manipulations It is an DML and commit/rollback is required. It can work on views Delete is a logged operation based on each row of the table Delete is a physical deletion Can delete any row that will not violate a constraint, while leaving the foreign key as it is Delete will not reset the default seed value Delete is logged. It is DML

Truncate When truncate is executing no triggers will fire We cant use filtering in truncation Where as for truncate is does not It is a DDL operation and auto commit. It can work on views and works on clusters and tables. Whereas truncate is the logged operation based on deallocation of the datapages in which data exists. Whereas in Truncate the data exists in datapages, but the extents are marked empty for reuse. so we cant rollback Whereas cannot truncate a table with foreign key constraint, so remove that then truncate and then add the foreign key constraint. whereas truncate will do that hence truncate is faster than delete Truncate is unlogged. It is DDL

Equi join and union: Indeed both equi join and the Union are very different. Equi join is used to establish a condition between two tables to select data from them.. eg select a.employeeid, a.employeename, b.dept_name from employeemaster a , DepartmentMaster b where a.employeeid = b.employeeid; This is the example of equijoin whereas with a Union allows you to select the similar data based on different conditions eg select a.employeeid, a.employeename from employeemaster a where a.employeeid 100 b.employeeid Union select a.employeeid, a.employeename from employeemaster a where a.employeename like 'B%' the above is the example of Union where in we select employee name and Id for two different conditions into the same recordset and is used thereafter. for equijoin there must be relationship between two tables but for union no need to have any relation between the two tables but equijoin & union are used with more than one table

Function = stored code suitable for use in the select list of a query and in the where clause conditions that generally accepts one parameter and returns one value. Allows user to extend list of Oracle provided row queries. Procedure = stored code designed to return one or more values to the caller. Procedures are generally called from other stored code routines or directly from the user application and not as part of a query or DML statement. Package = a collection of stored procedures and functions Delete = DML row operation to remove rows from a table

Truncate = DDL operation to quicky and with minimum overhead mark a table and its associated indexes as empty Drop = DDL operation to remove a table or other object from the database Maybe these definitions will help until you cover the reading. You can also learn this from the Concepts manual. If you are a DBA this manual is where you should start. For a Developer I would start with the Application Developers - Fundamentals. Function = stored code suitable for use in the select list of a query and in the where clause conditions that may or may not accepts one or more parameter and returns one value. Funcions also can be called from other procdures or functions. Function must RETURN a value. Procedure = stored code designed to return one or more values to the caller. Procedure may or may not return values. Procedures are generally called from other stored code routines or directly from the user application and not as part of a query or DML statement. Package = a collection of stored procedures and functions Delete = DML row operation to remove rows from a table. With this option one can delete selective rows. To make delete permanent user has to issue commit command. Truncate = DDL operation to quicky and with minimum overhead mark a table (completely not selective rows) and its associated indexes as empty. Drop = DDL operation to remove a table or other object from the database Surrogate key:
We are trying to construct a job where we update/insert into dimension table in oracle using the OCI plugin. The dimension table is to have a surrogate key, and we were wishing to set this within the oracle environment rather than using the DataStage 4.0 surrogate key generation capability. Therefore rather than having to do a lookup to see if the dimension record already exists we would use the update/insert capability of the plugin using an alternative unique index based on the source system keys. We would not include the SERIAL surrogate key in the columns to update/insert. This would work under SQL server 7.0/informix with SERIAL data types, but the only thing I can find in ORACLE is sequences, which from what Iunderstandhave no direct relationship with a table, and therefore do not behavelike aSERIAL data type. I know we could do a lookup using the alternate unique index to returnthe surrogate key value, and therefore determine if an insert or update was about to occur depending on whether a row was returned, but I was trying to eliminate having to do this. Do you want the good news or the bad news? The bad news is that you cant do what you are trying to do. The good news is that you have figured out the correct/only way to do it. We extensively use surrogate keys for our dimensions and manage them with sequences. The approach we use is in one transform, look up the dimension table with the "natural" key to get the surrogate key (we call it a UID). If it exists, the same transform updates the dimension table. If it does not exist, pass the data to another transform which looks up the values from the sequence and inserts the

new row to the dimension. Alternatively, instead of a lookup to get the next value from sequence, you could use a user defined query for the insert to directly reference the sequence. For that matter, this would remove the need for the second transform. For my money, the extra lookup is easier and makes it clearer in the job as to what is happening. Having done a lookup to the dimension, you may as well test to see if anything has changed and avoid the update if you dont need it. && More good news is that you CAN do it (and I know Phil is a gun BASIC programmer). Construct a routine using BCI (BASIC SQL Client Interface) functions to SELECT MAX(surrogate_key) FROM dimension_table, load this into COMMON, then construct a transform function to increment it when needed to generate the next row. This technique is taught on the "Programming with DataStage BASIC" class. &&& I can see Im going to be a little more careful about how I word things in future. Thanks Ray, this is a perfectly reasonable option. Your approach has the advantage that it can all be done in the one transform stage as the transform function will only be executed when one actually inserts. (As opposed to looking up a sequence which needs to be in a separate transform stage to avoid incrementing it when one is not inserting). Is the BCI using ODBC (in which case I know what you are talking about) or using OCI (something I havent tried in basic code)? I still dont think you can avoid doing the lookup first though to see if the dimension record already exists. *sigh* I guess thats what I get for missing the " Programming with DataStage BASIC" class. The BCI is a set of BASIC functions that mimic the ODBC 2.0 API. So its ODBC protocols. You are right that you need to perform a reference lookup against the dimension table to determine existence/change. However, this can be done extremely efficiently by pre-loading a hashed file. Something like SELECT A.CODE, A.DESCR FROM DIMTABLE A WHERE A.SURRKEY = (SELECT MAX(B.SURRKEY) FROM DIMTABLE B WHERE A.CODE = B.CODE); Then the lookups can be done against CODE (the "natural key" in your terms) as the key in the hashed file (an equi-join, so its fast). Existence is an ISNULL test on the returned key, and testing for difference needs only to compare the auditable columns. If there are likely to be duplicates in the input stream you must ensure that the hashed file is NOT pre-loaded to memory, and that it is also updated when a changed value occurs. If there can not be duplicates (SELECT DISTINCT for example), then this check can be omitted. Have fun. With DataStage you need to consider two architectures, design-time and run-time.

Design Time

At design time there are five clients; Administrator, Repository Manager, Designer, Director (as in movie director), and Version Control. Each client connects to the DataStage server, which manages storage of design-time objects in the DataStage repository (a proprietary database originally based on UniVerse). Each project is a separate schema. Connection is made over TCP/IP (Windows LAN Manager is also possible if the DataStage server is on a Windows machine) using a proprietary protocol called "DataStage Objects", originally based on UniObjects. On the client side, the clients consume ActiveX objects and methods exposed by OLE servers that are part of DataStage Objects. During design, each connected client has an "agent process" (dsapi_server) associated with it, to manage the connection. To perform actual work, an additional process (dsapi_slave) may be forked. These use "helper subroutines" (written in DataStage BASIC) such as DSR_RECORD, DSR_SELECT, to effect changes to design-time components in the DataStage Respository. Run Time At run time the DataStage Engine acts as a data pump. Passive stage types expose methods for opening and closing connections, retrieving data (either the next row from a result set or the row associated with a particular key value), and putting data out, again a row at a time. These rows can be packaged into arrays, and transaction control governed. Properties of the passive stage types are allocated at design time, or at run time from job parameters. When connecting to databases, DataStage jobs appear as just another client to the database servers. Active stage types (such as Transformer stages) not only act upon the data, they also control the sequence of events. Tables in the DataStage repository contain information about configuration of the job (for example what comes after what, what depends on what, and so on), records the status of each job and of its active stages, and stores the job log. There are three different modes of run time operation. Server jobs and job control routines execute in DataStage BASIC (a compiled BASIC language) via the DataStage shell (dssh). Parallel jobs execute in C via the Orchestrate shell (osh). Within osh are mechanisms for controlling execution on multiple parallel processing nodes, governed by information in a configuration file that has been associated with the running job. Mainframe jobs execute in COBOL on a mainframe machine; DataStage

generates the COBOL source code and JCL for compiling and running it, and transfers it to the mainframe, but does not control its execution.

I'd count "SAP" jobs as a hybrid between server and mainframe jobs. With the SAP R/3 Extract and SAP BW pieces, they fit into server jobs like other stage types. However, instead of running under control of the DataStage Engine, they generate legal ABAP code that is passed (by one of two mechanisms) to SAP for execution. What they did was to create custom (plug-in) stage types that can work with IDOC definitions, generate ABAP code, and communicate with SAP, either receiving data from SAP R/3 or sending data to SAP BW. Control returns to dssh once that's happened. Says a lot for the "plug-in" architecture, which has been in the product since version 1.0, that it allows this kind of thing to be done. Surrogate key: Begin by thinking through the logic. When loading the dimension table, does the row already exist? If no, generate key value and insert new row. If yes, is the row changed? If no, do nothing. If yes, generate key value and insert new row. The reference lookup must be done on the original key value, not the surrogate key value. Therefore take a copy of only the auditable columns from the dimension table into a local hashed file (this will yield much better performance than a lookup on a secondary key). Use the hashed file to feed a Transformer stage via a reference input link, and constrain the output of the Transformer stage to "reference input key is null (not found) or row is changed (columns are different between stream input and reference input)". Auditable columns are the ones you need to compare to see whether a difference has occurred. There are many ways to generate a surrogate key value. Prefer to use a SEQUENCE/IDENTITY column if the target database allows. Otherwise, depending on your skill levels, use the SDK key management routines, or write your own routines: a before-stage subroutine to get the current maximum value of the surrogate key and load it into a variable in COMMON, and a transform function to increment this variable and return the new value. Loading the fact table then involves loading a hashed file with just the true code value (as its key) and the maximum surrogate key value for that code value, and using this hashed file to convert the true code to the corresponding surrogate key value. The hashed file has only two columns.

SCDS

DataStage does not have a wizard as such to create these mappings for you, however they are not hard to do. Now that you have labeled them, we use types 1 & 2 extensively in our warehouse. I would think that type three would present difficulties to most reporting tools, but, nonetheless, DataStage could still handle it. Our warehouse is in Oracle, so Im generally speaking in the context of how DataStage interacts with Oracle. Type 1 The output stage (in our case, an Oracle OCI plug in) can do either an "insert else

update" or an "update else insert". The developer would choose based on whether more inserts or updates are expected. For SCD, I would expect that "update else insert" would generally be the preferred option and in practice we have found this to perform the best. Type 2 DataStage does not build this automatically or with a wizard. However it is not hard to do and we use it extensively. The logic partly depends on how good your changed data capture (CDC) is. One of our interfaces uses CDC based on triggers in the source system to record changes. However, not all changes to the source records are relevant to us, so it is not a forgone conclusion that a changed record presented to our warehouse is really changed from our perspective. Obviously the dimension table would have a surrogate key as well as what I refer to as a natural key (ie the source system key plus time). DataStage could look up the dimension table to see if any of the attributes have really changed (this is optional, depending on the quality of the CDC). If a change is detected, update the end date of the old record and insert a new record. We use Oracle sequences to generate all our surrogate keys. DataStage can simply select this with a reference link or if you write a custom query for the insert, you can include it there (but why write code if you dont have to). Ray Wurlod wrote a good posting about alternative approaches to this some time back. Type 3 The biggest issue here is your table design in the warehouse. DataStage can do whatever you need to populate columns of the target table in its transforms. These can be very simple or very complicated. We have an equipment hierarchy table in our warehouse that allows for up to 9 levels in the hierarchy. Every level is held on the single dimension record, so I guess it is a form of what you call Type 3. This involved a transform for each level. Each transform checked to see if the current record had a parent. If it did, it passed the hierarchy so far to the next transform, which does exactly the same thing. If it didnt, it wrote the completed hierarchy data to the target table. A further interesting point is that the native database under DataStage is UniVerse which handles these multi-value constructs very nicely. However, I have little experience with UniVerse, so Ill leave this for others to explain. I hope this helps a little. Im sure others will find plenty more to add to it. A typical example for "Slowly changing Dimension (SCD)" is a PRODUCT dimension in which the detailed description of a given product is occasionally changed/adjusted. Because of change in detail description of proudct, we need to track the changes in warehouse. There are three main techniques for handling SCD in datawarehouse. OVERWRITING: (simple one) A Type One change overwrites an existing dimensional attribute with new information. In the customer name-change example, the new name overwrites the old name, and the value for the old version is lost. A Type One change updates only the attribute, doesnt insert new records, and affects no keys. PRESERVING HISTORY:

A Type Two change writes a record with the new attribute information and preserves a record of the old dimensional data. Type Two changes let you preserve historical data. Implementing Type Two changes within a data warehouse might require significant analysis and development. Type Two changes accurately partition history across time more effectively than other types. However, because Type Two changes add records, they can significantly increase the databases size. Preserving a version of history (Type Three). You usually implement Type Three changes only if you have a limited need to preserve and accurately describe history, such as when someone gets married and you need to retain the previous name. Instead of creating a new dimensional record to hold the attribute change, a Type Three change places a value for the change in the original dimensional record. You can create multiple fields to hold distinct values for separate points in time. In the case of a name change, you could create an OLD_NAME and NEW_NAME field and a NAME_CHANGE_EFF_DATE field to record when the change occurs. This method preserves the change. But how would you handle a second name change, or a third, and so on? The side effects of this method are increased table size and, more important, increased complexity of the queries that analyze historical values from these old fields. After more than a couple of iterations, queries become impossibly complex, and ultimately youre constrained by the maximum number of attributes allowed on a table. Thata all about SCD. Currenly Im working in both ETL tools Informatica and DataStage. In Informatica I can see separte Wizard to implement the SCD (all three techniques). I would like to know how to implement the same in DataStage.

Hash files as lookups:


Can anyone explain exactly what all the hash-file creation options mean and, more usefully, in what situations to use them effectively. For example: FileType: What are the differences between Type2 to Type18(Hashed)? Under what scenarios should hashed/b-tree/dynamic be used/avoided? Split/Merge load options: What do these terms mean? When (if at all) should defaults be modified? Large record vs Record size: When should these be different? Whats the effect of over/under specifying record size? Hash-algorithm: Under what scenarios should General vs Seq.Num be used? I know tuning hash-files can be a complex job and "do this when ..." rules are difficult to provide but Im almost completely in the dark about this area! Is it documented rather more clearly/completely than provided in on-line help anywhere? Hash files group records into buckets. The different types use different algorithms to decide which bucket a record goes into. If an algorithm is poor then the buckets are not evenly distributed therefore some buckets have more records than others.

When this happens performance suffers. All the types exist to improve performance. On old Pick machines only type 18 was used. It is the safest type. DataStage uses dynamic or type 30 as a default. Type 30 files resize themselves. A hash file has 2 properties which determine performance. The first is modulo. Modulo is the number if buckets. Separation is the size of the bucket. In different version of the Pick database use a different size for separation. Some use separation times 512 as the bucket size. Others use 1K to 2K blocks. These are important because performance can be terrible if either separation or modulo are too small. This situation is said to be in overflow because when a bucket gets full the overflow is added sequentially to the end of a file. The more in overflow a bucket is then the longer it takes Universe to find a record. The proper term for bucket is "group" The safest thing to do is to use dynamic files. Dynamic files have few options which can help performance. Minimum modulo is one of them. In a dynamic file it will split a group into 2 parts and grow the modulo by one. This also happens as records are deleted. So this means a group may be written out twice as it splits. This means twice the IO. Minimum modulo stops the splitting until that modulo is reached. This can make performance several times faster when writing to large hash files. Split/merge are the percentage of how full a group is before it splits or how empty it is before it merges. DataStage mainly uses hash files for lookups. In this situation hash files are better off oversized because there are fewer records per group. If you use a hash file for primary input then undersized hash files will read less groups and perform better. Large records are when a record is larger than the default group size therefore a record spans multiple groups. The performance of this situation is terrible because all records are in overflow and multiple groups need to be pieced together to make up one record. Very bad. HASH.HELP and ANALIZE.FILE will tell you how your hash files are setup. You can buy Fast that will also help resize your files. Fast is a third party program to manage hash files. If you clear your files before loading them then fragmentation is not an issue. Fragmentation is when you delete a record or read a record and write it back with a larger size. What happens is Universe marks the old space as available and then adds the new record to the end of the group. This leaves holes in the groups. RESIZE file * * * will defragment even a dynamic file. Clearing files also fixes this issue. When you read from a fragmented file then you read all the holes which means more disk IO than necessary. Btrees are type 25 files. They are created when you add an index to a file. Indexes are important when you select data from this file using a WITH or a WHERE clause involving the indexed field. Indexes slow up the building of a hash file but improve the selection process on a primary input with a selection clause. They are not very useful in DataStage. Building a hash file for 10M items can be a challenge. If you are using the default dynamic hash file settings, loading this many rows will take a very long time. As you may know, hash files work by placing the data into buckets. The bucket is chosen by calculating a number using the key. During load, if a bucket fills up, the system must rearrage all the previous loaded data to distribute them among a new number of buckets. This redistribution becomes very costly as the number of

rows gets large (such as in your case). By default, a dynamic hash starts with one bucket. You can change this by adjusting the minimum modulus option, which specifies the number of buckets to start with. The group size option specifies the size of the bucket. A group size of 1 gives you a 2048 byte bucket and 2 gives you a 4096 byte bucket. Calculate the size of your key to determine how many rows go into a bucket. Divide that into the total number of rows for the number of buckets you need. Since all buckets dont fill completely, multiply that estimate by 2. That would be a good guess for the minimum modulus setting. You can set the value higher if you wish, as there is always a risk that data may clump into some buckets. Doing this will significantly improve the load times. If that doesnt work, an option in Oracle would be to create a companion table for key lookup. Using an insert trigger to maintain it, create an index organized table that contains nothing by the lookup key data. Set the caching so Oracle keeps it in memory, and use the OCI stage to access it. This would not be as good as a hash, but probably better than what you have.

Das könnte Ihnen auch gefallen