Sie sind auf Seite 1von 11

Changing Character Set to UTF8 for Production Database

Rui Wang

May 08, 2010

To implement upgrade of one of our applications, our team scheduled a period of downtime to get

oracle database (10.2.0.4)ready for it. What the DBA team is required is to create a new oracle

database which is identical to the production database. Thanks for the downtime of production

database, the steps to create identical database are quite straightforward as below.

• issue “alter database backup controlfile to trace” against production database

before downtime

• extract “create controlfile” command from trace file under folder udump

• edit command to reflect new location of target database, and change line of

“CREATE CONTROLFILE REUSE DATABASE "OLDDB" NORESETLOGS NOARCHIVELOG”

to “CREATE CONTROLFILE SET DATABASE “NEWDB” RESETLOGS NOARCHIVELOG”

• Shutdown Production Database

• create init parameter file, password file and edit listener.ora and tnsnames.ora

• log on idle instance for target database

• issue “startup nomount”

• issue “create controlfile” command to create control file

• issue “alter database open resetlogs” to open database

Once the target database is up, we are ready to implement character set conversion. The following

three oracle metalink documents are what we highly relied on to proceed.

• Changing the NLS_CHARACTERSET to AL32UTF8 / UTF8 (Unicode) [ID 260192.1]

• Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner)

[ID 745809.1]

• NLS considerations in Import/Export - Frequently Asked Questions [ID 227332.1]


Here, I would like to list step-by-step solution to finish the character set conversion.

Step 1: Installing and Configuring CSSCAN

The first thing to install CSSCAN is to connect database as sysdba and run script csminst.sql

($ORACLE_HOME/rdbms/admin).

If you experience error about no-existence of directory ‘log_file_dir’ and ‘data_file_dir’,

please ignore it because granting read privilege to these two directories is removed.

And then, we need to make sure if CSSCAN is installed propermanticsly and ready for use. To

check that, simply issue the following OS command.

$ csscan \"sys/password@dbtnsname as sysdba\" FULL=Y TOCHAR=UTF8 LOG=TOUTF8FIN

CAPTURE=Y ARRAY=1000000 PROCESS=2

If we have “Scanner terminated successfully.” message, we are ready to use CSSCAN for

character set conversion.

Step 2: Pre-checking for database

Before starting character set conversion, we need to do following pre-checking against

database.

• Invalid objects

• Orphaned datqapump master tables (10g and up)

• Objects in the recyclebin (10g and up)

• Leftover temporary tables using CHAR semantics

Note that, please log on database with sysdba to proceed the followign steps.

1) Invalid objects

To check invalid objects in database, we need to Issue the following sql statement to check

invalid objects
SQL> select distinct owner from dba_objects where status=’INVALID’;

The above sql statement lists all of schemas which contains ‘INVALID’ objects. These invalid objects

need to be compiled or dropped if they are unused. The simplest way to compile all of objects

within single schema is to use package UTL_RECOMP as following.

SQL> exec utl_recomp.recomp_serial(‘SCHEMA’);

2) Orphaned datapump master tables (10g and up)

To check it, Issue the following sql statement:

SQL> SELECT o.status, o.object_id, o.object_type,

o.owner||'.'||object_name "OWNER.OBJECT"

FROM dba_objects o, dba_datapump_jobs j

WHERE o.owner=j.owner_name AND o.object_name=j.job_name

AND j.job_name NOT LIKE 'BIN$%' ORDER BY 4,2;

If “no rows selected”, please go proceed next step. Otherwise, check Note: 336014.1 How To

Cleanup Orphaned DataPump Jobs In DBA_DATAPUMP_JOBS ?.

3) Objects in the recyclebin (10g and up)

SQL> SELECT OWNER, ORIGINAL_NAME, OBJECT_NAME, TYPE from dba_recyclebin order

by 1,2;

If there are objects in the recyclebin then perform

SQL> PURGE DBA_RECYCLEBIN;

This will remove unneeded objects and otherwise during CSALTER an ORA-38301 will be seen.

4) Leftover temporary tables using CHAR semantics

SQL> select C.owner ||'.'|| C.table_name ||'.'|| C.column_name ||' ('||

C.data_type ||' '|| C.char_length ||' CHAR)'


from all_tab_columns C

where C.char_used = 'C'

and C.table_name in (select table_name from dba_tables where temporary='Y')

and C.data_type in ('VARCHAR2', 'CHAR')

order by 1;

If “no rows selected”, please go proceed next step. Otherwise, check Note: 4157602.8

DBMS_STATS "ORA_TEMP_%_DS_%" temporary tables not cleaned up.

Step 3. Check the Source database for "Lossy"

Run CSSCAN with the following syntax:

$ csscan \"sys/password@dbtnsname sysdba\" FULL=Y FROMCHAR=WE8ISO8859P1

TOCHAR=WE8ISO8859P1 LOG=dbcheck CAPTURE=N ARRAY=1000000 PROCESS=2

Where, ‘WE8ISO8859P1’ is the current character set of database. To check current character set of

database, please issue the following sql statement.

SQL> select value from NLS_DATABASE_PARAMETERS where

parameter='NLS_CHARACTERSET';

Running above CSSCAN command will create three files:

1. dbcheck.out a log of the output of csscan

2. dbcheck.txt a Database Scan Summary Report

3. dbcheck.err contains the rowid's of the Lossy rows reported in dbcheck.txt (if any).

This is to check if all data is stored correctly in the current character set. Because the TOCHAR and

FROMCHAR character sets as the same there cannot be any "Convertible" or "Truncation" data

reported in dbcheck.txt. If all the data in the database is stored correctly at the moment then there

is only "Changeless" data reported in dbcheck.txt.

If there is any "Lossy" data then those rows contain code points that are not currently defined

correctly and they should be cleared up before you can continue. The most common situation is
when having an US7ASCII/WE8ISO8859P1 database and "Lossy", in this case changing your

US7ASCII/WE8ISO8859P1 SOURCE database to WE8MSWIN1252 using Alter Database Character Set /

Csalter will most likely solve you lossy.

To perform character set conversion from WE8ISO8859P1 to WE8MSWIN1252, issue,

SQL> shutdown immediate

SQL> startup restrict

SQL > alter database character set WE8MSWIN1252;

SQL> alter database open;

After conversion to WE8MSWIN1252, repeat above CSSCAN command to make sure there is no “lossy”

data.

Step 4. Check for "Convertible" and "Truncation" data when going to UTF8

$ csscan \"sys/password@dbtnsname as sysdba\" FULL=Y TOCHAR=UTF8 LOG=TOUTF8 CAPTURE=Y

ARRAY=1000000 PROCESS=2

This will create 3 files :

toutf8.out a log of the output of csscan

toutf8.txt the Database Scan Summary Report

toutf8.err contains the rowid's of the Convertible and Lossy rows reported in toutf8.txt

There should be NO entries under "Lossy" in toutf8.txt, because they should have been filtered out

in step 3, if there is "Lossy" data then please redo step 3.

File toutf8.txt should have the following part of output or similar.

[Scan Summary]

All character type data in the data dictionary are convertible to the new character set

All character type application data remain the same in the new character set
[Data Dictionary Conversion Summary]

Datatype Changeless Convertible Truncation Lossy

--------------------- ---------------- ---------------- ---------------- ----------------

VARCHAR2 4,225,489 0 0 0

CHAR 1,116 0 0 0

LONG 159,629 0 0 0

CLOB 17,743 4,349 0 0

VARRAY 23,462 0 0 0

--------------------- ---------------- ---------------- ---------------- ----------------

Total 4,427,439 4,349 0 0

Total in percentage 99.902% 0.098% 0.000% 0.000%

The data dictionary can be safely migrated using the CSALTER script

[Application Data Conversion Summary]

Datatype Changeless Convertible Truncation Lossy

--------------------- ---------------- ---------------- ---------------- ----------------

VARCHAR2 5,262,627 0 0 0

CHAR 87 0 0 0

LONG 0 0 0 0

CLOB 134 0 0 0

VARRAY 1,587 0 0 0

--------------------- ---------------- ---------------- ---------------- ----------------

Total 5,264,435 0 0 0

Total in percentage 100.000% 0.000% 0.000% 0.000%

If there is only ‘Changeless’ and ‘Convertible’, we can go continue next step.

Step 5. Export application objects

In above toutf8.txt, there are detailed list of application objects which we need to use

export/import to deal with. The sample list of this looks like:


[Distribution of Convertible, Truncated and Lossy Data by Table]

USER.TABLE Convertible Truncation Lossy

-------------------------------------------------- ---------------- ---------------- ----------------

MDSYS.SDO_COORD_OP_PARAM_VALS 200 0 0

MDSYS.SDO_GEOR_XMLSCHEMA_TABLE 1 0 0

MDSYS.SDO_STYLES_TABLE 78 0 0

MDSYS.SDO_XML_SCHEMAS 4 0 0

SYS.METASTYLESHEET 80 0 0

SYS.RULE$ 4 0 0

SYS.SQL$TEXT 1 0 0

SYS.WRH$_SQLTEXT 1,420 0 0

SYS.WRH$_SQL_PLAN 1,339 0 0

SYS.WRI$_ADV_ACTIONS 4,754 0 0

SYS.WRI$_ADV_OBJECTS 2,872 0 0

SYS.WRI$_ADV_RATIONALE 2,130 0 0

SYS.WRI$_DBU_FEATURE_METADATA 99 0 0

SYS.WRI$_DBU_FEATURE_USAGE 10 0 0

SYS.WRI$_DBU_HWM_METADATA 19 0 0

WEBCT.AGN_ASSIGNMENT 4,130 0 0

WEBCT.AGN_GROUPASSIGNMENT 221 0 0

WEBCT.AGN_SUBMISSION 12,531 0 0

WEBCT.AGN_SUBMISSION_COMMENT 594 0 0

WEBCT.ANNOUNCEMENT 1,776 0 0

WEBCT.ASSMT_ATTEMPT 144 0 0

WEBCT.ASSMT_ATTEMPT_ITEM 4,386 0 0

WEBCT.ASSMT_RESPONSE 3,264 0 0

WEBCT.ASSMT_SETTING 24 0 0

WEBCT.CALENDAR_ENTRY 4,634 0 0

WEBCT.CMS_CE_LANGUAGE 2 0 0

WEBCT.CMS_CONTENT_ENTRY 586,378 0 0

WEBCT.CMS_LINK 653 0 0

WEBCT.CMS_UNIQUE_NAME 618 0 0
WEBCT.CMS_UNIQUE_NAME070809133755 218 0 0

WEBCT.CO_HEADERFOOTER 11,479 0 0

Please note that only data dictionary and oracle schema could be safely converted to target

character set. Other application data, such as objects in schema WEBCT, need to be converted by

using export/import mechanism. The basic step is to do object export backup, drop objects and

do import these object after the character set conversion is done.

Export these application objects and then drop them from database. Please note that “Do NOT use

Expdp/Impdp when going to (AL32)UTF8 or an other multibyte characterset on ALL 10g versions

lower then 10.2.0.4 (including 10.1.0.5).”

Step 6. Perform Character Set conversion with CSALTER package

After all of application objects/data are dropped from database, please re-run csscan

command in step 4 to check if there is application object/data in toutf8.txt. If not, it’s ready

for conversion by using CSALTER package.

SQL> shutdown immediate

Database closed.

Database dismounted.

ORACLE instance shut down.

SQL> startup restrict

ORACLE instance started.

Total System Global Area 884998144 bytes

Fixed Size 2044616 bytes

Variable Size 436211000 bytes

Database Buffers 440401920 bytes

Redo Buffers 6340608 bytes

Database mounted.

Database opened.

SQL> $ORACLE_HOME/rdbms/admin/csalter.plb

0 rows created.
Function created.

Function created.

Procedure created.

This script will update the content of the Oracle Data Dictionary.

Please ensure you have a full backup before initiating this procedure.

Would you like to proceed (Y/N)?Y

old 6: if (UPPER('&conf') <> 'Y') then

new 6: if (UPPER('Y') <> 'Y') then

Checking data validility...

begin converting system objects

15541 rows in table SYS.WRH$_SQL_PLAN are converted

1129 rows in table SYS.WRH$_SQLTEXT are converted

80 rows in table SYS.METASTYLESHEET are converted

421 rows in table SYS.WRI$_ADV_ACTIONS are converted

19 rows in table SYS.WRI$_DBU_HWM_METADATA are converted

87 rows in table SYS.WRI$_DBU_FEATURE_METADATA are converted

4 rows in table SYS.RULE$ are converted

978 rows in table SYS.WRI$_ADV_OBJECTS are converted

117 rows in table SYS.WRI$_DBU_FEATURE_USAGE are converted

1 row in table SYS.SCHEDULER$_EVENT_LOG is converted

354 rows in table SYS.WRI$_ADV_RATIONALE are converted

PL/SQL procedure successfully completed.

Alter the database character set...

CSALTER operation completed, please restart database

PL/SQL procedure successfully completed.

0 rows deleted.

Function dropped.
Function dropped.

Procedure dropped.

SQL> SELECT value$ FROM sys.props$ WHERE name = 'NLS_CHARACTERSET' ;

VALUE$

--------------------------------------------------------------------------------

UTF8

Step 7. Import application object/data with preceding export backup

While you start the importing job, you’ll find the screen output as following.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit

Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Export file created by EXPORT:V10.02.01 via conventional path

import done in AL32UTF8 character set and AL16UTF16 NCHAR character set

import server uses UTF8 character set (possible charset conversion)

It’s clear that importing application object/data will automatically finish character set

conversion.

Step 8. Compare related schemas with source database

From my experience, successful importing of application object/data doesn’t mean the

conversion is done completely. To be safe, it’s highly recommended to compare all of related

schemas between source database and target database to make sure that there is no object is

missing in target database.

In completing, this is my approach to complete character set conversion to UTF8 on one of

our application database. The above steps may not apply to other database or application.
If you find it’s helpful, that’s great. If you have any concern, please refer to related

articles in Oracle Metalink.

About the Author

Rui Wang currently works as Oracle DBA in Canada. He is responsible for


database performance tuning and high availability. With over 10 years
experience in architecting and building oracle systems, Rui is an evangelist
for oracle technology and products. Rui is OCP and possess master degree in
computer science from Simon Fraser University in Canada. Rui is also an expert of
integration of EDI and database/software system with proven successful projects.

Visit Rui’s blog at http://www.oraclepoint.com/oralife


or join forum http://www.oraclepoint.com for much more resource on oracle.

Das könnte Ihnen auch gefallen