29 June 2012

What-What © Duplicate Clone using RMAN

Seriously, a duplicate clone using RMAN is about the most fun anyone can do with a database... why??  Because it's EASY. It WORKS.  It never ever goes WRONG.  So here's what to do.... step by step.  

Refreshing a test database from a production database using RMAN duplicate (UNIX SYSTEM)

by DanceRat


Verify Server name Database name and Database version on both production and test sides.

Check the TARGET (PRODUCTION) and AUXILIARY (DEV/TEST) database versions, if same, proceed with refresh.

Check mount point of  TARGET (PRODUCTION) and AUXILIARY (DEV/TEST) databases
df –k | grep <dbname> | sort

Make sure space allocation (disk) in AUXILIARY (DEV/TEST) is equal to that in TARGET (PRODUCTION)

Backup AUXILIARY (DEV/TEST) schema passwords, users, roles using datapump

Backup AUXILIARY (DEV/TEST) functions, procedures, db_links, packages (optional) using datapump

Verify any tables that need to be datapumped and saved.

Backup and remove any AUXILIARY archive logs through RMAN.

Stop management of archive logs through crontab (if that's how you manage your archive logs).

Do a full database backup through RMAN.
Notify clients and fellow DBA Brethren that you are doing work on the database via email:


I will be working on the following, please disregard warnings until renotified.

instance name @ hostname


After backup, find Media needed from Netbackup using RMAN Run command:
run {
allocate channel 'rec_01' type sbt;
set until time "to_date('03/20/2012 09:00:00','mm/dd/yyyy hh24:mi:ss')";
restore database preview recall;
release channel 'rec_01';


Turn monitoring off on AUXILIARY (DEV/TEST)

connect target /
connect catalog 
connect auxiliary 'sys/pass@<AUX_SID>'

run {
   allocate auxiliary channel 'aux_01' type sbt;
   allocate auxiliary channel 'aux_02' type sbt;
   send 'NB_ORA_CLIENT=prodhostname.corp.dom';
   set until time "to_date('02/01/2012 06:15:00','mm/dd/yyyy hh24:mi:ss')";
   duplicate target database to {AUX_sidname} nofilenamecheck;
   release channel aux_01;
   release channel aux_02;

 --comment- use the same until time timestamp info from the media query previous.


if running from AUXILIARY (TEST) DATABASE:

run {
   allocate auxiliary channel 'aux_01' type sbt;
   allocate auxiliary channel 'aux_02' type sbt;
   send 'NB_ORA_CLIENT=targethostname.corp.dom;
   set until time "to_date('02/01/2012 06:15:00','mm/dd/yyyy hh24:mi:ss')";
   duplicate target database to {AUX_sidname} nofilenamecheck;
   release channel aux_01;
   release channel aux_02;

IMPORTANT: IF YOU HAVE A WALLET, COPY THE WALLET OVER FROM PROD TO TESTFrom TEST host: scp –r oracle@hostname:/PRODSID_orabase/admin/SID/wallet//TESTSID_orabase/admin/SID/wallet


--comment-  even though the AUX db started in a nomount state, when the database was duplicated, RMAN finishes it by opening it into active mode with a new dbid.
--comment- if, for any reason, the clone is interrupted during the restore process, the whole RMAN duplicate can be started again as long as the memory script has not passed the "switch clone datafile all" stage.  In the case of an interruption, do a shutdown immediate and restart the database clone process from the beginning, i.e. startup nomount pfile=...blah blah blah


Check alert log
Check the current status of updated temp files
SELECT v.file#, t.file_name, v.status
FROM dba_temp_files t, v$tempfile v
WHERE t.file_id = v.file#;

If bad then

alter database tempfile '/<SID>_oradata/oradata/tempFILENAME.dbf' drop;
      '/<SID>_oradata/oradata/tempFILENAME.dbf'  SIZE 8388616192


create temp tablespace from script generated earlier.

select name from v$datafile;
select member from v$logfile;
select name from v$controlfile;
select name from v$tempfile;
select name from v$datafile where name like '%MISS%';
select file_name from dba_data_files where file_name like '%MISS%';
select * from v$recover_file;

Validate database directory paths - these will still have production paths

Start the listener using 

Stop and start the database using 
orastop and orastart


Once duplication is complete then crontab –e and start the manage archive log scripts again by unremming the statement. (if you use crontab to manage archive logs)


Don't forget to clean up users if not needed.

drop user <USERNAME> cascade;

Here are a couple of common errors and solutions:

Identify and copy database files:

select  name, bytes
from  (select name, bytes
      from  v$datafile
      union all
      select name, bytes
      from v$tempfile
      union all
      select lf.member "name", l.bytes
      from  v$logfile lf
      ,     v$log l
      where lf.group# = l.group#) used
      ,(select sum(bytes) as poo
      from dba_free_space) free

See how long time remaining for rman duplicate is:
select sid, serial#, context,
round(sofar/totalwork*100,2) "% Complete",
substr(to_char(sysdate,'yymmdd hh24:mi:ss'),1,15) "Time Now",
elapsed_seconds, opname
from v$session_longops;

After cloning, and when registering with rman: "ORA-01400 cannot insert NULL" - see if dbid has been changed, if not then Reset DBID

sqlplus / as sysdba
select status from v$instance;
select distinct status from v$datafile;

rman_login <sid>
connect auxiliary 'sys/pass@<SID>'

connect target 'sys/pass@<PROD_SID>'
connect catalog 
connect auxiliary /

--comment- look for the last hotbackup piece without arc logs.  RMAN automatically plays the logs when duplication occurs, but it cuts down on time.


Create simple pfile from spfile on Production (Target) and edit with conversion lines:
SQL> CREATE PFILE='/export/home/oracle/pfilenewSID.ora' from SPFILE=$ORACLE_HOME/dbs/init{sid}.ora

 Example: (comment -- Replace {AUXsid} with instance sid name)

#  Special init.ora for clone-style refresh of
#  TEST (auxiliary) from PROD (target)
db_name              = {AUX_sid_name_here}
db_block_size        = 8192
compatible           = 11.1.0
sga_max_size         = 1024M
sga_target           = 840M
log_buffer           = 1048576
control_files        = ('/{AUXsid}_oradata/control01.ctl',
diagnostic_dest      = /{AUXsid}_orabase
audit_file_dest      = /{AUXsid}_orabase/admin/adump
log_archive_dest_1   ='LOCATION=/{AUXsid}_oraarch'
log_archive_format   ={AUXsid}_log_%s_%t_%r.arc
db_file_name_convert   = ('/{PRODsid}_oradata','/{AUXsid}_oradata')
log_file_name_convert  = ('/{PRODsid}_oradata','/{AUXsid}_oradata')
#   redo/undo settings
undo_management      = auto
undo_tablespace      = undots
undo_retention       = 3600          # 900 seconds is default

Verify that password file exists on TARGET and AUXILIARY.  If password file exists ONLY on AUXILIARY, then run RMAN client from TARGET during procedure.

Notify clients and DBA Brethren that you are working on database...


I will be working on the following, please disregard warnings until renotified.
instance name @ hostname

--comment- Shutdown listeners on AUXILIARY (DEV/TEST) before databases are shutdown as a courtesy to yourself and clients.
SQL> STARTUP NOMOUNT PFILE='/<SID>_orabase/product/<version>/dbs/init<sid>.ora_clone';


21 June 2012

REDO Logs? They were IMPORTANT?

Okay so it's been like... a year....

I've been busy!!  Seriously, I've been doing DBA work like a freaking FIEND!!! I've been wrecking test and dev databases left and right, learning about partitioning in all it's glory, learning about TUNING! (that's right I know how explain plans work, y'all!) and basically y'know.  Earning a living.

So... here's my first post in a really long time.  Say one is refreshing a database and not doing it the nice safe RMAN clone way, the way Tom Kyte tells us to, but does it in a "I live on the edge and will take my Production database down a walk into the dark side, becuz that's how I ROLL".... and in the excitedness of it all accidentally wipes out NOT the who-cares-about-it DEV database redo logs, but sort of overwrites the redo logs of a QA Test database.  That is up.  And running.  And no longer has the redo logs it was just dreaming about before you so painfully shoved new redo log up into its, um, tender parts.  Somehow that window was just up! And the rest of the database files are intact, but the REDO logs are now history.

This is what you can do.  By the way, you have lost all the transactions in flight that have not been archived.  There MAY be a way to recover those out of memory but those people get paid like 1000.00 an hour and have Black American Expresses.  I am not one of them.  This is the corn bread version.  Oh, and by the way, this is why we MULTIPLEX redo logs.  FYI.


You have accidently dropped or overwritten or have otherwise corrupted redo logs.  This can happen a couple of ways; When restoring the whole database, you can accidentally restore the online redo logs, thus overwriting the current online logs with the older, useless backups. This action forces you to perform incomplete recovery instead of the intended complete recovery, thereby losing the ability to recover valuable data contained in the overwritten redo logs.​  The database can crash, or it can just give you terrible inconsistent messages.
  • ​Beginning crash recovery of 1 threads
  • Parallel recovery started with 15 processes​
  • ​ORA-00305: log 2 of thread 1 inconsistent; belongs to another database

SQL>SHUTDOWN IMMEDIATE (or abort, depending on the mood you are in);


{​Create three additional redo log file groups }

SQL>ALTER DATABASE ADD LOGFILE GROUP 4 ('/oradata/foobar/redo04.log') SIZE 150m; 

SQL>ALTER DATABASE ADD LOGFILE GROUP 5 ('/oradata/foobar/redo05.log') SIZE 150m; 

SQL>ALTER DATABASE ADD LOGFILE GROUP 6 ('/oradata/foobar/redo06.log') SIZE 150m;


{delete the corrupted or incorrect redo log files in the file system}





If this still doesn't give you the droids you are looking for then do a DBPITR and then after the RESTORE part, do this again. It will work.

30 July 2011

OMG What Do I Do? - "The database is slow!" - Part III

So, the last time I wrote an epic adventure about my databasing, it was like, three months ago. I've learned a LOT in the last three months.  More vi practicing, how to refresh a database, how to do datapump imports and exports, moving lobs... in fact I am going to move a lob today! Four of them in fact!  I'll post about that later.  What I really wanted to discuss quickly was how the smallest of changes can really F*&^ up how a database coughs up information to users.

So here's the situation. Last week I got an email, then a CALL, from one of my key clients.  "My database is running slow!!"  Oh god.  Again.  But this time, I know a little bit more about databases and slowness, and of all things, I can open Toad AND I have Embarcadero. Two tools!  That makes me like... twice as powerful, right? In fact, I also have something called Precise InDepth!! I am loaded with tools to look at things. So I have all these wonderful tools and I spend literally two days comparing guiful windows stats and numbers.  I just want to say that I was also in the middle of  a 14 day nonstop project so MAYBE I wasn't as thoughtful as I could have been.

I get in late one morning and the senior DBA who sits next to me, who is TOTALLY sweet, mentions to me, "Oh, I opened up topas on your server this morning.  You may want to look at this one process, Hedgehog, that seems to be chewing up some cpu.." then he swivels back around to play some solitaire.

All my tools and he opens up a command line window and runs one unix command and finds my issue.  Yep, turns out that Security is monitoring my database and forgot to turn full debugging off and JUST when my client was running some larger loads, it just HAPPENED to be the straw on the camels back.  Which is why the slowness issue only popped up occasionally... thresholds and all that.

Which is why it takes years to be a senior DBA.

But now I know... before you start using all your fancy tools.  Look at topas.

Like they say, "In the beginning.... was the command line..."

29 July 2011


No, this is not about how cool and refreshing a database can be, although I certainly do think they can be.

This is like one of the most important things a database administrator can do, and just like everything else we do in our jobs there are like 10 DIFFERENT WAYS TO DO IT!

First let's start with the right terminology.  I think refresh is the common vernacular, but really what most people call refresh isn't a refresh at all.  When you replace a development or a test database with a complete database from Production, it's not refreshing. It's called cloning or duplication.  After you have said clone or said duplicate, and then you refill it's yummy gooey insides with fresh yummy gooey data from Production, this is called loading data.  There really is no such technical term as refreshing. You could search all over Oracle and not find something called refresh database, at least not in the way we use it.  I wonder where it even came from...

So, first way to refresh and the easiest, I swear, is doing a duplicate from a cold backup.

OMG, you say, I can't bring my production database down for ANY amount of time, love, nor money! Whatever shall I do!!

There is a duplicate from a hot backup.  For a long time I called it a warm backup, but really, there is no such thing.  I'll do step by steps for that soon. Not now, but soon.

Oh NO, you say, I can't do THAT, my database is like 250gb!! I don't have enough room to store all that crap while I Frankenstein it into a new database incarnation!

Well, speaking of incarnations, there is something called RMAN duplication.... and you can restore from an RMAN backup.  I personally have done two Point-In-Time restores and am still amazed at the magic that goes on inside it while it's cooking away.  I'll do step by steps for that soon. Not now, but soon.

 But today, I will talk about doing a clone from a cold backup. The easiest, right?

A little background.  When I first started looking around for instructions on how to do a clone from a cold backup, I found like pages here and pages there that sort of hinted around at how to do it but no step by step instructions with explanations. Why is THAT?? I don't KNOW, but it drove me crazy.  Why don't senior magic database wizards want to share this stuff? Now, I will tell you that what I am about to write does NOT cover the stuff that can go wrong. That is why senior wizards make the big bucks, because they can figure out at the drop of their hat what's going wrong if your database doesn't come up. They've seen these arcane messages at least one point in their life and they know how to fix it. But this is like a simple plain vanilla cone. And like all things, read all the way through and perform at your own risk. I disclaim any and all production issues from anyone doing this where it kills your production database and you are in someone's cubicle explaining why you were refreshing a a database with instructions you got from some blog called Adventures in DBAsitting. Although I think my instructs are pretty good, I did this with Oracle 10.2 and AIX unix. I have no idea what you have.

Let us start.

The readers' digest condensed version, and the one I heard from senior dbas at work is this:

Shut down the old database.
Copy the database files over to the new database.
Make a new control file.
Start the new database up again.

Oh my GOD, it is so much more complicated than that.  Okay, so here we go....

First, grab a Diet Mountain Dew. Refreshing and filled with caffeine. Yum!

Verify server name, database name, and database version.
Check the source and target database versions, if same, proceed with refresh.

<kim comment - honestly, if you don't know how to check the versions just STOP right here. You are going to seriously damage something. JUST DROP THE MOUSE AND WALK AWAY>

Check mount point of source and target databases

df –k | grep <dbname> | sort

Make sure the space allocated in the target are equal to those from source for datafiles

<kim comment - on these next steps, ask your clients if they want to keep any of this stuff. I was talking to another app dba today and he said, look, if they ask for a refresh, then my job is give the database, they can put all their fancy procedures and functions.  But I'm nice that way. I think it depends on your relationship with your clients...>

Backup TARGET schema passwords, users, roles,

Backup TARGET  functions, procedures, db_links, packages

Verify any tables that need to be datapumped and saved.


alter system switch logfile;
alter system checkpoint

<kim comment – this step saves time when bringing target database up.>

Shutdown listeners on source and target before databases are shutdown

TARGET DATABASE: sign on as SYSDBA and shut the database down

sqlplus / as sysdba

shutdown immediate;


sqlplus / as sysdba

and issue:

alter database backup controlfile to trace;

This will put the create database syntax in the trace file directory. The trace keyword tells oracle to generate a script containing a create controlfile command and store it in the trace directory identified in the user_dump_dest parameter of the init.ora file (udump location).
SOURCE DATABASE: sign on as SYSDBA and shutdown the SOURCE database
<kim comment - actually, you are already signed on as sysdba, so you can really just shut it down...>

sqlplus / as sysdba
shutdown immediate;

Delete all files *.dbf, *.log, *.ctl

Copy all data files from the oradata directory on the target server from the old server. Edit the controlfile to reflect the new data file names on the new server.

Identify and copy database files:
select  name, bytes
from  (select name, bytes
      from  v$datafile
      union all
      select name, bytes
      from v$tempfile
      union all
      select lf.member "name", l.bytes
      from  v$logfile lf
      ,     v$log l
      where lf.group# = l.group#) used
      ,(select sum(bytes) as poo
      from dba_free_space) free

DO not copy control files across. 

Verify size of datafiles copied (in source and target oradata directories)
ls -ltr /SID_oradata/oradata/* |awk '{s+=$5}; END {print s/1024/1024/1024}'

After datafiles are copied, make sure there are no control files in the oradata directory.
GET THAT? NO CONTROL FILES *.ctl on the new target system!

SOURCE DATABASE: Startup the source database again and notify clients that your source is back up.

TARGET DATABASE: Copy and Edit the Control file

Copy whichever new controlfile.sql you want to use into the target /sid_oradata/oradata directory fresh from trace. It should be in the udump directory along with all your other .trc files.
The file needs major editing before it can be used. Use vi! It's your friend!
Remove all lines from the top of the file up to but not including the second 'STARTUP MOUNT' line (it's about halfway down the file).
Remove any lines that start with --
Remove any lines that start with a #
Remove any blank lines in the 'CREATE CONTROLFILE' section.

Remove the line ‘ALTER DATABASE OPEN’
Move to the top of the file to the 'CREATE CONTROLFILE' line. The word 'REUSE' needs to be changed to 'SET'. The database name needs setting to the new database name (if it is being changed). Decide whether the database will be put into archivelog mode or not. Change the line accordingly.  
If the file paths are being changed, alter the file to reflect the changes.

Example: copying oldval to newval:


It will look something like this AFTER you do edits:
  •     MAXLOGFILES 50
  •     MAXDATAFILES 100
  •   GROUP 1 '/u02/oradata/newval/redo01.log'  SIZE 100M,
  •   GROUP 2 '/u02/oradata/newval/redo02.log'  SIZE 100M,
  •   GROUP 3 '/u02/oradata/newval/redo03.log'  SIZE 100M
  •   '/u02/oradata/newval/system01.dbf',
  •   '/u02/oradata/newval/undotbs01.dbf',
  •   '/u02/oradata/newval/cwmlite01.dbf',
  •   '/u02/oradata/newval/drsys01.dbf',
  •   '/u02/oradata/newval/example01.dbf',
  •   '/u02/oradata/newval/indx01.dbf',
  •   '/u02/oradata/newval/odm01.dbf',
  •   '/u02/oradata/newval/tools01.dbf',
  •   '/u02/oradata/newval/users01.dbf',
  •   '/u02/oradata/newval/db01.dbf',
  •   '/u02/oradata/newval/BIGONE01.dbf',
  •   '/u02/oradata/newval/STATS01.dbf',
  •   '/u02/oradata/newval/somerandomname01.dbf'
  • ;
    <kim comment - the sql actually runs up until the ; above this sentence - you will still need to change the tempfiles before you really open the database, just fyi> copy and paste the alter tablespace temp stuff that is in YOUR controlfile that you created after you initially open the database.>

  • ALTER TABLESPACE TEMP ADD TEMPFILE '/u02/oradata/newval/temp01.dbf'
  •      SIZE 104857600  REUSE AUTOEXTEND OFF;
 When editing, make sure to replace empty lines and comments.

TARGET DATABASE: Start the new database with the newly edited create_controlfile.sql

sqlplus / as sysdba
<kim comment – note the startup nomount in the control file>

Rerun control script where it creates tempfiles at the bottom as mentioned previously two sentences up.

shutdown immediate;
startup mount
alter database open resetlogs;


shutdown abort;
startup nomount;
recover database using backup controlfile;
cancel (when error comes back)
alter database open resetlogs;

Then for god's sake change the database ID!


nid TARGET=/

Do you want to change the database ID - YES YOU DO!


sqlplus / as sysdba

Tell the database it's pretty. Databases like that.

select status from v$instance;
select distinct status from v$datafile;

check alert log

archive log list

Validate tempfiles
Check the current status of updated temp files

SELECT v.file#, t.file_name, v.status
FROM dba_temp_files t, v$tempfile v
WHERE t.file_id = v.file#;

If bad then
alter database tempfile '/bkrd_oradata/oradata/temp03.dbf' drop;
      '/bkrd_oradata/oradata/temp03.dbf' SIZE 8388616192 


create temp tablespace from script generated earlier.

Update the global name to target database

select * from global_name;
update global_name set global_name=’<targetdb>’;

select name from v$datafile;
select member from v$logfile;
select name from v$controlfile;
select name from v$tempfile;
select name from v$datafile where name like ‘%MISS%’;
select file_name from dba_data_files where file_name like ‘%MISS%’;
select * from v$recover_file;

Validate database directory paths - these are the directory paths in your database itself. If the paths have different file path structures they do not magically change. You have to alter them. FYI.

start the listener

stop and start the database.

I hope that if you are using RMAN that you have an AMAZING senior DBA who writes AMAZING scripts so all you have to do is type something like:

rman_it.ksh <dbid>

Then ice up the cake with your saved users and things and you are good to go!

19 April 2011


Okay, so here's the dealio.  I am most likely, barring an unlikely event, to be taking over a Senior Application DBA's role soon.  This is not to say that I am going to be a Senior Application DBA, this particular crowning achievement will be mine not for many many years.  Maybe 3 or 4.  But for now, I have seven, count them, SEVEN days to glean all the information he has gathered in his brain for the last 30 years. 15 of those years in this particular set of applications.  Sometimes the thought overwhelms me, and I think of the ancient Japanese ritual of sepukku if I accidentally kill a database.

So I have seven days.

What do we do? Imports/Exports.  Tuning.  Partition maintenance.  Architecture advice. Moving databases.  Working with upgrades.  Running scripts in Production.  Reports.

And this is just the stuff I know about.

I'm sure it will be JUST FINE.  In the meantime, I'm getting over a horrible cold and swigging down Sarah Silverman orange Cold/Flu relief cough syrup.


07 March 2011

How Do I Do That? - Installing a pks, pkb file

"S" before "B" when using the "P".

Just when I was feeling a little more secure in tidbits of knowledge, I got a request from a client today.

From: Client
To: Kim

Please run a simple SQL statement in Production.

"Okay", I thought, "Here goes!".  It was easy! I ran it! No fear!  Then I got another request two minutes later.

From: Client
To: Kim

 Please run these two attachments.

[attachments: ThingOne.pkb; ThingOne.pks]


What were THEY?  What was I supposed to DO with them?  Was I supposed to RUN them? Was I supposed to import them into a schema?  I have never seen things with these extensions before!  Yah, again, people -  wonder at my ignorance.  It's pretty vast.

Okay, Google is my friend.  You know what?  pks and pkb files are all over the place in the Google, but they don't tell you what to do with them. They just tell you bugs and issues. Thank GOD I have Sr. DBA R.

And Instant Messenger

I open up Instant Messenger and start typing...

Me:  Um, have you ever seen a pks or pkb file?

Sr. DBA R: Yes.

Me:  Um, do you... run them?

(pause of Instant Messenger silence)

Sr. DBA R.: Yes.
(kim comment - he messenges like this. Very brief. To the point. He probably knows I'm trying to compose a message that sounds somewhat knowledgeable and this entertains him.)

Me: Um... how?

Sr. DBA R.: Open them in editor in Toad and then run them.

This is where I am busily scrounging in my mind to figure out how to do this, I mean, yes, technically, I understand what he means, but I am still sort of new to the little green frog.  I've been using this tool called Embarcadero and it seems easier, but none of the cool DBAs use, and I want to be cool.  Desperately, I want to be cool.  I want to use the Toad.

Me:  So... how do I do that in Toad? (oh, please come over please please please)

(Two moments of Instant Messenger silence)

Sr. DBA R shows up at my cubicle.  He was probably thinking oh my god, she's in Production maybe it's worth the 1 minute I spend with her in her cube to head off the 3 hours of RMAN recovery in case she wipes out the database.  Or maybe not.  I project a lot of my insecurities.

He very kindly shows me how to open editor, open my files and VOILA! There it is!  He cautions me to run pkS, then pkB.  And I run it! And it's against the wrong schema.  Whereupon he kindly shows me select name from v$database, which I have now consigned to making sure I always run before I do anything else ever in my life, ever.

And then I run them again.  They run great!  And ask, "Do I do commit?" and he wanders off saying, "Nope, it's ddl".  (which I KNEW didn't require commit!! Dammit!! But I forgot.)

And now I know what to do with pkS and pkB files.

04 March 2011

OMG What Do I Do? - "The database is slow!" - Part II

Okay, when we last left our heroine, she was bravely dodging the arrows of slow databasity.  Well, here's the thing.  When someone says their database is slow, what does this mean?  I don't mean what's causing it, I mean what is the client really trying to say?  I have complained of what I have called incorrectly a "slow database".  My last job, and in some ways my part-time current job, is Senior Administrator of our enterprise scheduling system.  This system is Applications Manager by UC4, (formerly AppWorx) and I have gone through a couple different application upgrades and two database upgrades.  And I have complained about a slow "database".  My personal experience when *I* was a client went as follows:

Me:  "Oh my god! My database!  It's just... creeping!" (note: complete tone of desperation in my voice. also note: this scheduling system, of which I am/was the Senior Administrator, runs ALL of our billing batch jobs, runs ALL of our payroll, including the top management, runs ALL of our financials, and well, pretty much runs everything in our company.  Normally, I block this all out of my day to day thoughts which helps me cope with all the awesome and scary responsibility.)

Sr. DBA DE:  "Hmm.  When did this start? (As he types on something completely unrelated in the background.)

Me: "Um, when? Um, it just FLIPPED, it was running okay and now everything is going really really slow, like 10 times as slow as it normally does!!"

Sr. DBA DE: "Is the database slow or is your application slow?"

(kim comment - See, this is what DBAs do.  Blame the application.  Typical comment:  It's not the database.  It's the application.  The database is perfectly healthy! All the data is there, the listener is up and running, the archive logs are shuffling off as planned, there is no heavy I/O or cpu, and nothing is crashed. So therefore the database is FINE, except of course the frickin' phone is ringing off the hook because everything that actually touches the database, YES, application wise, is going 100x slower than normal.)

Me:  Trying not to have the panic show in my voice, I say: "Okay, you're right, it's the application, the database is fine.  Can't you put some indexes in or something? Or run statistics or something?"

Although this sounded perfectly reasonable to me, my Sr. DBA snorts.

Me: "Okay, here's the situation, I can make it run at normal speed.  I can stop the application stack, truncate one table, delete all the data out of another table, start it again and then it runs at normal speed.  UNTIL the truncated table starts filling up the table again.  So it's recreatable!"

Sr. DBA DE:  Swivels around his chair, ears all perked up and suddenly shows interest.  "Really?  Well, then we can probably do something that could fix this... have you ever heard of OEM?"

Me: "No. What does it do?"

Sr. DBA DE: "Magical stuff.  Basically, if we can find that sql statement that is causing us trouble, we can lock it into the good twin behavior when it is playing nice and keep it running fast.  Maybe.  No promises.  But maybe, if you can recreate it.  But really, no promises."

(kim comment - there is an art to giving a client hope but not actually promising anything. A skill I need to work on.)

Me: "Okay, let's do it!  I can totally recreate it!"

I use words like totally a lot because I grew up on the East Coast on the beach.  It's sort of mid 80's Valley Talk.  Some people find it endearing.

So then I found out about OEM and magical things like SQL Advisor, which really is magical.  Of course, it took some time to get access hooked up and in the meantime I did a lot of truncating and bouncing.  But I watched my now Hero Sr DBA DE trap the offending sql statement, tune it by locking it in as its good twin so NO MATTER WHAT, it ALWAYS runs this way!  And just like that my application loved my database again, and they danced together as fast as they were supposed to.

Now wouldn't it be great if I could do that with this database that my clients were meeping about?  Well, a couple of things.  We don't have everything in OEM yet, and of course this one particular database is not in OEM.  So that tool is ruled right out.  It will all be manual and eyeballing.  But here are some magical questions that I followed up with.  I think, that in any troubleshooting scenarios there are a couple of basic fundamentals that will work when trying to figure out what's going on.

  • Journalistic Questions.
  • Sherlock Holmes Most Famous Quotation.
Remember journalism in middle school?  There were basic questions that were taught in that class that have helped me troubleshoot practically everything.

Who is this affecting? Is it affecting everyone? Are you sure? Have you tried it with different permissions?

What is it affecting? Is it doing the same thing on every query? What is it doing exactly? I need times, dates, milliseconds. What is your definition of "it's taking forever"? I mean, forever is a long time.

Is it doing this everywhere or only on one desktop or one floor?

When did it start happening exactly? Does it happen all the time? Is it recreatable?

You know, the client is the closest thing to their application other than the database.  Ask them what their opinion is as to why it's happening.  They have feelings towards their app.  They may have seen it before. Clients have actually pretty good intuition.  And besides which, it makes them feel valued.

How did you discover it? How does it make you feel?  Okay the last question is kinda touchy feely, but I think empathy is something all database administrators should cultivate. I mean, you can go to the best doctor in the world, but there is something lacking if, when you are really sick, he doesn't hold your hand and murmurs, "There, there...we will try and make it better..."

And then Sherlock Holmes Most Famous Quotation.

When you have eliminated the impossible, whatever remains, however improbable, must be the truth.

So using these basic tools, I found out the following:

The whole application is fine, except for two update statements that occur every day at about 1:30 p.m.  These are the only things that are slow. I also learned that this didn't start happening last week, this started happening a couple of weeks ago. Interesting!  You know why?  We have something called Veritas Clustering for high availability databases (databases that have to be up all the time).  To analogize, it's something like having a duplex where there is apartment A on the left side and apartment B on the right side. Usually, each database has their own duplex, own fridge, own bathroom and their individual space and privacy.  Which is important to databases. They like being by themselves as much as possible.  But sometimes, something happens to one side of the duplex that forces that database to temporarily move over to the same side that the other database is living in.  Not ideal, but good in a pinch.  But just like any crunched living situation, it can get a little crowded.

Guess what happened a couple of weeks right when the database started getting slow?  Yup, one database moved in with another and never moved back.  Could this be the problem?  Maybe. No-one likes sharing the bathroom.  Even if it doesn't fix the slowness issue, we can't really start analyzing the situation in great depth until we get the databases back to their respective homes.  So that's happening on Monday.  So everybody, cross your fingers for me!!

In the meantime give good thoughts to all the DBAs in Egypt, Tunisia, and Libya.  They exist and are going through a lot right now, good and bad.  My thoughts and well wishes to them all.