Monday, June 3, 2019

Shutting down and Restart steps for Exadata machine for maintenance


Powering Off Oracle Exadata Rack

Before powering off, you may need to do following:

1. save a snapshot of file system on all nodes:
       login as root on each node
       df -h > node1filesystem .......
2. save a snapshot of db services distribution:
       login as oracle
       srvctl status service -d db_name

Poweroff steps:

1. shutdown databases for safty sake;
    login test01.com as root and su - oracle
    menu to set env
    srvctl stop database -d db_name  -- for all databases (just for PX, no worries about DR )

2. shutdown crs for whole cluster;
     login test01.comas root
     . oraenv
     +ASM1
     crsctl stop cluster -all

3.Shut down all remote database servers using the following command:

     # dcli -l root -g remote_dbs_group "shutdown -h now"

4.Shut down all Exadata Storage Servers using the following command:
  
     #dcli -l root -g cell_group "cellcli -e alter griddisk all inactive"
     #dcli -l root -g cell_group "cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome|grep -v OFFLINE"
        -- you may need to run this command couple of time wih few minutes interval till no output
        -- and then it is safe to proceed with shutdown
     # dcli -l root -g cell_group "shutdown -h now"

5.Shut down the local database server using the following command:

    #shutdown -h now

    Now it is safe to unplug power cable.

Reference:
https://docs.oracle.com/cd/E50790_01/doc/doc.121/e51951/general.htm#CHDGJICE
  Example 1-1 Powering Off Oracle Exadata Rack Using the dcli Utility 
  
Give 30 minutes after it powered back on and then start checking:
1. cell server
    login as root
    #dcli -l root -g cell_group "cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome|grep -v ONLINE"
       -- this is a long list, status should either be ONLINE or SYNC; other status indicate cell issues.
           it may take few hours for cells to sync, so we can leave it there and proceed to check other items
2. check if every file system properly mounted with previous snapshot
        -- dbfs database needs to be online before all file systems mounted
        -- usually check first node is enough, but better to check all nodes
3. cluster
    login as root
    . oraenv
    +ASM1
    crsctl check cluster
    srvctl status nodeapps
    srvctl status database -d  -- for each database to see if any one did not come back online
4. check Db service distribution with previous snapshot to see if any relate needed
5. check if CFAEPS comes up with read only mode
6. check data guard to see if any lags are catching up and logs are applying -- dgmgrl -- show configuration
7. check if there is rebalance going on; if it is notify apps; performance may be impacted:
    su - oracle
    menu -- 1
    sqlplus / as sysdba
      select * from gv$asm_operation; 

No comments:

Post a Comment