Monday, February 4, 2019

How to shut down or reboot an Exadata storage cell without affecting ASM

1. Make sure repair time is at least 8.5 hours. Connect to ASM instance as oracle or grid and run the below command.

 select dg.name,a.value from v$asm_diskgroup
dg, v$asm_attribute a where dg.group_number=a.group_number and
a.name='disk_repair_time';

2. If the above query returns less than 8.5 hours.

ALTER DISKGROUP DATA SET ATTRIBUTE 'DISK_REPAIR_TIME'='8.5H';

3. Next you will need to check if ASM will be OK if the grid disks go OFFLINE. The following command should return 'Yes' for the grid disks being listed:

cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

4. Run cellcli command to Inactivate all grid disks on the cell you wish to power down/reboot:

cellcli -e alter griddisk all inactive

5. Confirm that the griddisks are now offline by performing the following actions:

(a) Execute the command below and the output should show either asmmodestatus=OFFLINE or asmmodestatus=UNUSED and asmdeactivationoutcome=Yes for all griddisks once the disks are offline in ASM. Only then is it safe to proceed with shutting down or restarting the cell:

cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

(b) List the griddisks to confirm all now show inactive:

cellcli -e list griddisk


6) You can now shutdown or reboot the cell depending on your need. Oracle Exadata Storage Servers are powered off and rebooted using the Linux shutdown command.

(a) The following command will shut down Oracle Exadata Storage Server immediately: (as root):


#shutdown -h now

(When powering off Oracle Exadata Storage Servers, all storage services are automatically stopped.)

(b) The following command will reboot Oracle Exadata Storage Server immediately and force fsck on reboot:


#shutdown -F -r now

========================================================
After the cell comes back online -

7.  you will need to reactive the grid disks:

cellcli -e alter griddisk all active

8. Issue the command below and all disks should show 'active':

cellcli -e list griddisk

9.  Verify grid disk status:

(a) Verify all grid disks have been successfully put online using the following command:

cellcli -e list griddisk attributes name, asmmodestatus

That is it you have successfully shutdown or rebooted the server without affecting ASM.
========================================================

Below steps are example to understand what happens after sync is in progress and balancing is taking place. 

(b) Wait until asmmodestatus is ONLINE for all grid disks. Each disk will go to a 'SYNCING' state first then 'ONLINE'. The following is an example of the output:

DATA_CD_00_dm01cel01 ONLINE
DATA_CD_01_dm01cel01 SYNCING
DATA_CD_02_dm01cel01 OFFLINE
DATA_CD_03_dm01cel01 OFFLINE
DATA_CD_04_dm01cel01 OFFLINE
DATA_CD_05_dm01cel01 OFFLINE
DATA_CD_06_dm01cel01 OFFLINE
DATA_CD_07_dm01cel01 OFFLINE
DATA_CD_08_dm01cel01 OFFLINE
DATA_CD_09_dm01cel01 OFFLINE
DATA_CD_10_dm01cel01 OFFLINE
DATA_CD_11_dm01cel01 OFFLINE

(c) Oracle ASM synchronization is only complete when all grid disks show asmmodestatus=ONLINE.
     ASM rebalance phase will start automatically after disk resync phase completes if using GI version 12.1 or higher.
( Please note:  this operation uses Fast Mirror Resync operation - which does not trigger an ASM rebalance if using GI version < 12.1. The Resync operation restores only the extents that would have been written while the disk was offline.)

10) Before taking another storage server offline, Oracle ASM synchronization must complete on the restarted Oracle Exadata Storage Server. If synchronization is not complete, then the check performed on another storage server will fail. The following is an example of the output:

type cellcli on storage cell as root and you can get in cellcli command line or use below command.

CellCLI> list griddisk attributes name where asmdeactivationoutcome != 'Yes'

DATA_CD_00_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_01_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_02_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_03_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_04_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_05_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_06_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_07_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_08_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_09_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_10_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_11_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"

All should return yes.
cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

The above output is says we can not deactivate like we did in step 3 where all disks were okay and returned Yes its okay to proceed. So we wait until they are all online and then this command will tell us we are okay to proceed on another server for a reboot or shutdown.

Some times there is an issue with cell flash cache and we have to drop and create the cell flash cache on Exadata. 

How To drop create exadata cell flash cache?

Hope this helps!

No comments:

Post a Comment