Thursday, May 16, 2019

Pre and Post steps for platinum oracle Exadata patching.

Pre and Post steps for Exadata patching.

Note: The actual patch apply will be conducted by Oracle engineer we make
environment ready for patching and then ready for customer after patching for
Exadata in my case.

Pre-Patching steps

1. Create Blackout for host being patched.

2. Check if we have enough repair time" and re-balance:
login as root
su - oracle
menu -- choose 1 ASM
sqlplus / as sysasm  -- be careful login as sysasm NOT sysdba

​​set linesize 100
column name format a30
column value format a30
SELECT group_number, name, value FROM v$asm_attribute
where name = 'disk_repair_time' ORDER BY group_number, name;
If the output showing hours too low or less than 12h , change it:
​ALTER DISKGROUP DBFS_DG SET ATTRIBUTE 'disk_repair_time' = '12h';
ALTER DISKGROUP DATA_MAEX SET ATTRIBUTE 'disk_repair_time' = '12h';
ALTER DISKGROUP RECO_MAEX SET ATTRIBUTE 'disk_repair_time' = '12h';
ALTER DISKGROUP DATA2_MAEX SET ATTRIBUTE 'disk_repair_time' = '12h';

Re-execute select to make sure it changed:   
-- this is important if we don't have enough repair time, ASM will be
dropped and recreated and that will take long time.       
2. Save a snapshot of file system on all nodes as root user, you will compare
this for post steps.
​   login as root on each node
   df -h > node1filesystem -- do this for all nodes​
3. Save a snapshot of database services distribution:

​​srvctl status service -d db_name for all running database on the server.
5. This will be done on the day of patching. Stop the omsagent.
Depending on the directory in use for your environment omsagent stop agent:
ps -ef | grep agent

and see where its running usually as below but depends where your environment was configured /u01/app/oracle/agent12c/agent_inst/bin

In my case:
/tools/agents/oracle/occ/agent12c/agent_inst/bin
./emctl  status agent
 ./emctl  stop agent
6. Rebalance won't prevent oracle from patching, but sometimes they will ask.    
select * from gv$asm_operation;
7. Save the cluster health.   
login as root
. oraenv
+ASM1
crsctl check cluster > check_cluster
crsctl stat res -t > cluster_res_t
srvctl status nodeapps
srvctl status database -d  -- for each database to see if any one did not come
back online

Note: you will compare the cluster health after patching using this saved
output.
 
8.  At this time you are all done notify oracle by email and update the SR that
Exadata
is ready for them to patch.
==================================================================
Post patching steps: Oracle came back and confirmed patching is done.
(oracle will notify by email once they are done with patching)
1. Check cell disks:
log to test1.com as root
​#dcli -l root -g cell_group "cellcli -e list griddisk attributes name, asmmodestatus, asmdeactivationoutcome| grep -v OFFLINE"

​#dcli -l root -g cell_group "cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome| grep -v OFFLINE"
   -- Make sure all online and nothing offline, else bring griddisks online
if you dont know how then notify engineer who patched and he can help.
2. Check cluster
login as root
. oraenv
+ASM1
crsctl check cluster
crsctl stat res -t
srvctl status nodeapps
srvctl status database -d  -- for each database to see if any one did not come back online

-- file systems will not be fully mounted till DBFSB database is up and online, so
check DBFSB database first in my case. Check if you have the same database.

a-
srvctl status database -d DBFSB
srvctl status database -d DBFSB

b- start the dbfs resource on crs.

crsctl start resource dbfs_mount  ---(This is in Prod env i have
crsctl start resource mount_dbfs    ---(This is in CT env i have)

if not you might need to mount it.
[oracle@test1 ~]$ fusermount -u -z /dbfs

fusermount: entry for /dbfs not found in /etc/mtab

--> Before performing unmount need to clear all the directories and its contents residing under dbfs mount point or else you may encounter above error.

then reissue ,
--> umounted the file system using the following command

fusermount -u -z /dbfs

--> start dbfs using crsctl
crsctl start resource dbfs_mount  ---(This is in Prod env)
crsctl start resource mount_dbfs    ---(This is in CT env)


c- Check if every file system properly mounted with previous saved output in pre steps.

Make sure file system comes back on all nodes as root.
df -h and compare with root home directory saved log for prior patching df -h file system.
IF needed and missing any mount it. ​
4.  Relocate services using the output saved before patching started.  Make sure
you in the right directory.

5. Check if standy databases have to be mounted then check status they are mounted.
If they are read only ones then open in read only.
For mount:
srvctl config database -d Database_name ==>> make sure its set to standby mount for mount databases:

srvctl status database -d Database_name
srvctl start database -d Database_name
sqlplus / as sysdba
select status from gv$instance:

for read only open:
srvctl stop database -d Database_name
srvctl start database -d Database_name -o 'read only'
6. Check data guard to see if any errors in configuration and log shipping and applying
looks good.
set environment
dgmgrl sys/password
show configuration
enable database : #if needed
enable configuration; #if needed

In case Apply is not ON:
make sure intended state is online and not offline else bring it online. 
edit database 'dbname' set state='ONLINE';
it should show Intended State:  APPLY-ON

In case MRP did not work:

To stop MRP:
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL

To start MRP: 
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT; 

To restart MRP:
alter database recover managed standby database disconnect from session;


If still getting errors test connectivity :
Test Netservice between primary and standby vice versa
i. From Primary ( all the nodes if it is RAC)

sqlplus sys/<password>@<standby_db_service> as sysdba
select name, open_mode, host_name from v$database,v$instance;

sqlplus sys/<password>@<Primary_db_service> as sysdba
select name, open_mode, host_name from v$database,v$instance;

ii. From Standby ( all the nodes if it is RAC)

sqlplus sys/<password>@<Primary_db_service> as sysdba
select name, open_mode, host_name from v$database,v$instance;

sqlplus sys/<password>@<standby_db_service> as sysdba
select name, open_mode, host_name from v$database,v$instance;
7. Check standby databases to see if MRP is running and logs are applying
​ sqlplus / as sysdba
select inst_id, process, status from gv$managed_standby where process='MRP0';
select process "StdbyProcess", client_process "PrimaryProcess", status, thread#, sequence# from gv$managed_standby where process='MRP0';
Make sure to see below.
StdbyProc PrimaryP STATUS       THREAD#  SEQUENCE#
--------- -------- ------------ ---------- ----------
RFS    LGWR IDLE               6   67958
RFS    LGWR IDLE               8   92721
RFS    LGWR IDLE               1   69432
RFS    LGWR IDLE               7   94216
RFS    ARCH IDLE               0       0
RFS    ARCH IDLE               0       0
RFS    ARCH IDLE               0       0
MRP0   N/A   APPLYING_LOG       4 161541

DGMGRL> show database test
Database - test
  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   31 minutes 26 seconds
  Apply Lag:       20 hours 37 minutes 17 seconds
  Real Time Query: OFF
  Instance(s):
    test1 (apply instance)
    test2
  
Database Status:
SUCCESS

8. Check if there is rebalance going on; if it is notify apps; performance may be
impacted:
su - oracle
menu -- asm
sqlplus / as sysdba
  select * from gv$asm_operation;

9. This will be done on the day of patching. Start the omsagent.
Depending on the directory in use for environement start agent:
cd ​/u01/tools/agents/oracle/occ/agent12c/agent_inst/bin
./emctl  status agent
./emctl  start agent
10 . Verify Encryption Wallet is open
    
If encrypted databases are there, and there is a bug, if we do not do the following select,
database will not open for public:
   
sqlplus / as sysdba
set linesize 100
column WRL_PARAMETER format a40
select * from gv$encryption_wallet;

11.  There are several files that need to be setup or possibly got changed during a patch upgrade
a.  verify $ORACLE_HOME/rdbms/admin/externaljob.ora contains the following lines in the script:
run_user = oracle
run_group = dba
and has the following file permissions (must be owned by root and only root can write):

-rw-r----- 1 root dba 1531 Mar 26 20:06 /u01/app/oracle/product/11.2.0.4/dbhome_1/rdbms/admin/externaljob.ora

b.  verify $ORACLE_HOME/bin file is owned by root and has setuid bit set
-rwsr-x--- 1 root dba 1254458 Mar 4 17:40 /u01/app/oracle/product/11.2.0.4/dbhome_1/bin/extjob
to set permissions as user root:
chown root extjob
chmod 4750 extjob
12. DOS2UNIX verify its installed if not then install it.

run below as root on all nodes.
yum install dos2unix
enter option y
and perform this on other nodes.


13. Notify system support to bring up the apps. We are all done here with our post verification

No comments:

Post a Comment