Friday, September 24, 2010

Restoring OCR & Vote disk due to ASM disk failures - 3

There are few changes to the setup that was used to test various ASM disk failure scenarios and restoring OCR and Vote disks. The main change is, setup used is no longer the RAC setup upgraded from 10gR2 to 11gR1 and then to 11gR2
It's a new installation of 11gR2 grid infrastructure (only GI was installed no RAC database was created. Installing GI was sufficient to test this scenario) and therefore the ASM spfile is also inside the same diskgroup as clusterware files. Metalink note 1082943.1 explains how to move it to another diskgroup. If not a pfile from the spfile should be created to re-create the spfile after the ocr and vote disk restore.

Scenario 3.

1. Both OCR and Vote disks are in ASM diskgroup
2. ASM diskgroup has normal redundancy with only three failure groups
3. All failure groups are affected
4. ASM Spfile is also located in the same diskgroup where clusterware files are located.

1. Current OCR, vote disk and ASM Spfile configuration
# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2272
Available space (kbytes) : 259848
ID : 1242190491
Device/File Name : +clusterdg
Device/File integrity check succeeded
Device/File not configured

Cluster registry integrity check succeeded
Logical corruption check succeeded


$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 7d49533611734f3bbf404d32f1759ed5 (ORCL:CLUS1) [CLUSTERDG]
2. ONLINE 4a8c288d1ade4f8cbf6588c145b27489 (ORCL:CLUS2) [CLUSTERDG]
3. ONLINE ad241f9823cd4fb9bf3412ca67e591df (ORCL:CLUS3) [CLUSTERDG]
Located 3 voting disk(s).

SQL> show parameter spfile

NAME TYPE VALUE
----------- ---------- -----
spfile string +CLUSTERDG/hpc-cluster/asmparameterfile/registry.253.730565167

srvctl config asm -a
ASM home: /opt/app/11.2.0/grid
ASM listener: LISTENER
ASM is enabled.
2. Identify the disks beloging to the ASM diskgroup using oracleasm query -p and corrupt them to simulate disk failure
# dd if=/dev/zero of=/dev/sdc10 count=204800 bs=8192
204800+0 records in
204800+0 records out
1677721600 bytes (1.7 GB) copied, 1.61087 seconds, 1.0 GB/s

# dd if=/dev/zero of=/dev/sdc3 count=204800 bs=8192
204800+0 records in
204800+0 records out
1677721600 bytes (1.7 GB) copied, 1.6883 seconds, 994 MB/s

# dd if=/dev/zero of=/dev/sdc2 count=204800 bs=8192
204800+0 records in
204800+0 records out
1677721600 bytes (1.7 GB) copied, 1.74684 seconds, 960 MB/s
3. ocssd.log will show the detection of vote disk corruption
2010-09-24 15:46:02.893: [    CSSD][1136630080]clssgmDestroyProc: cleaning up proc(0x2aaab02131b0) con(0x8db0) skgpid  ospid 12671 with 0 clients, refcount 0
2010-09-24 15:46:02.893: [ CSSD][1136630080]clssgmDiscEndpcl: gipcDestroy 0x8db0
2010-09-24 15:46:03.009: [ CSSD][1231038784]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS1)
2010-09-24 15:46:03.009: [ CSSD][1231038784]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now offline
2010-09-24 15:46:03.584: [ CLSF][1241528640]Closing handle:0x2aaab008cbe0
...
2010-09-24 15:46:03.584: [ SKGFD][1241528640]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab0197570 for disk :ORCL:CLUS1:
2010-09-24 15:46:13.269: [ CSSD][1168099648]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS3)
2010-09-24 15:46:13.269: [ CSSD][1168099648]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS3 now offline
2010-09-24 15:46:13.431: [ CLSF][1157609792]Closing handle:0x2aaab006ac20
2010-09-24 15:46:13.431: [ SKGFD][1157609792]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x2aaab0062830 for disk :ORCL:CLUS3:
Querying vote disks frequently will enable to spot the state change
$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. PENDOFFL 7d49533611734f3bbf404d32f1759ed5 (ORCL:CLUS1) [CLUSTERDG]
2. ONLINE 4a8c288d1ade4f8cbf6588c145b27489 (ORCL:CLUS2) [CLUSTERDG]
3. ONLINE ad241f9823cd4fb9bf3412ca67e591df (ORCL:CLUS3) [CLUSTERDG]
Located 3 voting disk(s).

$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 7d49533611734f3bbf404d32f1759ed5 (ORCL:CLUS1) [CLUSTERDG]
2. ONLINE 4a8c288d1ade4f8cbf6588c145b27489 (ORCL:CLUS2) [CLUSTERDG]
3. ONLINE ad241f9823cd4fb9bf3412ca67e591df (ORCL:CLUS3) [CLUSTERDG]
Located 3 voting disk(s).
crs_stat (deprecated in 11gR2) shows cluster applications are running but ocrcheck fails
crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....ERDG.dg ora....up.type ONLINE ONLINE hpc1
ora....ER.lsnr ora....er.type ONLINE ONLINE hpc1
ora....N1.lsnr ora....er.type OFFLINE OFFLINE
ora.asm ora.asm.type ONLINE ONLINE hpc1
ora.eons ora.eons.type ONLINE ONLINE hpc1
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....SM1.asm application ONLINE ONLINE hpc1
ora....C1.lsnr application ONLINE ONLINE hpc1
ora.hpc1.gsd application OFFLINE OFFLINE
ora.hpc1.ons application ONLINE OFFLINE
ora.hpc1.vip ora....t1.type ONLINE ONLINE hpc1
ora....network ora....rk.type ONLINE ONLINE hpc1
ora.oc4j ora.oc4j.type OFFLINE OFFLINE
ora.ons ora.ons.type ONLINE OFFLINE
ora....ry.acfs ora....fs.type ONLINE ONLINE hpc1
ora.scan1.vip ora....ip.type OFFLINE OFFLINE

ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage
4. Stop the crs on all nodes and start crs in exclusive mode in one node. Manual shutdown of ASM instance and database instance might be required if the stop command is unable complete these operations.
crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'hpc1'
CRS-2673: Attempting to stop 'ora.crsd' on 'hpc1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'hpc1'
CRS-2673: Attempting to stop 'ora.CLUSTERDG.dg' on 'hpc1'
CRS-2673: Attempting to stop 'ora.registry.acfs' on 'hpc1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'hpc1'
CRS-2677: Stop of 'ora.registry.acfs' on 'hpc1' succeeded
CRS-4549: Unexpected disconnect while executing shutdown request.
CRS-2677: Stop of 'ora.crsd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'hpc1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.evmd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.asm' on 'hpc1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'hpc1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.asm' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hpc1'
CRS-2677: Stop of 'ora.cssd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'hpc1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'hpc1'
CRS-2677: Stop of 'ora.gipcd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'hpc1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'hpc1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

crsctl start crs -excl
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.gipcd' on 'hpc1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'hpc1'
CRS-2676: Start of 'ora.gipcd' on 'hpc1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'hpc1'
CRS-2676: Start of 'ora.gpnpd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'hpc1'
CRS-2676: Start of 'ora.cssdmonitor' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'hpc1'
CRS-2679: Attempting to clean 'ora.diskmon' on 'hpc1'
CRS-2681: Clean of 'ora.diskmon' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.diskmon' on 'hpc1'
CRS-2676: Start of 'ora.diskmon' on 'hpc1' succeeded
CRS-2676: Start of 'ora.cssd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'hpc1'
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'hpc1'
CRS-2676: Start of 'ora.drivers.acfs' on 'hpc1' succeeded
CRS-2676: Start of 'ora.ctssd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'hpc1'
CRS-2676: Start of 'ora.asm' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'hpc1'
CRS-2676: Start of 'ora.crsd' on 'hpc1' succeeded
Various log files will show the status of vote disks and ocr disks. ocssd.log
2010-09-24 15:53:43.643: [    CSSD][1147734336]clssnmvDiskVerify: Successful discovery of 0 disks
2010-09-24 15:53:43.643: [ CSSD][1147734336]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2010-09-24 15:53:43.643: [ CSSD][1147734336]clssnmvFindInitialConfigs: No voting files found
2010-09-24 15:53:43.644: [ CSSD][1147734336]clssnmCompleteVFDiscovery: Completing voting file discovery
2010-09-24 15:53:43.644: [ CSSD][1147734336]clssnmvVerifyCommittedConfigVFs: Insufficient voting files found, found 0 of 0 configured, needed 1 voting files
crsd.log
2010-09-24 15:54:30.723: [  OCRASM][21660240]proprasmo: kgfoCheckMount returned [6]
2010-09-24 15:54:30.723: [ OCRASM][21660240]proprasmo: The ASM disk group clusterdg is not found or not mounted
2010-09-24 15:54:30.724: [ OCRRAW][21660240]proprioo: Failed to open [+clusterdg].
Returned proprasmo() with [26]. Marking loc
ion as UNAVAILABLE.
2010-09-24 15:54:30.724: [ OCRRAW][21660240]proprioo: No OCR/OLR devices are usable
2010-09-24 15:54:30.724: [ OCRASM][21660240]proprasmcl: asmhandle is NULL
2010-09-24 15:54:30.724: [ OCRRAW][21660240]proprinit: Could not open raw device
2010-09-24 15:54:30.724: [ OCRASM][21660240]proprasmcl: asmhandle is NULL
2010-09-24 15:54:30.724: [ OCRAPI][21660240]a_init:16!: Backend init unsuccessful : [26]
2010-09-24 15:54:30.724: [ CRSOCR][21660240] OCR context init failure. Error: PROC-26: Error while accessing the physical sto
ge ASM error [SLOS: cat=8, opn=kgfoOpenFile01, dep=15056, loc=kgfokge
ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +CLUSTERDG.255.4294967295
ORA-17503: ksfdopn:2 Failed to open file +CLUSTERDG.255.4294967295
ORA-15001: disk] [8]
2010-09-24 15:54:30.724: [ CRSD][21660240][PANIC] CRSD exiting: Could not init OCR, code: 26
2010-09-24 15:54:30.724: [ CRSD][21660240] Done.
ASM alert log shows why the diskgroup containign vote disks and ocr wasn't mounted
SQL> ALTER DISKGROUP ALL MOUNT /* asm agent */
Diskgroup used for OCR is:CLUSTERDG
NOTE: cache registered group CLUSTERDG number=1 incarn=0x6ed824a0
NOTE: cache began mount (first) of group CLUSTERDG number=1 incarn=0x6ed824a0
NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
ERROR: no PST quorum in group: required 2, found 0
5. There's no SPfile for ASM but instance will be up
SQL> show parameter spfile

NAME TYPE VALUE
------- --------- ------
spfile string
6. To test various replace and repair scenarios, a diskgroup was created with a different name. Oracle Clusterware Admin Guide states "If the original OCR location does not exist, then you must create an empty (0 byte) OCR location before you run the ocrconfig -add or ocrconfig -replace commands. The OCR location that you are replacing can be either online or offline."

Replace and repair options could be used to replace the current location and to add, add and replace existing OCR locations. Cluster Admin guide also states "You cannot repair the OCR on a node on which Oracle Clusterware is running.If you run the ocrconfig -add | -repair | -replace command, then the device, file, or Oracle ASM disk group that you are adding must be accessible. This means that a device must exist. You must create an empty (0 byte) OCR location, or the Oracle ASM disk group must exist and be mounted."

There's the question of moutning ASM diskgroup while the crs down. If you try to manually mount the ASM instnace that is part of a cluster you'd get
sqlplus  / as sysasm

SQL> startup
ORA-01078: failure in processing system parameters
ORA-29701: unable to connect to Cluster Synchronization Service
All of the replace and repair option were useless and failed
# ocrconfig -restore /opt/app/11.2.0/grid/cdata/hpc-cluster/backup_20100924_142540.ocr
PROT-16: Internal Error

# ocrconfig -replace +clusterdg -replacement +clusterdgbk
PROT-28: Cannot delete or replace the only configured Oracle Cluster Registry location

# ocrconfig -repair -replace +clusterdg -replacement +clusterdgbk
PROT-21: Invalid parameter

# ocrconfig -repair -replace +clusterdg -replacement +clusterdgbk
PROT-21: Invalid parameter

# ocrconfig -repair -add +clusterdgbk
PROT-21: Invalid parameter

# ocrconfig -add +clusterdgbk
PROT-1: Failed to initialize ocrconfig
Even creating diskgroup with the original name and another one with new name and trying to repair replace also failed
SQL> create diskgroup clusterdg disk 'ORCL:CLUS1' disk 'ORCL:CLUS2' disk 'ORCL:CLUS3' attribute 'compatible.asm'='11.2';
Diskgroup created.

SQL> create diskgroup clusterdg2 disk 'ORCL:RED1' disk 'ORCL:RED2' disk 'ORCL:RED3' attribute 'compatible.asm'='11.2';
Diskgroup created.

# ocrconfig -repair -replace +clusterdg -replacement +clusterdg2
PROT-21: Invalid parameter
7. Finally created a diskgroup with the same name as original and did the OCR restore
SQL> create diskgroup clusterdg disk 'ORCL:RED1' disk 'ORCL:RED2' disk 'ORCL:RED3' attribute 'compatible.asm'='11.2';

ocrconfig -restore /opt/app/11.2.0/grid/cdata/hpc-cluster/backup_20100924_142540.ocr
Unlike the previous scenario there's no crs restart required to restore the vote disks, it was possible to restore the vote disks after restoring ocr
# crsctl query css votedisk
Located 0 voting disk(s).

# crsctl replace votedisk +clusterdg
Successful addition of voting disk 92814292a2ec4fa7bf2d6fe10960cc55
Successful addition of voting disk 57b152766d5f4f3dbf2935c93556b7f5
Successful addition of voting disk 6120792eb1284f64bf958f13a4947ece
Successfully replaced voting disk group with +clusterdg.
CRS-4266: Voting file(s) successfully replaced
8. Stop the crs on the node and start crs on all nodes
crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'hpc1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'hpc1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.asm' on 'hpc1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'hpc1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.asm' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hpc1'
CRS-2677: Stop of 'ora.cssd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'hpc1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'hpc1'
CRS-2677: Stop of 'ora.gipcd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'hpc1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'hpc1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
9. The last step in this scenario is recreate the ASM SPfile. When ASM instance is started with a spfile or pfile, ASM alert log shows the file location. For spfile
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
Using parameter settings in server-side spfile +CLUSTERDG/hpc-cluster/asmparameterfile/registry.253.730565167
System parameters with non-default values:
large_pool_size = 12M
instance_type = "asm"
remote_login_passwordfile= "EXCLUSIVE"
asm_diskstring = "ORCL:CLUS*"
asm_power_limit = 1
diagnostic_dest = "/opt/app/oracle"
Cluster communication is configured to use the following interface(s) for this instance
For pfile
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
Using parameter settings in client-side pfile /opt/app/11.2.0/grid/dbs/init+ASM1.ora on machine hpc1
System parameters with non-default values:
large_pool_size = 12M
instance_type = "asm"
remote_login_passwordfile= "EXCLUSIVE"
asm_diskstring = "ORCL:CLUS*"
asm_power_limit = 1
diagnostic_dest = "/opt/app/oracle"
Cluster communication is configured to use the following interface(s) for this instance
But after the corruption of all the disks on the ASM diskgroup, there won't be a server side spfile and even when there's no client side pfile ASM instance starts. ASM alert log shows that it is using default parameter setting without a parameter file
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
WARNING: using default parameter settings without any parameter file
Cluster communication is configured to use the following interface(s) for this instance
These default parameters cannot be considered as a spfile or a pfile trying to create one would fail
SQL> create pfile='/home/oracle/pfile.ora' from spfile;
create pfile='/home/oracle/pfile.ora' from spfile
*
ERROR at line 1:
ORA-01565: error in identifying file '?/dbs/spfile@.ora'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

SQL> create spfile from pfile;
create spfile from pfile
*
ERROR at line 1:
ORA-17502: ksfdcre:4 Failed to create file
+CLUSTERDG/hpc-cluster/asmparameterfile/registry.253.730565167
ORA-15177: cannot operate on system aliases
But a pfile could be created from these values in memory using
SQL> create pfile='/home/oracle/pfile.ora' from memory;

File created.
Or if a pfile is available from the time before the diskgroup corruption it could be used to restore the ASM spfile
SQL> create spfile='+clusterdg' from pfile='/home/oracle/asmpfile.ora';

File created.
Last command above would create a new spfile with a different alias than the original and ASM instance would use than during startup.
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
Using parameter settings in server-side spfile +CLUSTERDG/hpc-cluster/asmparameterfile/registry.253.730908391
System parameters with non-default values:
large_pool_size = 12M
instance_type = "asm"
remote_login_passwordfile= "EXCLUSIVE"
asm_diskstring = "ORCL:CLUS*"
asm_power_limit = 1
diagnostic_dest = "/opt/app/oracle"
Cluster communication is configured to use the following interface(s) for this instance
Useful Metalink note
How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems [ID 1062983.1]

Friday, September 17, 2010

Restoring OCR and Vote disk due to ASM disk failures - 2

Earlier blog posts was about restoring OCR & Vote disk when only one of the disks in the ASM disk group containing them suffers a failure.

This post is when two disks in the ASM diskgroup are affected by some failure. The procedure is different to that of loosing two disks when only one of the clusterware files(OCR, Vote) is in the disk group.

Because the diskgroup (in this case normal redundancy) loses its quorum it won't be mountable and would require the recreation of asm diskgroup and restoring OCR first and then Vote disk.

Scenario 2.
1. Both OCR and Vote disks are in ASM diskgroup
2. ASM diskgroup has normal redundancy with only three failure groups
3. Only Two failure groups are affected

1. Current OCR and vote disk configuration.
ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2616
Available space (kbytes) : 259504
ID : 2002737697
Device/File Name : +CLUSTERDG
Device/File integrity check succeeded
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded

crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 34845aa0b29d4f36bf8743e3506eba12 (ORCL:CLUS2) [CLUSTERDG]
2. ONLINE 1200c8daed494fbabf6f64ee6e07fde8 (ORCL:CLUS3) [CLUSTERDG]
3. ONLINE 98557cefaca24fcdbf8807f3dd1fbd29 (ORCL:CLUS1) [CLUSTERDG]
Located 3 voting disk(s).
2. Corrupt the disks to simulate disk failure.
# /etc/init.d/oracleasm querydisk -p clus1
Disk "CLUS1" is a valid ASM disk
/dev/sdc2: LABEL="CLUS1" TYPE="oracleasm"

# /etc/init.d/oracleasm querydisk -p clus2
Disk "CLUS2" is a valid ASM disk
/dev/sdc3: LABEL="CLUS2" TYPE="oracleasm"

# dd if=/dev/zero of=/dev/sdc2 count=204800 bs=8192
204800+0 records in
204800+0 records out
1677721600 bytes (1.7 GB) copied, 1.6441 seconds, 1.0 GB/s

# dd if=/dev/zero of=/dev/sdc3 count=204800 bs=8192
204800+0 records in
204800+0 records out
1677721600 bytes (1.7 GB) copied, 1.59438 seconds, 1.1 GB/s
3. ocssd.log will show the vote disk corruption being detected.
2010-09-15 15:26:04.272: [    CSSD][1332816192]clssnmvDiskKillCheck: voting disk corrupted (0x00000000,0x00000000) (ORCL:CLUS1)
2010-09-15 15:26:04.272: [ CSSD][1332816192]clssnmvDiskAvailabilityChange: voting file ORCL:CLUS1 now offline
2010-09-15 15:26:04.364: [ CLSF][1322326336]Closing handle:0x126d1f50
2010-09-15 15:26:04.364: [ SKGFD][1322326336]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x12927d50 for
disk :ORCL:CLUS1:
4. Querying the vote disks possibel whereas ocrcheck fails
crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 34845aa0b29d4f36bf8743e3506eba12 (ORCL:CLUS2) [CLUSTERDG]
2. ONLINE 1200c8daed494fbabf6f64ee6e07fde8 (ORCL:CLUS3) [CLUSTERDG]
3. ONLINE 98557cefaca24fcdbf8807f3dd1fbd29 (ORCL:CLUS1) [CLUSTERDG]
Located 3 voting disk(s).

# ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage
5. Stop the cluster and crs services
# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'hpc1'
CRS-2673: Attempting to stop 'ora.crsd' on 'hpc1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'hpc1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'hpc1'
CRS-2673: Attempting to stop 'ora.CLUSTERDG.dg' on 'hpc1'
CRS-2673: Attempting to stop 'ora.clusdb.db' on 'hpc1'
CRS-2673: Attempting to stop 'ora.registry.acfs' on 'hpc1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'hpc1'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'hpc1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.hpc1.vip' on 'hpc1'
CRS-2677: Stop of 'ora.scan1.vip' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.hpc1.vip' on 'hpc1' succeeded
CRS-4549: Unexpected disconnect while executing shutdown request.
CRS-2675: Stop of 'ora.crsd' on 'hpc1' failed
CRS-2679: Attempting to clean 'ora.crsd' on 'hpc1'
CRS-4548: Unable to connect to CRSD
CRS-2678: 'ora.crsd' on 'hpc1' has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability.

CRS-2795: Shutdown of Oracle High Availability Services-managed resources on 'hpc1' has failed
CRS-4687: Shutdown command has completed with error(s).
CRS-4000: Command Stop failed, or completed with errors.

# crsctl stop cluster
CRS-2673: Attempting to stop 'ora.crsd' on 'hpc1'
CRS-4548: Unable to connect to CRSD
CRS-2675: Stop of 'ora.crsd' on 'hpc1' failed
CRS-2679: Attempting to clean 'ora.crsd' on 'hpc1'
CRS-4548: Unable to connect to CRSD
CRS-2678: 'ora.crsd' on 'hpc1' has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability.
CRS-4000: Command Stop failed, or completed with errors.
Some of the components fails to shutdown. Use the force option to stop the crs but even then ASM instnace and database instnace will remain open, they must be manually shutdown loging in as sys / sysdba[sysasm]
# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'hpc1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'hpc1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.evmd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.asm' on 'hpc1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'hpc1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.asm' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hpc1'
CRS-2677: Stop of 'ora.cssd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'hpc1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'hpc1'
CRS-2677: Stop of 'ora.gipcd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'hpc1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'hpc1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
6. Start the crs on one node in exclusive mode.
# crsctl start crs -excl
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.gipcd' on 'hpc1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'hpc1'
CRS-2676: Start of 'ora.gipcd' on 'hpc1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'hpc1'
CRS-2676: Start of 'ora.gpnpd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'hpc1'
CRS-2676: Start of 'ora.cssdmonitor' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'hpc1'
CRS-2679: Attempting to clean 'ora.diskmon' on 'hpc1'
CRS-2681: Clean of 'ora.diskmon' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.diskmon' on 'hpc1'
CRS-2676: Start of 'ora.diskmon' on 'hpc1' succeeded
CRS-2676: Start of 'ora.cssd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'hpc1'
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'hpc1'
CRS-2676: Start of 'ora.drivers.acfs' on 'hpc1' succeeded
CRS-2676: Start of 'ora.ctssd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'hpc1'
CRS-2676: Start of 'ora.asm' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'hpc1'
CRS-2676: Start of 'ora.crsd' on 'hpc1' succeeded
Stop the crsd if started
crsctl stop resource ora.crsd -init
ocssd.log will show only one vote disk is detected
2010-09-15 15:34:19.853: [    CLSF][1148365120]Opened hdl:0x13512f20 for dev:ORCL:CLUS3:
2010-09-15 15:34:19.865: [ CSSD][1148365120]clssnmvDiskVerify: Successful discovery for disk ORCL:CLUS3, UID 1200c8da-ed494fba-bf6f64ee-6e07fde8, Pending CIN 0:1284559165:0, Committed CIN 0:1284559165:0
2010-09-15 15:34:19.865: [ CLSF][1148365120]Closing handle:0x13512f20
2010-09-15 15:34:19.865: [ SKGFD][1148365120]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x1350a590 for
disk :ORCL:CLUS3:

2010-09-15 15:34:19.865: [ CSSD][1148365120]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-09-15 15:34:19.865: [ CSSD][1148365120]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-09-15 15:34:19.865: [ CSSD][1148365120]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-09-15 15:34:19.865: [ CSSD][1148365120]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-09-15 15:34:19.865: [ CSSD][1148365120]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-09-15 15:34:19.865: [ CSSD][1148365120]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2010-09-15 15:34:19.865: [ CSSD][1148365120]clssnmvDiskVerify: Successful discovery of 1 disks
2010-09-15 15:34:19.865: [ CSSD][1148365120]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
7. At this all the diskgroup will be dismount state and trying to mount them or drop them with force option will result in an error.
SQL> select name,state from v$asm_diskgroup;
NAME STATE
--------------- ---------------
CLUSTERDG DISMOUNTED
DATA DISMOUNTED
FLASH DISMOUNTED

SQL> alter diskgroup clusterdg mount force;
alter diskgroup clusterdg mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "CLUSTERDG" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup
"CLUSTERDG"
11g has a new ASM command option which allows asm diskgroup to be dropped when they are offline. But this too fails because disk group contains vote disks
SQL> drop diskgroup clusterdg force including contents;
drop diskgroup clusterdg force including contents
*
ERROR at line 1:
ORA-15039: diskgroup not dropped
ORA-15276: ASM diskgroup CLUSTERDG has cluster voting files
Querying the ora-15276 error code shows
oerr ora 15276
15276, 00000, "ASM diskgroup %s has cluster voting files"
// *Cause: An attempt was made to drop a diskgroup that contained cluster
// voting files.
// *Action: Move the cluster voting files out of the diskgroup and retry the
// operation.
But trying to move the vote disks will also fail due to unavailability of the crs
crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. OFFLINE 6c23b17c25c34fb2bf02b3d6979a3e29 () []
2. OFFLINE 34845aa0b29d4f36bf8743e3506eba12 () []
3. ONLINE 1200c8daed494fbabf6f64ee6e07fde8 (ORCL:CLUS3) [CLUSTERDG]
4. OFFLINE 98557cefaca24fcdbf8807f3dd1fbd29 () []
Located 4 voting disk(s).

crsctl replace votedisk /dev/sdc6
Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=6, opn=kgfo, dep=0, loc=kgfoCkMt03
diskgroup CLUSTERDG not mounted ()
] [6]
CRS-4000: Command Replace failed, or completed with errors.
Solution is to restore the OCR first and have the crs up and running before dealing with vote disks.

8. Repair the failed disks and create a new disk group, use the surviving disk in the new disk group with force option
# /etc/init.d/oracleasm deletedisk clus1
Removing ASM disk "clus1": [ OK ]
# /etc/init.d/oracleasm deletedisk clus2
Removing ASM disk "clus2": [ OK ]

# /etc/init.d/oracleasm createdisk clus1 /dev/sdc2
Marking disk "clus1" as an ASM disk: [ OK ]
# /etc/init.d/oracleasm createdisk clus2 /dev/sdc3
Marking disk "clus2" as an ASM disk: [ OK ]

SQL> create diskgroup clusterdgbk disk 'ORCL:CLUS1' DISK 'ORCL:CLUS2' DISK 'ORCL:CLUS3' force attribute 'compatible.asm'='11.2';

Diskgroup created.

SQL> select name,state from v$asm_diskgroup;

NAME STATE
--------------- --------
CLUSTERDG MOUNTED
9. Trying to replace the vote disks would still fail because crs is not available
crsctl replace votedisk +clusterdgbk
Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=6, opn=kgfo, dep=0, loc=kgfoCkMt03
diskgroup CLUSTERDG not mounted ()
] [6]
CRS-4000: Command Replace failed, or completed with errors.
Restoring crs would also fail because it is still looking for the old diskgroup name.
/ocrconfig -restore /opt/app/11.2.0/grid/cdata/hpc-cluster/backup_20100915_151240.ocr
PROT-16: Internal Error
Recreate the diskgroup with the same name and restore the ocr from a backup file
SQL> alter diskgroup clusterdgbk dismount;

Diskgroup altered.

SQL> create diskgroup clusterdg disk 'ORCL:CLUS1' force disk 'ORCL:CLUS2' force DISK 'ORCL:CLUS3' force attribute 'compatible.asm'='11.2';

Diskgroup created.

./ocrconfig -restore /opt/app/11.2.0/grid/cdata/hpc-cluster/backup_20100915_151240.ocr
10. Still restoring vote disk fails because ocr files have been only restored. Stop the crs and start again in exclusive mode and restore the vote disks.
crsctl replace votedisk +clusterdg
Failed to create voting files on disk group clusterdg.
Change to configuration failed, but was successfully rolled back.
CRS-4000: Command Replace failed, or completed with errors.

crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'hpc1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'hpc1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'hpc1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.asm' on 'hpc1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'hpc1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.asm' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hpc1'
CRS-2677: Stop of 'ora.cssd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.diskmon' on 'hpc1'
CRS-2677: Stop of 'ora.gpnpd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'hpc1'
CRS-2677: Stop of 'ora.gipcd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'hpc1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'hpc1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

crsctl start crs -excl
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.gipcd' on 'hpc1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'hpc1'
CRS-2676: Start of 'ora.gipcd' on 'hpc1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'hpc1'
CRS-2676: Start of 'ora.gpnpd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'hpc1'
CRS-2676: Start of 'ora.cssdmonitor' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'hpc1'
CRS-2679: Attempting to clean 'ora.diskmon' on 'hpc1'
CRS-2681: Clean of 'ora.diskmon' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.diskmon' on 'hpc1'
CRS-2676: Start of 'ora.diskmon' on 'hpc1' succeeded
CRS-2676: Start of 'ora.cssd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'hpc1'
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'hpc1'
CRS-2676: Start of 'ora.drivers.acfs' on 'hpc1' succeeded
CRS-2676: Start of 'ora.ctssd' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'hpc1'
CRS-2676: Start of 'ora.asm' on 'hpc1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'hpc1'
CRS-2676: Start of 'ora.crsd' on 'hpc1' succeeded

crsctl replace votedisk +clusterdg
Successful addition of voting disk 53dd1707604f4fc5bf910fc59bd857f8
Successful addition of voting disk 7496834c116f4f53bf72b8aa726a8ede
Successful addition of voting disk 9c6c24ea6cd74f66bf616d7624b856af
Successfully replaced voting disk group with +clusterdg.
CRS-4266: Voting file(s) successfully replaced

crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 53dd1707604f4fc5bf910fc59bd857f8 (ORCL:CLUS1) [CLUSTERDG]
2. ONLINE 7496834c116f4f53bf72b8aa726a8ede (ORCL:CLUS2) [CLUSTERDG]
3. ONLINE 9c6c24ea6cd74f66bf616d7624b856af (ORCL:CLUS3) [CLUSTERDG]
Located 3 voting disk(s).

ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2588
Available space (kbytes) : 259532
ID : 2002737697
Device/File Name : +CLUSTERDG
Device/File integrity check succeeded
Device/File not configured

Cluster registry integrity check succeeded
Logical corruption check succeeded
11. Stop the crs and start crs on normal mode on all nodes.
crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'hpc1'
CRS-2673: Attempting to stop 'ora.crsd' on 'hpc1'
CRS-2677: Stop of 'ora.crsd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'hpc1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'hpc1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.asm' on 'hpc1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'hpc1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.asm' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hpc1'
CRS-2677: Stop of 'ora.cssd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.diskmon' on 'hpc1'
CRS-2677: Stop of 'ora.gpnpd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'hpc1'
CRS-2677: Stop of 'ora.gipcd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'hpc1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'hpc1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
Useful Metalink note
How to restore ASM based OCR after complete loss of the CRS diskgroup on Linux/Unix systems [ID 1062983.1]

Wednesday, September 15, 2010

Stopping RAC/Clusterware stack on 11.2

Stopping RAC and Cluster stack without use of force options.
1.Status before the shutdown
HA Resource                                   Target     State
-----------                                   ------     -----
ora.CLUSTERDG.dg                              ONLINE     ONLINE on hpc1
ora.DATA.dg                                   ONLINE     ONLINE on hpc1
ora.FLASH.dg                                  ONLINE     ONLINE on hpc1
ora.LISTENER.lsnr                             ONLINE     ONLINE on hpc1
ora.LISTENER_SCAN1.lsnr                       ONLINE     ONLINE on hpc1
ora.asm                                       ONLINE     ONLINE on hpc1
ora.clusdb.db                                 ONLINE     ONLINE on hpc1
ora.eons                                      ONLINE     ONLINE on hpc1
ora.gsd                                       OFFLINE    OFFLINE
ora.hpc1.vip                                  ONLINE     ONLINE on hpc1
ora.net1.network                              ONLINE     ONLINE on hpc1
ora.oc4j                                      OFFLINE    OFFLINE
ora.ons                                       ONLINE     ONLINE on hpc1
ora.registry.acfs                             ONLINE     ONLINE on hpc1
ora.scan1.vip                                 ONLINE     ONLINE on hpc1
2. Shutdown one RAC component at time. Some components have dependencies and using force option (appending -f) would resolve those issues. But here each dependent component is shutdown beforehand.
srvctl stop database -d clusdb
srvctl stop listener -n hpc1
srvctl stop vip -n hpc1
srvctl stop scan_listener -i 1
srvctl stop scan -i 1
srvctl stop diskgroup -g DATA -n hpc1
srvctl stop diskgroup -g flash -n hpc1
srvctl stop nodeapps -n hpc1
srvctl stop diskgroup -g clusterdg -n hpc1
3. After this ora.registry.acfs and ora.asm would be the only components running and those cannot be stopped srvctl. Use crsctl as root to stop the acfs registry
crsctl stop resource ora.registry.acfs
CRS-2673: Attempting to stop 'ora.registry.acfs' on 'hpc1'
CRS-2677: Stop of 'ora.registry.acfs' on 'hpc1' succeeded
4. ora.asm cannot be stopped with either crsctl as root nor as srvctl even with force option.
srvctl stop asm -n hpc1 -f
PRCR-1014 : Failed to stop resource ora.asm
PRCR-1065 : Failed to stop resource ora.asm
CRS-2673: Attempting to stop 'ora.asm' on 'hpc1'
ORA-15097: cannot SHUTDOWN ASM instance with connected client
CRS-2675: Stop of 'ora.asm' on 'hpc1' failed

crsctl stop resource ora.asm
CRS-2673: Attempting to stop 'ora.asm' on 'hpc1'
CRS-2673: Attempting to stop 'ora.asm' on 'hpc1'
ORA-15097: cannot SHUTDOWN ASM instance with connected client
CRS-2675: Stop of 'ora.asm' on 'hpc1' failed
CRS-2675: Stop of 'ora.asm' on 'hpc1' failed
CRS-4000: Command Stop failed, or completed with errors.
After execution of these commands state of various rac components would be
HA Resource                                   Target     State
-----------                                   ------     -----
ora.CLUSTERDG.dg                              OFFLINE    OFFLINE
ora.DATA.dg                                   OFFLINE    OFFLINE
ora.FLASH.dg                                  OFFLINE    OFFLINE
ora.LISTENER.lsnr                             OFFLINE    OFFLINE
ora.LISTENER_SCAN1.lsnr                       OFFLINE    OFFLINE
ora.asm                                       OFFLINE    ONLINE on hpc1
ora.clusdb.db                                 OFFLINE    OFFLINE
ora.eons                                      OFFLINE    OFFLINE
ora.gsd                                       OFFLINE    OFFLINE
ora.hpc1.vip                                  OFFLINE    OFFLINE
ora.net1.network                              OFFLINE    OFFLINE
ora.oc4j                                      OFFLINE    OFFLINE
ora.ons                                       OFFLINE    OFFLINE
ora.registry.acfs                             OFFLINE    OFFLINE
ora.scan1.vip                                 OFFLINE    OFFLINE
Interestingly clusterdg is the asm diskgroup which has the ocr and vote disks. It seem diskgroup being in a mount or unmount state is irrelevant to access ocr and vote disk as long as ASM instnace is running cluster seem to function.

5. Stop the clusterware stack with
crsctl stop cluster
CRS-2673: Attempting to stop 'ora.crsd' on 'hpc1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'hpc1'
CRS-2673: Attempting to stop 'ora.asm' on 'hpc1'
CRS-2677: Stop of 'ora.asm' on 'hpc1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'hpc1' has completed
CRS-2677: Stop of 'ora.crsd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'hpc1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.evmd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.asm' on 'hpc1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.asm' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hpc1'
CRS-2677: Stop of 'ora.cssd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'hpc1'
CRS-2677: Stop of 'ora.diskmon' on 'hpc1' succeeded
6. Stop the oracle high availability services with
crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'hpc1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'hpc1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'hpc1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'hpc1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'hpc1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'hpc1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'hpc1'
CRS-2677: Stop of 'ora.gipcd' on 'hpc1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'hpc1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

Closing ResultSet early invalidates result cache

When using the result cache (/*+ result_cache */) in a SQL during the query execution phase all the dependent objects will be identified for the query and will have the status PUBLISHED in result cache object view. But it is during the result fetch phase that the actual result will be cached. Status of the result set in the aforementioned view will be NEW when the first result is fetched and then on will be ByPass for subsequent row fetches for the same result set in the same session, once all the rows are fetched without an error status will be PUBLISHED.

If for some reason result set was closed half way through (or an error happens, which is not considered in this post) the result will have a status of invalid. Subsequent running of the sql will try to create a new result set to cache but will also end of invalid if result set is closed without being traversed the full length.

However if one session traverse the entire result set and subsequent sessions only traverse few rows of the result set, these latter sessions will be using the result cache.

Test case is give below.

1. Create the tables and functions needed

SQL> create table x (a number, b varchar2(100), c as (mod(a,4)));

Table created.

SQL> begin
2 for i in 1 .. 1000
3 loop
4 insert into x (a,b) values (i,i||'abcdefg');
5 end loop;
6 end;
7 /

PL/SQL procedure successfully completed.

create or replace package rescachetest as
type ret_type is ref cursor;
function getvalues return ret_type;
end;
/

create or replace Package Body Rescachetest As

Function Getvalues Return Ret_Type Is

Ret_Val Ret_Type;
Begin
Open Ret_Val For
Select /*+ result_cache */ * From X Where C In (3,4);

return ret_val;
end;
End;
/
This uses the result cache hint inside the pl/sql function which returns a ref cursor type. But it could be verified for pure sql as well. The java code used to test use be used to run both these cases.

2. Run the java code which will take only 90 rows from the result set and close the result set.
public class ResCacheTest {

public static void main(String[] args) {
try {
OracleDataSource dataSource = new OracleDataSource();
dataSource.setURL("jdbc:oracle:thin:@url");
dataSource.setUser("username");
dataSource.setPassword("password");

Connection con = dataSource.getConnection();


// PL/SQL
CallableStatement clm = con.prepareCall("begin ? := rescachetest.getvalues; end;");
clm.registerOutParameter(1, OracleTypes.CURSOR);
clm.execute();


// SQL
// PreparedStatement pr = con.prepareStatement("select /*+ result_cache */ * from X Where C In (3,4)");

// System.out.println("executed");
// Thread.sleep(10000);

ResultSet rs = ((OracleCallableStatement)clm).getCursor(1);

// ResultSet rs = pr.executeQuery();
int i = 0;
while(rs.next()){

i++;
System.out.println(rs.getInt(1)+" "+rs.getString(2));

// System.out.println("get result");
// Thread.sleep(10000);
if(i == 90){
break;
}
}

// System.out.println("closing");
rs.close();
clm.close();
// pr.close();
con.close();
dataSource.close();


} catch (Exception ex) {
Logger.getLogger(ResCacheTest.class.getName()).log(Level.SEVERE, null, ex);
}
}

}
3. ADMon has been used to query the v$result_cache_objects view.

When the clm.execute(); has been called observed the following
When the first row was taken from the result set
Subsequent row fetches
Once the result set was closed after taking 90 rows
4. If the java code was to run without closing the result set after taking 90 rows results will be published in the result cache.
Subsequent execution of java code where result set is closed after taking 90 rows will not result in creating additional invalid result caches as before, and these execution will use the already publish result cache.

Tuesday, September 14, 2010

No DML allowed when flashback data archive quota exceeded

When a flashback data archive exceeds its quota on the tablepsace, it will log an alert on to the alert log but more importantly all DML statements will result in an error until space is added.

1. Create a flashback archive and grant privileges to a user. The flashback archive in this case will have a quota of only 2 MB
sqlplus / as sysdba

SQL> create flashback archive flash1 tablespace asmbkp quota 2m retention 1 year;

SQL> grant flashback archive on flash1 to asanga;
2. Create a table with flashback archiving
conn asanga/***

SQL> create table x ( a char(2000), b char(2000), c char(2000), d char(2000)) flashback archive flash1;

Table created.
3. Insert a single row into the table and continue to update that row
SQL> insert into x values ('x','y','z','i');

SQL> begin
2 for i in 1 .. 100000
3 loop
4 update x set a = i||'x', b = a, c = a, d = a;
5 commit;
6 end loop;
7 end;
8 /
begin
*
ERROR at line 1:
ORA-55617: Flashback Archive "FLASH1" runs out of space and tracking on "X" is
suspended
ORA-06512: at line 4
Foreground session receives the flashback runs out of space error and following could be seen on the alert log
ORA-1688: unable to extend table ASANGA.SYS_FBA_HIST_77187 partition HIGH_PART by 1024 in tablespace ASMBKP
Fri Sep 10 02:28:22 2010
ORA-1688: unable to extend table ASANGA.SYS_FBA_HIST_77187 partition HIGH_PART by 1024 in tablespace ASMBKP
Flashback Archive FLASH1 ran out of space in tablespace ASMBKP.
Flashback archive FLASH1 is full, and archiving is suspended.
Please add more space to flashback archive FLASH1.
Flashback Archive FLASH1 ran out of space in tablespace ASMBKP.
ORA-1688: unable to extend table ASANGA.SYS_FBA_HIST_77187 partition HIGH_PART by 1024 in tablespace ASMBKP
Flashback Archive FLASH1 ran out of space in tablespace ASMBKP.
4. Further inserts are also suspended
SQL> insert into x values('1','2','3','4');
insert into x values('1','2','3','4')
*
ERROR at line 1:
ORA-55617: Flashback Archive "FLASH1" runs out of space and tracking on "X" is
suspended


Update on 2011-06-29

There has been some new development with regard to the above post. It seem the error thrown is not due to flashback quota exceeding but tablespace quota exceeding. It seem quota limit has no effect.

Below is the test case.
SQL> create tablespace asmbkp datafile '+DATA(datafile)' size 10M autoextend on next 10M maxsize 100M;

SQL> create flashback archive flash1 tablespace asmbkp quota 2m retention 1 year;

SQL> grant flashback archive on flash1 to asanga;

SQL> conn asanga/asa
Connected.
SQL> create table x ( a char(2000), b char(2000), c char(2000), d char(2000)) flashback archive flash1;

Table created.

SQL> insert into x values ('x','y','z','i');

1 row created.

SQL> commit;

Commit complete.

SQL> begin
2 for i in 1 .. 100000
3 loop
4 update x set a = i||'x', b = a, c = a, d = a;
5 commit;
6 end loop;
7 end;
8 /
begin
*
ERROR at line 1:
ORA-55617: Flashback Archive "FLASH1" runs out of space and tracking on "X" is
suspended
ORA-06512: at line 4
But the error is not due to the fact flashback archive has reached it limit of 2m, ORA-55617 happens because tablespace has reached it max size. This error should have happened when flashback archive internal table reached 2M.

Query showing quota has been set to 2 M
SQL> select * from dba_flashback_archive_ts;

FLASHBACK_ARCHIVE_NAME FLASHBACK_ARCHIVE# TABLESPACE_NAME QUOTA_IN_MB
------------------------- ------------------ ------------------------------ -------------
FLASH1 1 ASMBKP 2
Getting the name of the internal flashback archive table
SQL> select * from dBA_FLASHBACK_ARCHIVE_TABLES;

TABLE_NAME OWNER_NAME FLASHBACK_ARCHIVE_NAME ARCHIVE_TABLE_NAME STATUS
---------- ------------------------------ ------------------------- ------------------- -------------
X ASANGA FLASH1 SYS_FBA_HIST_87387 ENABLED
Size of the internal flashback archive table
SQL> select sum(bytes)/1024/1024 as "MB" from dba_segments where segment_name='SYS_FBA_HIST_87387';

MB
--
96
Maximum and current size of the datafile (only one datafile in this tablespace)
SQL> select tablespace_name,bytes/1024/1024 as "Size MB",maxbytes/1024/1024 as "MaxSize MB" from dba_data_files where tablespace_name='ASMBKP';

TABLESPACE_NAME Size MB MaxSize MB
------------------------------ ---------- ----------
ASMBKP 100 100
Above was tested on 11.2.0.2 With Patchset 11.2.0.2.0 set. Same behavior is also seen on 11.1.0.7 with PSU 11.1.0.7.7. Oracle support has suggested Bug 7120053 - ORA-55617 in flashback archive tablespace even if used size does not reach quota [ID 7120053.8] but issue suggested on the metalink and above issue is not the same.

Update on 2011-07-08

Response from Oracle is The Flashback archiving is handled by fbda Background Process and it checks for Tablespace Quota every 1 Hour. So the Limit can be exceeded until the next fbda-Run detects it. And 11g feature: Flashback Data Archive Guide. [ID 470199.1] will be updated with this information.

So carried out the test but limited the number of loops to 10. This created a flashback archive table of size 8MB exceeding the 2M quote. Waited until fbda to check the quota limit again and following could be seen on the alert log
Fri Jul 08 12:33:49 2011
Flashback Archive FLASH1 ran out of space in tablespace TEST.
Flashback archive FLASH1 is full, and archiving is suspended.
Please add more space to flashback archive FLASH1.
Same message is repeated almost every hour
Fri Jul 08 13:38:49 2011
Flashback Archive FLASH1 ran out of space in tablespace TEST.
And trying to update would cause the following error and no further DML will be allowed.
SQL>  update x set a ='aaa' ,b = a, c = a, d = a;
update x set a ='aaa' ,b = a, c = a, d = a
*
ERROR at line 1:
ORA-55617: Flashback Archive "FLASH1" runs out of space and tracking on "X" is
suspended

SQL> delete from x where a='10x';
delete from x where a='10x'
*
ERROR at line 1:
ORA-55617: Flashback Archive "FLASH1" runs out of space and tracking on "X" is
suspended


Changing Resource Attributes in 11gR2 Grid Infrastructure

In 11gR2 grid infrastructure installations certain resources may have auto start set to never and restore. This was observed both on environments where clusterware was upgraded to 11.2 as well as newly installed environments. Depending on the situation these may not be desirable. Auto start attribute setting could be changed as follows.

1. Check the current auto start values
# crsctl stat res -p
NAME=ora.FLASH.dg
TYPE=ora.diskgroup.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=
AUTO_START=never

NAME=ora.DATA.dg
TYPE=ora.diskgroup.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=
AUTO_START=never

NAME=ora.clusdb.db
TYPE=ora.database.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
AUTO_START=restore
2. Since ASM diskgroup that database depend on will never auto start database will also be unavailable.

3. Change the resource start attribute with
# crsctl modify resource "ora.FLASH.dg" -attr "AUTO_START=always"
# crsctl modify resource "ora.DATA.dg" -attr "AUTO_START=always"
# crsctl modify resource ora.clusdb.db -attr "AUTO_START=always"
Auto start must be upper case if not command will fail
crsctl modify resource ora.clusdb.db -attr "auto_start=always"
CRS-0160: The attribute 'auto_start' is not supported in this resource type.
CRS-4000: Command Modify failed, or completed with errors.
4. Verify the status change with
# crsctl stat res -p
NAME=ora.clusdb.db
TYPE=ora.database.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
AUTO_START=always


Saturday, September 11, 2010

DB_SECUREFILE Options

db_securefile parameter could be set to one of the five permitted values. They are
PERMITTED :- allows SECUREFILE LOBs to be created provided ASSM tablespace is used. This is the default value.
sqlplus / as sysdba
SQL> show parameter db_secure

NAME TYPE VALUE
-------------- ----------- ---------
db_securefile string PERMITTED

SQL> create table seclob (a number, b blob) lob(b) store as securefile;
create table seclob (a number, b blob) lob(b) store as securefile
*
ERROR at line 1:
ORA-43853: SECUREFILE lobs cannot be used in non-ASSM tablespace "SYSTEM"
NEVER :- disallow SECUREFILE LOBs from being created. If securefile option is used then table create command still succeed but secure file is not used. Error is thorwn only if any of the securefile options are specified in the create statement.
SQL> conn asanga/*****
Connected.

SQL> alter session set db_securefile='NEVER';

Session altered.

SQL> create table seclob (a number, b blob) lob(b) store as securefile;

Table created.

SQL> select table_name,column_name,securefile from user_lobs;

TABLE COLUMN SEC
------ ------ ---
SECLOB B NO

SQL> drop table seclob;

Table dropped.

SQL> create table seclob (a number, b blob) lob(b) store as securefile(compress);
create table seclob (a number, b blob) lob(b) store as securefile(compress)
*
ERROR at line 1:
ORA-43854: use of a BASICFILE LOB where a SECUREFILE LOB was expected
ALWAYS :- creates securefiles even if basicfile is specified.
SQL> alter session set db_securefile='ALWAYS';

SQL> create table seclob (a number, b blob) lob(b) store as basicfile;

Table created.

SQL> select table_name,column_name,securefile from user_lobs;

TABLE COLUMN SEC
------ ------ ---
SECLOB B YES
On the securefile and large object guide it is stated "ALWAYS attempts to create all LOBs as SECUREFILE LOBs but creates any LOBs not in ASSM tablespaces as BASICFILE LOBs, unless SECUREFILE is explicitly specified. Any BASICFILE LOB storage options specified will be ignored, and the SECUREFILE defaults are used for all storage options not specified."
SQL> create tablespace mssm datafile '+DATA(datafile)' segment space management manual;

Tablespace created.

SQL> alter session set db_securefile='ALWAYS';

Session altered.

SQL> create table seclob (a number, b blob) lob(b) store as securefile (tablespace mssm);
create table seclob (a number, b blob) lob(b) store as securefile (tablespace mssm)
*
ERROR at line 1:
ORA-43853: SECUREFILE lobs cannot be used in non-ASSM tablespace "MSSM"
This was tested on 11.1.0.7.4 database and it seem when MSSM tablespace is specified rather than creating basicfile it throws an error.

Securefile and large object guide states "If FORCE is specified, all LOBs created in the system will be created as SECUREFILE LOBs. If the LOB is being created in an MSSM tablespace, an error will be thrown. Any BASICFILE LOB storage options specified will be ignored, and the SECUREFILE defaults are used for all storage options not specified.". This behavior is same as "ALWAYS"
SQL> alter session set db_securefile='FORCE';

Session altered.

SQL> create table seclob (a number, b blob) lob(b) store as basicfile;

Table created.

SQL> select table_name,column_name,securefile from user_lobs;

TABLE_ COLUMN SEC
------ ------ ---
SECLOB B YES

SQL> drop table seclob;

SQL> create table seclob (a number, b blob) lob(b) store as securefile (tablespace mssm);
create table seclob (a number, b blob) lob(b) store as securefile (tablespace mssm)
*
ERROR at line 1:
ORA-43853: SECUREFILE lobs cannot be used in non-ASSM tablespace "MSSM"
Last option is ignore which will ignore securefile keyword and any securefile options specified in the create statement. This is true for all options except encrypt, where instead of being ignored an error is thrown if the wallet is not open.
SQL> alter session set db_securefile='IGNORE';

Session altered.

SQL> create table seclob (a number, b blob) lob(b) store as securefile;

Table created.

SQL> select table_name,column_name,securefile from user_lobs;

TABLE_ COLUMN SEC
------ ------ ---
SECLOB B NO


SQL> drop table seclob;

SQL> create table seclob (a number, b blob) lob(b) store as securefile(compress deduplicate);

Table created.

SQL> select table_name,column_name,securefile,compression,deduplication from user_lobs;

TABLE_ COLUMN SEC COMPRE DEDUPLICATION
------ ------ --- ------ -------------
SECLOB B NO NONE NONE
As soon as the encrypt option is specified following error is thrown. Encrypt option could be specified in two places but not at the same time
SQL> create table seclob (a number, b blob)  lob(b) store as securefile(encrypt using 'AES128');
create table seclob (a number, b blob) lob(b) store as securefile(encrypt using 'AES128')
*
ERROR at line 1:
ORA-28365: wallet is not open


SQL> create table seclob (a number, b blob encrypt using 'AES128') lob(b) store as securefile;
create table seclob (a number, b blob encrypt using 'AES128') lob(b) store as securefile
*
ERROR at line 1:
ORA-28365: wallet is not open

SQL> create table seclob (a number, b blob) lob(b) store as securefile(compress encrypt identified by asanga);
create table seclob (a number, b blob) lob(b) store as securefile(compress encrypt identified by asanga)
*
ERROR at line 1:
ORA-28365: wallet is not open
If a wallet is created and if it is open then this create statement will succeed and encrypt option will be ignored.

Create the default wallet directory if does not exists. If this directory doesn't exists "ORA-28368: cannot auto-create wallet" will be thrown
mkdir -p $ORACLE_BASE/admin/$ORACLE_SID/wallet

sqlplus / as sysdba

SQL> alter system set encryption key identified by "asanga321";

System altered.
Create the securefile with all options
create table seclob (a number, b blob)  lob(b) store as securefile(compress encrypt using 'AES128' deduplicate);

Table created.

SQL> select table_name,column_name,securefile,compression,deduplication,encrypt from user_lobs;

TABLE_ COLUM SEC COMPRE DEDUPLICATION ENCR
------ ----- --- ------ --------------- ----
SECLOB B NO NONE NONE NONE
Closing the wallet and trying to create the table will fail as before
alter system set encryption wallet close;

SQL>drop table seclob;

SQL> create table seclob (a number, b blob) lob(b) store as securefile(compress encrypt using 'AES128' deduplicate);
create table seclob (a number, b blob) lob(b) store as securefile(compress encrypt using 'AES128' deduplicate)
*
ERROR at line 1:
ORA-28365: wallet is not open

SQL> alter system set encryption wallet open identified by "asanga321";

SQL> create table seclob (a number, b blob) lob(b) store as securefile(compress encrypt using 'AES128' deduplicate);

Table created.


compatible.asm and compatible.rdbms attribute changes

compatible.asm controls the format of data structures of ASM metadata on disk and compatible.rdbms is the minimum rdbms compatible version that is allowed the mount the disk group. Values once increased cannot be reverted back to previous lower level.

It is interesting to see that compatible.asm attribute could only be set to major version values such as 10.1 (default), 11.1 and 11.2 but no minor version values such as 10.1.0.2 nor to 10.2.

All those other values are qunatized by the diskgroup.

Setting for 10.1 minor version
SQL> alter diskgroup reduntest set attribute 'compatible.asm'='10.1.0.2';

Diskgroup altered.

Sun Sep 12 01:20:44 2010
SQL> alter diskgroup reduntest set attribute 'compatible.asm'='10.1.0.2'
NOTE: Advancing ASM compatibility to 10.1.0.2.0 for grp 3
NOTE: Quantized compatibility from 10.1.0.2.0
to 10.1.0.0.0

NOTE: initiating PST update: grp = 3
kfdp_update(): 136
Sun Sep 12 01:20:46 2010
kfdp_updateBg(): 136
NOTE: Advancing compatible.asm on grp 3 disk RED1
NOTE: Advancing compatible.asm on grp 3 disk RED2
NOTE: group REDUNTEST: updated PST location: disk 0000 (PST copy 0)
NOTE: group REDUNTEST: updated PST location: disk 0001 (PST copy 1)
NOTE: PST update grp = 3 completed successfully
SUCCESS: Advanced compatible.asm to 10.1.0.0.0 for grp 3
SUCCESS: alter diskgroup reduntest set attribute 'compatible.asm'='10.1.0.2'
Sun Sep 12 01:20:46 2010
kfdp_query(REDUNTEST): 137
kfdp_queryBg(): 137
NOTE: Instance updated compatible.asm to 10.1.0.0.0 for grp 3
No row is added to v$asm_attribute view.

Changing compatible.asm to 10.2 and 10.2 minor versions
SQL> alter diskgroup dg1 set attribute 'compatible.rdbms'='10.2';

Diskgroup altered.

Sun Sep 12 01:04:30 2010
SQL> alter diskgroup dg1 set attribute 'compatible.asm'='10.2'
NOTE: Advancing ASM compatibility to 10.2.0.0.0 for grp 4
NOTE: Quantized compatibility from 10.2.0.0.0
to 10.1.0.0.0

NOTE: initiating PST update: grp = 4
kfdp_update(): 126
Sun Sep 12 01:04:31 2010
kfdp_updateBg(): 126
NOTE: Advancing compatible.asm on grp 4 disk RED3
NOTE: Advancing compatible.asm on grp 4 disk RED4
NOTE: group DG1: updated PST location: disk 0000 (PST copy 0)
NOTE: group DG1: updated PST location: disk 0001 (PST copy 1)
NOTE: PST update grp = 4 completed successfully
SUCCESS: Advanced compatible.asm to 10.1.0.0.0 for grp 4
SUCCESS: alter diskgroup dg1 set attribute 'compatible.asm'='10.2'

SQL> alter diskgroup dg1 set attribute 'compatible.rdbms'='10.2.0.0';

Diskgroup altered.

Sun Sep 12 01:08:06 2010
SQL> alter diskgroup dg1 set attribute 'compatible.asm'='10.2.0.0'
NOTE: Advancing ASM compatibility to 10.2.0.0.0 for grp 4
NOTE: Quantized compatibility from 10.2.0.0.0
to 10.1.0.0.0

NOTE: initiating PST update: grp = 4
kfdp_update(): 128
Sun Sep 12 01:08:07 2010
kfdp_updateBg(): 128
NOTE: Advancing compatible.asm on grp 4 disk RED3
NOTE: Advancing compatible.asm on grp 4 disk RED4
NOTE: group DG1: updated PST location: disk 0000 (PST copy 0)
NOTE: group DG1: updated PST location: disk 0001 (PST copy 1)
NOTE: PST update grp = 4 completed successfully
SUCCESS: Advanced compatible.asm to 10.1.0.0.0 for grp 4
Sun Sep 12 01:08:07 2010
kfdp_query(DG1): 129
SUCCESS: alter diskgroup dg1 set attribute 'compatible.asm'='10.2.0.0'
kfdp_queryBg(): 129
NOTE: Instance updated compatible.asm to 10.1.0.0.0 for grp 4

SQL> alter diskgroup dg1 set attribute 'compatible.rdbms'='10.2.0.5';

Diskgroup altered.

Sun Sep 12 01:08:41 2010
SQL> alter diskgroup dg1 set attribute 'compatible.asm'='10.2.0.5'
NOTE: Advancing ASM compatibility to 10.2.0.5.0 for grp 4
NOTE: Quantized compatibility from 10.2.0.5.0
to 10.1.0.0.0

NOTE: initiating PST update: grp = 4
kfdp_update(): 130
Sun Sep 12 01:08:43 2010
kfdp_updateBg(): 130
NOTE: Advancing compatible.asm on grp 4 disk RED3
NOTE: Advancing compatible.asm on grp 4 disk RED4
NOTE: group DG1: updated PST location: disk 0000 (PST copy 0)
NOTE: group DG1: updated PST location: disk 0001 (PST copy 1)
NOTE: PST update grp = 4 completed successfully
SUCCESS: Advanced compatible.asm to 10.1.0.0.0 for grp 4
Sun Sep 12 01:08:43 2010
kfdp_query(DG1): 131
SUCCESS: alter diskgroup dg1 set attribute 'compatible.asm'='10.2.0.5'
kfdp_queryBg(): 131
NOTE: Instance updated compatible.asm to 10.1.0.0.0 for grp 4
No row added to v$asm_attribute.

Even a non-existing version could be specified and diskgroup would quantiz it to 10.1
SQL> alter diskgroup dg1 set attribute 'compatible.asm'='10.3.0.5';

Diskgroup altered.

SQL> alter diskgroup dg1 set attribute 'compatible.asm'='10.3.0.5'
NOTE: Advancing ASM compatibility to 10.3.0.5.0 for grp 4
NOTE: Quantized compatibility from 10.3.0.5.0
to 10.1.0.0.0

NOTE: initiating PST update: grp = 4
kfdp_update(): 132
Sun Sep 12 01:10:58 2010
kfdp_updateBg(): 132
NOTE: Advancing compatible.asm on grp 4 disk RED3
NOTE: Advancing compatible.asm on grp 4 disk RED4
NOTE: group DG1: updated PST location: disk 0000 (PST copy 0)
NOTE: group DG1: updated PST location: disk 0001 (PST copy 1)
NOTE: PST update grp = 4 completed successfully
SUCCESS: Advanced compatible.asm to 10.1.0.0.0 for grp 4
Sun Sep 12 01:10:58 2010
kfdp_query(DG1): 133
SUCCESS: alter diskgroup dg1 set attribute 'compatible.asm'='10.3.0.5'
kfdp_queryBg(): 133
NOTE: Instance updated compatible.asm to 10.1.0.0.0 for grp 4
Finally updating to 11.1 version
SQL> alter diskgroup dg1 set attribute 'compatible.asm'='11.1';

Diskgroup altered.

SQL> alter diskgroup dg1 set attribute 'compatible.asm'='11.1'
NOTE: Advancing ASM compatibility to 11.1.0.0.0 for grp 4
NOTE: initiating PST update: grp = 4
kfdp_update(): 134
Sun Sep 12 01:11:25 2010
kfdp_updateBg(): 134
NOTE: Advancing compatible.asm on grp 4 disk RED3
NOTE: Advancing compatible.asm on grp 4 disk RED4
NOTE: group DG1: updated PST location: disk 0000 (PST copy 0)
NOTE: group DG1: updated PST location: disk 0001 (PST copy 1)
NOTE: PST update grp = 4 completed successfully
SUCCESS: Advanced compatible.asm to 11.1.0.0.0 for grp 4
Sun Sep 12 01:11:25 2010
kfdp_query(DG1): 135
SUCCESS: alter diskgroup dg1 set attribute 'compatible.asm'='11.1'
kfdp_queryBg(): 135
NOTE: Instance updated compatible.asm to 11.1.0.0.0 for grp 4
v$asm_attribute view is only populated when an attribute is "really" changed.

Unlike the compatible.asm the compatible.rdbms does accept 10.2 and other minor versions.
SQL> alter diskgroup dg1 set attribute 'compatible.rdbms'='10.2';

Diskgroup altered.

Sun Sep 12 01:13:54 2010
SQL> alter diskgroup dg1 set attribute 'compatible.rdbms'='10.2'
NOTE: Advancing RDBMS compatibility to 10.2.0.0.0 for grp 4
SUCCESS: Advanced compatible.rdbms to 10.2.0.0.0 for grp 4
SUCCESS: alter diskgroup dg1 set attribute 'compatible.rdbms'='10.2'
Advancing to minor versions
SQL> alter diskgroup dg1 set attribute 'compatible.rdbms'='10.2.0.5';

Diskgroup altered.

Sun Sep 12 01:14:26 2010
SQL> alter diskgroup dg1 set attribute 'compatible.rdbms'='10.2.0.5'
NOTE: Advancing RDBMS compatibility to 10.2.0.5.0 for grp 4
SUCCESS: Advanced compatible.rdbms to 10.2.0.5.0 for grp 4
SUCCESS: alter diskgroup dg1 set attribute 'compatible.rdbms'='10.2.0.5'
v$asm_attribute is populated with the manually set attributes.
SQL> select name,value from v$asm_attribute;

NAME VALUE
-------------------- ----------
disk_repair_time 3.6h
au_size 1048576
compatible.asm 11.1.0.0.0
compatible.rdbms 10.2.0.5


Friday, September 10, 2010

Cloning Oracle Homes

Oracle Homes could be cloned with either $ORACLE_HOME/clone/bin/clone.pl or $ORACLE_HOME/oui/bin/runInstaller.

Using clone.pl

1. Run prepare_clone.pl on the source oracle home before it's copied to destination and tar the source oracle home and copy to destination.

2. Extract the oracle home at the destination server.

3. set ORACLE_BASE and ORACLE_HOME variables (not necessary) and run the clone.pl
perl clone.pl ORACLE_HOME=/opt/app/oracle/product/10.2.0/ent ORACLE_HOME_NAME=10ghome
./runInstaller -silent -clone -waitForCompletion  "ORACLE_HOME=/opt/app/oracle/product/10.2.0/ent" "ORACLE_HOME_NAME=10ghome" -noConfig -nowait
Starting Oracle Universal Installer...

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2010-09-10_10-41-16AM. Please wait ...Oracle Universal Installer, Version 10.2.0.5.0 Production
Copyright (C) 1999, 2010, Oracle. All rights reserved.

SEVERE:1. OUI-10035:You do not have permission to write to the inventory location.
OR
2. OUI-10033:The inventory location /opt/app/oraInventory set by the previous installation session is no longer accessible. Do you still want to continue by creating a new inventory? Note that you may lose the products installed in the earlier session.
SEVERE:OUI-10180:Either a component of the path prefix or the file referred to by path does not exist or is a null pathname.
As shown in the inventory creation post when oracle base is set the inventory location deviate from default 10g inventory position. To fix the problem oraInventory directory could be pre-created or ORACLE_BASE could be unset, in the second case oraInventory will be created in /home/oracle.

3. After fixing the issue with oraInventory run the command again
unset ORACLE_BASE
unset ORACLE_HOME
perl clone.pl ORACLE_HOME=/opt/app/oracle/product/10.2.0/ent ORACLE_HOME_NAME=10ghome
./runInstaller -silent -clone -waitForCompletion  "ORACLE_HOME=/opt/app/oracle/product/10.2.0/ent" "ORACLE_HOME_NAME=10ghome" -noConfig -nowait
Starting Oracle Universal Installer...

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2010-09-10_10-43-12AM. Please wait ...Oracle Universal Installer, Version 10.2.0.5.0 Production
Copyright (C) 1999, 2010, Oracle. All rights reserved.

You can find a log of this install session at:
/home/oracle/oraInventory/logs/cloneActions2010-09-10_10-43-12AM.log
.................................................................................................... 100% Done.

Installation in progress (Friday, September 10, 2010 10:43:28 AM BST)
...........................................................................                                                     75% Done.
Install successful

Linking in progress (Friday, September 10, 2010 10:43:39 AM BST)
Link successful

Setup in progress (Friday, September 10, 2010 10:44:17 AM BST)
Setup successful

End of install phases.(Friday, September 10, 2010 10:44:21 AM BST)
WARNING:A new inventory has been created in this session. However, it has not yet been registered as the central inventory of this system.
To register the new inventory please run the script '/home/oracle/oraInventory/orainstRoot.sh' with root privileges.
If you do not register the inventory, you may not be able to update or patch the products you installed.
The following configuration scripts need to be executed as the "root" user.
#!/bin/sh
#Root script to run
/home/oracle/oraInventory/orainstRoot.sh
/opt/app/oracle/product/10.2.0/ent/root.sh
To execute the configuration scripts:
1. Open a terminal window
2. Log in as "root"
3. Run the scripts

The cloning of 10ghome was successful.
Please check '/home/oracle/oraInventory/logs/cloneActions2010-09-10_10-43-12AM.log' for more details.
oraInst.loc would have been created inside oraInventory (this is linux x86_64) instead of in /etc


Using runInstaller

1. Step 1,2 are same as above.

2. Run the runInstaller command from $ORACLE_HOME/oui/bin
./runInstaller -silent -clone ORACLE_HOME=/opt/app/oracle/product/10.2.0/ent ORACLE_HOME_NAME="10g2home1"
Starting Oracle Universal Installer...

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2010-09-10_11-49-48AM. Please wait ...
Oracle Universal Installer, Version 10.2.0.5.0 Production
Copyright (C) 1999, 2010, Oracle. All rights reserved.

You can find a log of this install session at:
/home/oracle/oraInventory/logs/cloneActions2010-09-10_11-49-48AM.log
.................................................................................................... 100% Done.

Installation in progress (Friday, September 10, 2010 11:50:05 AM BST)
...........................................................................                                                     75% Done.
Install successful

Linking in progress (Friday, September 10, 2010 11:50:15 AM BST)
Link successful

Setup in progress (Friday, September 10, 2010 11:50:55 AM BST)
Setup successful

End of install phases.(Friday, September 10, 2010 11:50:59 AM BST)
WARNING:A new inventory has been created in this session. However, it has not yet been registered as the central inventory of this system.
To register the new inventory please run the script '/home/oracle/oraInventory/orainstRoot.sh' with root privileges.
If you do not register the inventory, you may not be able to update or patch the products you installed.
The following configuration scripts need to be executed as the "root" user.
#!/bin/sh
#Root script to run
/home/oracle/oraInventory/orainstRoot.sh
/opt/app/oracle/product/10.2.0/ent/root.sh
To execute the configuration scripts:
1. Open a terminal window
2. Log in as "root"
3. Run the scripts

The cloning of 10g2home1 was successful.
Please check '/home/oracle/oraInventory/logs/cloneActions2010-09-10_11-49-48AM.log' for more details.
Same as before oraInst.loc is inside oraInventory not in /etc.

On 11gR1 and 11gR2 ORACLE_BASE must also be specified in the command line if not cloning will fail.
./runInstaller -silent -clone ORACLE_HOME=/opt/app/oracle/product/11.2.0/ent ORACLE_HOME_NAME="11g2home1"
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 16002 MB    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2010-09-10_11-07-10AM. Please wait ...
Oracle Universal Installer, Version 11.2.0.1.0 Production
Copyright (C) 1999, 2009, Oracle. All rights reserved.

You can find the log of this install session at:
/home/oracle/oraInventory/logs/cloneActions2010-09-10_11-07-10AM.log
Values for the following variables could not be obtained from the command line or response file(s):
ORACLE_BASE
Cloning cannot continue.
This is same even if clone.pl was used
perl clone.pl ORACLE_HOME=/opt/app/oracle/product/11.2.0/ent ORACLE_HOME_NAME="11g2home1"
ERROR: Invalid Oracle Base specified. Aborting the clone operation. 
Once oracle base is specified cloning will suceed.
./runInstaller -silent -clone ORACLE_HOME=/opt/app/oracle/product/11.2.0/ent ORACLE_HOME_NAME="11g2home1" ORACLE_BASE=/opt/app/oracle
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 16002 MB    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2010-09-10_12-12-04PM. Please wait ...
Oracle Universal Installer, Version 11.2.0.1.0 Production
Copyright (C) 1999, 2009, Oracle. All rights reserved.

You can find the log of this install session at:
/home/oracle/oraInventory/logs/cloneActions2010-09-10_12-12-04PM.log
.................................................................................................... 100% Done.

Installation in progress (Friday, September 10, 2010 12:12:14 PM BST)
.............................................................................                                                   77% Done.
Install successful

Linking in progress (Friday, September 10, 2010 12:12:20 PM BST)
Link successful

Setup in progress (Friday, September 10, 2010 12:12:51 PM BST)
Setup successful

End of install phases.(Friday, September 10, 2010 12:13:38 PM BST)
Starting to execute configuration assistants
Configuration assistant "Oracle Configuration Manager Clone" succeeded
WARNING:A new inventory has been created in this session. However, it has not yet been registered as the central inventory of this system.
To register the new inventory please run the script '/home/oracle/oraInventory/orainstRoot.sh' with root privileges.
If you do not register the inventory, you may not be able to update or patch the products you installed.
The following configuration scripts need to be executed as the "root" user.
/home/oracle/oraInventory/orainstRoot.sh
/opt/app/oracle/product/11.2.0/ent/root.sh
To execute the configuration scripts:
1. Open a terminal window
2. Log in as "root"
3. Run the scripts

The cloning of 11g2home1 was successful.
Please check '/home/oracle/oraInventory/logs/cloneActions2010-09-10_12-12-04PM.log' for more details.


Creating Oracle Inventory

Oracle inventory could be created with runInstaller -attachHome option. Unlike the cloning process which includes creation of oracle inventory as well as relinking all libraries this only creates an inventory. Libraries must be manually relinked with relink -all option if have to. This option is prefered when inventory is damaged as it doesn't requires any relinking.

This method is also useful When copying an oracle home to a new server which doesn't have an oracle inventory but manual relinking of libraries is needed afterwards.

1. tar the source binaries
tar --exclude=excludeList -cvzf 10g2.tgz /opt/app/oracle/product/10.2.0/ent
Exclude log files, EM console related files in OC4J

2. Extract the tar file at the destination.

3. Set Oracle Base and Oracle Home variables, it is not necessary, if Oracle Base is not set and no oraInst.loc file exists oraInventory will be created in /home/oracle
export ORACLE_BASE=/opt/app/oracle
export ORACLE_HOME=`pwd`
export ORACLE_HOME=/opt/app/oracle/product/10.2.0/ent
cd $ORACLE_HOME/oui/bin/

./runInstaller -silent -attachHome ORACLE_HOME=$ORACLE_HOME ORACLE_HOME_NAME="10ghome"
Starting Oracle Universal Installer...

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
OiiolLogger.addFileHandler:Error while adding file handler - /opt/app/oraInventory/logs/AttachHome2010-09-08_02-40-58PM.log
java.io.FileNotFoundException: /opt/app/oraInventory/logs/AttachHome2010-09-08_02-40-58PM.log (No such file or directory)
'AttachHome' failed.
As seen from above output runInstaller doesn't creates the oraInventory inside the oracle base directorywhich is the default location when installing 10g. (Oracle binaries used here have been patched to 10.2.0.5).

4. To fix this problem create the oraInventory manually
mkdir -p /opt/app/oraInventory/
chown oracle:oinstall oraInventory
chmod 770 oraInventory
5. Run the command again
./runInstaller -silent -attachHome ORACLE_HOME=$ORACLE_HOME ORACLE_HOME_NAME="10ghome"
Starting Oracle Universal Installer...

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /opt/app/oraInventory
Please execute the '/opt/app/oraInventory/orainstRoot.sh' script at the end of the session.
'AttachHome' was successful.

/opt/app/oraInventory/orainstRoot.sh
Changing permissions of /opt/app/oraInventory to 770.
Changing groupname of /opt/app/oraInventory to oinstall.
The execution of the script is complete
This would have created the necessary files inside the oraInventory, added the oracle home to inventory and oraInst.loc file in /etc.

5. Use relink -all to relink the oracle binaries.