Wednesday, June 30, 2010

Upgrading 10.2.0.5 Clusterware, ASM, RAC to 11.1.0.7

This uses the installation and databases created in installing 10gR2 on RHEL5, which uses raw devices for OCR, Vote, ASM spfile and ASM disks (instead of ASMLib)

This is a brief blog with only the main points. More on metalink note 338706.1 and Oracle upgrade guide

Upgrade the clusterware to 11.1.0.6


1. Run preupdate.sh as root user, which is available in the clusteware installation.

2. runInstaller, it automatically selects the existing clusterware installation, click next, next.... and install.

3. After the installation run rootupgrade.sh as root
[root@hpc1 oracle]# /opt/app/crs/product/10.2.0/crs/install/rootupgrade
Checking to see if Oracle CRS stack is already up...

copying ONS config file to 11.1 CRS home
/bin/cp: `/opt/app/crs/product/10.2.0/crs/opmn/conf/ons.config' and `/opt/app/crs/product/10.2.0/crs/opmn/conf/ons.config' are the same file
/opt/app/crs/product/10.2.0/crs/opmn/conf/ons.config was copied successfully to /opt/app/crs/product/10.2.0/crs/opmn/conf/ons.config
Oracle Cluster Registry configuration upgraded successfully
Adding daemons to inittab
Attempting to start Oracle Clusterware stack
Waiting for Cluster Synchronization Services daemon to start
Cluster Synchronization Services daemon has started
Event Manager daemon has started
Cluster Ready Services daemon has started
Oracle CRS stack is running under init(1M)
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: hpc1 hpc1-pvt hpc1
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
clscfg -upgrade completed successfully
CRS stack on this node, is successfully upgraded to 11.1.0.6.0
Checking the existence of nodeapps on this node
Creating '/opt/app/crs/product/10.2.0/crs/install/paramfile.crs' with data used for CRS configuration
Setting CRS configuration values in /opt/app/crs/product/10.2.0/crs/install/paramfile.crs
4. Click OK on OUI which runs the cluvfy

5. All done! clusterware (on 11.1.0.6), including ASM and RAC (on 10.2.0.5) is up and running.
[oracle@hpc1 clusterwaer ent11g2]$ crsctl query crs softwareversion hpc1
Oracle Clusterware version on node [hpc1] is [11.1.0.6.0]
[oracle@hpc1 clusterwaer ent11g2]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.1.0.6.0]


Upgrade the clusterware to 11.1.0.7


1. runInstaller

2. Click bunch of next buttons and install and then in the end run root111.sh as root user
[root@hpc1 oracle]# /opt/app/crs/product/10.2.0/crs/install/root111.sh
Creating pre-patch directory for saving pre-patch clusterware files
Completed patching clusterware files to /opt/app/crs/product/10.2.0/crs
Relinking some shared libraries.
Relinking of patched files is complete.
Preparing to recopy patched init and RC scripts.
Recopying init and RC scripts.
Startup will be queued to init within 30 seconds.
Starting up the CRS daemons.
Waiting for the patched CRS daemons to start.
This may take a while on some systems.
.
11107 patch successfully applied.
clscfg: EXISTING configuration version 4 detected.
clscfg: version 4 is 11 Release 1.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: hpc1 hpc1-pvt hpc1
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
clscfg -upgrade completed successfully
Creating '/opt/app/crs/product/10.2.0/crs/install/paramfile.crs' with data used for CRS configuration
Setting CRS configuration values in /opt/app/crs/product/10.2.0/crs/install/paramfile.crs
Setting cluster unique identifier
Restarting Oracle clusterware
Stopping Oracle clusterware
Stopping resources.
This could take several minutes.
Successfully stopped Oracle Clusterware resources
Stopping Cluster Synchronization Services.
Shutting down the Cluster Synchronization Services daemon.
Shutdown request successfully issued.
Waiting for Cluster Synchronization Services daemon to stop
Cluster Synchronization Services daemon has stopped
Starting Oracle clusterware
Attempting to start Oracle Clusterware stack
Waiting for Cluster Synchronization Services daemon to start
Waiting for Cluster Synchronization Services daemon to start
Waiting for Cluster Synchronization Services daemon to start
Waiting for Cluster Synchronization Services daemon to start
Cluster Synchronization Services daemon has started
Event Manager daemon has started
Cluster Ready Services daemon has started
3. All done!! clustware is now upgraded to 11.1.0.7
[oracle@hpc1 ~ ent11g2]$ crsctl query crs softwareversion hpc1
Oracle Clusterware version on node [hpc1] is [11.1.0.7.0]
[oracle@hpc1 ~ ent11g2]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.1.0.7.0]


Upgrading ASM & RAC to 11.1.0.7

This setup used a single home for both ASM and Database. If a separate ASM home is installed as part of the upgrade then substitute it when appropriate

1. This is from the upgrade guide "You must choose an Oracle home directory for Oracle Database 11g Release 1 (11.1) that is separate from the Oracle home directory of your current release. You cannot install the new software into the same Oracle home directory as your current release, unless you are installing an Oracle Database 11g Release 1 (11.1) patch set release. For a patch set release, you can use the same Oracle Database 11g Release 1 (11.1) Oracle home."

2. Install the Oracle database 11g software (11.1.0.6) on a different location while clusterware is running as a cluster installation.

3. Upgrade it to 11.1.0.7

4. Upgrading ASM (more metalink note 736127.1)

4.1 Shutdown the instances running on the ASM instances to be upgraded.

4.2 Start the asm instance from the 10gR2 home if not already started.

4.3 From the newly installed 11gR1 home run dbua and select upgrade ASM instance


4.4 If a new listener is not created on 11gR1 then select yes on listener upgrade.


4.5 Click next-next and finish


4.6 Watch the progress


4.7 ASM upgrade is done!


5. Upgrading the RAC Database

5.1 Start the instances using 10g home

5.2 Run dbua from the new 11g home

5.3 Select upgrade database, continue as usual and click finish


5.4 During the upgrade got the following error but was able ignore and continue (screenshot from the summary at the end of upgrade)


5.5 End of upgrade with spatial and OLAP API invalid


5.6 Upgrade result


5.7 New entries automatically added to /etc/oratab
+ASM1:/opt/app/oracle/product/11.1.0/clusdb:N
clusdb:/opt/app/oracle/product/11.1.0/clusdb:N
5.8 Looking at the dba_registry from sql prompt didn't show olap as invalid.
COMP_NAME                      STATUS
------------------------------ ----------
Oracle Enterprise Manager VALID
OLAP Catalog VALID
Spatial INVALID
Oracle Multimedia VALID
Oracle XML Database VALID
Oracle Text VALID
Oracle Data Mining VALID
Oracle Expression Filter VALID
Oracle Rules Manager VALID
Oracle Workspace Manager VALID
Oracle Database Catalog Views VALID

COMP_NAME STATUS
------------------------------ ----------
Oracle Database Packages and T VALID
ypes

JServer JAVA Virtual Machine VALID
Oracle XDK VALID
Oracle Database Java Packages VALID
OLAP Analytic Workspace VALID
Oracle OLAP API VALID
Oracle Real Application Clusters VALID
Tried to run the DB scripts to fix the spatial component problem but still ended up with
OBJECT_NAME                    OBJECT_TYPE         STATUS
------------------------------ ------------------- ----------
QRY2OPT FUNCTION INVALID
SDO_PQRY FUNCTION INVALID
Googling showed it maybe related to spatial component version 11.1.0.7 specific errors. Didn't investigate any further.

Installing 10gR2 clusterware on RHEL 5

In RHEL/OEL 5 and onwards raw devices are deprecated (raw devices was initially deprecated in EL5 (GA-U3), but later undeprecated from EL5 U4). Therefore OCFS2 would be the only option to store OCR, VOTE disks and even ASM spfile. But there's a way to enable raw devices on RHEL/OEL 5 and process is explained in metalink note 465001.1 for singlepath raw devices and 564580.1 for multipath raw devices.

This blog is for singlepath.

1. Edit the scsi_id.config to return the device ID. This is done by adding option=-g to the /etc/scsi_id.config file.
# some libata drives require vpd page 0x80
vendor="ATA",options=-b 0x80
options=-g
2. Get the device IDs with
# /sbin/scsi_id -g -s /block/sdc/sdc1
360a98000686f6959684a45124114174
Note down these IDs they will be used in the next step

3. Create a udev rules file in
vi /etc/udev/rules.d/61-oracleraw.rules
and add the following lines
ACTION=="add", KERNEL=="sd*", PROGRAM=="/sbin/scsi_id", RESULT=="360a98000686f6959684a45124114174", RUN+="/bin/raw /dev/raw/raw1 %N"
ACTION=="add", KERNEL=="sd*", PROGRAM=="/sbin/scsi_id", RESULT=="360a98000686f6959684a451241131151", RUN+="/bin/raw /dev/raw/raw2 %N"
For each rule, if all specified keys (KERNEL, PROGRAM, RESULT) are matched and raw device created if one or more keys are unmatched, the rule is completely ignored and the default (arbitrary) kernel-assigned device file name is assigned to devices.
The 'RUN+=' directive, is always the very last directive to execute in a rule file, therefore a separate file is used to permission resultant raw devices.
4. Create another udev rules files to set the permission on the raw devices.
 vi /etc/udev/rules.d/65-raw-permissions.rules
and add the following entries
KERNEL=="raw1", OWNER="root", GROUP="oinstall", MODE="640"
KERNEL=="raw2", OWNER="oracle", GROUP="oinstall", MODE="640"
5.Check the udev rules with
udevtest /block/sdc/sdc1
The device path should be the one relative to sysfs directory /sys/block. Restart the udev service with
start_udev
and verify the raw devices are enabled with
raw -qa
and permission on raw devices with
ls -l /dev/raw/* 

Metalink note also describes a way to create named devices. This is not related to raw device bindings.
 vi etc/udev/rules.d/55-oracle-naming.rules
and add the following lines
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id", RESULT=="360a98000686f6959684a45124114174", NAME="ocr1", OWNER="root", GROUP="oinstall", MODE="0640"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id", RESULT=="360a98000686f6959684a451241131151", NAME="vote1", OWNER="oracle", GROUP="oinstall", MODE="0640"
For each rule, if all specified keys (KERNEL, BUS, PROGRAM, RESULT) are matched, the rule is applied and the specified assignments (NAME, OWNER, GROUP, MODE) are made and applied to the device. If, however, one or more keys are unmatched, the rule is completely ignored and the default (arbitrary) kernel-assigned device file name is assigned to devices.
Restart the udev service.

For system that are running on low-end hardware for demonstration or proof-of-concept purposes this might not work. The maxtor external device that was used to test this installation didn't give the device ID as the above command.
/sbin/scsi_id -g -s /block/sdc/sdc5
1Maxtor OneTouch 2HA1BNAG
In this case udev rules file could be created applying rules for each individual partition.
vi 61-oracleraw.rules
ACTION=="add", KERNEL=="sdc5", RUN+="/bin/raw /dev/raw/raw1 %N"
ACTION=="add", KERNEL=="sdc6", RUN+="/bin/raw /dev/raw/raw2 %N"
ACTION=="add", KERNEL=="sdc7", RUN+="/bin/raw /dev/raw/raw3 %N"
ACTION=="add", KERNEL=="sdc8", RUN+="/bin/raw /dev/raw/raw4 %N"
ACTION=="add", KERNEL=="sdc9", RUN+="/bin/raw /dev/raw/raw5 %N"
and another rules file for permissions.
vi 65-raw-permissions.rules
KERNEL=="raw1", OWNER="root", GROUP="oinstall", MODE="640"
KERNEL=="raw2", OWNER="oracle", GROUP="oinstall", MODE="660"
KERNEL=="raw3", OWNER="oracle", GROUP="oinstall", MODE="660"
KERNEL=="raw4", OWNER="oracle", GROUP="dba", MODE="660"
KERNEL=="raw5", OWNER="oracle", GROUP="dba", MODE="660"
Restart the udev service and verify raw devices are enabled.

There are several other issues when installing on RHEL5 and these are documented on metalin note 414163.1
1. To run the installater ignoreSysPrereqs option is needed

2. Virtual IP Configuration Assistant (VIPCA) failes at the end of root.sh. To solve this before running the root.sh edit
$CRS_HOME/bin/vipca
and unset the LD_ASSUME_KERNEL variable.
if [ "$arch" = "i686" -o "$arch" = "ia64" -o "$arch" = "x86_64" ]
then
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
fi

unset LD_ASSUME_KERNEL #add this line

Also edit the following files
$CRS_HOME/bin/srvctl
$RDBMS_HOME/bin/srvctl
$ASM_HOME/bin/srvctl
and unset the same variable as above
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL

unset LD_ASSUME_KERNEL # add this line
These changes may be necessary after applying the 10.2.0.2 or 10.2.0.3 patchsets, as these patchset will still include those settings unnecessary for OEL5 or RHEL5 or SLES10. It is fixed in the 10.2.0.4 and 10.2.0.5 patchsets.

3. VIP will again fail if the VIP IP's are in a non-routable range eg. 10.x.x.x, 172.[16-31].x.x, 192.168.x.x range. If the OUI window is open, click OK and it will create the "oifcfg" information. Run the VIPCA manually as root and once done retry the cluvfy part at the end of OUI, and it should succeed.
If not then use
oifcfg setif -global eth*
to set public and cluster_interconnect interfaces.

Installing RAC on RHEL 5 after the clusterware did not throw any errors.

Tuesday, June 29, 2010

Ubuntu Cloud Troubleshooting

Anyone interested in setting up a ubunut cloud could follow this tutorial.

Tried both 9.10 version as well as 10.04 versions,but still no luck in creating an instance on the created private cloud.

Examining the /var/log/eucalyptus/cc.log found the following line
[Wed May 19 14:23:04 2010][001218][EUCAWARN  ] vnetInitTunnels(): in MANAGED-NOVLAN mode, priv interface 
'eth0' must be a bridge, tunneling disabled
Changed the eth0 to a bridge
auto eth0
iface eth0 inet manual

auto br0
iface br0 inet static
address 192.168.0.74
netmask 255.255.255.0
network 192.168.0.0
broadcast 192.168.0.255
gateway 192.168.0.100
# dns-* options are implemented by the resolvconf package, if installed
dns-nameservers 100.00.00.00
dns-search domain.net
bridge_ports eth0
bridge_fd 9
bridge_hello 2
bridge_maxage 12
bridge_stp off

Edited the /etc/eucalyptus/eucalyptus.conf to include the br0 (changed the default etho)
# Affects: CC, NC
# See: **NOTE** below
ENABLE_WS_SECURITY="Y"
LOGLEVEL="DEBUG"
VNET_PUBINTERFACE="br0"
VNET_PRIVINTERFACE="br0"
VNET_MODE="MANAGED-NOVLAN"

Edited the /etc/eucalyptus/eucalyptus.local.conf to change the default subnet to a network specific one
# network configuration from the input configuration file
VNET_MODE="MANAGED-NOVLAN"
#VNET_SUBNET="172.19.0.0"
VNET_SUBNET="192.168.0.0"
VNET_NETMASK="255.255.0.0"
VNET_DNS="100.00.00.00"
VNET_ADDRSPERNET="32"
VNET_PUBLICIPS="192.168.0.76-192.168.0.86"
restarted with
sudo restart eucalyptus-cc CLEAN=1
and problem went away and was able to create an instance on the private cloud.

Wednesday, June 16, 2010

Maximum on Oracle Cluster

What are the maximum number of nodes under OCFS on Linux ?
Oracle 9iRAC on Linux, using OCFS for datafiles, can scale to a maximum of 32 nodes. According to the OCFS2 User Guide User Guide, OCFS 2 can support up to 255 nodes.

What is the maximum number of nodes I can have in my cluster if I am using OCFS2?
Theroetically you can have up to 255 however it has been tested with up to 16 nodes.

What is the maximum distance between nodes in an extended RAC environment?
The high impact of latency create practical limitations as to where this architecture can be deployed. While there is not fixed distance limitation, the additional latency on round trip on I/O and a one way cache fusion will have an affect on performance as distance increases. For example tests at 100km showed a 3-4 ms impact on I/O and 1 ms impact on cache fusion, thus the farther distance is the greater the impact on performance. This architecture fits best where the 2 datacenters are relatively close (<~25km) and the impact is negligible. Most customers implement under this distance w/ only a handful above and the farthest known example is at 100km. Largest distances than the commonly implemented may want to estimate or measure the performance hit on their application before implementing. Due ensure a proper setup of SAN buffer credits to limit the impact of distance at the I/O layer.

How many nodes can be had in an HP-UX/Solaris/AIX/Windows/Linux cluster?
The number of nodes supported is not limited by Oracle, but more generally by the clustering software/hardware in question.

When using solely Oracle Clusterware: 63 nodes (Oracle 9i or Oracle RAC 10g Release 1) With 10g Release 2, the maximum nodes is 100
When using a third party clusterware:
Sun: 8
HP UX: 16
HP Tru64: 8
IBM AIX:
* 8 nodes for Physical Shared (CLVM) SSA disk
* 16 nodes for Physical Shared (CLVM) non-SSA disk
* 128 nodes for Virtual Shared Disk (VSD)
* 128 nodes for GPFS
* Subject to storage subsystem limitations
Veritas: 8-16 nodes (check w/ Veritas)
For 3rd party vendor clusterware, please check with the vendor.

How many nodes are supported in a RAC Database?
With 10g Release 2, we support 100 nodes in a cluster using Oracle Clusterware, and 100 instances in a RAC database. Currently DBCA has a bug where it will not go beyond 63 instances. There is also a documentation bug for the max-instances parameter. With 10g Release 1 the Maximum is 63. In 9i it is platform specific due to the different clusterware support by vendors. See the platform specific FAQ for 9i. with 11g Release 1 and 11g Release 2 100

more on metalink note 220970.1

Wednesday, June 9, 2010

EM Console startup/shutdown Error in 11gR2 Cluster

Once the database is installed and started the EM console is automatically started.
Depending on how the PATH variable is set following error maybe encountered when trying to stop the EM console
[oracle@hpc1 ~ grid1]$ emctl stop dbconsole
Can't locate CompEMdbconsole.pm in @INC (@INC contains: %s_javapOracleHome%
/sysman/admin/scripts %s_javapOracleHome%/bin .....
The problem was as of 11gR2 both grid infrastructure and Oracle database have emctl in the bin directory.
/opt/app/11.2.0/grid/bin
[oracle@hpc1 bin grid1]$ ls | grep emctl
emctl
emctl.pl
emctl.template
Therefore when ORA_CRS_HOME\bin appear first in the PATH instead of or ORACLE_HOME\bin the above error will be thrown.
Also ORACLE_UNIQUENAME is required when starting the EM otherwise
Environment variable ORACLE_UNQNAME not defined. 
Please set ORACLE_UNQNAME to database unique name
which wasn't the case before.

Simply set the global database SID (not the instance SID)
export ORACLE_UNQNAME=grid
emctl start dbconsole

11gR2 Grid Infrastructure Offline Targets

When you do a fresh installation of Oracle 11gR2 Grid Infrastructure (formally clusterware) installation, will notice that serveral resources are offline.
crs_stat -t
HA Resource                                   Target     State
-----------                                   ------     -----   
ora.gsd                                       OFFLINE    OFFLINE
ora.hpc1.gsd                                  OFFLINE    OFFLINE
ora.oc4j                                      OFFLINE    OFFLINE
This is the expected behavior as per "Grid Infrastructure Installation Guide" section 5 and metalink notes 429966.1 and 1068835.1.

Global Service Daemon (10g and above) is to service requests for 9i RAC management clients and therefore when there are no 9i databases present, there is nothing for GSD to do.

Therefore if there's no 9i instances running on the 11gR2 grid you can safely ignore this target being offline.

Secondly ora.oc4j is offline in 11.2.0.1 as Database Workload Management(DBWLM) is unavailable.

Tuesday, June 1, 2010

Hang Analyze and System State Dumps.

Oracle support request hang analysis and system state dumps when rasing SR.

One 10.1 or higher versions login as
sqlplus -prelim / as sysdba
To do a hanganalyze
oradebug setmypid;
oradebug unlimit;
oradebug hanganalyze 3
Wait 60 - 90 seconds and run the last command again to identify the process state changes.

To get a systemstate dump

oradebug setmypid;
oradebug unlimit;
oradebug dump systemstate 266
Wait 60 - 90 seconds and run again to identify the system state changes.
If it is taking too long then cancel and run with level 258.
One a cluster environment use
oradebug -g all hanganalyze 3
for hang analyze and
oradebug -g all dump systemstate 266
for system state dumps.

In a cluster verify *diag* (admin/tbxdb/bdump/tbxdb1_diag_21606.trc) file is updated with information on each node.
more on metalink notes 452358.1 and 175006.1



On 19.5 with preliminary connection the unlimit command fails.
SQL> oradebug unlimit;
ORA-02096: specified initialization parameter is not modifiable with this option
MOS Doc 2506099.1 suggest using max_dump_file_size but that also fails with preliminary connection.
 sqlplus -prelim "/ as sysdba"

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Feb 20 09:14:48 2020
Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

SQL>  oradebug setmypid;
Statement processed.
SQL> alter session set max_dump_file_size=unlimited;
alter session set max_dump_file_size=unlimited
*
ERROR at line 1:
ORA-01012: not logged on
Process ID: 0
Session ID: 0 Serial number: 0
A workaround is to use an existing process ID with "oradebug setospid {SMON or PMON PID}" as explained in 452358.1.

ORA-01460 when foreign key column on different scale

ORA-01460: unimplemented or unreasonable conversion requested will be thrown when you try to delete from parent table with foreign key column with "on delete cascade" on a deferent scale.

Test as below

create table x (a number, b varchar2(100 byte) primary key);
create table y (c varchar2(30 BYTE), d number,
foreign key(c) references x (b) on delete cascade);

Insert into X (A,B) values (2,'abcdefghijklmnopqrstuvwxyzabcdefg');
Insert into X (A,B) values (3,'abcdefghijklmnopqrstuvwxyzabcdefgi');
Insert into X (A,B) values (4,'abcdefghijklmnopqrstuvwxyz abcdf');
Insert into X (A,B) values (5,'abcdefghijklmnopqrstuvwxyz abcdfg');
Insert into X (A,B) values (6,'abcdefghijklmnopqrstuvwxyz abcdfgi');


Try
 delete from x where a = 3;

and
delete from x where a = 3
*
ERROR at line 1:
ORA-01460: unimplemented or unreasonable conversion requested
will be thrown on Oracle versions 10.2 and 11.1 but not in 11.2

On 11.2 the delete operation will suceed without an error.

To fix the problem both referred and referrenced columns should have the same scale
ie. both should be varchar2(100 byte) for this example. Once the change is made delete operation will work on both 10.2 and 11.1