Friday, September 4, 2009

Oracle 11gR2 Grid Infrastructure Installation

One of the new "features" of Oracle 11gR2 is the Grid Infrastructure. Oracle 11gR2 Grid Infrastructure is the collection of infrastructural components provided for Oracle Database and other software.
I tested the installation of Oracle Grid Infrastructure on two virtual machines running OEL 5.3, operating under VMware Server 1.08. I suppose it is not supported this way, but it can give an idea of what to expect during the installation. The screenshots below show the installation procedure. I will add some comments when appropriate, especially to emphasize on new features. NOTE: Click on the screenshot to get a larger view.


The first installation screen. The bullet shows which option I chose here.

Usually the Advanced option is for experienced users. Well, at least I think I am experienced, I suppose...

Languages. Anyone ever chose additional languages here? I never did.

Nice new feature here: SCAN (Single Client Access Name): All the servers in the cluster will act upon a single hostname. This way, the cluster becomes completely transparent.
GNS: If you enable Grid Naming Service (GNS), then name resolution requests to the cluster are delegated to the GNS, which is listening on the GNS virtual IP address. You define this address in the DNS domain before installation. The DNS must be configured to delegate resolution requests for cluster names (any names in the subdomain delegated to the cluster) to the GNS. When a request comes to the domain, GNS processes the requests and responds with the appropriate addresses for the name requested.
To use GNS, before installation the DNS administrator must establish DNS Lookup to direct DNS resolution of a subdomain to the cluster. If you enable GNS, then you must have a DHCP service on the public network that allows the cluster to dynamically allocate the virtual IP addresses as required by the cluster.
Validation is performed... and accepted!

As a result of enabling GNS, I can no longer assign VIP addresses to my cluster node names. These addresses are assigned by GNS from now on. Of course, when I wouldn't have enabled GNS, I were able to assign the addresses in this screen.

Regular SSH connectivity test

Network Interface Usage specification. Should be no surprise for anyone experienced in 10g Clusterware.

Now, I will come to the choice of my storage solution. Before I started the installation I added another shared disk, enabled ASMlib on both servers and created an ASM Disk. Therefore, I can now choose ASM for storage.

Choose the diskgroup name and assign disks to the diskgroup.

Passwords. Be careful here! Passwords should comply with Oracle Recommendations, however you can ignore them. You will be notified when something is wrong or should be reviewed according to Oracle:


I am not using IPMI (Intelligent Platform Management Interface) so, I didn't select this option.
IPMI provides a set of common interfaces to computer hardware and firmware that system administrators can use to monitor system health and manage the system. With Oracle 11g release 2, Oracle Clusterware can integrate IPMI to provide failure isolation support and to ensure cluster integrity.
You can configure node-termination with IPMI during installation by selecting a node-termination protocol, such as IPMI. You can also configure IPMI after installation with crsctl commands.

Here you can assign different groups to specific management options.
ASM Database Administrator (OSDBA)
ASM Instance Administration Operator (OSOPER)
ASM Instance Administrator (OSASM)
Oracle has separated the privileges for each of these groups of administrators. Therefore, when I chose dba as group for each of these, I got a warning. Under normal (production operational) circumstances, I should have created three different groups and assigned these groups to different users on my system. Well, I have got only one user on my test environment: oracle, so what is the use of setting up three separate groups? Anyway, the idea is clear and it is good practice to follow these guidelines.

Specify the locations of the Oracle Base and Oracle Grid Home. Oracle Grid Home should be installed parallel to the Oracle Base, so not inside the Oracle Base!

File locations are being checked on both servers.

New feature: Maybe it is better to say there is an enhancement here. The system is being verified thouroughly for prerequisites. The result can be seen in the next screen shots:

Uhh. I have got some work to do here...
Nice new feature : the fixup script. Everything that Oracle can change in a script, a script is provided for by the installer. Just run the script and those checks that failed and are Fixable are being fixed. Don't worry about prerequisite checks before running the installer. Just let the installer do the work for you. It'll report everything you need to fix before installing. Just push the "Check Again" button as many times as you wish until everything is reported as passed.

Execute the runfixup.sh script:

...And Check Again...
Well, I didn't have enough memory and swap. Also, NTP wasn't started with the -x option... Who cares? - We'll see at the end of this weblog...
...summary...

Start the installation by clickin on Finish.
This is what you'll see during installation:
Tap, tap... :-)

...tap...!
We're almost there! Just run these two scripts and we're finished with the installation of Grid Infrastructure.
Well, that is what I thought. It took my virtual machines almost all night to finish the last script (root.sh in Grid Home). In fact, the script timed out after an hour or so, and the Virtual Machine practically froze... After having waited a couple of hours it even rebooted my entire machine (Including the Host OS!).
I will have to assign some more memory to the guests and see if I can finish the installation, or whether that is needed at all. However, it is weekend now, my wife and kids will not like me sitting behind my laptop all weekend. We'll be back next week...

68 comments:

  1. Good article. BTW there's an environment to configure for Grid? With oracle you have at least ORACLE_BASE, ORACLE_HOME, ORACLE_SID and for grid?

    ReplyDelete
  2. Ste,
    Thanks for the compliment. Basically, it is not necessary to set any environment, because if you omit this, you will be asked for it during installation. ORACLE_BASE is recommended to be set, the Grid Infrastructure Home should be installed parallel to the ORACLE_BASE.

    ReplyDelete
  3. Thanks Sir. But i am facing problem in executing root.sh script. When i execute root.sh script through root user it shows error 'Improper Oracle Clusterware Configuration found on this machine first run rootcrs.pl -deconfig script then re-run root.sh script'. when i want to run rootcrs.pl script again it shows error CRS-4013 and CRS-40000 Please Help me. My OS is SUSE Linux Enterprise Server 11.

    ReplyDelete
  4. Shaheer, are you performing an upgrade or a fresh installation?

    ReplyDelete
  5. I am performing New Installation. Oracle 11gR2 Grid Infrastructure for Standalone Server. Installation faild because ACFS/ADVM are not supported on SUSE. Any Solution Plz Help

    ReplyDelete
  6. Well, you have a difficult choice to make, let SUSE prevail or ACFS/ADVM. If you require ACFS, you should either wait for certificatioin (if you don't want to let go of SUSE) or move to OEL5, because OEL5 does support it.
    The alternative for ACFS is OCFS2. Make sure you install ocfs2 version 1.4 or higher, because that version is required for the memory mapping feature that Oracle 11gR2 is using. Or you could just stick to ASM disks to store your database on. Remember that ACFS delivers easy file access (for database files). On the other hand: It is an additional layer which is added to your infrastructure, taking resources away from your system. If you want further contact on the issue, please contact me directly by e-mail. See my profile to get my contact details.

    ReplyDelete
  7. is it possible to setup SCAN without N/W administrator just for testing 11gr2 Cluster installation

    ReplyDelete
  8. Well, you can set up SCAN without the Network Admin, however you would still have to setup the subdomain delegation in DNS and DHCP, for which you would require a network admin, unless you can setup something separately yourself.
    SCAN is created and instantiated by Oracle and maintained by Oracle. The way to find the SCAN VIPs is done via GNS, which cannot function without DNS delegation.
    The alternative is to put a SCAN VIP address in your hosts file, but this will result in only one SCAN interface in your infrastructure, since the hosts file is only able to resolve a name to a single IP address.

    ReplyDelete
  9. The followinhg components werent installed how can I installed them manually ???
    Oracle Net Configuration Assistant
    Automatic Storage Management Configuration Assistant
    Oracle Private Interconnect Configuration Assistant
    Oracle Cluster Verification Utility

    ReplyDelete
  10. You can select them manually by choosing an advanced installation - custom installation. You can install each of the products separately.
    Did the installer mention these products not being installed? How are you trying to run these products?

    ReplyDelete
  11. I tried re-installing it, but it re-install the software only and ask to run root.sh, but it finds the cluster home and doesnt let install a second one.The root.sh script just exits. I clicked ok and it never showed the the next screen for the netca,etc. What will be the best way to deinstall this and try to re install it from scratch or manually run each component?

    ReplyDelete
  12. To deinstall, use the deinstall utility present under $ORACLE_HOME/deinstall.
    Just call it with ./deinstall -home <$ORACLE_HOME>
    Hope this helps.
    Arnoud

    ReplyDelete
  13. I didn't need to deinstall I installed the oracle software. I only have 1 SCAN Listener and the local listener on node 1 not working. I'm still figuring it out how to solve it. Any ideas?

    ReplyDelete
  14. Are you using GNS and DHCP, or did you add the SCAN interfaces to your hosts file? When you add the SCAN interfaces to the hosts file, you can use only one SCAN interface.
    Where is your local_listener pointing to?

    ReplyDelete
  15. zippythechimp

    Hey, great blog and help..One thing. Implementing
    GNS/SCAN.. Not sure 'bout:

    1> GNS settings (hostname.domain) in DNS...so just get admin to make one?

    2> SCAN - Are these vips that resolve to GNS? If so, are they resolvable @ DNS or just vips GNS knows about. I take it they do NOT go into the /etc/hosts...

    3> Current node-vips...are these still relevant? With GNS don't see how..what do I do with them.

    I am upgrading from a 10.2.0.3 CRS..All I have seen is out-of-place installs ( no prob) but upgrading docs are sketchy (Solaris R2 esp).

    Any guidance...man its a jungle out there

    ReplyDelete
  16. Hi Zippythechimp ;-)
    1> Indeed, that is the way to go
    2> Yes, these are vips and GNS takes care of them. They can be added to the /etc/hosts, but then you get only one SCAN address, which is not good for recoverability/availability, so NOT recommended
    3> They can still be relevant, in case of upgrades, and they will still be used in Grid Infrastructure, because, no matter how big your Grid Infrastructure is, there will only be 3 SCAN vips across your entire infrastructure.

    If I were to do an upgrade, I would also do an out-of-place upgrade. No chance I would want to be risking a mess-up of my current ORACLE_HOME...
    And, you are right, it IS a jungle out there...;-)
    But, keep hanging on in there, you will manage...:-)
    Hope This Helps.
    Arnoud

    ReplyDelete
  17. ZippytheChimp

    U da man! Couple more things 'bout the particulars:

    1> GNS ...got it I'll get my admin to set up one. I take it that since i have three nodes in my cluster that there will be three ips resolvable for the cluster domain name? ergo, the nslookup for each ip will resolve to the cluster dns assigned...correct?

    2> SCANs--err..uhhh, are these indeed magic i.e. is Oracle 11gR2 smart enough to use the ips resolvable to the cluster?? OR are these additional ips at a dns level that are resolvable per nslookup...one for each node.

    I think that'll do it...

    I'll hang around for an answer..thanks!

    ReplyDelete
  18. da man says answer!
    1> Just to make sure, GNS is set up as part of the Grid Infrastructure. Just configure DNS to delegate the subdomain to GNS. The Installation of 11G Grid Infrastructure takes care of the member DHCP IP address registration in GNS (that is one of the reasons why you need a DHCP server).
    the cluster dns assigned address/name is part of the next answer:

    2> SCAN - Single Client Access Name - is your central Grid (or cluster, if you like it better) DNS name, or better: GNS name, because it is registered in and resolved by GNS (still following me? Am I still following myself here?:-)...
    Additionally GNS decides (haven't got any clues about the algorithm, btw) where the SCANs will be activated. So all in all, it is pretty smart...

    Ugh! ;-)

    ReplyDelete
  19. GNS doesn't really decide anything. It stores the name/address mappings and resolves DNS requests. I don't know the particulars of how SCAN determines which IP to use for a connection.

    ReplyDelete
  20. Just checked some materials, the only thing I can see is that SCAN vips are initiated by the orarootagent, which is called by ohasd.

    ReplyDelete
  21. That's right. A thread of the orarootagent runs the SCAN. Once it gets an name and address, it advertises it with GNS.

    ReplyDelete
  22. ZippyTheChimp

    k...looks like gns is not going to be something we use...at least for now (shame). Hey.. ran into something not sure it it is a grid bug but..on the Cluster Topology (running 10.2.0.5 GC) see a funky reference for my underlying ASM instance for my DB instances...err...pointing to a tst server where asm is running. Yet on the home page the asm instances are correct..and working...weird...is this a bug in Grid?? Checked all my install logs and nada reference of test box anywhere. Crs_stat is good and points to the correct ones..I can read/write to the DGs on the correct ASM so I know I'm good but when I run a srvctl status asm -n from the 10g home (err...running 10.2.0.2 on top of 11.1.0.7 cluster and ASM). get a PRKO-2118 yet same command from the 11g crs path is ok...

    any insight? the lions are prowling.

    ReplyDelete
  23. zippythechimp

    ok...is a bug in grid...finally meta SR took root. freked me out for awhile..pouring over logs..I'm good...so.errr...nevahmind...'preciate talking with you...getting ready to install r2...talked to Orc Tech on it...some whizkid..anyhow...got an understanding of it now...u da man tho..i'll let ya know how it went...any jungles in da EU??? Got plenty state-side...

    ReplyDelete
  24. Arnoud,
    I'm in big trouble --
    The 11gR2 install went perfect: 2 nodes on ASM running quite nicely. We want 11gR1 for our R12 EBS. Instead of cloning, we are doing a platform migration using datapump. This was not my task, but I am trying to help by creating an 11gR1 instance on one node of our new cluster. After installing 11gR1, dbca had issues (more later).
    Creating 11gR1 db manually failed also:
    SQL> create database TEST;
    create database TEST
    ERROR at line 1:
    ORA-01501: CREATE DATABASE failed
    ORA-00200: control file could not be created
    ORA-00202: control file: '+TESTDATA1/cntrl01.dbf'
    ORA-17502: ksfdcre:4 Failed to create file +TESTDATA1/cntrl01.dbf
    ORA-15001: diskgroup "TESTDATA1" does not exist or is not mounted
    ORA-15077: could not locate ASM instance serving a required diskgroup
    ORA-29701: unable to connect to Cluster Manager
    ...
    Trying to resolve the ORA-29701, I thought it would be okay to run "localconfig delete" and "localconfig add".
    Forgetting I was the "oracle" user in the 11gR1 $OH and not the "grid" user in the 11gR2 $OH, I ran this from the 11gR1 $OH/bin as root. The server hung, and after restarting the server, none of the cluster services will start:
    /d01/11.2.0/grid/bin/crsctl start cluster
    CRS-4013: This command is not supported in a single-node configuration.
    [root@ebtd1 grid]# $ORACLE_HOME/perl/bin/perl -I $ORACLE_HOME/perl/lib -I $ORACLE_HOME/crs/install $ORACLE_HOME/crs/install/roothas.pl
    2010-01-13 12:57:14: Checking for super user privileges
    2010-01-13 12:57:14: User has super user privileges
    2010-01-13 12:57:14: Parsing the host name
    Using configuration parameter file: /d01/11.2.0/grid/crs/install/crsconfig_params
    Improper Oracle Clusterware configuration found on this host
    Deconfigure the existing cluster configuration before starting
    to configure a new Clusterware
    run '/d01/11.2.0/grid/crs/install/rootcrs.pl -deconfig'
    to configure existing failed configuration and then rerun root.sh
    Now I'm officially in a panic: do you think there is something missing after restarting the server?
    Or, did I really mess something up by doing the 11gR1 localconfig since it's not supported in 11gR2?
    I'm still looking for relevant logs on the system to help.
    Thanks again!
    Kurt Forshee
    kurt.forshee@gmail.com

    ReplyDelete
  25. Hi Kurt,
    Wow, you've got a situation there...
    My guess about the 11gR1 create controlfile issue is that the 11gR2 ASM disk(group) might not have the correct compatibility options. You shuold also check the compatibility parameter on your ASM instance. Please refer to the Oracle 11gR2 Storage Administrators Guide (http://download.oracle.com/docs/cd/E11882_01/server.112/e10500/asmdiskgrps.htm#CHDDIGBJ) for compatibility parameters on the diskgroups.

    You can try to perform the suggested actions as stated by oracle (rootcrs.pl -deconfig and rerun root.sh) to solve the issue with clusterware. Hopefully this will work out for you.

    If not, please send me the log files by mail(you know my e-mail address:-)). I'll try to figure out. If I cannot solve the isssue remotely, you will have to get me on-site!:-O

    ReplyDelete
  26. Maybe it's my fault for trying to get 11gR1 to run on 11gR2 grid. I double checked that I have the correct asmlib rpm's loaded for our version of OEL, and I have confirmed everything regarding access, config & availability of ASM services -- so I know it's available; but my "create database" gets this error that totally surprises me. I even went so far as to change the file permissions on the libasm.so, but still got the same error:
    ERROR: asm_version error. err: driver/agent not installed rc:2
    Errors in file /d01/oracle/TEST/db/tech_st/11.1.0/admin/TEST_ebtd2/diag/rdbms/test/TEST/trace/TEST_rbal_29365.trc:
    ORA-15183: ASMLIB initialization error [driver/agent not installed]
    WARNING:FAILED to load library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
    ERROR: asm_init(): asm_erc:-5 msg:Driver not installed pid:29349

    ReplyDelete
  27. Hi Arnoud,

    I am trying to add a node to a 1-node 11g R2 cluster and I am receiving the following permissions errors, which prevent the Clusterware files from being copied to the second node:

    rssdb2-pub:
    PRCF-2023 : The following contents are not transferred as they are non-readable.
    Directories:
    1) /u01/app/11.2.0/grid/crs/utl
    Files:
    1) /u01/app/11.2.0/grid/evm/admin/conf/evmdaemon.conf
    2) /u01/app/11.2.0/grid/evm/admin/conf/evm.auth
    3) /u01/app/11.2.0/grid/evm/admin/conf/evmlogger.conf
    4) /u01/app/11.2.0/grid/srvm/admin/logging.properties
    5) /u01/app/11.2.0/grid/bin/oradaemonagent
    6) /u01/app/11.2.0/grid/bin/oradnssd

    The 1-node cluster installation was successful, including the root-run scripts, would you be able to provide some advice?

    Many thanks.

    ReplyDelete
  28. @Kurt: Were you able to get 11gR2 back into a stable running state? My guess, but it is only a guess, is that something on the R1 side screwed up some paths that were ment to be pointing to R2, but again, I am not sure.
    Please keep me updated on your progress!
    Arnoud

    ReplyDelete
  29. @Anonymous:
    What exactly is the command you are executing here? As which user are you performing this action?
    Please specify a little more on exactly what you are doing. I might be able to help you better then.
    Arnoud

    ReplyDelete
  30. I was in exactly the same situation as you were during my 11GR2 grid infra install ( Wife, kids, weekend, root.sh, etc.) So how did you continue with the install ? I tried install on 2-node configuration. root.sh was completed successful on first node. On second node, it timed out at starting CRS. The last step of of the OUI session, runing configuration assistants, was skipped. How can I continue ? Re-run the last root.sh ? Manually execute the config assistants (netca, vipca, etc ) ? Please advise.

    Thanks,
    TSY

    ReplyDelete
  31. How fortunate I am to see that people in similar conditions actually do (and experience) similar things!:-) Geeky stuff, regardless of private situation. I guess the wife and kids started nagging because you were becoming more and more frustrated, which was bothering the wife and kids again etc. etc...
    To answer your question: I started all over again, and chose not to implement GNS for that moment. I did the "manual" config and faked the SCAN interface to be resolved via the /etc/hosts file, something utterly discouraged by Oracle, not without reason, btw. A single SCAN ip is not something you want, but hey, we're on VMware here. Something else Oracle would rather see otherwise...;-)
    I am currently setting up a grid with all the fancy stuff (DNS, DHCP, GNS, SCAN, DVM, ACFS, to name a few...). I will post my findings on that shortly, I hope. Stay tuned!

    ReplyDelete
  32. Great blog.
    Arnold, I am new to clusterware and RAC and getting ready to install 11gR2 Grid with two nodes. The nodes are on a build network which has a domain that's different from the production network which we will move to once every thing is setup. Would it be diffuclt to change the node names down the road to reflect the new domain. If different IP addresses are to be used for the public interface, would that cause an issue?
    Also if GNS is not an option for us now does that mean client connectivity will rely on VIPs?
    Thanks

    ReplyDelete
  33. Usually it is not a problem to change hostnames, and even IP addresses. Use the correct tools to perform those actions, however! See the corresponding administration guides for the correct command line options. I am not quite sure about the 11gR2 command set, but it was possible to perform this kind of actions in 10g.
    If you cannot use GNS, you will have to fall back to the manual option. Even with GNS, client connectivity will rely on VIPs, because VIPs realize the whole idea of flexibility.

    ReplyDelete
  34. Arnoud -- good news! My RAC node1 is running fine despite my late night gaffe (no more work after 10pm for me!).
    Here was the solution:
    Using Metalink note 747415.1 as guideline (although outdated), I first ran $GRID_HOME/crs/utl/rootdelete.sh
    this gave a message: "this script is obsolete. Please use rootcrs.pl"
    So I ran $GRID_HOME/crs/install/rootcrs.pl -deconfig -force
    Message: "Successfully deconfigured Oracle clusterware stack on this node"
    I then ran $GRID_HOME/crs/utl/rootdeinstall.sh, which appears to execute "$GRID_HOME/crs/install/rootcrs.pl -delete -lastnode" successfully.
    In any case, I rebooted the server and then ran $GRID_HOME/root.sh again.
    This finished normally just as it did during install, and I am back to normal.
    BTW: the R12 RAC config document is still pretty immature, and R12 autocfg does not yet use SCAN nor does rconfig work for me to configure RAC. The only way I was able to get the R12 db RAC config'd was to do "adclonectx.pl" to create CONTEXT_FILE, which required some editing, and then do "adcfgclone.pl dbconfig $CONTEXT_FILE" to get all my environment files, init.ora settings, and listener.ora (which also required editing). But it worked, and after that I repeated this on node2, and I now have R12 RAC'd on 2nodes using 11gR2 Grid.
    As always, thanks a ton for your help and friendship -- it's inspiring!

    ReplyDelete
  35. Hi Kurt,
    Thanks for your feedback! I am sure it will help many of us in solving issues like these.
    Good to see that you got R12 running on 11gR2. That will be my next project... - Got a weblog? :-)

    ReplyDelete
  36. can anyone help me how to create or configure SCAN and GNS in Oracle Enterprise Linux 5.3

    ReplyDelete
  37. Hi Anonymous,
    Can you elaborate a little more? Are you looking for some specific issues you need to resolve?

    ReplyDelete
  38. Arnoud,

    I am planning to install 11gR2 Grid infrastructure / RAC DB configuration. So i have a server in which i have installed oracle VM and added 2 (OEL .3)VM templates. so now I have 2 linux VM slices. Also I have created 2 shared folders . Now when i am running the Grid Infrastructure installer , at "Grid plug and play" it ask for SCAN and GNS information.

    As i am installing in a new VM slice , i didn't created the SCAN and GNS.

    So i need a help, how to create SCAN and GNS ? means i need the steps to create the 3 IP and what are files needed to update in my Oracle enterprise linux so that the installer can work.

    ReplyDelete
  39. I completed a 11gr2 upgrade from 10204. Everything was successful I even installed the 11.2 databases software and upgraded two databases. The issue i am having is after rebooting the two nodes in my cluster only on the second node does the OCR start.

    I followed note 1050908.1 and my issue seems to point to possibly issue with voting disk. Anyone having an issue like this.

    2010-03-09 16:47:58.431: [ CLSF][1136679264]Opened hdl:0xeb2d10 for dev:/dev/raw/raw2:
    2010-03-09 16:47:58.433: [ CSSD][1136679264]clssnmvDiskCreate: Cluster guid 4678e2ccd54d5fddbfc8524ef97c0c94 found in voting disk does not match with the cluster guid 910b661b35724f22ff01892b84f8fb47 obtained from the GPnP profile
    2010-03-09 16:47:58.433: [ CSSD][1136679264]clssnmvDiskDestroy: removing the voting disk
    2010-03-09 16:47:58.434: [ CLSF][1136679264]Closing handle:0xeb2d10
    2010-03-09 16:47:58.434: [ SKGFD][1136679264]Lib :UFS:: closing handle 0xeb28f0 for disk :/dev/raw/raw2:

    2010-03-09 16:47:58.434: [ CSSD][1136679264]clssnmvDiskVerify: Successful discovery of 0 disks
    2010-03-09 16:47:58.434: [ CSSD][1136679264]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
    2010-03-09 16:47:58.434: [ CSSD][1136679264]clssnmvFindInitialConfigs: No voting files found
    2010-03-09 16:47:58.434: [ CSSD][1136679264]###################################
    2010-03-09 16:47:58.434: [ CSSD][1136679264]clssscExit: CSSD signal 11 in thread clssnmvDDiscThread

    ReplyDelete
  40. Is your voting disk on a raw device (/dev/raw/raw2)? Why didn't you put it on ASM?
    I know it is not an answer to your question, but nevertheless, Oracle nowadays recommends putting the Voting and OCR disk on ASM (since 11.2).
    Second question: did you try rebooting the nodes one by one or simultaneously? If you didn't already do so, try to reboot them one by one.
    Another thing that can be seen from the first CSSD message is that the cluster ID stored in the Voting Disk doesn't match the cluster ID derived from Grid Infrastructure Profile. Strange, did you try reinstantiating the Voting Disk?
    You might want to take a look at https://supporthtml.oracle.com/ep/faces/secure/km/DocumentDisplay.jspx?id=1050908.1 (troubleshooting Grid Infrastructure Startup Issues). Another source of information could be Note 942166.1.
    HTH,
    Arnoud

    ReplyDelete
  41. Has anyone used the gpnptool to resync the profile.xml files?

    ReplyDelete
  42. I haven't done so... If anyone else has, please leave a comment here.

    ReplyDelete
  43. Hi Kurt,

    Im having the same problem, my 11.2 works fine against 11.2 asm its just the 10.2 dbca that is failing.


    Thread-44] [14:54:28:248] [StepErrorHandler.setFatalErrors:322] setting Fatal Error: ORA-03114
    [Thread-63] [14:54:46:475] [StepErrorHandler.setFatalErrors:322] setting Fatal Error: ORA-03114
    oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-03114: not connected to ORACLE
    ORA-15183: ASMLIB initialization error [driver/agent not installed]
    ORA-15183: ASMLIB initialization error [/opt/oracle/extapi/64/asm/orcl/1/libasm.so]
    ORA-15183: ASMLIB initialization error [driver/agent not installed]
    ORA-00600: internal error code, arguments: [kfkLoadByNum03], [0x06000C4B0], [], [], [], [], [], []
    ORA-00600: internal error code, arguments: [kfkLoadByNum03], [0x06000C4B0], [], [], [], [], [], []


    ORA-15183: Message 15183 not found; No message file for product=RDBMS, facility=ORA; arguments: [driver/agent not installed]
    Thu May 6 14:33:32 2010
    ORA-15183: Message 15183 not found; No message file for product=RDBMS, facility=ORA; arguments: [/opt/oracle/extapi/64/asm/orcl/1/libasm.so]
    ORA-15183: Message 15183 not found; No message file for product=RDBMS, facility=ORA; arguments: [driver/agent not installed]
    Thu May 6 14:33:32 2010
    ORA-00600: Message 600 not found; No message file for product=RDBMS, facility=ORA; arguments: [kfkLoadByNum03] [0x06000C4B0]
    Thu May 6 14:33:33 2010
    ORA-00600: Message 600 not found; No message file for product=RDBMS, facility=ORA; arguments: [kfkLoadByNum03] [0x06000C4B0]
    Thu May 6 14:33:33 2010
    PSP0: terminating instance due to error 490
    Thu May 6 14:33:33 2010
    Trace dumping is performing id=[cdmp_20100506143333]

    ReplyDelete
  44. Hi,
    Are you sure you have the correct environment set? It seems that this is an ORACLE_HOME related issue?

    ReplyDelete
  45. Hi Arnoud,

    I have installed ORACLE 11g R2 Grid infrastructure -cluster on ORACLE VM. Planning to use 2 node cluster.

    During Grid infrastructure installation I have checked the "configure GNS" check box and provided some IP. But when root.sh run for the first node it gives below output in root.sh
    .....
    addition of GNS failed.
    CLSGN-00154: Invalid subdomain: "rac."
    add gns -i 10.50.0.41 -d rac. ... failed
    Configure Oracle Grid Infrastructure for a Cluster ... failed
    ....
    The inventory is located at /u02/app/oraInventory
    'UpdateNodeList' was successful.

    ....

    so I come to know that GNS is not configured but from "crsctl stat res -t" command it showing both GNS, GNS.VIP and VIP in OFFLINE stat.

    Is there any way through which I can remove unsuccessful GNS configuration from GRID infratsturcture installation other than just just run this command "./srvctl remove gns", this command remove GNS from "crsctl" list but I guess there is some configuration is still exisi in Grid.

    ReplyDelete
  46. Removing gns from using srvctl removes gns including the configuration. It shouldn't leave any grid configuration in your cluster.
    HTH, Arnoud

    ReplyDelete
  47. Thanks Arnoud for your prompt reply.

    one more thing that If I remove GNS I can't use SCAN any more. Is it?

    ReplyDelete
  48. Hi,

    not able to remove GNS through "srvctl remove GNS"
    its giving error:
    CLSGN-00135: Mandatory parameter "subdomain" is not configured.

    [ this is becuase during root.sh on first node its failed to create give the same error:
    CLSGN-00154: Invalid subdomain: "rac."
    add gns -i 10.50.0.41 -d rac. ... failed]

    If you have any idea then let me know how to remove this non configured GNS?

    ReplyDelete
  49. Yes, you can still use SCAN. SCAN does not depend on GNS... GNS makes life easier, but it is not a required component for SCAN.

    ReplyDelete
  50. Hi Sonu,

    Did you try to run srvctl remove gns -f (force remove)?

    Arnoud

    ReplyDelete
  51. yahh.. I did this also but same error:
    # ./srvctl remove gns -f
    CLSGN-00135: Mandatory parameter "subdomain" is not configured.

    ReplyDelete
  52. same result when you type "srvctl remove gns -f -d rac." ?

    If this doesn't work, my guess is that you have entered an illegal format for the subdomain (the dot shouldn't have been given there). I am not sure how you can recover from this. My suggestion is to clean up the entire installation and start over again, if that is an option for you. This gives you the chance of defining the proper subdomain (rac) and clean up the mess...

    HTH.
    Arnoud

    ReplyDelete
  53. I am getting stuck at the point of running the root.sh script. I get the feedback of "Improper Oracle Clusterware configuration found on this host
    Deconfigure the existing cluster configuration before starting
    to configure a new Clusterware" and it tells me to run "/u01/app/oracle/product/11.2.0/grid/crs/install/rootcrs.pl -deconfig" and when I try that I get "/u01/app/oracle/product/11/2/0/grid/crs/install/rootcrs.pl: No such file or directory". I am not sure where to go at this point. This is a fresh install on OEL 5 using 11gR2. I am new to this world, so any help that you can provide would be greatly appreciated.

    ReplyDelete
  54. Hi Arnoud,

    I have reinstalled 2 node RAC -11gR2 on ORACLE VM2.2.1. This time I am using only scan and not the GNS.
    I have added scan name and ip address in /etc/hosts and it work successfully. I installed RAC setup on both the nodes without any error.

    Now, I want to access this RACDB (my rac database name) from my desktop. I copied tnsentry from DB server and paste into my desktop but when try to connect using sql plus, its not able to resolve "scan name". So I have added two VIPs of both the nodes into my desktop's /etc/hosts file and it's able to connect through SQL*Plus. But again SQL*Developer is not able to connect.

    Could you please inform me what is the right method? Do I inform to all dev team to use RAC DB they need to make entry list of IPs in their hosts file. Is it the correct method?

    suggest me.

    ReplyDelete
  55. Hi Arnoud,

    I have one more query: If I am not register this scan-name IP address in my DNS then it is must that I have to make entry of two VIPS (each for a node) and scan-name to /etc/hosts file at client side?

    ReplyDelete
  56. Sonu,

    Having SCAN in your tns entry, you should also be able to resolve that name to an IP addess via either DNS or hosts file of the machine you are connecting from. Why sql developer is unable to connect, I cannot say from the information you give me. The alternative is to use the VIPs of the database nodes in stead of the SCAN, iow use the legacy pre-11.2 method of connecting to the datatabase. Unless you are planning on creating a grid with multiple database nodes and multiple databases, this is a perfect solution. You don't need SCAN to connect to the database.

    HTH,
    Arnoud

    ReplyDelete
  57. Hi Arnoul,

    Please i have a problem, i have racle 11gr2 with a single node. it was ok, yesterday it show a error :

    SQL> startup
    ORA-01078: failure in processing system parameters

    ORA-01565: error in identifying file '+DATA/xxxprod/spfilexxxprod.ora'

    ORA-17503: ksfdopn:2 Failed to open file +DATA/xxxprod/spfilexxxprod.ora

    ORA-15077: could not locate ASM instance serving a required diskgroup


    I was a remake of spfile with " create pfile from memory" and after "create spfile from pfile". no the problem is :
    SQL> startup
    ORACLE instance started.

    Total System Global Area 1603411968 bytes
    Fixed Size 2213776 bytes
    Variable Size 419432560 bytes
    Database Buffers 1174405120 bytes
    Redo Buffers 7360512 bytes
    ORA-00205: error in identifying control file, check alert log for more info

    I can to say the problem is because the ASM instance is not start, but when i try start asm instance the error is :

    export ORACLE_SID=+ASM2
    [oracle@oracle1 ~]$ sqlplus "SYS/iptotal AS SYSDBA"

    SQL*Plus: Release 11.2.0.1.0 Production on Sat Aug 7 16:35:10 2010

    Copyright (c) 1982, 2009, Oracle. All rights reserved.

    Connected to an idle instance.

    SQL> startup
    ORA-01031: insufficient privileges
    SQL>

    please, what can i do?

    ReplyDelete
  58. I assume the ORACLE_SID (+ASM2) is correct, though a bit strange, as I would say, in a single server environment (unless you have multiple ASM instances...). Are you sure you have set the correct ORACLE_HOME? This could lead to issues, if you are starting ASM whilst your ORACLE_HOME is set to the wrong one.
    Next, you should try "sqlplus / as sysdba" and start the ASM instance.
    Keep me updated on your progress?

    HTH.
    Arnoud

    ReplyDelete
  59. Hi Around,

    Thanks for giving understanding. I also want to know what is dependancy between LISTENER_SCAN and listeners of all the nodes... If I am going to start/stop one by one resource then for SCAN_LISTENER and listeners which one needs to start first and stop first?

    Thanks..

    ReplyDelete
  60. The SCAN listener is for uniform access to whichever database in your Grid. If you stop it, you can still access these databases using the database listener, but you will have to provide the hostname and service name.
    You will not be able to connect to your database once the listeners on your local nodes have stopped (as far as I am aware). A connection attempt through the SCAN listener will be redirected to the local listeners on the nodes. That is the dependency, as far as I know.

    HTH.
    Arnoud

    ReplyDelete
  61. hi Arnoud Roth
    This is Sarfaraz

    I am a new learner to oracle and I am trying to install ORACLEGRID infrastructure with ASM and its asking me to run root.sh script while creating trace in root.sh script its error out by saying unable to create OLR :permission denied I hav set all the permissions for evrything as for ORACLEASMdisks,/u02/app/...../11.2.0, and chmod 777 /u02
    but i dont know whats missing please help

    ReplyDelete
  62. Hi everybody,
    I’ve installed 3 homes on standalone server OEL 5.2 (not cluster installation)
    1. Oracle GRID infrastructure 11.2.0.1 (OHAS, ASM)
    ORACLE_HOME=/u01/app/oracle/product/11.2.0/grid
    2. Oracle DB 11.2.0.1
    ORACLE_HOME=/u01/app/oracle/product/11.2.0/db
    3. Oracle DB 10.2.0.5
    ORACLE_HOME=/u01/app/oracle/product/10.2.0/db

    As can be seen OHAS controls everything from grid infrastructure to old-fashioned database 10g. I think important thing is NOT USING SRVCTL COMMANDS from 10gR2 home.
    I did it through crsctl command from grid home. There is only one asm instance 11gR2 with two diskgroups (DATA and FRA) which are being used from every database on this machine (in my case elzg is 11gR2 and p10g is 10gR2).
    .
    [oracle@elzg public]$ env | grep ORA
    ORA_CRS_HOME=/u01/app/oracle/product/11.2.0/grid
    ORACLE_SID=+ASM
    ORACLE_BASE=/u01/app/oracle
    ORACLE_HOME=/u01/app/oracle/product/11.2.0/grid
    [oracle@elzg public]$ crs_stat -t
    Name Type Target State Host
    ------------------------------------------------------------
    ora.DATA.dg ora....up.type ONLINE ONLINE elzg
    ora.FRA.dg ora....up.type ONLINE ONLINE elzg
    ora....ER.lsnr ora....er.type ONLINE ONLINE elzg
    ora....0G.lsnr ora....er.type ONLINE ONLINE elzg
    ora.asm ora.asm.type ONLINE ONLINE elzg
    ora.cssd ora.cssd.type ONLINE ONLINE elzg
    ora.diskmon ora....on.type ONLINE ONLINE elzg
    ora.elzg.db ora....se.type ONLINE ONLINE elzg
    ora.elzg.em application ONLINE ONLINE elzg
    ora.p10g.db ora....se.type ONLINE ONLINE elzg
    ora.p10g.em application ONLINE ONLINE elzg

    Only strange behavior is when starting 10gR2 instance thru crs_start command:
    [oracle@elzg bin]$ crs_start ora.p10g.em
    Attempting to start `ora.p10g.db` on member `elzg`
    CRS-5016: Process "/u01/app/oracle/product/11.2.0/grid/bin/setasmgidwrap" spawned by agent "/u01/app/oracle/product/11.2.0/grid/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/u01/app/oracle/product/11.2.0/grid/log/elzg/agent/ohasd/oraagent_oracle/oraagent_oracle.log"
    KFSG-00312: not an Oracle binary: '/u01/app/oracle/product/10.2.0/db/bin/oracle'
    Start of `ora.p10g.db` on member `elzg` succeeded.
    Attempting to start `ora.p10g.em` on member `elzg`
    Start of `ora.p10g.em` on member `elzg` succeeded.
    BUT still, it is working, although I don’t know if this is supported.

    Regards,
    Drazen

    ReplyDelete
  63. Very good article, when i try to start crsctl start crs, all services are not starting.
    I have to manually bring all services up.
    Can you please suggest me i.e script or any command to verify that my cluster is setup correctly .

    Thanks
    -basha

    ReplyDelete
  64. Please Can anyone tell me how to do a setup in Real Time.

    The requirement is to install Oracle 11g R2 on Red hat Linux.
    configure RAC
    and check the clustering is working perfectly between two server.

    It should be done by SCANN and with GNS as per the above tutorial but i dont know what are the pre Installation Steps. I have a manual for this but even i have few doubts abt dns and dhcp.

    I am very weak at networking side i dont know how this ip addresses interact with DNS and Client side and DHCP.

    i have seen this tutorial.

    http://72.32.201.108/rac/Oracle_RAC_with_GNS.html#Pre-Installation_Steps

    Here the DNS and DHCP server is on 197588-oralab1. if you go to the link you can understand there is a Chart you can understand.

    I want to know one thing that is the DNS server and DHCP server is installed on Windows server or Linux server.

    Here he is showing a DNS and DHCP server on Linux. As you can know see.

    [root@197588-oralab1 ~]# rpm --query dhcp
    dhcp-3.0.5-18.el5


    197588-oralab1 is acting here as a DNS and DHCP server so how can we configure if it is on Windows Server.

    I am very confused and need your assistance how to setup a DNS and DHCP server with ip address with step by step tutorial.

    I will be very grateful if any suggestme.

    regards,

    ReplyDelete
  65. Hi Anonymous,
    To set up dhcp and dns on windows, please consult your administrator. I know you can easily set up dns and dhcp on windows, nevertheless, I have not done this for more than 10 years. I am sure you will be able to find some docs via google on installing and configuring dns and dhcp.
    If you need futher assistance on configuring grid / asm / rac, don't hesitate to come back. We can also exchange e-mail addresses to help you out. Please post your address here, I won't publish it here (via comment moderation).

    ReplyDelete
  66. Hi Arnoud,

    We have 2 databases on a server, where we are planning to us with ASM. both the DB'S are11gR2. i presume installing 11gr2 will automatically come with grid infrastructure.

    I am lost and clueless of where to start with. Unix admin got me 4 disks with 150g sizes. After a little research i could see people suggesting the use of ASMCA utility, which i tried to invoke but couldnt see nothing.
    Could you walk me through a little please....

    ReplyDelete
  67. I am installing 11R2 grid infrastructure with response file. I managed to install software successfully. When I perform post configuration, Net Configuration runs successfully but ASM configuration assistant fails. Details given below.

    I followed document at http://www.oracle-class.com/?p=2433 and it indicates ignore the ASM configuration error. But I am not sure what is the significance of ignoring the error.

    $GRID_HOME/bin>./configToolAllCommands RESPONSE_FILE=/tmp/orainstall/cfgrsp.properties

    cfgrsp.properties file contain following
    oracle.assistants.server|S_SYSPASSWORD=Adak0h0mada
    oracle.assistants.server|S_SYSTEMPASSWORD=Adak0h0mada
    oracle.assistants.server|S_SYSMANPASSWORD=Adak0h0mada
    oracle.assistants.server|S_DBSNMPPASSWORD=Adak0h0mada
    oracle.assistants.server|S_HOSTUSERPASSWORD=Adak0h0mada
    oracle.assistants.server|S_ASMSNMPPASSWORD=Adak0h0mada

    Log file contains
    Aggregate oracle.assistants.server not found
    ###################################################
    The action configuration is performing
    ------------------------------------------------------
    The plug-in Oracle Net Configuration Assistant is running


    Parsing command line arguments:
    Parameter "orahome" = /u01/app/11.2.0/grid
    Parameter "orahnam" = Ora11g_gridinfrahome1
    Parameter "instype" = typical
    Parameter "inscomp" = client,oraclenet,javavm,server
    Parameter "insprtcl" = tcp
    Parameter "cfg" = local
    Parameter "authadp" = NO_VALUE
    Parameter "responsefile" = /u01/app/11.2.0/grid/network/install/netca_typ.rsp
    Parameter "silent" = true
    Parameter "silent" = true
    Done parsing command line arguments.
    Oracle Net Services Configuration:
    Profile configuration complete.
    Profile configuration complete.
    db3...
    db4...
    Oracle Net Listener Startup:
    Listener started successfully.
    Listener configuration complete.
    Oracle Net Services configuration successful. The exit code is 0

    The plug-in Oracle Net Configuration Assistant has successfully been performed
    ------------------------------------------------------
    ------------------------------------------------------
    The plug-in Automatic Storage Management Configuration Assistant is running


    The plug-in Automatic Storage Management Configuration Assistant has failed its perform method
    ------------------------------------------------------
    The action configuration has failed its perform method
    ###################################################

    ReplyDelete
  68. great links..share more soon..btw got the starter pack on ur?
    site and yes it was way more links than u said..
    thx quite a lot
    Here is my weblog ; scrapebox list

    ReplyDelete