Friday, February 21, 2014

RAC-- Listeners in Oracle 11GR2 RAC, IPC,oraagent --Exadata

I decided write this post regarding the configuration and working principle of database listeners in a Oracle 11gR2 RAC environment..

First of all; in 11gR2 RAC environment, we have scan listeners.


As it can be clearly seen from the picture above, scan listeners are the first point of contact in a 11gR2 RAC environment. Clients are connect to the scan listeners in the first place. There are multiple scan listeners listening multiple scan ip address in the cluster. All the database services are registered with the Scan listeners.
Client send its connection request to a scan listener.
Note that: Client only knows the scan name. DNS redirects the connection request to one of the scan listeners in the cluster. Once the scan listener takes the client's connection request, it sends back one of the local listener's (least crowded node) address to the client.. The local listener information belongs to the local listener which services for the service name that the client requested.. Then the client and local listener communicate between eachother and the connection becomes established.
To understand better, you can take a look to my previous post regarding to the scan listeners http://ermanarslan.blogspot.com.tr/2013/05/database-about-scan-11gr2.html

The following picture represents the interprocess communication between Listeners and Oracle Instance in a RAC environment.


So, in a 2-node RAC environment (like an Exadata Quarter Machine) we have 3 scan listeners and 2 local listener. Scan listeners register the database services via IPC/TCP through pmon and Local listeners registers both the database and ASM instances via IPC/TCP through pmon.

With the 11gR2 environment, both remote and local listeners are started from the Grid Home and when we look to the listener processes, we see they use the same binaries.
ps -ef |grep inherit
 /u01/app/11.2.0.2/grid/bin/tnslsnr LISTENER -inherit
 /u01/app/11.2.0.2/grid/bin/tnslsnr LISTENER_SCAN3 -inherit
 /u01/app/11.2.0.2/grid/bin/tnslsnr LISTENER_SCAN2 -inherit

Furthermore; we have a "oraagent" process to manage these listeners..  Actually, oraagent process starts a bunch of grid components such as  Clustered ASM instance, ons, eONS, scan listener and local listener.

ps -ef |grep oraagent
oracle    8634     1  0  2013 ?        01:54:19 /u01/app/11.2.0.2/grid/bin/oraagent.bin
oracle   19967     1  0 Feb13 ?        00:26:07 /u01/app/11.2.0.2/grid/bin/oraagent.bin

As you see above, we have 2 oraagent.bin processes.
Now, I put my Linux admin hat, and look to the environments of them;
both of them were started from GRID_HOME , the binary used was /u01/app/11.2.0.2/grid/bin/oraagent.bin.. The environments are a little diffent , the file descriptors of these 2 processes are a little different, too. So these should make them different, as it s known that these two processes are used for starting different components of the cluster environments..
For example, while the first oraagent.bin starts the components such as Asm and listeners,  the second oraagent.bin starts the databases and their associated services in the cluster..

Okay lets come back to your topic..
So, when we look at the listener.ora file in Grid Home, we see a different kind of listener specification, as there are no ports or sids or even a hostname declared there. We also see the comments like "line added by Agent"

LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))) # line added by Agent
LISTENER_SCAN3=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN3)))) # line added by Agent
LISTENER_SCAN2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2)))) # line added by Agent
LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN2=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN3=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON # line added by Agent
ADMIN_RESTRICTIONS_ADMIN=ON


This Agent that adds these lines is the oraagent as expected.. These lines proves that those listeners are started by oraagent.
The lines with ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER*=ON are also important.
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER parameter when set to ON, makes oraagent to dynamically activate the listening point for the running instances..
Actually when we start listener, listener starts and listens on IPC. Oraagent uses this IPC listening point to connect to the listener and start the TCP connection point .. Following is an example for this;

lsnrctl start LISTENER_SCAN3
.....
lsnrctl status LISTENER_SCAN3

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN3)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN3
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                19-FEB-2014 15:17:56
Uptime                    0 days 0 hr. 0 min. 29 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/osrvdb01/listener_scan3/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN3)))
The listener supports no services
The command completed successfully

As you see when we start Listener_scan3, the only listening protocol that the listener is listening is IPC.
On the other hand; Just after we start the listen, when we look to the oraagent log file, we see that oraagent uses this IPC listening point to activate the TCP listening point of the listener.

2014-02-19 15:31:21.288: [ora.LISTENER_SCAN3.lsnr][1358747968] {1:19319:12863} [start] Vip Address used for the Scan Listener is 192.168.0.86
2014-02-19 15:31:21.288: [ora.LISTENER_SCAN3.lsnr][1358747968] {1:19319:12863} [start] Listener obtained IP 192.168.0.86 from the VIP Resource
2014-02-19 15:31:21.392: [ora.LISTENER_SCAN3.lsnr][1358747968] {1:19319:12863} [start] LsnrAgent::regEndpoint Listener endpoint is (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN3)))
2014-02-19 15:31:21.392: [ USRTHRD][1358747968] {1:19319:12863} Thread:RegEndpointThread:LISTENER_SCAN3 start {
2014-02-19 15:31:21.392: [ USRTHRD][1358747968] {1:19319:12863} Thread:RegEndpointThread:LISTENER_SCAN3 start }
2014-02-19 15:31:21.392: [ USRTHRD][1593657664] {1:19319:12863} Thread:RegEndpointThread:LISTENER_SCAN3 Registering Endpoint nsgfei_EndpointInit() (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.0.86)(PORT=1521))
2014-02-19 15:31:21.393: [ USRTHRD][1593657664] {1:19319:12863} Thread:RegEndpointThread:LISTENER_SCAN3 LsnrAgent, registered endpoint (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.0.86)(PORT=1521))
2014-02-19 15:31:22.393: [ora.LISTENER_SCAN3.lsnr][1358747968] {1:19319:12863} [start] LsnrAgent::start }
2014-02-19 15:31:22.393: [ora.LISTENER_SCAN3.lsnr][1358747968] {1:19319:12863} [start] (:CLSN00107:) clsn_agent::start }
2014-02-19 15:31:22.393: [    AGFW][1358747968] {1:19319:12863} Command: start for resource: ora.LISTENER_SCAN3.lsnr 1 1 completed with status: SUCCESS

So far so good; but how can oraagent obtain the needed IP and port information to start this listener?
To answer the question; lets make a demo,
When we use srvctl to add a new listener. we update the cluster inventory actually.
For demonstration,

This is my current listener.ora in grid home;

LISTENER_PROD=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_PROD)))) # line added by Agent
LISTENER_SCAN3=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN3)))) # line added by Agent
LISTENER_SCAN2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2)))) # line added by Agent
LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN2=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN3=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_PROD=ON # line added by Agent

Lets add a listener using srvctl;
I add listener named listener_erman to start from grid home and listen on port 1555;

srvctl add listener -l listener_erman -o /u01/app/11.2.0.2/grid -p 1555

okay listener_erman is added. So lets look to listener.ora again;

LISTENER_PROD=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_PROD)))) # line added by Agent
LISTENER_SCAN3=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN3)))) # line added by Agent
LISTENER_SCAN2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2)))) # line added by Agent
LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN2=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN3=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_PROD=ON # line added by Agent


So by looking above we can say that nothing is changed.. There is not even an IPC listening point description for listener_erman there..

Now I start listener_erman; (note that I dont have any description about this listener in listener.ora)

 srvctl start listener -l listener_erman
srvctl status listener -l listener_erman

Listener LISTENER_ERMAN is enabled
Listener LISTENER_ERMAN is running on node(s): osrvdb01,osrvdb02

ps -ef |grep LISTENER_ERMAN

oracle   30687     1  0 09:18 ?        00:00:00 /u01/app/11.2.0.2/grid/bin/tnslsnr LISTENER_ERMAN -inheri

[oracle@osrvdb01 ~]$ lsnrctl status LISTENER_ERMAN

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 21-FEB-2014 09:46:02

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_ERMAN)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_ERMAN
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                21-FEB-2014 09:18:47
Uptime                    0 days 0 hr. 27 min. 15 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/11.2.0.2/grid/log/diag/tnslsnr/osrvdb01/listener_erman/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_ERMAN)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=some ip)(PORT=1555)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=some vip )(PORT=1555)))

We see that listener_erman is started , and it is listening on port 1555.
So lets look to the listener.ora again;

LISTENER_ERMAN=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_ERMAN)))) # line added by Agent
LISTENER_PROD=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_PROD)))) # line added by Agent
LISTENER_SCAN3=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN3)))) # line added by Agent
LISTENER_SCAN2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2)))) # line added by Agent
LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN2=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN3=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_PROD=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_ERMAN=ON # line added by Agent

As you see, when we start the listener, oraagent adds the needed listener configuration lines into the listener.ora file in $GRID_HOME... Oraagent also adds a line for the new listener in to the endpoints_listener.ora file.  "Endpoints_listener.ora file is there for backward compatibility with pre-11.2 databases.

endpoints_listener.ora file contents;
LISTENER_ERMAN_DB01=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=db01-vip)(PORT=1555))(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.1.110)(PORT=1555)(IP=FIRST)))) # line added by Agent

Now I delete the lines and restart the listener ...

Looking to the listener.ora file again ;

LISTENER_ERMAN=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_ERMAN)))) # line added by Agent
LISTENER_PROD=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_PROD)))) # line added by Agent
LISTENER_SCAN3=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN3)))) # line added by Agent
LISTENER_SCAN2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2)))) # line added by Agent
LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN2=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN3=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_PROD=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_ERMAN=ON # line added by Agent

Again, oraagent adds the needed listener configuration lines into the listener.ora file in $GRID_HOME..

So in the listener start process , oraagent looks to the listener.ora, if it cant find a configuration lines for that listener, it goes the cluster repository and looks there to find the needed configuration for that listener.. If it finds the listener configuration in repository, it updates the listener.ora , and make the listener be able to start.. 

Or not :? keep reading..:) 

Following is a proof for the above pharagraph.

Here, 
I first delete the listener_erman
Then add a fake line for configuration of the listener_erman to the listener.ora.
Lastly add the listener_erman back again and try to start the listener_erman using srvctl..

See what happens;

Stopping and removing listener_erman;

srvctl stop listener -l listener_erman
srvctl remove listener -l listener_erman

Adding a fake line for listener_erman to the listener.ora;  ( note that: I dont even change the lines written in  endpoints_listener.ora )

LISTENER_ERMAN=(This is FAKE)
LISTENER_PROD=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_PROD)))) # line added by Agent
LISTENER_SCAN3=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN3)))) # line added by Agent
LISTENER_SCAN2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2)))) # line added by Agent
LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN2=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN3=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_PROD=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_ERMAN=ON # line added by Agent

Adding listener_erman back again ;

 srvctl add listener -l listener_erman -o /u01/app/11.2.0.2/grid -p 1555
srvctl status listener -l listener_erman
Listener LISTENER_ERMAN is enabled
Listener LISTENER_ERMAN is not running

Starting listener_erman again;

srvctl start listener -l listener_erman
PRCR-1079 : Failed to start resource ora.LISTENER_ERMAN.lsnr
CRS-5016: Process "/u01/app/11.2.0.2/grid/bin/lsnrctl" spawned by agent "/u01/app/11.2.0.2/grid/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0.2/grid/log/osrvdb01/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-5016: Process "/u01/app/11.2.0.2/grid/bin/lsnrctl" spawned by agent "/u01/app/11.2.0.2/grid/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0.2/grid/log/osrvdb01/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-2674: Start of 'ora.LISTENER_ERMAN.lsnr' on 'osrvdb01' failed

:) lets look to the oraagent_oracle.log for details;

2014-02-21 10:01:29.284: [ora.LISTENER_ERMAN.lsnr][1660774720] {1:19319:17079} [start] TNS-01150: The address of the specified listener name is incorrect

So, as you see , oraagent tries to start LISTENER_ERMAN.. It looks to the listener.ora file and if it finds an entry starts with the listener_erman.. Later, it accepts that entry as true and doesn ot go to the repository at all.. That's why it can not start the listener, as we add a fake line for listener_erman. 
This also means, oraagent does not update the listener.ora file if it sees a line in listener.ora file which starts with the listener name it is requested to start..

Lets clear this mess; lets remove that listener and take everyting back again..

srvctl remove listener -l listener_erman
And try to start the listener_erman back again ( as listener.ora the fake line are still present in listener.ora)

srvctl start listener -l listener_erman
PRCR-1001 : Resource ora.LISTENER_ERMAN.lsnr does not exist

You see there is no TNS error this time..

So the truth is ->When we issue srvctl start listener command; Oracle looks to the repository and see there is no resource named listener_erman, this means It does not read listener.ora in the first place.. 
To prove that , I start the listener using lsnrctl and get the expected TNS error.

lsnrctl start LISTENER_ERMAN
TNS-01150: The address of the specified listener name is incorrect..

In summary, when we add a listener using  srvctl; listener defition is added to the cluster repository . When we use srvctl to start the listener, oraagent looks to the repository to see the listener defition for that listener, if there is a correct listener definition in repository-> it updates the listener.ora and starts the listener.
if there isnt any listener definition in repository -> it just cannot start the listener.
if  there is a correct listener definition in repository and(&&) if there is a false listener definition -> it cannot start the listener but this time produces tns error or listener specific error, but at least it tries..

Alternatively, you can always write a manuel declaration in to the GRID_HOME/listener.ora file.
For listener_erman , I can write a manuel declaration and start the listener using lsnrctl command as follows;

LISTENER_ERMAN=(DESCRIPTION_LIST=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=db01-vip.erman.net)(PORT=1555))))))
lsnrctl status LISTENER_ERMAN

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=db01-vip.karbosan.net)(PORT=1555)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_ERMAN
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                21-FEB-2014 13:40:44
Uptime                    0 days 0 hr. 0 min. 11 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/osrvdb01/listener_erman/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.82)(PORT=1555)))

As seen above, listener start successfully, but this I cant use srvctl to manage this listener as expected.. oraagent also does not control this listener any more..

srvctl status listener -l listener_erman
PRCR-1001 : Resource ora.LISTENER_ERMAN.lsnr does not exist

Also the listener listens only from a single ip address.. 

Then, I change my setting in listener.ora to be the following; 
LISTENER_ERMAN=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_ERMAN))))

Then I use lsnrctl to start the listener..
This time I see that listener_erman only listens for IPC..
To make listener_erman to listen multiple hostnames with same port numbers,I add following line to the endpoints_listener.ora file
LISTENER_ERMAN=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=db01-vip.karbosan.net)(PORT=1555))(ADDRESS=(PROTOCOL=TCP)(HOST=db01.karbosan.net)(PORT=1555))))


So I connect to the listener using lsnrctl , which connects to the listener_erman using open IPC connection point and set enable_global_dynamic_endpoint.

[oracle@osrvdb01 ~]$ lsnrctl

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 21-FEB-2014 14:20:31

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Welcome to LSNRCTL, type "help" for information.

LSNRCTL> set current_listener listener_erman
Current Listener is listener_erman
            LSNRCTL> set enable_global_dynamic_endpoint on;
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_ERMAN)))
listener_erman parameter "enable_global_dynamic_endpoint" set to ON
The command completed successfully
LSNRCTL> services
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_ERMAN)))
The listener supports no services
The command completed successfully

Still no chance.... 
Bytheway , dont bother the above lsnrctl commands . I wrote them just for fun and found them cool. Actually they are not related with this endpoint thing..:)

Okay , here IP=FIRST comes to play.
.
The (IP=FIRST) statement will make the listener create a listening endpoint on the IP address to which the given HOST resolves. By default, without (IP=FIRST), the listener will listen on all network interfaces (e.g. INADDR_ANY) .

So to able to listen to the same port using different hostnames on a machine, listener configuration should contain IP=FIRST statement..

I add following listener description to listener.ora.

LISTENER_ERMAN=(DESCRIPTION_LIST=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=db01-vip.erman.net)(PORT=1555)(IP=FIRST))))(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=db01.erman.net)(PORT=1555)(IP=FIRST)))))

This time, listener can start successfully listening for two different hostnames with the same port numbers..

[oracle@db01 admin]$ lsnrctl start LISTENER_ERMAN

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 21-FEB-2014 15:22:42

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Starting /u01/app/11.2.0.2/grid/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 11.2.0.3.0 - Production
System parameter file is /u01/app/11.2.0.2/grid/network/admin/listener.ora
Log messages written to /u01/app/oracle/diag/tnslsnr/db01/listener_erman/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.82)(PORT=1555)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.100)(PORT=1555)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=db01-vip.karbosan.net)(PORT=1555)(IP=FIRST)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_ERMAN
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                21-FEB-2014 15:22:42
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/db01/listener_erman/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.82)(PORT=1555)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.100)(PORT=1555)))

As you see this is how it s done manually .. 
Do we need to make our listener to listen from the Administration network? No :) I wrote the things above just to show you some advanced listener operations..  

In conclusion,

In a 11gR2 Rac environment, srvctl should be used to manage the listeners.. Adding, removing, starting and stopping the listeners should be made by using srvctl commands. By using srvctl management is easier.. Oraagent takes the control, checkes our resources with this. 
Using $GRID_HOME for listener home is recommended, by using $GRID_HOME for all our listeners, we have a centralized location and environment for all of our listener processes. This also simplifies the management activities.. 
On the other hand, if you want to use manual technics for listener declaration and management in a RAC envrionment, it is possible ,as well. 
The examples you have read above explain the listener endpoints and manual declarations, clearly..

I hope you will find this helpful.  

Monday, February 17, 2014

Rman channels, a need for manual channel allocation, ORA-19554, ORA-27211, ORA-19624, RMAN-10035

In a cluster environment , rman works clusterwide. I mean; when we allocate channel automatically, rman uses all of the cluster nodes to spread the work across the nodes in the clusters. For example; In a two node cluster, some channels are allocated from the first node and the others are allocated from the second.
It seems , Rman uses plsql to do this work.. It uses DBMS_BACKUP_RESTORE package.
When we the trace such a rman session we see the following;

DBGPLSQL:               30 TEXTNOD =   devtype := sys.dbms_backup_restore.deviceAllocate( ident => chid,
DBGPLSQL:               31 TEXTNOD =        node => node,
DBGPLSQL:               32 PRMVAL = type=>'SBT_TAPE',dupcnt=>1
DBGPLSQL:               33 TEXTNOD =  );

As you see above, even for tape channel rman uses database plsql to allocate channels..
So the thing we need to keep in my mind is that, when we have a tape device or a tape agent configured to run on only one of the cluster, our rman backups can encounter errors like the following;

RMAN-10035: exception raised in RPC:
ORA-19624: Operation failed, retry possible
ORA-19554: error allocating device, device type: SBT_TAPE, device name:
ORA-27211: Failed to load Media Management Library

It can be easily seen that rman cannot allocate the sbt_tape type channel here, but supposing that we already configured our libraries to connect the sbt device, these errors shoud not be seen, because we use node 1 to connect to database using rman.
On the other hand, the reason for these errors is actually the clusterwide parallelism ...
Rman connects to the target instance in a load balanced manner.. Rman creates its sessions and allocates channels through these session directly from the database using plsql. So if rman goes to node 2 because of the load balancing, and if we didnt configured our sbt type libraries/agent, rman normally produces above errors..

So in order to prevent these type of erros
* we should allocate channels manually from the node that configured to access our tape libraries
or
* we should configure all of the nodes in the cluster to be able to allocate necessary rman channels.

To create channels only from a specific node, we should create rman channels as follows;

CONFIGURE CHANNEL 1.. CONNECT 'user1/pwd1@node1';  
 Note that : node 1 is the node that configured properly for sbt type backups..

You can also check out the following post:

Monday, February 10, 2014

Oracle EBS on EXADATA X3 Database Machine

In this post , I will write about a POC event we have done for a leading organization in Turkey.
The POC was done to prove the EBS 11i can run on Exadata X3 smoothly and to display the performance results achieved with the power of Exadata X3.


Implementation steps of this POC were -> patch the source app&db tier in place , migrate the database to Exadata , configure Exadata and EBS components for maximizing the performance , and supplying High availability and Load Balance.

For the source, we had an EBS database running on a powerfulerma HP server .
The source system's computing power was provided by 2xIntel 4 core Cpu's and the I/O subsystem was supplied by NetApp storages in the backend. The data between the storage and the database server was flowing on fiber using fcp protocol.. All of this system was designed to run the EBS database which was 1.2 TB in size.. There were high number of batches, transactions, ad-hocs and reporting on the database, and the users was complaining about the significant decrease in response times during the peak hours.
The system was servicing the following technologies/tools : Application Servers components ( Oracle Forms, Reports, Apache, Oc4j instances), Discoverer Admin/Desktop and Web reporting tools, External in-house developed web applications, Oracle Concurrent Managers/Programs , lots of Integrations etc..

So it was a busy system(1200 database session in avg) as you can imagine..

Okay lets look at the source system specs clearly;

-SOURCE -> HP ProLiant DL380 G7 & Netapp SAN-



Hardware:

148 GB memory
Cpu Count : 2
Core count per Cpu : 4
Cpu model: Intel(R) Xeon(R) CPU E5640 @ 2.67GHz
Netapp Storage ( a raid dp array with 15k disks dedicated to EBS luns)

Software:

OS: RHEL 5.4 64 bit, hugepage enabled&used
Oracle RDBMS: Oracle 10.2.0.4 64 bit Enterprise Edition
EBS: 11i 11.5.10.2 --extended support patches applied (application tier didnt be migrated to Exadata.)
EBS modules installed and used were Finance, Inventory ,HR , some Custom modules and etc..

The top waits on this source system was db file sequential read, read by other session , db file scattered read, log file sync and Cpu time..

Top 5 Wait table ( gathered from the awr snapshots taken in peak hours)




As you see database has a lot of I/O activities. The load was high, so the system was busy.. IO waits were significant..

Elapsed:
60.09 (mins)

DB Time:
2,427.01 (mins)

Statistic Total
BUSY_TIME 9,744,849
IDLE_TIME 36,270,309
IOWAIT_TIME 11,054,996
NICE_TIME 210
SYS_TIME 646,362
USER_TIME 8,875,169
LOAD 35
RSRC_MGR_CPU_WAIT_TIME 0
PHYSICAL_MEMORY_BYTES 63,350,468,608
NUM_CPUS 16
NUM_CPU_SOCKETS 2

Now look at the target system specs ;

-TARGET -> EXADATA Database Machine x3-2 quarter rack-

Hardware


2 x Database Servers, each with:

2 x Eight-Core Intel® Xeon® E5-2690 Processors (2.9 GHz)
256GB Memory
Disk Controller HBA with 512MB Battery Backed Write Cache
4 x 300 GB 10,000 RPM Disks
2 x QDR (40Gb/s) Ports
4 x 1/10 Gb Ethernet Ports (copper)
2 x 10 Gb Ethernet Ports (optical)
1 x ILOM Ethernet Port
2 x Redundant Hot-Swappable Power Supplies
3 x Exadata Storage Servers in total;
36 CPU cores for SQL processing
12 PCI flash cards with 4.8 TB Exadata Smart Flash Cache
36 x 600 GB 15,000 RPM High Performance disks
2 x 36 port QDR (40 Gb/sec) InfiniBand Switches

Software:

Oracle Grid Infrastructure 11.2.0.3 (11.2.0.3.21)
Oracle RDBMS 11.2.0.3 Enterprise Edition
ORACLE RAC
Oracle Exadata Storage Software components.

...

Installation of the Exadata was done by Oracle, but this time , I've partially participated in the installation process to Exadata.. Exadata X3-2 database machine re-imaged and Bundle Patch 21 was installed..

We requested 22 ip adresses, as we had to configure internal and external networks as well as scan listeners.

We also requested 5 ethernet cables. 4 of them were for 2 Db nodes, 1 of them was for Cisco switch.
The requested ip block template was like the following;
 
For external connections: (1st vlan)
exaclient01 192.168.19.11
exaclient01-vip 192.168.19.12
exaclient02 192.168.19.13
exaclient02-vip 192.168.19.14
exa-scan 192.168.19.15
exa-scan 192.168.19.16
exa-scan 192.168.19.17 --note 3 exa-scan ip , different ip - same hostname(exa-scan)

For Administration and ILOM : (2nd vlan)
exadbadm01 192.168.95.11
exadbadm02 192.168.95.12
exaceladm01 192.168.95.13
exaceladm02 192.168.95.14
exaceladm03 192.168.95.15
exadbadm01-ilom 192.168.95.16
exadbadm02-ilom 192.168.95.17
exaceladm01-ilom 192.168.95.18
exaceladm02-ilom 192.168.95.19
exaceladm03-ilom 192.168.95.20
exasw-adm01 192.168.95.21
exasw-iba01 192.168.95.22
exasw-ibb01 192.168.95.23
exasw-pdua01 192.168.95.24
exasw-pdub01 192.168.95.25

The installation was a standard process.. The servers were installed through the images..

Okay, after describing the source and target environments; lets start with migration process:

Following MOS docs can be followed for Exadata migration;
  • Export/import notes on Applications 11i Database 11g (Doc ID 557738.1)
  • Interoperability Notes Oracle EBS 11i with Oracle Database 11gR2 (11.2.0) (Doc ID 881505.1)
  • Migrating an Oracle E-Business Suite Database to Oracle Exadata Database Machine (Doc ID 1133355.1)
  • Migrating Oracle E-Business Suite to Sun Oracle Database Machine Using Oracle Data Pump paper
  • E-Business Suite on Exadata Oracle Maximum Availability Architecture White Paper October 2012 

I will not go in to details about the migration steps, so I will keep it short..

First of all; the decision for the migration method was made.. The migration method was using datapump for do operation.. Datapump would work without any problems inspite of the version differences in OS and Database Software. It would make our upgrade automatically during the migration , as we would export the data from 10g and import it to the 11g.. Datapump would also clear our fragmentation, and help us decrese the foot print of our EBS database.. We also thought that import would be very fast, as the target system was an Exadata.. That is, we had 2 node rac environment, several disks, so we would enforce the limits of parallelism on the basis of I/O and CPU..

So, after choosing the accurate migration method, the first step of the migration was about upgrading the related Oracle EBS application tier and database objects into appropriate versions.. This approach was needed because target environment was an Exadata , which has 11gr2 database by default.. So in order to migrate our EBS database(which was 10.2.0.4), we had to apply the needed patches to make it interoperable with 11gR2 databases.

Also, we need to apply other patches for the components like autoconfig,exp/imp and etc... They were needed to support the new technology. In addition to that, we applied patches to target database (11.2.0.3)to fix the known EBS specific problems..

During a patch we needed to stop asm, too. While starting ASM again, we faced the problem applied and the solution in the following post :

http://ermanarslan.blogspot.com.tr/2014/02/start-crs-error-crs-4640-11gr2.html

After applying the necessary patches , I first created our target database using Create Database command syntax. Actually, target database script was generated by a Oracle EBS script. Then we executed the script on the target to create our target database. After successful completion of our database creation script, changed the created database into two node RAC. Then, I set up SYS and SYSTEM schemas in the target database. In addition to that, installed java virtual machine and required components such as ORD, Spatial, XDB, OLAP, interMedia, and Context in to the target database using oracle application scripts.

I followed the docs line by line. I mean, for instance; we recorded advanced queues as the docs indicate that, advanced queues not propagated in the target database instance during the export/process.

I rebuilt the spatial indexes to remove the rebuild index parameters from them ,and scynronized the indexes, which were not synchronized before running the export and so on.. In brief everything stated in MOS docs were done before exporting the database.

In the export phase; I used a parameter file as stated, exported the database with 8 parallel export processes.. I kept the max dump file size 20 gb in order not to reach the file count limit..

During the export phase, I faced the following datapump problems and applied the related solutions stated in the following posts of mine;

http://ermanarslan.blogspot.com.tr/2014/01/rdbms-expdp-ora-39095-dump-file-space.html

http://ermanarslan.blogspot.com.tr/2014/01/ebs-datapump-expdp-error-ora-33272-zpb.htmlNote that : During the export phase of the migration if you cancel your export to start it again for some reason, following note can be used to cleanup the mess, before issuing the expdp command again.
http://ermanarslan.blogspot.com.tr/2014/02/datapump-cleanup-dbadatapumpjobs.html

After exporting the database , the export dumpfile were copied to the target system.. I used the Exadata's disks as storage for storing the export files, by creating an LVM on it.

I didnt use dbfs, because I didnt find it useful for the case, as this was a POC event..

When the export was finished, I started the import job , used parallel workers and didnt use the cluster(by default y) option with imptp and imported the database using cluster=n. This parameter prevented the I/O errors based on the lack of the availability of export dump files on the remote nodes.(as I used the local storage of node 1)
Again, the MOS documents were followed line by line, so all the necessary little things were done before and after the import process...

When the import finished, our work in the database tier was also finished.
We left the performance tuning to the end. I did few things in the database tier at this time; such as ;
Fixed object stats were gathered ( Issue details:http://ermanarslan.blogspot.com.tr/2014/01/dbmsstats-error-gathering-fixed-object.html)
After the import, srvctl and local listener configurations were done, as well.
In addition; Enterprise manager configured and started.. Also, fnd tables were cleaned and autoconfig was run on db tier to populate the tables with new database information again.(Note that, autoconfig was run on both db node 1 and db node 2)
During import I faced with some problems; you can find the explanations and solutions regarding to these problems in the following posts;

http://ermanarslan.blogspot.com.tr/2014/01/ebs-impdb-ora-39083-ignorable-apps.htmlhttp://ermanarslan.blogspot.com.tr/2014/01/datapump-impdp-raises-error-ora-942.htmlAt this point, our database became an Apps database , so we were ready to go ahead and connect our Application Tier to this 2node rac database.. We proceeded with the application post clone phase.

In the application post clone phase, we run adcfgclone.pl to update the application configuration files with the new database information. After this step the application became connected to our new database , which was running on Exadata.. When the post clone was completed, I did some manual configuration for load balancing, such as modifying jdbc_url in the apps context file.. After this changes , I run autoconfig to update the necessary configuration file of the application components..
At this moment, I reviewed the docs 881505.1 and 1133355.1 and took actions if previously missed out or necessary. Also I reviewed the EBS on EXA MAA papers and took actions..

Lastly, I configured all application components for connecting to the database in a load balanced way , increased the memory target, parameter of the database and started/restarted the application and database services. You can find the details about load balancing in the following post ; http://ermanarslan.blogspot.com.tr/2014/01/ebs-on-oracle-rac-exadata-load-balance.htmlNote that; I faced a problem with tnsping , tnsping 8 couldnt ping to the database through a load balanced tns.. In this case, it was normal.
For the details and explanation: http://ermanarslan.blogspot.com.tr/2014/01/tnsing-tns-03507-even-though-sqlnet.htmlInitially the architecture became like the following;




So all the components were connecting to the database using load balance tns.. ( Note that EBS 11i cant use scan ips by the way..)

With this configuration, the business users run some tests on the screen ad batches.. They seemed pleased with the response times. They also run their discoverer reports, which were suffering in performance, and they seem pleased again..
At the same, we analyzed the system performance through awr reports and enterprise manager..
We saw that there weren't any I/O related waits.. This progress made us happy, because all the waits were on CPU..
Top 5 waits..
DB CPU 8,610
PX Deq: Slave Session Stats 982
library cache: mutex X 508
enq: PS - contention 470

The machine was idle actually.. So in order to enhance the performance, we found top 50 tables and put them in to the flash cache, and also compressed them using query low compression method..
After the compression we moved the related tables/partitions/subpartition and rebuilt the indexes on them.(in order to compress the current data stored in those tables).
We faced an issue while rebuilding the indexes..
Read the following for details: http://ermanarslan.blogspot.com.tr/2014/01/is-my-indexindex-subpartition-unusable.htmlAt the same time, we increased the parallelization in tables and made the necessary modifications in the database parameters for supporting such a parallel processes.. (like max parallel servers and etc..) We did this as the database servers had a lot of Cpu cores, but they were idle. We needed to utilize them..
Discoverer reports seemed accelarated with this modications, but some issues were reported by the Business Users.. The issues were in Forms components.. That is, some screens (like login page) started to work slow.

This was because of the parallelization. Discoverer reports liked parallelization, but the screens didnt..
On the other hand, we had to this.. We had to utilize the CPU's..
To supply the paralelization without effecting the screen peformance, I designed the architecture as follows;


So ,with this design, our Rac database service was separeted into two logical units..
One node would be serving for Web,Forms,Concurrents and Reports components..
The other node, node2 would be serving for Discoverer Clients only.
By seperating them, we gained the control of the database according to the Application needs.
Db node2 was configured to have 64 parallel servers, which was optimal for Discoverer reports.
Db node1 was configured to have 8 parallel server, which was okay for the standard EBS processing..

Note that, we got back the parallel degrees that we set on tables previously... as the standard tables were configured to use the application's standard parallel degrees.. This action was taken because Application developer explicitly set it.. Application developer should be the one who knows the code, should be the one to answer the question : Is this table suitable for high degree parallelism?... so if during the installation, application tables were configured to be in serial by the installer, then they should be serial, but if the tables were configured to be in for ex 8 degree parallel, then this means those tables are suitable for parallel operations and we can increase the parallelizm degree on them..
Anyways , we tuned the system according to above thought.
I mean, for custom the tables used in Discoverer reports, we set 64 degree parallelism.
For application standard tables, we used single thread (if they were set so by design).. For other application tables, which were configured to operate in 8 parallel, we increased the degree to 32..
and had left the system to the EBS functionals/developers for testing..
We also made some indexes invisible.. But not all the related indexes.. Only we made this operation on the indexes which we found necessary..
The test result were very good, actually better than we expected.
For example: There was a performance problem in standard HR forms and an SR has been opened for this problem 3 months ago .. Development probably would send a patch for the problem, but the issue was resolved automatically after the migrating to Exadata.. In other words, Exadata could run the problematic query with its Software & Hardware power .. The sql is still problematic but it is transparent to the application:)
Discoverer reports were also accelerated without a need for any modification in the sql.. A Discoverer report which took 4 hours to complete, could run in 10-20 mins .. In addition, batch concurrent programs have been accelerated significantly. Following is important part of the test results.. I m sharing it with you.. You can see the improvement in performance by comparing the Source and Exadata columns...


In addition to the improvement in application performance; database footprint reduced down to %50 of its size in the source..

Of course backups were superfast.. We could backup entire database(nearly 600gb) using only 8 parallel channels into the ASM almost in 6 mins.. We tried different backup types. We had used high compress and basic compress methods to decrease backup size. The compression ratio was %80, actually pretty good for an Erp application's database.. Backup Test results are below;

OBJECT_TYPE
START_TIME
END_TIME
INPUT_MB
OUTPUT_MB
DB FULL(high comp)

2/2/2014 12:33:38 PM
2/2/2014 2:55:38 PM
573607936
101987328


DB FULL(basic comp)
2/2/2014 11:47:37 AM
2/2/2014 12:31:36 PM
573576192
118644736


DB FULL(nocompress)

2/2/2014 11:29:29 AM
2/2/2014 11:36:04 AM
573486080
525056000

As a result of this performance improvoment in I/O , cloning time for this EBS environment would be almost 1 hours with all the application specific configurations.. Cloning process is an import thing for Apps Dba 's, as it can be a throuble if it is a time cosuming thing.. For example, we spent 1 day for cloning activites for this system normally, but If we had a chance to use exadata we could take 24 clones in one day :)..

Of course, there are somethings that even Exadata cant cure.. Like a Discoverer report we have seen in this POC.. That problematic report was running & running.. It was running in loop, its logic was wrong.. In Exadata, it couldnt be accelarated, but we could easily say that the reports is a defective one, as we dont have any waits in database , in IO or something else.. This could be clearly seen by the application owner, too..


In conclusion;

I can say that, Exadata is a great cure for busy EBS 11i an R12 environments.. We already support EBS production systems running on Exadata X2 and X3 s , but the systems we support, could not even utilize Exadata in %50.. That 's why this POC was usefeul ,as in this POC, we could see an EBS environment, which already reached the limits of its current server environment.. We could see that this system can take advantage of Exadata's power in terms of performance and at the same time even this system can not utilize either Exadata's IO subsystem or CPU 's . Test results were promising, even though we didnt have enough time for fine tuning.. Exadata didnt let us down :) We could see and achieve the things in real life.. The things those were written in advertisements...


Tuesday, February 4, 2014

Oracle E-Business Suite R12 on Exadata X2

As far as I know, this was the first "EBS R12 on Exadata X2" implementation in Turkey.. The year was 2011.
The work was a POC event, planned to prove the speed of an Exadata powered EBS environment.
In this POC, we installed EBS R12 on Exadata X2 , configured it application-to-database load balance and failover.. Finally, we run our tests and prepared a presentation accordingly..


We made this work in Linkplus, which is the distributer of Oracle in Turkey... Linkplus supplied a quarter rack Exadata x2 and two virtual application servers for our POC.
We had witnessed a significant gain in performance of Oracle EBS R12 both on its OLTP flavoured screens and batch jobs..

You can take a look at the presentation we prepared at that time.

Click for EBS on Exadata X2 presentation. 

Start crs error CRS-4640 11gR2 clusterware, crsctl start cluster

I have faced the following problem when I was working on several Exadata Systems/RAC systems.
The problem arises after a patch application or maintanence operation, extactly when I try start the crs back again..
CRS can not be started with "crsctl start crs command", as you see below;
/u01/app/product/11.2/bin/crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

Note that : crsctl is an interface for controlling Oracle Clusterware objects.

The solution is to use crsctl start cluster command.. That's all but i m writing this post in order to expose the reason lies behind this solution..

So, lets look to the difference between start crs and start cluster is ;

start crs : to start the entire Oracle Clusterware stack on a node, including the OHASD process,which is responsible for starting up all other cluserware processes . This command is to be used only on the local node..
start cluster : to start Oracle Clusterware stack on local node . It does not include the OHASD process.

Okay now we know the difference, but this does not explain CRS-4640 error produced when we used the start crs command..

Additional info: If your OCR and Voting Disks are in ASM, you shouldnt shutdown the ASM instance alone. You need to stop the Oracle Clusterware stack. You have to user crsctl stop cluster -n node_name or crsctl stop crs (on local)

I can not reproduce right now, but this can be the reason.. The error is produced because we shutdown ASM.. We didnt use crsctl stop cluster or crs commands.. Voting disks and OCR need to be mounted for csrd to operate, because OCR contais the cluster node list, services, db instances and node mappings. Oracle Clusterware uses this info to verify cluster node membership and status.. On the other hand, crsctl start crs command should start asm , too.. So why are the errors produced?

Look what happens when I use "crsctl start cluster" command;

 /u01/app/11.2.0.3/grid/bin/crsctl start cluster -all

CRS-2672: Attempting to start 'ora.cssdmonitor' on 'erm01'
CRS-2672: Attempting to start 'ora.cssd' on 'erm01'
CRS-2672: Attempting to start 'ora.diskmon' on 'erm01'
CRS-2672: Attempting to start 'ora.ctssd' on 'erm01'
CRS-2672: Attempting to start 'ora.evmd' on 'erm01'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'erm01'
CRS-2672: Attempting to start 'ora.asm' on 'erm01'
CRS-2672: Attempting to start 'ora.crsd' on 'erm01'

As you see, crsctl start cluster command starts asm and all necessary components before crsd ..
Note that it doesnt start OHASD , and that s why CRS-4640 error is not produced..  We have OHASD process already started. (check -> ps -ef|grep init.ohasd|grep -v grep)

Thanks to Bharat Damarla.(http://startupforce.wordpress.com/2013/05/17/rac-11gr2-startup-sequence)
He summarized clusterware sequence with the following diagram..


You can see it in the following figure, too -> OHASD starts crsd..
So what happens is;
"crsctl start crs" tries to start OHASD and can not do that as it s already started.. In my opinion, that is why it can not continue and start crs... The root cause seems to be stopping ASM without using crsctl stop cluster or crs commands, as this leaves an improper environment for crsctl start crs command..

In conclusion, we use crsctl start cluster in this situation, as we have OHASD up and running , which is a prereq for "crsctl start cluster command", and that's why crsctl start cluster becomes our solution..

RAC-- RMAN CHANNELS MULTI NODE

In RAC environments (Eg Exadata), we use high parallel processes for RMAN backup, we use high count of channel and etc. We also use high parallel IO devices like DBFS_DG to increase the speed of our rman backups.. In this context, it is also useful to use all the RAC nodes for parallelizing the backup process.. In other words; it is useful to spread the rman backup processes accross the RAC nodes to maximize the processing performance..

In order to do that, we can use manual channel allocations;
I mean, we can direct rman to allocate a channel through node 1 and allocate another channel through node2 and etc.. Like the example below;

CONFIGURE CHANNEL 1.. CONNECT 'user1/pwd1@node1'; 
CONFIGURE CHANNEL 2.. CONNECT 'user2/pwd2@node2'; 
CONFIGURE CHANNEL 3.. CONNECT 'user3/pwd3@node3';

In addition, you can use automatic channel allocation in RMAN to load balance the channels automatically.
In order to do that, you can configure your channel type to connect to the target using LOAD BALANCED service name, which is present in your tnsnames.ora.

CONFIGURE CHANNEL DEVICE TYPE DISK CONNECT 'user1/pwd1@<service_name>'

By setting this, parallel rman backup workers will  be automatically spreaded across rac database node without manual channel specification..

Datapump cleanup-- dba_datapump_jobs, DBMS_DATAPUMP

When you terminate a datapump job,  the reflection of it can remain on the database..
I mean its work tables and even some datapump worker session can remain..
These reflection can bring extra cost on the database , and constitute a dirty environment..

So to be to clean up the mess, following actions can be taken;

Find the remaining datapump jobs if they still exists with the following query;

SELECT owner_name, job_name, operation, job_mode,
state, attached_sessions
FROM dba_datapump_jobs;

User following to drop the master tables

DROP TABLE OWNER.JOB_NAME purge;   (take OWNER and JOB_NAME from the output of query above)
Okay, now execute the same query once again;

SELECT owner_name, job_name, operation, job_mode,
state, attached_sessions
FROM dba_datapump_jobs;

If the jobs are still listed in the output, use following to clear them;

--Again job_name and owner is from query above)

DECLARE
jobtostop NUMBER;
BEGIN
job1 := DBMS_DATAPUMP.ATTACH(JOB_NAME,OWNER);
DBMS_DATAPUMP.STOP_JOB (jobtostop);
END;
/

Expdp -- ORA-31693 ,ORA-31617, Linux-x86_64 Error, cluster parameter

While exporting a Rac database you can encounter following errors;

ORA-31693: Table data object OWNER:TABLE failed to load/unload and is being skipped due to error:
ORA-31617: unable to open dump file "/erman/expimp/fullexp02.dmp" for write
ORA-19505: failed to identify file "/erman/expimp/fullexp02.dmp"
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory

These errors are encountered , because the filesystem used for exporting the database is mounted only on one of the nodes, and you use parallel export option.. When you use parallel option in datapump and if it s a Rac database, datapump workers can go to the other nodes, too..

So if you can not mount the export filesystem to all the nodes and want to use parallel export, I recommend you to use cluster parameter(introduced in 11gr2) .. Cluster parameter determines whether Data Pump can use Oracle Real Application Clusters (Oracle RAC) resources and start workers on other Oracle RAC instances.Cluster parameter is TRUE by default, and as you imagine, this setting triggers the problems above..
So to workaround the problem, you can set this parameter to false (cluster=no)
By setting cluster=no, you force Data Pump Export to use only the instance where the job is started...

Ex: expdp hr DIRECTORY=ermandmpdir DUMPFILE=erman.dmp CLUSTER=NO PARALLEL=4