Recently had a problem with ACFS in an ODA X4-2 machine.
The problem was in ASM actually.. ASMResilver2 process was running all the time..
Because it was related with the highest thing in the software stack (ACFS) , we needed to have a wide view of the problematic instance.. That is , we needed to see the OS logs, ASM logs, ACFS logs, daemon logs and etc..
There were several log files, which were spread across the directories of the filesystem.. This situation did not let us to have an analytical look at first glance.
At this point, ADR and ADRCI(Automatic Diagnostic Repository Command Interpreter) came into play and made our job easy at least for dealing with the Grid part..
Lets have a look at the definition of ADR first;
The ADR is a file-based repository for database diagnostic data such as traces, dumps, the alert log, health monitor reports, and more. It has a unified directory structure across multiple instances and multiple products. Beginning with Release 11g, the database, Automatic Storage Management (ASM), and other Oracle products or components store all diagnostic data in the ADR. Each instance of each product stores diagnostic data underneath its own ADR home directory .
The problem was in ASM actually.. ASMResilver2 process was running all the time..
Because it was related with the highest thing in the software stack (ACFS) , we needed to have a wide view of the problematic instance.. That is , we needed to see the OS logs, ASM logs, ACFS logs, daemon logs and etc..
There were several log files, which were spread across the directories of the filesystem.. This situation did not let us to have an analytical look at first glance.
At this point, ADR and ADRCI(Automatic Diagnostic Repository Command Interpreter) came into play and made our job easy at least for dealing with the Grid part..
Lets have a look at the definition of ADR first;
The ADR is a file-based repository for database diagnostic data such as traces, dumps, the alert log, health monitor reports, and more. It has a unified directory structure across multiple instances and multiple products. Beginning with Release 11g, the database, Automatic Storage Management (ASM), and other Oracle products or components store all diagnostic data in the ADR. Each instance of each product stores diagnostic data underneath its own ADR home directory .
For example, in an Oracle Real Application Clusters environment with shared storage and ASM, each database instance and each ASM instance has a home directory within the ADR. The ADR's unified directory structure enables customers and Oracle Support to correlate and analyze diagnostic data across multiple instances and multiple products.
So , ADR is our repository and ADRCI is our tool to view the data stored in ADR and also it is the tool to create packages based on the problems and incident stored in ADR..
Okay after having the general info with the tool, lets continue;
To diagnose the problem, we run the ADRCI and saw the incident, created a package which contains all of the information we need to diagnose the problem, and we continued..
You may see an example of running this utility in the next paragraphs..
Normally, I dont like to use this kind of tools, but in this incident it helped a lot, and I recommend using it for diagnosing the problems in the complicated environment.. It might come in handy one day..
Here , we had a problem with ASM, the problem was obvious : asmResilver2 process was running all the time.. The result was obvious but the cause was not.. OS logs were easy to reach , but the grid infrastructure logs were in several locations.. Several daemons were there.. This might even be related with the patches.. I mean there could be some ASM patch missing in this environment.
asmResilver2 was something like a kernel thread. So ASM Resilvering was involved, ASM was involved, OS was involved and also all the related daemons was involved.. (The environment was a virtualized ODA X4-2, there were OAK, ACFS , shared repositories and etc..) So all were involved and anything in this software stack could be related with the problem..
In Grid part; we had several places to look for the logs.. (crsd, ocssd,ohasd, oracssdmonitor, orarootagent , ocr and etc..)
Here we executed adrci using grid software owner; and set the ADR home accordingly.
[grid@daroravmsrv1 +ASM1]$ adrci
ADRCI: Release 11.2.0.4.0 - Production on Thu Apr 30 14:22:34 2015
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
ADR base = "/u01/app/grid"
adrci> set home diag/asm/+asm/+ASM1
adrci> show problem
ADR Home = /u01/app/grid/diag/asm/+asm/+ASM1:
*************************************************************************
PROBLEM_ID PROBLEM_KEY LAST_INCIDENT LASTINC_TIME
-------------------- ----------------------------------------------------------- -------------------- ----------------------------------------
1 ORA 600 [17183] 34825 2015-04-27 18:15:19.358000 +03:00
1 rows fetched
We saw our problem.. We had an ORA-600 in ASM alert log..
Note that we did not check alert log manually.. We did not made any effort to find it..
adrci> IPS CREATE PACKAGE INCIDENT 34825
Created package 2 based on incident id 34825, correlation level typical
Here we created our package, the newly created package will be in ADR HOME/incpkg/pkg_{PACKAGE_NUMER)
Then we zipped the package contents in to a zip file stored in tmp directory.. (it is optional, zipping may be used for sending the package to Oracle Support)
adrci> ips generate package 2 in "/tmp"
Generated package 2 in file /tmp/ORA600171_20150430142314_COM_1.zip, mode complete
Here our work with ADRCI was finished.. At this point we could start making our diagnostics using the log files in the ADR package..
The contents of the package,
[root@ermansrv1 pkg_2]# tree seq_1/
seq_1/
|-- config.xml
|-- crs
| |-- alertdaroravmsrv1.log
| |-- crflogd.l04
| |-- crflogd.l05
| |-- crfmond.l05
| |-- crfmond.l06
| |-- crfmond.l07
| |-- crfmond.l08
| |-- crfmond.l09
| |-- crfmond.l10
| |-- crsctl_root.log
| |-- crsd.log
| |-- crsdiag.log
| |-- cvuhelper.log.0
| |-- cvutrace.log.2
| |-- evmd.log
| |-- gipcd.l06
| |-- gipcd.l07
| |-- gipcd.l08
| |-- gipcd.l09
| |-- gipcd.l10
| |-- gipcdOUT.log
| |-- gpnpd.log
| |-- gpnpdOUT.log
| |-- gpnptool_43748.log
| |-- mdnsd.log
| |-- mdnsdOUT.log
| |-- ocr_14337_3.log
| |-- ocr_14392_3.log
| |-- ocr_14543_3.log
| |-- ocr_14601_3.log
| |-- ocr_14654_3.log
| |-- ocr_19298_3.log
| |-- ocr_20334_3.log
| |-- ocr_20730_3.log
| |-- ocr_29140_3.log
| |-- ocr_30921_3.log
| |-- ocr_31641_3.log
| |-- ocr_38810_3.log
| |-- ocr_40936_3.log
| |-- ocr_41358_3.log
| |-- ocr_45262_3.log
| |-- ocr_45592_3.log
| |-- ocr_45599_3.log
| |-- ocr_45684_3.log
| |-- ocr_45837_3.log
| |-- ocr_45920_3.log
| |-- ocr_46467_3.log
| |-- ocr_46517_3.log
| |-- ocr_46581_3.log
| |-- ocr_46656_3.log
| |-- ocr_46706_3.log
| |-- ocr_46753_3.log
| |-- ocr_46846_3.log
| |-- ocr_46891_3.log
| |-- ocr_46938_3.log
| |-- ocr_47009_3.log
| |-- ocr_47052_3.log
| |-- ocr_47574_3.log
| |-- ocr_47670_3.log
| |-- ocr_47726_3.log
| |-- ocr_47818_3.log
| |-- ocr_47897_3.log
| |-- ocr_47945_3.log
| |-- ocr_48002_3.log
| |-- ocr_48071_3.log
| |-- ocr_48126_3.log
| |-- ocr_48613_3.log
| |-- ocr_48734_3.log
| |-- ocr_48790_3.log
| |-- ocr_48837_3.log
| |-- ocr_48930_3.log
| |-- ocr_48975_3.log
| |-- ocr_49025_3.log
| |-- ocr_49094_3.log
| |-- ocr_49276_3.log
| |-- ocr_49367_3.log
| |-- ocr_49906_3.log
| |-- ocr_49976_3.log
| |-- ocr_50049_3.log
| |-- ocr_50173_3.log
| |-- ocr_50221_3.log
| |-- ocr_50265_3.log
| |-- ocr_50335_3.log
| |-- ocr_50381_3.log
| |-- ocr_50428_3.log
| |-- ocr_50985_3.log
| |-- ocr_51049_3.log
| |-- ocr_51108_3.log
| |-- ocr_51170_3.log
| |-- ocr_51265_3.log
| |-- ocr_51319_3.log
| |-- ocr_51391_3.log
| |-- ocr_51448_3.log
| |-- ocr_51488_3.log
| |-- ocr_52071_3.log
| |-- ocr_52180_3.log
| |-- ocssd.l02
| |-- ocssd.l03
| |-- ocssd.l04
| |-- octssd.l03
| |-- octssd.l04
| |-- octssd.l05
| |-- ohasd.log
| |-- olsnodes.log
| |-- oraagent_grid.l01
| |-- oraagent_grid.log
| |-- oracssdagent_root.log
| |-- oracssdmonitor_root.log
| |-- orarootagent_root.l01
| |-- orarootagent_root.l02
| |-- orarootagent_root.l03
| |-- orarootagent_root.log
| |-- scriptagent_grid.l01
| `-- scriptagent_grid.log
|-- export
| |-- DDE_USER_ACTION.dmp
| |-- DDE_USER_ACTION_DEF.dmp
| |-- DDE_USER_ACTION_PARAMETER.dmp
| |-- DDE_USER_ACTION_PARAMETER_DEF.dmp
| |-- DDE_USER_INCIDENT_ACTION_MAP.dmp
| |-- DDE_USER_INCIDENT_TYPE.dmp
| |-- EM_USER_ACTIVITY.dmp
| |-- HM_RUN.dmp
| |-- INCCKEY.dmp
| |-- INCIDENT.dmp
| |-- INCIDENT_FILE.dmp
| |-- IPS_CONFIGURATION.dmp
| |-- IPS_FILE_COPY_LOG.dmp
| |-- IPS_FILE_METADATA.dmp
| |-- IPS_PACKAGE.dmp
| |-- IPS_PACKAGE_FILE.dmp
| |-- IPS_PACKAGE_HISTORY.dmp
| |-- IPS_PACKAGE_INCIDENT.dmp
| `-- PROBLEM.dmp
|-- manifest_2_1.html
|-- manifest_2_1.txt
|-- manifest_2_1.xml
|-- metadata.xml
|-- opatch
| |-- opatch.log
| `-- opatch.xml
`-- progress.log
3 directories, 141 files
So as you see; using ADRCI makes the diagnostic process easier.. We reach all the relevant log files (or lets say files which can be used to diagnose a problem) in a single directory..
Note that the files we dont even know will be there in the package, too.. This does not only make the diagnostic easier, this also elaborate the diagnostics process.. I mean all the files which may be looked for finding errors are there in the ADR package, so this eliminates the risk to miss out a log file while making diagnostic analysis.
On the other hand, as I mentioned above in complicated environments we need to check the other files too.. (OS messages, other program logs and etc..)
To diagnose the problem, we run the ADRCI and saw the incident, created a package which contains all of the information we need to diagnose the problem, and we continued..
You may see an example of running this utility in the next paragraphs..
Normally, I dont like to use this kind of tools, but in this incident it helped a lot, and I recommend using it for diagnosing the problems in the complicated environment.. It might come in handy one day..
Here , we had a problem with ASM, the problem was obvious : asmResilver2 process was running all the time.. The result was obvious but the cause was not.. OS logs were easy to reach , but the grid infrastructure logs were in several locations.. Several daemons were there.. This might even be related with the patches.. I mean there could be some ASM patch missing in this environment.
asmResilver2 was something like a kernel thread. So ASM Resilvering was involved, ASM was involved, OS was involved and also all the related daemons was involved.. (The environment was a virtualized ODA X4-2, there were OAK, ACFS , shared repositories and etc..) So all were involved and anything in this software stack could be related with the problem..
In Grid part; we had several places to look for the logs.. (crsd, ocssd,ohasd, oracssdmonitor, orarootagent , ocr and etc..)
[grid@daroravmsrv1 +ASM1]$ adrci
ADRCI: Release 11.2.0.4.0 - Production on Thu Apr 30 14:22:34 2015
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
ADR base = "/u01/app/grid"
adrci> set home diag/asm/+asm/+ASM1
adrci> show problem
ADR Home = /u01/app/grid/diag/asm/+asm/+ASM1:
*************************************************************************
PROBLEM_ID PROBLEM_KEY LAST_INCIDENT LASTINC_TIME
-------------------- ----------------------------------------------------------- -------------------- ----------------------------------------
1 ORA 600 [17183] 34825 2015-04-27 18:15:19.358000 +03:00
1 rows fetched
We saw our problem.. We had an ORA-600 in ASM alert log..
Note that we did not check alert log manually.. We did not made any effort to find it..
adrci> IPS CREATE PACKAGE INCIDENT 34825
Created package 2 based on incident id 34825, correlation level typical
Here we created our package, the newly created package will be in ADR HOME/incpkg/pkg_{PACKAGE_NUMER)
Then we zipped the package contents in to a zip file stored in tmp directory.. (it is optional, zipping may be used for sending the package to Oracle Support)
adrci> ips generate package 2 in "/tmp"
Generated package 2 in file /tmp/ORA600171_20150430142314_COM_1.zip, mode complete
Here our work with ADRCI was finished.. At this point we could start making our diagnostics using the log files in the ADR package..
The contents of the package,
[root@ermansrv1 pkg_2]# tree seq_1/
seq_1/
|-- config.xml
|-- crs
| |-- alertdaroravmsrv1.log
| |-- crflogd.l04
| |-- crflogd.l05
| |-- crfmond.l05
| |-- crfmond.l06
| |-- crfmond.l07
| |-- crfmond.l08
| |-- crfmond.l09
| |-- crfmond.l10
| |-- crsctl_root.log
| |-- crsd.log
| |-- crsdiag.log
| |-- cvuhelper.log.0
| |-- cvutrace.log.2
| |-- evmd.log
| |-- gipcd.l06
| |-- gipcd.l07
| |-- gipcd.l08
| |-- gipcd.l09
| |-- gipcd.l10
| |-- gipcdOUT.log
| |-- gpnpd.log
| |-- gpnpdOUT.log
| |-- gpnptool_43748.log
| |-- mdnsd.log
| |-- mdnsdOUT.log
| |-- ocr_14337_3.log
| |-- ocr_14392_3.log
| |-- ocr_14543_3.log
| |-- ocr_14601_3.log
| |-- ocr_14654_3.log
| |-- ocr_19298_3.log
| |-- ocr_20334_3.log
| |-- ocr_20730_3.log
| |-- ocr_29140_3.log
| |-- ocr_30921_3.log
| |-- ocr_31641_3.log
| |-- ocr_38810_3.log
| |-- ocr_40936_3.log
| |-- ocr_41358_3.log
| |-- ocr_45262_3.log
| |-- ocr_45592_3.log
| |-- ocr_45599_3.log
| |-- ocr_45684_3.log
| |-- ocr_45837_3.log
| |-- ocr_45920_3.log
| |-- ocr_46467_3.log
| |-- ocr_46517_3.log
| |-- ocr_46581_3.log
| |-- ocr_46656_3.log
| |-- ocr_46706_3.log
| |-- ocr_46753_3.log
| |-- ocr_46846_3.log
| |-- ocr_46891_3.log
| |-- ocr_46938_3.log
| |-- ocr_47009_3.log
| |-- ocr_47052_3.log
| |-- ocr_47574_3.log
| |-- ocr_47670_3.log
| |-- ocr_47726_3.log
| |-- ocr_47818_3.log
| |-- ocr_47897_3.log
| |-- ocr_47945_3.log
| |-- ocr_48002_3.log
| |-- ocr_48071_3.log
| |-- ocr_48126_3.log
| |-- ocr_48613_3.log
| |-- ocr_48734_3.log
| |-- ocr_48790_3.log
| |-- ocr_48837_3.log
| |-- ocr_48930_3.log
| |-- ocr_48975_3.log
| |-- ocr_49025_3.log
| |-- ocr_49094_3.log
| |-- ocr_49276_3.log
| |-- ocr_49367_3.log
| |-- ocr_49906_3.log
| |-- ocr_49976_3.log
| |-- ocr_50049_3.log
| |-- ocr_50173_3.log
| |-- ocr_50221_3.log
| |-- ocr_50265_3.log
| |-- ocr_50335_3.log
| |-- ocr_50381_3.log
| |-- ocr_50428_3.log
| |-- ocr_50985_3.log
| |-- ocr_51049_3.log
| |-- ocr_51108_3.log
| |-- ocr_51170_3.log
| |-- ocr_51265_3.log
| |-- ocr_51319_3.log
| |-- ocr_51391_3.log
| |-- ocr_51448_3.log
| |-- ocr_51488_3.log
| |-- ocr_52071_3.log
| |-- ocr_52180_3.log
| |-- ocssd.l02
| |-- ocssd.l03
| |-- ocssd.l04
| |-- octssd.l03
| |-- octssd.l04
| |-- octssd.l05
| |-- ohasd.log
| |-- olsnodes.log
| |-- oraagent_grid.l01
| |-- oraagent_grid.log
| |-- oracssdagent_root.log
| |-- oracssdmonitor_root.log
| |-- orarootagent_root.l01
| |-- orarootagent_root.l02
| |-- orarootagent_root.l03
| |-- orarootagent_root.log
| |-- scriptagent_grid.l01
| `-- scriptagent_grid.log
|-- export
| |-- DDE_USER_ACTION.dmp
| |-- DDE_USER_ACTION_DEF.dmp
| |-- DDE_USER_ACTION_PARAMETER.dmp
| |-- DDE_USER_ACTION_PARAMETER_DEF.dmp
| |-- DDE_USER_INCIDENT_ACTION_MAP.dmp
| |-- DDE_USER_INCIDENT_TYPE.dmp
| |-- EM_USER_ACTIVITY.dmp
| |-- HM_RUN.dmp
| |-- INCCKEY.dmp
| |-- INCIDENT.dmp
| |-- INCIDENT_FILE.dmp
| |-- IPS_CONFIGURATION.dmp
| |-- IPS_FILE_COPY_LOG.dmp
| |-- IPS_FILE_METADATA.dmp
| |-- IPS_PACKAGE.dmp
| |-- IPS_PACKAGE_FILE.dmp
| |-- IPS_PACKAGE_HISTORY.dmp
| |-- IPS_PACKAGE_INCIDENT.dmp
| `-- PROBLEM.dmp
|-- manifest_2_1.html
|-- manifest_2_1.txt
|-- manifest_2_1.xml
|-- metadata.xml
|-- opatch
| |-- opatch.log
| `-- opatch.xml
`-- progress.log
3 directories, 141 files
So as you see; using ADRCI makes the diagnostic process easier.. We reach all the relevant log files (or lets say files which can be used to diagnose a problem) in a single directory..
Note that the files we dont even know will be there in the package, too.. This does not only make the diagnostic easier, this also elaborate the diagnostics process.. I mean all the files which may be looked for finding errors are there in the ADR package, so this eliminates the risk to miss out a log file while making diagnostic analysis.
On the other hand, as I mentioned above in complicated environments we need to check the other files too.. (OS messages, other program logs and etc..)
No comments :
Post a Comment
If you will ask a question, please don't comment here..
For your questions, please create an issue into my forum.
Forum Link: http://ermanarslan.blogspot.com.tr/p/forum.html
Register and create an issue in the related category.
I will support you from there.