Monday, January 16, 2017

ODA -- Replacing a failed Disk, missing disk diagnotics, Firmware Upgrade required or not?

The environment that is the subject of this article is a Bare Metal ODA X5 machine.
In this blog post, I will go through a real life case, where we needed to replace a failed hard drive of an ODA X5 machine.

The cause that made me write this blog post, is actually a lack of a crucial info. You will understand what I mean when I will go in to the details of our case, but first  take a look at the process of changing a disk drive in an ODA machine.

The disk replacement on ODA is easy.

We basically follow the MOS document named: How to Replace an ODA (Oracle Database Appliance) FAILED/ PredictiveFail Shared Storage Disk ( Doc ID 1435946.1 )

In case of ODA, we(customers or consultants) are replacing the disks , no need for Oracle Field engineers for that. That's why, the disk components are called CRU's (Customer Replacable Units)

The actions to be taken for replacing a failed disk can be summarized as follows;
  • Identify the failed disk (do the diagnostics)
  • Take out the failed fisk
  • Wait 2-3 mins
  • Attach the new disk
  • Check the conditions and take post actions if necessary
Physically replacing the disk is so easy, that's why I will not concantrate on that in this article.
However; the steps: identifying the failed disk and the post actions are interesting. So here in this post; I will give some details about them.

Before the replacement, we just check the failed disk and be sure that it is in failed status (STATE_DETAILS=DiskRemoved or PredictiveFail)

We use oakcli show disk command for checking the disks and their status.

Note that, the failed disk may not be there in the output and if this happens is totally fine. 
It means the failed disk is removed.

In this example, I m doing my check for a failed disk drive on slot 10 ( the slot number can be obtained by looking at the machine rack itself as well).  



Here, first I use the oakcli show disk command and see the disk in slot 10 is not there at all.

[root@ermanoda0 ~]# oakcli show disk
        NAME            PATH            TYPE            STATE           STATE_DETAILS
        e0_pd_00        /dev/sdc        HDD             ONLINE          Good           
        e0_pd_01        /dev/sdd        HDD             ONLINE          Good           
        e0_pd_02        /dev/sde        HDD             ONLINE          Good           
        e0_pd_03        /dev/sdf        HDD             ONLINE          Good           
        e0_pd_04        /dev/sdg        HDD             ONLINE          Good           
        e0_pd_05        /dev/sdh        HDD             ONLINE          Good           
        e0_pd_06        /dev/sdi        HDD             ONLINE          Good           
        e0_pd_07        /dev/sdj        HDD             ONLINE          Good           
        e0_pd_08        /dev/sdaa       HDD             ONLINE          Good           
        e0_pd_09        /dev/sdac       HDD             ONLINE          Good
---Attention no output for e0_pd_10--           
        e0_pd_11        /dev/sdag       HDD             ONLINE          Good           
        e0_pd_12        /dev/sdai       HDD             ONLINE          Good           
        e0_pd_13        /dev/sdak       HDD             ONLINE          Good           
        e0_pd_14        /dev/sdam       HDD             ONLINE          Good           
        e0_pd_15        /dev/sdao       HDD             ONLINE          Good           
        e0_pd_16        /dev/sdab       SSD             ONLINE          Good           
        e0_pd_17        /dev/sdad       SSD             ONLINE          Good           
        e0_pd_18        /dev/sdaf       SSD             ONLINE          Good           
        e0_pd_19        /dev/sdah       SSD             ONLINE          Good           
        e0_pd_20        /dev/sdaj       SSD             ONLINE          Good           
        e0_pd_21        /dev/sdal       SSD             ONLINE          Good           
        e0_pd_22        /dev/sdan       SSD             ONLINE          Good           
        e0_pd_23        /dev/sdap       SSD             ONLINE          Good       

As I see the disk is not there in the output, I m doing more checks to be sure that the disk is not seen by OS or any other software component on ODA system.

--I do my checks on both of the ODA nodes..

First, checking the multipath devices;

multipath -ll output:

HDD_E0_S10_992975636 (35000cca23b2f9b14) dm-14 
size=7.2T features='0' hwhandler='0' wp=rw
"no disk paths listed here"

Well. The failed disk's multipath device name should be dm-14, as there is no disk paths listed for it.

I also don't see any slaves for it..

cd /sys/block/dm-14/slaves
ls -al
total 0
drwxr-xr-x 2 root root 0 Aug  1 10:28 .
drwxr-xr-x 8 root root 0 Mar  1  2016 ..
No devices...

This should give the real device names..

For ex: working a dm:
cd /sys/block/dm-14/slaves
ls -al
total 0
drwxr-xr-x 2 root root 0 Aug  1 10:28 .
drwxr-xr-x 8 root root 0 Mar  1  2016 ..
lrwxrwxrwx 1 root root 0 Jan  3 18:19 sde -> ../../sde
lrwxrwxrwx 1 root root 0 Jan  3 18:19 sdo -> ../../sdo

[root@ermanoda0 mapper]# ls -lrt|grep S10
brw-rw---- 1 grid asmadmin 252,  14 Mar  1  2016 HDD_E0_S10_992975636
brw-rw---- 1 grid asmadmin 252,  49 Jul 24 17:26 HDD_E0_S10_992975636p2
brw-rw---- 1 grid asmadmin 252,  36 Sep 24 09:59 HDD_E0_S10_992975636p1

Well.. the multipath devices are there,. But these are multipath devices right? They are not physical. So it is not a bad thing.
Continuing my diagnostics..

Next, I check the OAK logs;

log/ermanoda0/oak/oakd.l45:2016-12-15 11:22:08.886: [CLSFRAME][4160715072]{0:35:2} payload=|OAKERR : 9009 : Couldn't find the resource: e0_pd_10||
log/ermanoda0/oak/oakd.l45:2016-12-15 11:22:08.886: [CLSFRAME][4160715072]{0:35:2} String params:CmdUniqId=|ServiceName=e0_pd_10|pname=Error|
log/ermanoda0/oak/oakd.l45:2016-12-15 11:22:08.886: [   OAKFW][4160715072]{0:35:2} PE sending last reply for: MIDTo:1|OpID:1|FromA:{Relative|Node:0|Process:35|Type:2}|ToA:{Relative|Node:0|Process:0|Type:1}|MIDFrom:4|Type:1|Pri2|Id:8:Ver:2Value params:payload=|OAKERR : 9009 : Couldn't find the resource: e0_pd_10||String params:CmdUniqId=|ServiceName=e0_pd_10|pname=Error|Int params:ErrCode=0|MsgId=4359|flag=2|sflag=64|

OAK says, I can't find the e0_pd_10 resource, which actually corresponds to our failed disk. So this is normal.

In OAKCLI logs, I see the Disk Removed state for our failed disk, which is totally expected.

log/ermanoda0/client/oakcli.log:2016-08-01 11:02:51.999: [  OAKCLI][2575886656]     e0_pd_10        /dev/sdae       HDD             FAILED          DiskRemoved  

I check the physical device name, and it is not there (as expected)

--> ls-al /dev/sdae
ls: /dev/sdae: No such file or directory

I also check the fishwrap logs and see that the failed disk is deleted.

log/fishwrap/fishwrap.log:Sun Jul 24 17:27:39 2016: deleting an old disk: /dev/sg17

In Fishwrap log: Sun Jul 24 17:27:39 2016: Slot [10] sas-addr = 5000cca23b2f9b16

Sun Jul 24 17:27:39 2016: fwr_scsi_tree_topology_update finish, device num = 51

In this log, I see that, the SCSI deviced count was 53 before the failed disk. After the failed disk it have become 51. (this is also expected)

EARLIER:

Tue Mar  1 14:13:37 2016: Number of SCSI device found = 53, existing = 53 

NOW:

Sun Jul 24 17:27:38 2016: Number of SCSI device found = 51, existing = 53
Sun Jul 24 17:27:38 2016: fwr_scsi_tree_topology_update: expander update start

I execute a Storage Diagnostics and see the following in its output;

 8  : fwupdate
          [INFO]: fwupdate does not see disk from both controllers

   9  : Fishwrap
          [INFO]: Fishwrap not able to discover disk

  10  : Check for shared disk write cache status
          [INFO]: Unable to find OS devices for slot 10

  11  : SCSI INQUIRY
          [INFO]: Unable to run scsi inquiry command on disk as OS device are absent
          [INFO]: Unable to run scsi inquiry command on disk as OS device are absent

  12  : Multipath Conf for device    
           multipath {
             wwid 35000cca23b2f9b14
             alias HDD_E0_S10_992975636
       }

  13  : Last few LSI Events Received for slot 10
          [INFO]: No LSI events are recorded in OAKD logs

  14  : Version Information
          OAK              :  12.1.2.4.0
          kernel           :  2.6.39-400.250.6.el5uek
          mpt2sas          :  17.00.06.00
          Multipath        :  0.4.9  

  15  : OAK Conf Parms
          [INFO]: No scsi devices found for slot 10

In summary; 

I only see the Multipath device names are present for the failed disk and all the other things that are related with the failed disk are removed.
That 's why, I conclude that the failed disk is eliminated from OS and OAK.
So, the disk is ready to be replaced and the environment was totally in an expected state. 
I did such diagnostics to ensure the disk is removed properly ,because I could not see the failed disk with a failure status in oakcli output ... (Remember oakcli show disk doesn't report this disk at all)
So, oak removed the disk and it is not listing any more. Maybe it was the behaviour of it after a reboot, but anyways; there is a thing that needs to be added to the documentation right? :)

Well. After the disk is replaced online, every OS and OAK related thing is done automatically and transparently. Here is the status of the checks;

fwupdate list disk

===============
ID        Manufacturer   Model               Chassis Slot   Type   Media   Size(GiB) FW Version XML Support
-----------------------------------------------------------------------------------------------------------
c2d0      HGST           H7280A520SUN8.0T    0       0      sas    HDD     7325      P554       N/A        
c2d1      HGST           H7280A520SUN8.0T    0       1      sas    HDD     7325      P554       N/A        
c2d2      HGST           H7280A520SUN8.0T    0       2      sas    HDD     7325      P554       N/A        
c2d3      HGST           H7280A520SUN8.0T    0       3      sas    HDD     7325      P554       N/A        
c2d4      HGST           H7280A520SUN8.0T    0       4      sas    HDD     7325      P554       N/A        
c2d5      HGST           H7280A520SUN8.0T    0       5      sas    HDD     7325      P554       N/A        
c2d6      HGST           H7280A520SUN8.0T    0       6      sas    HDD     7325      P554       N/A        
c2d7      HGST           H7280A520SUN8.0T    0       7      sas    HDD     7325      P554       N/A        
c2d8      HGST           H7280A520SUN8.0T    0       8      sas    HDD     7325      P554       N/A        
c2d9      HGST           H7280A520SUN8.0T    0       9      sas    HDD     7325      P554       N/A        
c2d10     HGST           H7280A520SUN8.0T    0       10     sas    HDD     7325      P9E2       N/A        
c2d11     HGST           H7280A520SUN8.0T    0       11     sas    HDD     7325      P554       N/A        
c2d12     HGST           H7280A520SUN8.0T    0       12     sas    HDD     7325      P554       N/A        
c2d13     HGST           H7280A520SUN8.0T    0       13     sas    HDD     7325      P554       N/A        
c2d14     HGST           H7280A520SUN8.0T    0       14     sas    HDD     7325      P554       N/A        
c2d15     HGST           H7280A520SUN8.0T    0       15     sas    HDD     7325      P554       N/A        
c2d16     HGST           HSCAC2DA4SUN400G    0       16     sas    SSD     373       A29A       N/A       
c2d17     HGST           HSCAC2DA4SUN400G    0       17     sas    SSD     373       A29A       N/A       
c2d18     HGST           HSCAC2DA4SUN400G    0       18     sas    SSD     373       A29A       N/A       
c2d19     HGST           HSCAC2DA4SUN400G    0       19     sas    SSD     373       A29A       N/A       
c2d20     HGST           HSCAC2DA6SUN200G    0       20     sas    SSD     186       A29A       N/A       
c2d21     HGST           HSCAC2DA6SUN200G    0       21     sas    SSD     186       A29A       N/A       
c2d22     HGST           HSCAC2DA6SUN200G    0       22     sas    SSD     186       A29A       N/A       
c2d23     HGST           HSCAC2DA6SUN200G    0       23     sas    SSD     186       A29A       N/A  

[root@ermanoda0 ~]# oakcli show disk
        NAME            PATH            TYPE            STATE           STATE_DETAILS 
        e0_pd_00        /dev/sdc        HDD             ONLINE          Good           
        e0_pd_01        /dev/sdd        HDD             ONLINE          Good           
        e0_pd_02        /dev/sde        HDD             ONLINE          Good           
        e0_pd_03        /dev/sdf        HDD             ONLINE          Good           
        e0_pd_04        /dev/sdg        HDD             ONLINE          Good           
        e0_pd_05        /dev/sdh        HDD             ONLINE          Good           
        e0_pd_06        /dev/sdi        HDD             ONLINE          Good           
        e0_pd_07        /dev/sdj        HDD             ONLINE          Good           
        e0_pd_08        /dev/sdaa       HDD             ONLINE          Good           
        e0_pd_09        /dev/sdac       HDD             ONLINE          Good           
        e0_pd_10        /dev/sdp        HDD             ONLINE          Good           
        e0_pd_11        /dev/sdag       HDD             ONLINE          Good           
        e0_pd_12        /dev/sdai       HDD             ONLINE          Good           
        e0_pd_13        /dev/sdak       HDD             ONLINE          Good           
        e0_pd_14        /dev/sdam       HDD             ONLINE          Good           
        e0_pd_15        /dev/sdao       HDD             ONLINE          Good           
        e0_pd_16        /dev/sdab       SSD             ONLINE          Good           
        e0_pd_17        /dev/sdad       SSD             ONLINE          Good           
        e0_pd_18        /dev/sdaf       SSD             ONLINE          Good           
        e0_pd_19        /dev/sdah       SSD             ONLINE          Good           
        e0_pd_20        /dev/sdaj       SSD             ONLINE          Good           
        e0_pd_21        /dev/sdal       SSD             ONLINE          Good           
        e0_pd_22        /dev/sdan       SSD             ONLINE          Good           
        e0_pd_23        /dev/sdap       SSD             ONLINE          Good          

Even the multipath.conf file is updated automatically;

    multipath {
      wwid 35000cca2604a90ac   --> this is the wwid of the newly added disk
      alias HDD_E0_S10_1615499436
}

[root@ermanoda0 etc]# lsscsi
[0:2:0:0]    disk    LSI      MR9361-8i        4.23  /dev/sda 
[7:0:0:0]    disk    ORACLE   SSM              PMAP  /dev/sdb 
[8:0:0:0]    enclosu ORACLE   DE2-24C          0018  -       
[8:0:1:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdc 
[8:0:2:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdd 
[8:0:3:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sde 
[8:0:4:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdf 
[8:0:5:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdg 
[8:0:6:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdh 
[8:0:7:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdi 
[8:0:8:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdj 
[8:0:9:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdl 
[8:0:10:0]   disk    HGST     H7280A520SUN8.0T P554  /dev/sdn 
[8:0:12:0]   disk    HGST     H7280A520SUN8.0T P554  /dev/sdr 
[8:0:13:0]   disk    HGST     H7280A520SUN8.0T P554  /dev/sdt 
[8:0:14:0]   disk    HGST     H7280A520SUN8.0T P554  /dev/sdv 
[8:0:15:0]   disk    HGST     H7280A520SUN8.0T P554  /dev/sdx 
[8:0:16:0]   disk    HGST     H7280A520SUN8.0T P554  /dev/sdz 
[8:0:17:0]   disk    HGST     HSCAC2DA4SUN400G A29A  /dev/sdab
[8:0:18:0]   disk    HGST     HSCAC2DA4SUN400G A29A  /dev/sdad
[8:0:19:0]   disk    HGST     HSCAC2DA4SUN400G A29A  /dev/sdaf
[8:0:20:0]   disk    HGST     HSCAC2DA4SUN400G A29A  /dev/sdah
[8:0:21:0]   disk    HGST     HSCAC2DA6SUN200G A29A  /dev/sdaj
[8:0:22:0]   disk    HGST     HSCAC2DA6SUN200G A29A  /dev/sdal
[8:0:23:0]   disk    HGST     HSCAC2DA6SUN200G A29A  /dev/sdan
[8:0:24:0]   disk    HGST     HSCAC2DA6SUN200G A29A  /dev/sdap
[8:0:25:0]   disk    HGST     H7280A520SUN8.0T P9E2  /dev/sdp 
[9:0:0:0]    enclosu ORACLE   DE2-24C          0018  -       
[9:0:1:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdk 
[9:0:2:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdm 
[9:0:3:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdo 
[9:0:4:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdq 
[9:0:5:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sds 
[9:0:6:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdu 
[9:0:7:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdw 
[9:0:8:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdy 
[9:0:9:0]    disk    HGST     H7280A520SUN8.0T P554  /dev/sdaa
[9:0:10:0]   disk    HGST     H7280A520SUN8.0T P554  /dev/sdac
[9:0:12:0]   disk    HGST     H7280A520SUN8.0T P554  /dev/sdag
[9:0:13:0]   disk    HGST     H7280A520SUN8.0T P554  /dev/sdai
[9:0:14:0]   disk    HGST     H7280A520SUN8.0T P554  /dev/sdak
[9:0:15:0]   disk    HGST     H7280A520SUN8.0T P554  /dev/sdam
[9:0:16:0]   disk    HGST     H7280A520SUN8.0T P554  /dev/sdao
[9:0:17:0]   disk    HGST     HSCAC2DA4SUN400G A29A  /dev/sdaq
[9:0:18:0]   disk    HGST     HSCAC2DA4SUN400G A29A  /dev/sdar
[9:0:19:0]   disk    HGST     HSCAC2DA4SUN400G A29A  /dev/sdas
[9:0:20:0]   disk    HGST     HSCAC2DA4SUN400G A29A  /dev/sdat
[9:0:21:0]   disk    HGST     HSCAC2DA6SUN200G A29A  /dev/sdau
[9:0:22:0]   disk    HGST     HSCAC2DA6SUN200G A29A  /dev/sdav
[9:0:23:0]   disk    HGST     HSCAC2DA6SUN200G A29A  /dev/sdaw
[9:0:24:0]   disk    HGST     HSCAC2DA6SUN200G A29A  /dev/sdax
[9:0:25:0]   disk    HGST     H7280A520SUN8.0T P9E2  /dev/sdae

[root@ermanoda0 etc]# oakcli show disk e0_pd_10
Resource: e0_pd_10
        ActionTimeout   :       1500           
        ActivePath      :       /dev/sdp       
        AsmDiskList     :       |e0_data_10||e0_reco_10|
        AutoDiscovery   :       1              
        AutoDiscoveryHi :       |data:43:HDD||reco:57:HDD||redo:100
                                :SSD||flash:100:SSD|
        CheckInterval   :       300            
        ColNum          :       2              
        DependListOpr   :       add            
        Dependency      :       |0|            
        DiskId          :       35000cca2604a90ac
        DiskType        :       HDD            
        Enabled         :       0              
        ExpNum          :       0              
        IState          :       0              
        Initialized     :       1              
        IsConfigDepende :       false          
        MonitorFlag     :       0              
        MultiPathList   :       |/dev/sdae||/dev/sdp|
        Name            :       e0_pd_10       
        NewPartAddr     :       0              
        OSUserType      :       |userType:Multiuser|
        PrevState       :       UnInitialized  
        PrevUsrDevName  :       HDD_E0_S10_1615499436
        SectorSize      :       512            
        SerialNum       :       001634PA07WV   
        Size            :       7865536647168  
        SlotNum         :       10             
        State           :       Online         
        StateChangeTs   :       1483596721     
        StateDetails    :       Good           
        TotalSectors    :       15362376264    
        TypeName        :       0              
        UsrDevName      :       HDD_E0_S10_1615499436
        gid             :       0              
        mode            :       660            
        uid             :       0              

}

It is also seen in OS logs... The disk is discovered properly by OS;

/var/log/messages: 

Jan  5 08:10:29 ermanoda0 kernel: mpt3sas0: detecting: handle(0x0024), sas_address(0x5000cca2604a90ae), phy(10)
Jan  5 08:10:35 ermanoda0 kernel: scsi 9:0:25:0: Direct-Access     HGST     H7280A520SUN8.0T P9E2 PQ: 0 ANSI: 6
Jan  5 08:10:35 ermanoda0 kernel: scsi 8:0:25:0: Direct-Access     HGST     H7280A520SUN8.0T P9E2 PQ: 0 ANSI: 6
Jan  5 08:10:35 ermanoda0 kernel: scsi 8:0:25:0: SSP: handle(0x0024), sas_addr(0x5000cca2604a90ae), phy(10), device_name(0x5000cca2604a90af)
Jan  5 08:10:35 ermanoda0 kernel: scsi 8:0:25:0: SSP: enclosure_logical_id(0x5080020001ecf27e), slot(80)
Jan  5 08:10:35 ermanoda0 kernel: scsi 8:0:25:0: serial_number(001634PA07WV        VLHA07WV)
Jan  5 08:10:35 ermanoda0 kernel: scsi 8:0:25:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Jan  5 08:10:35 ermanoda0 kernel: scsi 9:0:25:0: SSP: handle(0x0024), sas_addr(0x5000cca2604a90ad), phy(10), device_name(0x5000cca2604a90af)
Jan  5 08:10:35 ermanoda0 kernel: scsi 9:0:25:0: SSP: enclosure_logical_id(0x5080020001eceb7e), slot(80)
Jan  5 08:10:35 ermanoda0 kernel: scsi 9:0:25:0: serial_number(001634PA07WV        VLHA07WV)
Jan  5 08:10:35 ermanoda0 kernel: scsi 9:0:25:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Jan  5 08:10:35 ermanoda0 kernel: sd 8:0:25:0: Attached scsi generic sg17 type 0
Jan  5 08:10:35 ermanoda0 kernel: sd 8:0:25:0: [sdp] Enabling DIF Type 1 protection
Jan  5 08:10:35 ermanoda0 kernel: sd 8:0:25:0: [sdp] 15362376264 512-byte logical blocks: (7.86 TB/7.15 TiB)
Jan  5 08:10:35 ermanoda0 kernel: sd 8:0:25:0: [sdp] 4096-byte physical blocks
Jan  5 08:10:35 ermanoda0 kernel: sd 9:0:25:0: [sdae] Enabling DIF Type 1 protection
Jan  5 08:10:35 ermanoda0 kernel: sd 9:0:25:0: [sdae] 15362376264 512-byte logical blocks: (7.86 TB/7.15 TiB)
Jan  5 08:10:35 ermanoda0 kernel: sd 9:0:25:0: [sdae] 4096-byte physical blocks
Jan  5 08:10:35 ermanoda0 kernel: sd 8:0:25:0: [sdp] Write Protect is off
Jan  5 08:10:35 ermanoda0 kernel: sd 9:0:25:0: [sdae] Write Protect is off
Jan  5 08:10:35 ermanoda0 kernel: sd 9:0:25:0: [sdae] Write cache: disabled, read cache: enabled, supports DPO and FUA
Jan  5 08:10:35 ermanoda0 kernel: sd 8:0:25:0: [sdp] Write cache: disabled, read cache: enabled, supports DPO and FUA
Jan  5 08:10:35 ermanoda0 kernel:  sdae:
Jan  5 08:10:35 ermanoda0 kernel: sd 9:0:25:0: Attached scsi generic sg32 type 0
Jan  5 08:10:35 ermanoda0 kernel:  sdp:
Jan  5 08:10:35 ermanoda0 kernel: sd 9:0:25:0: [sdae] Attached SCSI disk
Jan  5 08:10:35 ermanoda0 kernel: sd 8:0:25:0: [sdp] Attached SCSI disk

On the other hand, there is one thing that still needs to be cleared and it is the firmware of the newly added disk.
This part is actually the part named "Check the conditions and take post actions if necessary".

Here, I use the oakcli show version -detail command to see the installed and supported firmware version of the ODA Components.

[root@ermanoda0 device]# oakcli show version -detail
Reading the metadata. It takes a while...
System Version  Component Name            Installed Version         Supported Version        
--------------  ---------------           ------------------        -----------------        
12.1.2.4.0                                                                                   
                Controller_INT            4.230.40-3739             Up-to-date               
                Controller_EXT            06.00.02.00               Up-to-date               
                Expander                  0018                      Up-to-date               
                SSD_SHARED {                                                                 
                [ c2d20,c2d21,c2d22,      A29A                      A122                     
                c2d23 ]                                                                      
                [ c2d16,c2d17,c2d18,      A29A                      A122                     
                c2d19 ]                                                                      
                             }                                                               
                HDD_LOCAL                 A720                      Up-to-date               
                HDD_SHARED {                                                                 
                [ c2d0,c2d1,c2d2,c2d      P554                      Up-to-date               
                3,c2d4,c2d5,c2d6,c2d                                                         
                7,c2d8,c2d9,c2d11,c2                                                         
                d12,c2d13,c2d14,c2d1                                                         
                5 ]                                                                          
                [ c2d10 ]                 P9E2                      P554                     
                             }                                                               
                ILOM                      3.2.4.42 r99377           Up-to-date               
                BIOS                      30040200                  Up-to-date               
                IPMI                      1.8.12.0                  Up-to-date               
                HMP                       2.3.2.4.1                 Up-to-date               
                OAK                       12.1.2.4.0                Up-to-date               
                OL                        5.11                      Up-to-date               
                GI_HOME                   12.1.0.2.4(20831110,      Up-to-date               
                                          20831113)                                          
                DB_HOME {                                                                    
                [ OraDb12102_home1 ]      12.1.0.2.4(20831110,      Up-to-date               
                                          20831113)                                          
                [ OraDb11203_home1 ]      11.2.0.3.15(20760997      Up-to-date               
                                          ,17592127)                                         
                             }  

As seen in the output above, while the firmware of all the other disks are P554, the newly added disk has P9E2 firmware.
Well... The column named "supported" is showing the firmware P554.
So here is the question which needs to be answered? Is P9E2 newer than P554?

This question needs to be answered. That is, if the firmware version of the newly added disk is older than the supported one, then a Firmware upgrade needs to be done using "oakcli update -patch <patch_bundle_version> --infra" command

Reference from note "How to Replace an ODA (Oracle Database Appliance) FAILED/ PredictiveFail Shared Storage Disk ( Doc ID 1435946.1 )"

"If the newly replaced disk has older firmware from what the ODA Software is expecting, you will need to update the firmware on this disk.
If the disk has newer firmware from the existing disks, this is fine, and the firmware does not need to be downgraded to match existing disks."!!!


However, there is no info about P9E2 on Internet or Oracle Support. 
Actually, there is an info about P901..
Here in note Oracle Database Appliance FAQ: Common Questions Regarding Firmware Versions on ODA (Doc ID 2119003.1), it says: 

The P901 firmware is a newer firmware version for the 8 TB drive, and firmware P554 is the one we released in 12.1.2.5.

So p901> p554 right? I mean 901 is a greater value.
In the same logic p9E2 is greater than p901, but still we create an SR for it.
Here is what Oracle Support(SR) says:

The FirmWare P9E2 of the replaced shard storage disk in slot 10 is the latest FW which has been released with ODA image 12.1.2.8.0 in Sept 2016.

So, the problem solved. No need for firmware upgrade..

At the end of the day, we see the FAQ is not up-to-date and there is no document for getting info about ODA HDD firmware versions (a version compatibility between ODA versions and Component Versions).
Fortunetaly, we at least have Oracle Support :)

No comments :

Post a Comment