Tuesday, May 23, 2017

RDBMS/Exadata -- Estimating the HCC compression ratio and the gain //using DBMS_COMPRESSION.GET_COMPRESSION_RATIO

If you are considering buying an Exadata and if you are wondering how much storage space that you can save with the Exadata specific compression methods, you should take a look at this one.

You can see the compression ratio and calculate the % gain for a table by executing the dbms_compression.get_compression_ratio (specifying COMP_FOR_QUERY_HIGH or other compression types such as COMPRESS_FOR_QUERY_LOW as the compression type.)

You can use this method to compute the compression ratio and gain, even if you do not currently have the required hardware to run HCC. (In other words; you don't need your database running on Exadata or you don't need your database files residing on Pillar or ZFS storage hardware)

Note that, the name of compression types change according to the release. For example, it is COMP_QUERY_HIGH if the database is 12C.

Anyways, by using this method, you can have an idea about what you can gain in terms of storage savings.

Ofcourse, there are some requirements for using this method;
  • Your database must be 11gR2 ( I recommend >=11.2.0.2, because there are bug records for 11.2.0.1 dbms_compression )
  • The RDBMS compatibility parameter must to be set to at least 11.2.0.
  • The table to be estimated needs to have a minimum of 1 million rows.
The code reports the stats both from uncompressed and compressed version of the table (the code itself creates those in the scratch tablespace that we specify as the "scratchtbsname")

The code creates 2 internal tables named DBMS_TABCOMP_TEMP_UNCMP (uncompressed) and DBMS_TABCOMP_TEMP_CMP (compressed) . It generates the COMPRESSION_RATIO by comparing these two tables. It drops these tablespace immediately after the estimation.

Let's see an example for QUERY HIGH compression->
  • We get a big table such as XLA_DIAG_SOURCES;

SQL> select owner,table_name,num_rows from dba_tables where num_rows>100000 order by num_rows desc;

OWNER                          TABLE_NAME                       NUM_ROWS
------------------------------ ------------------------------ ----------
XLA                            XLA_DIAG_SOURCES                226354744

  • The user that owns the table to be analyzed, must have quota on the scratch tablespace. So we quato to the user. This is to avaoid "ORA-01950: no privileges on tablespace" 'XXERM' errors.
SQL>

 ALTER USER XLA QUOTA UNLIMITED ON XXERM;

User altered.    
  • We execute the DBMS_COMPRESSION.GET_COMPRESSION_RATIO.
SQL>

set serveroutput on
declare
 v_scratchtbsname varchar2(32) := 'XXERM';
 v_ownname varchar2(32) := 'XLA';
 v_tabname varchar2(32) := 'XLA_DIAG_SOURCES';
 v_partname varchar2(32) := null;
 v_comptype number(10,0) := DBMS_COMPRESSION.COMP_FOR_QUERY_HIGH;
 v_blkcnt_cmp pls_integer;
 v_blkcnt_uncmp pls_integer;
 v_row_cmp pls_integer;
 v_row_uncmp pls_integer;
 v_cmp_ratio number;
 v_comptype_str varchar2(4000);
begin
 DBMS_COMPRESSION.GET_COMPRESSION_RATIO (
 Scratchtbsname => v_scratchtbsname,
 ownname => v_ownname,
 tabname => v_tabname,
 partname => v_partname,
 comptype => v_comptype,
 blkcnt_cmp => v_blkcnt_cmp,
 blkcnt_uncmp => v_blkcnt_uncmp,
 row_cmp => v_row_cmp,
 row_uncmp => v_row_uncmp,
 cmp_ratio => v_cmp_ratio,
 comptype_str => v_comptype_str
 );
 dbms_output.put_line('Block count if compressed  :' || v_blkcnt_cmp);
 dbms_output.put_line('Block count if uncompressed: ' || v_blkcnt_uncmp);
 dbms_output.put_line('Row count per block if compressed: ' || v_row_cmp);
 dbms_output.put_line('Row count per block if uncompressed : ' || v_row_uncmp);
 dbms_output.put_line('Compression ratio :' || v_cmp_ratio);
 dbms_output.put_line('Gain (%) :' || round((v_blkcnt_uncmp-v_blkcnt_cmp)/v_blkcnt_uncmp*100,2));
 dbms_output.put_line('Method : ' || v_comptype_str);
end;

OUTPUT
--------------
Compression Advisor self-check validation successful. select count(*) on both
Uncompressed and EHCC Compressed format = 1000001 rows

Block count if compressed: 353
Block count if uncompressed:  19745
Row count if compressed: 2833
Row count if uncompressed : 50
Compression ratio :55.9
Gain (%) :98.21
Method : "Compress For Query High"

Let's see an example for QUERY LOW compression->

SQL> set serveroutput on
declare
v_scratchtbsname varchar2(32) := 'XXERM';
v_ownname varchar2(32) := 'XLA';
v_tabname varchar2(32) := 'XLA_DIAG_SOURCES';
v_partname varchar2(32) := null;
v_comptype number(10,0) := DBMS_COMPRESSION.COMP_FOR_QUERY_LOW;
v_blkcnt_cmp pls_integer;
v_blkcnt_uncmp pls_integer;
v_row_cmp pls_integer;
v_row_uncmp pls_integer;
v_cmp_ratio number;
v_comptype_str varchar2(4000);
begin
DBMS_COMPRESSION.GET_COMPRESSION_RATIO (
Scratchtbsname => v_scratchtbsname,
ownname => v_ownname,
tabname => v_tabname,
partname => v_partname,
comptype => v_comptype,
blkcnt_cmp => v_blkcnt_cmp,
blkcnt_uncmp => v_blkcnt_uncmp,
row_cmp => v_row_cmp,
row_uncmp => v_row_uncmp,
cmp_ratio => v_cmp_ratio,
comptype_str => v_comptype_str
);
 dbms_output.put_line('Block count if compressed  :' || v_blkcnt_cmp);
 dbms_output.put_line('Block count if uncompressed: ' || v_blkcnt_uncmp);
 dbms_output.put_line('Row count per block if compressed: ' || v_row_cmp);
 dbms_output.put_line('Row count per block if uncompressed : ' || v_row_uncmp);
 dbms_output.put_line('Compression ratio :' || v_cmp_ratio);
 dbms_output.put_line('Gain (%) :' || round((v_blkcnt_uncmp-v_blkcnt_cmp)/v_blkcnt_uncmp*100,2));
 dbms_output.put_line('Method : ' || v_comptype_str);
end;

OUTPUT
--------------------
Compression Advisor self-check validation successful. select count(*) on both
Uncompressed and EHCC Compressed format = 1000001 rows
Block count if compressed: 836
Block count if uncompressed:  19745
Row count if compressed: 1196
Row count if uncompressed : 50
Compression ratio : 23.6
Gain (%) : 95.77
Method : "Compress For Query Low"

Wednesday, May 17, 2017

RAC ASM -- Disaster Recovery solution (using storage techniques) snapshot/snapmirror, Netapp

This blog post is about an alternative Disaster Recovery solution that I  recently built through a POC (Proof of Concept).
The source system was a 2 Node Oracle RAC (Grid + RDBMS both 11.2.0.4) and the target system residing in DR was a 1 Node environment (GRID + RDBMS both 11.2.0.4).
Both source and target was using ASM for storing all the db files.
This made me think a little bit, becuase this alternative disaster recovery solution was relying on Netapp snapshot and snapmiror mechanisms.
We were planning to put the database to the backup mode (hot backup mode
/alter database begin backup), take the snapshot on the local storage, transfer the snapshot to the DR storage using snapmirror , map the luns created by this snapshot/snapmirror to the target server, end backup in target and open the database successfully in the target.

ASM was my concern, because  I had never done this before.
Actually I have implemented (even automated) this solution 6 years ago, but the filesystems were ext3 ( a cooked one) in that case.

So, I was afraid that ASM header info might get corrupted during the snapshot/snapmirror operations.
However, everything went well.

It is ofcourse not supported by Oracle. However,  I can say that ASM diskgroups can be replicated using the snapshot/snapmirror operations. (ofcourse if you know what you are doing..)

I did 3 tests and didn't encounter any problems.

In this blog post, I will give you the things that I did to replicate a 2 Node RAC system(Grid/ASM+RDBMS) to a single Node System(Grid/ASM+RDBMS) using Netapp's snapshot and snapmirror functionalities.

Following is a representative picture, explaining the work done.
As shown in the following picture, we take the snapshot of the ASM disks, replicate it to the DR using snapmirror and then map the luns to the target server running on the DR site, and make ASM running on the target server to see those disks...



The operation that was done for this, requires both Linux admin and storage admin skills. ( I don't see a need for mentioning the required DBA skills :) )
It requires Linux admin skills, because we do stuff like formatting the drives (we are working on new luns since it is a POC), rescanning the scsi bus and etc.
It requires the Storage admin skills, because we do stuff like taking snapshots, initiating snapmirror, mapping luns and etc.
Lastly, it requires DBA skills, because we put the database to the backup mode, change some init.ora parameters, make the replicated disks be known by ASM, open our database succesfully in the target node (recovery may ve needed), check the log files in case of an error and so on.

Let's see what I did in this POC ->
Note that, this is a technical overview. (doesn't have to include all the steps that I have done and all the commands that I have used).

RAC to SINGLE STORAGE REPLICATION (ASM diskgroup which is built on top of 3 disks)

1. Storage team creates 3 ASM disks (each sized approx. 13gb) and maps these luns to our Linux RAC nodes. (source nodes)
We do the scsi rescan to make Linux see these luns/disks, and then we create partition on them.  (oracleasm createdisk requires partition)

fdisk output just after the scsi rescan:

Disk /dev/sdbk: 11.8 GB, 11811160064 bytes
64 heads, 32 sectors/track, 11264 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000


Disk /dev/sdbl: 12.9 GB, 12884901888 bytes
64 heads, 32 sectors/track, 12288 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000


Disk /dev/sdbm: 14.0 GB, 13960740864 bytes
64 heads, 32 sectors/track, 13314 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000

Attention, before discovering the disks we use fdisk to create partitions on them.
fdisk output just after the partition creations:

Disk /dev/sdbk: 11.8 GB, 11811160064 bytes
64 heads, 32 sectors/track, 11264 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x42926436

    Device Boot      Start         End      Blocks   Id  System
/dev/sdbk1               1       11264    11534320   83  Linux

Disk /dev/sdbl: 12.9 GB, 12884901888 bytes
64 heads, 32 sectors/track, 12288 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xb900c4fd

    Device Boot      Start         End      Blocks   Id  System
/dev/sdbl1               1       12288    12582896   83  Linux

Disk /dev/sdbm: 14.0 GB, 13960740864 bytes
64 heads, 32 sectors/track, 13314 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x65f91faf

    Device Boot      Start         End      Blocks   Id  System
/dev/sdbm1               1       13314    13633520   83  Linux


2. We use oracleasm to create the ASM disks and again using oracleasm, we make ASM see them.

[root@source01 ~]# oracleasm createdisk DEV1 /dev/sdbk1
Writing disk header: done
Instantiating disk: done
[root@source01 ~]# oracleasm createdisk DEV2 /dev/sdbl1
Writing disk header: done
Instantiating disk: done
[root@source01 ~]# oracleasm createdisk DEV3 /dev/sdbm1
Writing disk header: done
Instantiating disk: done

[root@source01 ~]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
[root@source01 ~]# oracleasm listdisks

DATA1
DATA2
DATA3
DATA4
DATA5
DATA6
DATA7
DATA8
DATA9
DEV1 ** new disk
DEV2 ** new disk
DEV3 ** new disk
FRA1
FRA2
TEST1

3. In order to make these new ASM disks available in source node 2 (remember this is RAC) , we rescan the scsi and use oracleasm scandisks in source node 2 as well..

root@source02 ~]#  echo "- - -" > /sys/class/scsi_host/host0/scan
[root@source02 ~]#  echo "- - -" > /sys/class/scsi_host/host1/scan
[root@source02 ~]#  echo "- - -" > /sys/class/scsi_host/host2/scan
[root@source02 ~]#  echo "- - -" > /sys/class/scsi_host/host3/scan
[root@source02 ~]#  echo "- - -" > /sys/class/scsi_host/host4/scan
[root@source02 ~]# oracleasm  scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "DEV2"
Instantiating disk "DEV1"
Instantiating disk "DEV3"
[root@source02 ~]# oracleasm listdisks
DATA1
DATA2
DATA3
DATA4
DATA5
DATA6
DATA7
DATA8
DATA9
DEV1  ** new disk
DEV2  ** new disk
DEV3  ** new disk
FRA1
FRA2
TEST1

4. After these scsi and ASM disks become available on both of the RAC nodes, we create a diskgroup  called +DEV on these new 3 disks using ASMCA . After that we create a 2 node RAC database called DEVRAC and we configure this database and make all its files (controlfiles,redologs,datafiles etc..) be put into +DEV diskgroup.

SQL*Plus: Release 11.2.0.4.0 Production on Wed May 10 13:39:43 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> select instance_name from gv$instance;

INSTANCE_NAME
----------------
DEVRAC1
DEVRAC2

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
+DEV/devrac/system01.dbf
+DEV/devrac/sysaux01.dbf
+DEV/devrac/undotbs01.dbf
+DEV/devrac/users01.dbf
+DEV/devrac/undotbs02.dbf

SQL> select name from v$controlfile;

NAME
--------------------------------------------------------------------------------
+DEV/devrac/control01.ctl
+DEV/devrac/control02.ctl

SQL> select member from v$logfile;

MEMBER
--------------------------------------------------------------------------------
+DEV/devrac/redo02.log
+DEV/devrac/redo01.log
+DEV/devrac/redo03.log
+DEV/devrac/redo04.log


5. At this point, in order to make the necessary checks in the target environment, we create a table called ERMAN in the source. This table resides on a tablespace called ERMAN. (all the datafiles of this tablespace called ERMAN reside in +DEV diskgroup)
So, we first create a tablespace called ERMAN by creating all its datafiles on diskgroup +DEV. Then, we create a table named ERMAN in the tablespace called ERMAN . (lots of ERMANs.., I know :)

SQL> CREATE TABLESPACE ERMAN DATAFILE '+DEV';

Tablespace created.

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
+DEV/devrac/system01.dbf
+DEV/devrac/sysaux01.dbf
+DEV/devrac/undotbs01.dbf
+DEV/devrac/users01.dbf
+DEV/devrac/undotbs02.dbf
+DEV/devrac/datafile/erman.270.943623731


SQL> create table ERMAN tablespace ERMAN as select  * from dba_objectS;

Table created.

SQL> insert into ERMAN select * from ERMAN;

86293 rows created.

SQL> commit;

Commit complete.

SQL> insert into ERMAN select * from ERMAN;

172586 rows created.

6. We put our database (DEVRAC) to backup mode/hot backup mode and while it is in backup mode, we create a storage snapshot and then do a snapmirror update. After the snapmirror update, we create the luns on the target storage system (using the snapmirrored data) and map these luns to the target database node. After the lun mapping, we rescan the scsi on target node (Linux), and make ASM (running on target node) see these new luns (using oracleasm)

root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host0/scan
[root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host1/scan
[root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host2/scan
[root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host3/scan
[root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host4/scan

Disk /dev/sdn: 11.8 GB, 11811160064 bytes
64 heads, 32 sectors/track, 11264 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x42926436

   Device Boot      Start         End      Blocks   Id  System
/dev/sdn1               1       11264    11534320   83  Linux
Disk /dev/sdo: 12.9 GB, 12884901888 bytes
64 heads, 32 sectors/track, 12288 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xb900c4fd

   Device Boot      Start         End      Blocks   Id  System
/dev/sdo1               1       12288    12582896   83  Linux

Disk /dev/sdp: 14.0 GB, 13960740864 bytes
64 heads, 32 sectors/track, 13314 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x65f91faf

   Device Boot      Start         End      Blocks   Id  System
/dev/sdp1               1       13314    13633520   83  Linux

[root@targetsinglenode ~]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "DEV1"
Instantiating disk "DEV2"
Instantiating disk "DEV3" 

7. After making the snapmirrored disks available on the target host, we modify the init.ora, which we copy from the source. We modify it to exclude the RAC related parameters, because the target node is a single node.
After modifying the init.ora, we mount the diskgroup named +DEV to the target node's ASM and we startup the DEVRAC database as single node database in our target node.
Note that, after opening the database, we check the table named ERMAN in the target instance to ensure that it is replicated successfully.

[oracle@targetsinglenode ~]$ sqlplus "/as sysasm"

SQL*Plus: Release 11.2.0.4.0 Production on Wed May 10 14:26:35 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Automatic Storage Management option

SQL> alter diskgroup DEV mount;


SQL> startup mount;
ORACLE instance started.

Total System Global Area 2355015680 bytes
Fixed Size                  2255472 bytes
Variable Size             587203984 bytes
Database Buffers         1761607680 bytes
Redo Buffers                3948544 bytes
Database mounted.
SQL> alter database end backup;

Database altered.

SQL> alter database open;

Database altered.

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
+DEV/devrac/system01.dbf
+DEV/devrac/sysaux01.dbf
+DEV/devrac/undotbs01.dbf
+DEV/devrac/users01.dbf
+DEV/devrac/undotbs02.dbf
+DEV/devrac/datafile/erman.270.943623731

6 rows selected.

SQL> select count(*) from ERMAN;

  COUNT(*)
----------
    172586

SQL> alter system checkpoint;

Note: After opening the database, we monitor the alert logs and check the related log files for both RDBMS and GRID and check whether any errors recorded in them.. In conclusion, we don't see any errors.

8. Just in case, we use the replicated database a little bit. We do some read write tests by inserting some data using a loop, switching the log files and so on.. During these tests, we do not encounter any abnormalities.

SQL*Plus: Release 11.2.0.4.0 Production on Wed May 10 14:57:11 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

SQL> create table ERMAN2 as select * from ERMAN;
BEGIN
    FOR x IN 1 .. 10 LOOP
         INSERT INTO ERMAN2
         SELECT * from ERMAN;
    END LOOP;
END;
Table created.

SQL>   2    3    4    5    6
  7  /

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> commit;

Commit complete.

SQL> archive log all;
ORA-00271: there are no logs that need archiving
SQL> alter system checkpoint;

System altered.

SQL> alter system switch logfile;

System altered.

8. Actually, at this point the POC is successfully finished, but I want to give you a bonus info about the deletion of the databases and other things that we created during this POC.
Basically, we drop the databases using rman. We drop the diskgroups using the ASMCA and we clear these candidate disks using oracleasm(deletedisk).
You can see the "rman drop database method" in the the following example.. The method that I give in the following example is for a single node database, but it can be used in RAC environments as well.. ( note: in order to drop a RAC database using rman, the parameter named cluster_database must be set to "false" before executing the drop database command)

[oracle@targetsinglenode trace]$ sqlplus "/as sysdba"
SQL*Plus: Release 11.2.0.4.0 Production on Wed May 10 15:01:06 2017
Copyright (c) 1982, 2013, Oracle.  All rights reserved. 

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

SQL> shu immediaTE;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
SQL> startup mount exclusive;
ORACLE instance started.

Total System Global Area 2355015680 bytes
Fixed Size                  2255472 bytes
Variable Size             587203984 bytes
Database Buffers         1761607680 bytes
Redo Buffers                3948544 bytes
Database mounted.
SQL> alter system enable restricted session;

System altered.

[oracle@targetsinglenode trace]$ echo $ORACLE_SID
DEVRAC
[oracle@targetsinglenode trace]$ rman target /

Recovery Manager: Release 11.2.0.4.0 - Production on Wed May 10 15:04:12 2017

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: DEVRAC (DBID=680719421, not open)

RMAN> drop database including backups;

database name is "DEVRAC" and DBID is 680719421

Do you really want to drop all backups and the database (enter YES or NO)? YES

using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=25 device type=DISK
specification does not match any backup in the repository

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=25 device type=DISK
specification does not match any datafile copy in the repository
specification does not match any control file copy in the repository
specification does not match any control file copy in the repository
List of Archived Log Copies for database with db_unique_name DEVRAC
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - ---------
2       1    7       A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_7.271.943623819

3       1    8       A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_8.272.943623843

4       1    9       A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_9.273.943626473

7       1    10      A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_10.276.943628345

8       1    11      A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_11.277.943628349

9       1    12      A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_12.278.943628353

10      1    13      A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_13.279.943628355

11      1    14      A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_14.280.943628377

1       2    1       A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_2_seq_1.269.943619773

5       2    2       A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_2_seq_2.274.943626473

6       2    3       A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_2_seq_3.275.943626475

deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_7.271.943623819 RECID=2 STAMP=943623818
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_8.272.943623843 RECID=3 STAMP=943623842
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_9.273.943626473 RECID=4 STAMP=943626472
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_10.276.943628345 RECID=7 STAMP=943628345
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_11.277.943628349 RECID=8 STAMP=943628349
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_12.278.943628353 RECID=9 STAMP=943628352
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_13.279.943628355 RECID=10 STAMP=943628356
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_14.280.943628377 RECID=11 STAMP=943628377
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_2_seq_1.269.943619773 RECID=1 STAMP=943619772
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_2_seq_2.274.943626473 RECID=5 STAMP=943626473
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_2_seq_3.275.943626475 RECID=6 STAMP=943626474
Deleted 11 objects

database name is "DEVRAC" and DBID is 680719421
database dropped

Well. That is it :) it is working.. Please feel free to comment.

Friday, May 5, 2017

Fractured Blocks- ORA-01578 & tail checks

Recently dealed with an ORA-1578 error, which could be an indication of a Fractured Block..
Because of this error, the database could not be opened.
The complete Oracle error stack recorded in the trace file, was like the following;

Corrupt block relative dba: 0x0101342a (file 4, block 78890)
.....
........
.............
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 4, block # 78890)
ORA-01110: data file 4: '/ortest/testdata/system04.dbf'

So, without a need to do a further diagnostic, it was obvious that a block in the system04.dbf was corrupted. 
However; as the error was ORA-01578, this made me think that the problem was due to a physical corruption, which was a result of an OS or Hardware IO error.. 
In addition, this physical corruption seemed to be the cause of a fractured block.

I like to explain the Fractured block everytime it is asked, because i find it interesting.. It is somehow related with the other stacks below Oracle .. ( OS, Storage and so on)

As oracle IO is done through the OS. The size of a particular IO depends on the OS Block size, which is 512 bytes for Linux.

So, if we have a 8K Block Sized Oracle Database and if we modify one block of it; (oracle does its modifications on block level), then we actually do multiple IOs. (when we look to this incident from the OS perspective)

Almost all the backup consistency related solutions are built by considering this. For example, hot backup mode (Backup mode) is designed to guarantee the consistency, to guarantee the recovery eventhough there may be fractured blocks in the backed up files.

So, a fractured block means that the database block is incomplete/physically corrupted.

Suppose we are doing a DB block IO. It is 8K right? So, considering our OS block size is 512bytes, a single DB block IO means multiple OS IO (16 IOs..) . 

Suppose OS does 15 IOs and during the 16th IO it encounteres a problem and can not do the 16th IO.. 

Now suppose; Oracle can not even catch it, or we as Oracle couldn't get the return, or let's say; could not handle it.. (or OS says it is ok, altough it is not ok..)

At this moment, our Block is physically corrupt and it is inconsistent, since it has one part coming from the future , and one part coming from the past..

On the other hand; Oracle knows about this danger and it has mechanisms to check this.

In order to check this, Oracle updates a field in the end of each block. The size of this field is 4 bytes. 

Oracle writes down a number into these 4 bytes for checking the consistency of the tail of the block in the subsequent updates.

That is, Oracle updates this 4 byte field, which is located in the tail of each block, before writing down the block to disk completely.  (before giving block to OS -- for IO)

This value stored in a 4 byte field is called the tailchk..I t is a number, which is built by using the information stored in the block itself (which oracle reads from the head and remembers).

If the tailchk is aligned with the related data that is stored in block header, Oracle ensures that at least the tail of the block is consistent with the head.

Tailchk value is computed using the the last 4 digits of the SCN base, the Block Type and the SCN sequence Number.
Additional info can be found in Fractured Block explanation (Doc ID 1392417.1).

Note: The tailcheck only verifies if the header and the tail of the block have been written correctly. The complete(whole) block verification can be achieved using the DB_BLOCK_CHECKSUM..

So in our case, as seen below, the block type= 0 , last 4 digits of scn is 0001 and seq is 0x1..
So, our tailchk value must be -> 0x00010001, however; it is 0x19b90601, as you see below.
(--block type is 0 .. This is wrong in the first place, but still wanted to show you how we calculate the tailchk--)

Corrupt block relative dba: 0x0101342a (file 4, block 78890)
Bad header found during buffer read
Data in bad block:
 type: 0 format: 3 rdba: 0x0000000c
 last change scn: 0x056e.ea020001 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x19b90601
 check value in block header: 0x747
 computed block checksum: 0x8577
Reread of rdba: 0x0101342a (file 4, block 78890) found same corrupted data
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 4, block # 78890)
ORA-01110: data file 4: '/ortest/testdata/system04.dbf'

Also, computed block checksum approves that this block is corrupt, as it is different than 0x0.
Actually the block type is 0 here. Type 0 is an unknown type. So this blog is really a corrupt one. It is not only the computed block checksum that proves this.  (computed block checksum is meaningless actually.. because the type is 0) 
In addition, the content of block is actually for a different block. Rdba value is different than the requested dba value.
So this blog is both corrupted and misplaced.

The methods that Oracle suggests for fixing these kinds of problems, can be seen below;

Note 144911.1 RMAN : Block-Level Media Recovery - Concept & Example
Note 342972.1 How to perform Block Media Recovery (BMR) when backups are not taken by RMAN ORA-1578 Methods to Skip Block Corruption (Doc ID 2199133.1)

Above 2 methods rely on backups.. So we need to have RMAN or non-RMAN backups in order to fix these kinds of problems using the methods above.

However, what if we don't have backups?

Well, there are 4 things that I can say on this ->

1)We can recreate the index, if the fractured block is an index block.

2)We can use methods for skipping block corruption (documented in 2199133.1)

3)We can use "bbed" utility to update the consistency value in tail and computed block checksum properly in order to at least start our database. (using bbed at on our own risk) -- this method is a very advanced one.

4)There are cases for specific scenarios.. For example this one -> Error ORA-1578 reported on the database after changing compatible.rdbms >= 11 and resizing a datafile (Doc ID 965751.1)Taking a look to Oracle Support is always a good idea.

Tuesday, May 2, 2017

EBS -- OAF Page is not updatable - no errors received // caused by Accessibility features settings

Here is a strange issue encountered in an EBS 12.2 environment.(EBS 12.2.4)
It was related with a specific OAF Page (aka Self Service page) named GL_ASF_ALC_SUMMARY, which is shown in the following screenshot.


The problem was weird.. After rendering, the page just didn't respond to any of the user inputs.
We clicked on the buttons, tried to use LOVs and so on... We even tried using different browsers and different clients but couldn't make it work.
There were no errors in OAF logs, no errors in HTTP trace tools and no warning in our browsers.

In addition, the url produced for this page seemed to be okay..
There were no personalization for this page and the classes and everyting were there in the fileystem, and they were looking all good.

The url:
http://ermansrv:8000/OA_HTML/OA.jsp?page=/oracle/apps/gl/asflow/config/webui/GL_ASF_CONFIG_ALC_SUMMARY_PG&_ri=101&objectId=2070&objectType=PRIMARY&_ti=1375218639&retainAM=Y&addBreadCrumb=Y&OASF=GL_ASF_ALC_SUMMARY&oapc=27&oas=xMIKbnyHRvklsGwM8W3WcA..
However, the page just didn't work..
Okay.. I will keep this short..
Guess what the cause was..

"The accessibility features"...

The users encountering this problem had "Screen Reader Optimized" set in their user preferences. ( as the Accessibility Feature setting ) In other words, the profile option named "Self Service Accessibility Features" was set to Screen Reader Optimized , for these users.

Note:  when we set a preference, EBS actually updates the related Profile Options for our user. (in our user level)

I said I keep it short ,but I juut can't directly give the solution without giving some extra info :)

The Accessibility Features preference is used for better supporting the use of assistive technologies, such as screen readers for blind users in EBS.

When "Screen Reader Optimised" mode is set, EBS builds the pages in a way to better support the screen readers.
Here is the defition for values that can be set for the Screen Reader Optimized user preference:

* Standard Accessibility - pages are accessible to users using assistive technologies.
* Screen Reader Optimized - pages are optimized for screen readers. This may degrade the output for a sighted user.
* None - pages include behaviors that are not accessible to users using assistive technologies"

The screen reader is a software for a blind or visually impaired person to use the computer screen. The screen readers usally read the text diplayed in the screen and convert it into a form those users can understand.. tactile, auditory or a combination of both..

I have tested this screen reader thing using the ChromeVox, which is an add-on for the Chrome Browsers, and I found it interesting, as it actually understand the mouse moves, interprets the screen and reads the contents accordingly.

Okay after giving this extra info, let's see what the solution was:

Solution:

Query the system profile, Self Service Accessibility Features
Update the value to None (or Null) for the User (user level) that is getting this error.
Save and retest Retest

Alternative solution:

Go to Preferences - > Accessibility feature and set it to None
Apply the changes
Retest the issue
One more thing.. You may ask why did this error encountered? 
This question is normal because screen reader optimizer mode should not break anyting in EBS, normally..
However, in real world it does. It is a bug ofcourse and the solutions above are just  the workarounds.
I believe it is a bug of the page that is failing.. (in this case GL_ASF_ALC_SUMMARY)
So I think, the module should be upgraded to the latest code level in order to get the fix . (In this case, GL should be upgraded..)

Anyways, I' m giving you the related MOS documents here and I promise, I will revisit this post if I will find something more about it. 

By that time, you have the workarounds :)
  • Checklist for Assistive Technologies Issues with Oracle E-Business Suite (Doc ID 1100934.1)
  • http://www.oracle.com/us/corporate/accessibility/ebs-accessibility-white-paper-163185.pdf (E-Business Suite Accessibility , an Oracle Whitepaper)

Wednesday, April 26, 2017

Linux -- using the screen for uninterrupted command operations (especially when you don't have Vnc)

This is a very old topic actually. Especially Linux folks will not pay attention to this one (because they probably already know it for a long time ago). However, in Oracle DBA world, it is a valuable topic.
We always use vnc for uninterrupted operations.(sometimes nohup as well)
That's what said to us. Especially for critical and long running OS operation (including copy, move , manual database upgrades and Rman operations) , we most of the time use VNC and get a X session , which is not interrupted, even if we lose our connections to our servers.

the "screen" command, on the other hand; can be thought as an alternative to the VNC connections in a way.
Ofcourse screen doesn't have the capability (it is not designed for this aim) to give us a X session (GUI), but it let us to have uninterrupted command operations (like the VNC does).
Besides screen program let us continue with our shell/terminal even after we disconnect from the server. That is even after disconnecting and reconnecting , we can still see all the shell outputs which have been produced since we have started working in our screen terminal.

Following filtered info is from the man page of the screen program;

Screen  is  a  full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells). 
When screen is called, it creates a single window with a shell in it (or the specified command) and then gets out of your way so that you can use the program as you normally would. 
Programs continue to run when their window is currently not visible and even when the whole screen session is detached from the userâs terminal.

Well... screen is an easy to install and use tool. (In case of Oracle Linux, screen can be installed using "yum install screen" command)

Here is a demo for you ->

We connect to our server and execute the screen command as follows;
When we execute the screen command, a screen session/terminal gets created as shown below;
(I showed the pids of the sessions using echo $$ to let you see the terminal's pid gets changed when we execute the screen command).


In the screen session, we first execute the ls -al command and see its output. Then, we execute our next command, which is "du -sh / ".. While our command is working, we close our terminal (by closing our terminal program to mimic an unexpected disconnection).



After we close our terminal program (which is an SSH client in my case), we relogin to our server and check our ongoing screen sessions using "screen -ls" command, as shown below;



Once we identify our screen session using its id (pid actually, and it is 3729 in this case), we use "screen -r" command to attach our ongoing screen session, as depicted below;


After we execute the screen -r command, we attach to our screen terminal and see our du -sh operation is still on-going. (we even see the output of ls -al command , that we executed earlier)


If we decide to kill our screen terminal, we can always use exit command while we are in the screen terminal prompt. Alternatively, we can use "screen -S <pid> -X kill" command  to kill our screen terminal, without even attaching to it.


The last thing that I want to mention in this blog post is, the difference between the nohup command and the screen command. 
That is, screen is not only used for daemonizing a process. It works more like a terminal window manager. For instance, we can disconnect from the screen terminal while our command is working, and then reconnect to that terminal in case our command requires an input from us. Also, we can reconnect to the terminal and check the command outputs , which were produced while we are not connected to the server at all and so on.. So, screen and nohup are not the same thing.

Tuesday, April 25, 2017

Top 60 Oracle Blogs And Websites for Oracle DBAs ( Erman Arslan's Oracle Blog is on the list!)

I'm proud to be on the TOP-60 Oracle Blogs and Websites list.
I have worked hard for this blog from the first day and now it is a big pleasure to be on the list with the big names.
I would like to thank you, my followers and readers... Also I would like to thank you, my forum users.
Your interest, comments and feedbacks have been my biggest motivation both for writing this stuff and for all the reseaches and lab works that I have done for supplying an unique content in this blog.

Thursday, April 20, 2017

EBS 12.2 -- Blank page problem viewing concurrent out and logs, Hostname with Capital Letters, FNDWRR & node_info.txt

Recently encountered a problem on a new EBS implementation.
The problem appeared when the customer wanted to see the outputs and logs of the concurrent programs.
When they tried to open a concurrent log in the browser, a blank page was displayed.

In order to solve this issue, we did almost all the diagnostics, such as;
  • Enabling sqlnet trace in the apps tier listener.ora
  • Running "FS Diagnostic Test Program" with the parameter MODE=FULL 
The only error that we saw after running FS Diag Test Program was the following;

-- FNDFS did not return an error code 
-- FNDWRR did not create a debug logfile. 
-- FNDFS did not create a debug logfile. 
-- Displaying first 25 lines returned by FNDWRR.exe: 
-- BEGIN FILE ------------------------------------- 
-- END FILE --------------------------------------- 
-- ERROR: Unable to transfer file successfully! 
  • Setting the profile "Viewer:Text" to have no value (remove 'browser')  and retesting viewing a log file. This was to bypass FNDWRR code process and pass the FNDFS output directly to the built-in viewer.
This was almost working. We could display the text output, but we could not display any other output formats such as pdf or xml.
  • We did the tests documented in "How to Troubleshoot FNDFS and FNDWRR (Problems Viewing Concurrent Request Output) ( Doc ID 847844.1 )".  FNDWRR could display output from OS succesfully.
  • Checked Apache logs  for CGI entries (FNDWRR is a CGI program) /access_log.* error_log.* etc.. We could only see HTTP 200 there. GET /OA_CGI/FNDWRR.exe?temp_id=796020226 HTTP/1.1" 200 - 
  • Checked database alert log (just in case)
  • Generated a Fiddler Trace and reviewed it. No errors
  • Applied  "Patch 17075726:R12.FND.C - SUPPORT CONCURRENT MANAGER ON A SERVER CONFIGURED TO USE VIRTUAL HOSTING " , didn't fix.
  • We disabled all the anti-virus and proxy checks in the client side, but issue remained.
  • We created a SR (Oracle Support Service Request) , but it didn't help much. 
  • Did the browser configurations below, but they didn't help at all.

  • Set up Oracle E-Business Suite to run through the 'Trusted Sites' zone with a 'Medium' Security Setting.
    Tools -> Pop-up Blocker -> Pop-up Blocker Settings -> Allowed Sites
    Tools -> Internet Options -> Privacy -> Pop-up Blocker: Settings -> Allowed sites
    Selected (Checked) the following values:
    Tools -> Internet Options -> Security -> <zone> (e.g. Trusted Sites) -> Enable Protected Mode
    Tools -> Internet Options -> Advanced -> Enable 64-bit processes for Enhanced Protection Mode

THE CAUSE AND THE SOLUTION:

Well.. After all these diagnostics works, guess what we realized?

"The server name was written in capital-letters."  

Yes, the issue was caused by that.. 

The hostname of the EBS servers should not be written in capital letters. 

This is not supported and this issue is one of the causes of this support lack.. (note that, this issue is not documented, so that's why I m writing it :) )

The autoconfig and postclone are also affected by this problem, but there are workarounds for them. 

However; you can't get rid of FNDWRR (viewing concurrent log and out) problems, if you have a hostname with capital letters.

As for the solution, we updated $EBS_APPS_DEPLOYMENT_DIR/oacore/APP-INF/node_info.txt
and modified the lines which were including the hostname. 

We made all hostnames written in lower-case letters and the issue got resolved even without restarting anything.

Note:

txkFNDWRR.pl reads the node_info.txt and set the env variables accordingly. Using the $OA_HTML environment variable, txkFNDWRR.pl executes the FNDWRR, which in turn makes the conc log or out be displayed in browser screens.

I wrote this blog post to identify the diagnostics actions that need to be done while diagnosing such a problem and to show you the throubles that we may find ourselves in, if we have an unsupported configuration.

Tuesday, April 18, 2017

GRC -- GRC & EBS implementation //errors and solutions

Recently installed GRC 8.6.6, PCG 7.3.3 and PEA 8.6.6 using the Intallation guides delivered with the products to implement GRC with an EBS 12.1.3 instance.

The main document to be followed for doing this type of an installation was Enterprise Governance Risk and Compliance (EGRC) Product Information (Doc ID 1084596.1) and basically what we did was the following;
  • Installing an 11g Oracle Database for GRC
  • Installing a Weblogic 12C for GRC
  • Installing ADR 12C on Weblogic
  • Installing database for Schemas for GRC using RCU utility
  • Creating a Weblogic Domain for GRC
  • Deploying GRC application using Weblogic Console
  • Upgrading GRC
  • Installing PCG on EBS
  • Installing critical PCG Patches
  • Installing PAE on EBS
  • Fixing any errors reported by the functional team
It was not the thing that is frequently done in our customer environments, so maybe that's why we encountered some errors and spent some time to find their solutions, which I find useful to share with you.

We basically encountered 3 errors. 2 of them were caused by the complexity of the documentation and one of them was directly related with a corrupted data.

Let's see what those errors and their solutions are;

ERROR 1 - unable to synchronize access in grc 8.6.6

The connection test that we did for the datasource that we created on GRC, was succesful; but the synchronize acces job (in GRC application) failed with the following error. (error reported in grc.log)

ERROR [ExecutorThread-11] DataSourceService:1343 Error while setting ETL completed
java.lang.RuntimeException: Failed to serialize the object:
Descriptor Exceptions:
---------------------------------------------------------
Exception [EclipseLink-59] (Eclipse Persistence Services - 2.3.1.v20111018-r10243): org.eclipse.persistence.exceptions.DescriptorException
Exception Description: The instance variable [thingSavedStates] is not defined in the domain class [oracle.apps.grc.domain.datasource.SourceSyncState$SourceSyncStateBuilder], or it is not accessible.
Internal Exception: java.lang.NoSuchFieldException: thingSavedStates
Mapping: org.eclipse.persistence.oxm.mappings.XMLCompositeCollectionMapping[thingSavedStates]
Descriptor: XMLDescriptor(oracle.apps.grc.domain.datasource.SourceSyncState$SourceSyncStateBuilder --> [])
Runtime Exceptions:
---------------------------------------------------------
at oracle.apps.grc.domain.datasource.SourceSyncState.toXML(SourceSyncState.java:135)
at oracle.apps.grc.dataservices.dao.impl.spring.datasource.DataSourceDaoSpr.updateSyncState(DataSourceDaoSpr.java:1249)
at oracle.apps.grc.dataservices.dao.impl.spring.datasource.DataSourceDaoSpr.setEtlCompleted(DataSourceDaoSpr.java:1946) 


Solution:

USER_MEM_ARGS should be updated correctly in setDomainEnv.sh.
As documented in:
http://docs.oracle.com/cd/E51797_01/doc.8651/e52268.pdf
page : 2-14

Action plan:

1. Stop application Server
2. Backup setDomainEnv.sh file
3. Update the USER_MEM_ARGS parameter in setDomainEnv.sh file . The modification should be done between the comment and if statement.

# IF USER_MEM_ARGS the environment variable is set, use it to override ALL MEM_ARGS values 
<<<<<Changes should come here>>>>>>>
case "${SERVER_NAME}" in "AdminServer")
USER_MEM_ARGS="-Xms2048M –Xmx2048M" ;;
...
....
USER_MEM_ARGS="${USER_MEM_ARGS} -XX:PermSize=256m -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=128M -Djava.awt.headless=true -Djbo.ampool.maxpoolsize=600000 -Dfile.encoding=UTF-8 -Djavax.xml.bind.context.factory=com.sun.xml.internal.bind.v2.ContextFactory" 
 <<<<<Changes should come here>>>>>>>
if [ "${USER_MEM_ARGS}" != "" ] ; then

4. Ensure the eclipselink-2.3.1.jar file exist in the below path. <MW_HOME>/grc866/grc/WEB-INF/lib/ location

5. Start the application server

6.Retest the issue

ERROR 2 - unable to syncronize access in grc 8.6.6 

The connection test that we did for the datasource that we created on GRC, was succesful; but the syncronize acces job (in GRC application) failed with the following error. (error reported in grc.log)

DEBUG [EtlExtractor-1254779240] GrcLogPrintStream:73 STDOUT (oracle.core.ojdl.logging.ConsoleHandler:118) <Apr 6, 2017 11:10:11 AM EEST> <Error> <Default> <ODI-1217> <Session TCG_SCEN_Users_21 (85154) fails with return code 7000.
ODI-1226: Step TCG_INTR_Users_21 fails after 1 attempt(s).
Caused by: ODI-1240: Flow TCG_INTR_Users_21 fails while performing a Integration operation. This flow loads target table Users.
Caused by: org.apache.bsf.BSFException: exception from Jython:
Traceback (most recent call last):
File "<string>", line 50, in <module>


Solution:

PRE_CLASSPATH was not properly set in the setDomain.env.

As documented in "http://docs.oracle.com/cd/E51797_01/doc.8651/e52268.pdf
page : 2-2" , the PRE_CLASSPATH should be updated properly.

Action Plan:

1. Locate the following lines in the file: 

if [ "${PRE_CLASSPATH}" != "" ] ; then CLASSPATH="${PRE_CLASSPATH}${CLASSPATHSEP}${CLASSPATH}" 
export CLASSPATH 
fi 

2. Add the following before the above lines: 

PRE_CLASSPATH="/grc865/grc/WEB-INF/lib/jython- 2.5.1.jar:${PRE_CLASSPATH}" 
export PRE_CLASSPATH

Note: Replace with the actual path to your middleware home.

ERROR 3 - unable to synchronize access in grc 8.6.6 

The connection test that we did for the datasource that we created on GRC, was succesful; but the synchronize acces job (in GRC application) failed with the following error. (error reported in grc.log)

DEBUG [EtlExtractor-976389891] DataSourceDaoSpr:1234 updateSyncState(oracle.apps.grc.domain.datasource.SourceSyncState@22489656) 
ERROR [EtlExtractor-976389891] AvailableResource:173 myBlocks null 
ERROR [EtlExtractor-976389891] LocalEtlTcgWriter:412 A problem occurred in LocalEtlTcgWriter.writeData: 
java.lang.NullPointerException: myBlocks null 
at oracle.apps.odin.reasonerio.file.page.AvailableResource.askBlock(AvailableResource.java:174) 
at oracle.apps.odin.reasonerio.file.page.PageManager.createNewPage(PageManager.java:92) 
at oracle.apps.odin.reasonerio.file.page.PagingGrccChannelManager.expandChannel(PagingGrccChannelManager.java:498) 
at oracle.apps.odin.reasonerio.file.page.PageChannelController.<init>(PageChannelController.java:79) 
at oracle.apps.odin.reasonerio.file.page.PagingGrccChannelManager.createChannel(PagingGrccChannelManager.java:456) 
at oracle.apps.grc.reasonerio.graph.blockbytype.writer.BBTGraphWriter.getAttributeWriter(BBTGraphWriter.java:148) 
at oracle.apps.grc.appservices.connector.LocalEtlTcgWriter.writeNode(LocalEtlTcgWriter.java:669) 
at oracle.apps.grc.appservices.connector.LocalEtlTcgWriter.persistResults(LocalEtlTcgWriter.java:395) 
at oracle.apps.grc.appservices.connector.LocalEtlExtractor.persistResults(LocalEtlExtractor.java:540) 
at oracle.apps.grc.appservices.connector.LocalEtlExtractor.retrievePersistResultsLocal(LocalEtlExtractor.java:349) 
at oracle.apps.grc.appservices.connector.LocalEtlExtractor.retrievePersistResults(LocalEtlExtractor.java:239) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at org.python.core.PyReflectedFunction.__call__(PyReflectedFunction.java:175) 
at org.python.core.PyObject.__call__(PyObject.java:355) 
at org.python.core.PyMethod.__call__(PyMethod.java:215) 
at org.python.core.PyMethod.instancemethod___call__(PyMethod.java:221) 
at org.python.core.PyMethod.__call__(PyMethod.java:206) 
at org.python.core.PyObject.__call__(PyObject.java:381) 
at org.python.core.PyObject.__call__(PyObject.java:385) 
at org.python.pycode._pyx0.f$0(<string>:51) 
at org.python.pycode._pyx0.call_function(<string>) 
at org.python.core.PyTableCode.call(PyTableCode.java:165) 
at org.python.core.PyCode.call(PyCode.java:18) 
at org.python.core.Py.runCode(Py.java:1204) 
at org.python.core.Py.exec(Py.java:1248) 
at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:172) 
at org.apache.bsf.engines.jython.JythonEngine.exec(JythonEngine.java:144) 
at com.sunopsis.dwg.codeinterpretor.SnpScriptingInterpretor.execInBSFEngine(SnpScriptingInterpretor.java:322) 
at com.sunopsis.dwg.codeinterpretor.SnpScriptingInterpretor.exec(SnpScriptingInterpretor.java:170) 
at com.sunopsis.dwg.dbobj.SnpSessTaskSql.scripting(SnpSessTaskSql.java:2472) 
at oracle.odi.runtime.agent.execution.cmd.ScriptingExecutor.execute(ScriptingExecutor.java:47) 
at oracle.odi.runtime.agent.execution.cmd.ScriptingExecutor.execute(ScriptingExecutor.java:1) 
at oracle.odi.runtime.agent.execution.TaskExecutionHandler.handleTask(TaskExecutionHandler.java:50) 
at com.sunopsis.dwg.dbobj.SnpSessTaskSql.processTask(SnpSessTaskSql.java:2913) 
at com.sunopsis.dwg.dbobj.SnpSessTaskSql.treatTask(SnpSessTaskSql.java:2625) 
at com.sunopsis.dwg.dbobj.SnpSessStep.treatAttachedTasks(SnpSessStep.java:577) 
at com.sunopsis.dwg.dbobj.SnpSessStep.treatSessStep(SnpSessStep.java:468) 
at com.sunopsis.dwg.dbobj.SnpSession.treatSession(SnpSession.java:2128) 
at com.sunopsis.dwg.dbobj.SnpSession.treatSession(SnpSession.java:1930)

Solution:

While working to solve the issues (ERROR 1 and ERROR 2), we tried to run the synchronize access job again and again. After solving those issues,  we realized that; those failed tries corrupted the ETL data.

Action Plan:

1. Take backup of GRC environment database, WLS  filesystem, ETL repository filesystem, GRC Reports filesystem 
Note: your ETL Repository and GRC Reports locations can be found in the GRC application 
(Navigator > Setup and Administration > Manage Application Configuration) 

Example ETL repository: /Oracle/EGRCC/grcc_etl 
Example Report location: /Oracle/EGRCC/grcc_rep 

2. Stop GRC application server 

3. Kill any pending java processes(if they are still active)

4. Clear cache

cd $MW_HOME/user_projects/domains//bin 
rm -rf ..../servers/AdminServer/tmp/* 
rm -rf ..../servers/AdminServer/logs/* 
rm -rf ..../servers/AdminServer/cache/* 

5. Stop and start GRC Database 

6. Drop the SNP_* tables (ODI repo) 
--Drop all the table with name starting with SNP_. 
Check the following query returns 0 rows after dropping the tables;
select count(*) from all_tables where table_name like 'SNP_%'; 

7. Delete "temp.repository" and "raw" directories and their sub directories under home/grc_etl(Example ETL repository: /Oracle/EGRCC/grcc_etl) 
--"DO NOT touch "persistence"
--application should be shut down for deleting these directories.

9. Start the GRC application server 

10. Run Access and Transaction synchronization as the first step after the GRC application is started.