Erman Arslan's Oracle Blog: August 2016

Friday, August 26, 2016

RDBMS -- SYSAUX tablespace full, huge sized "WRI$_OPTSTAT_HISTHEAD_HISTORY" , DBMS_STATS.PURGE_ALL demo and shrink tablespace

You may encounter a tablespace problem in SYSAUX tablespace.
Actually, altough you don't have a disk size problem, you may have an unnecessarly big sized SYSAUX tablespace and you may not pay attention to.
In this blog post, I will explain one of the most likely causes of having a big sized SYSAUX tablespace, and also try to explain how to get rid of them, by purging the related content from the Oracle Database.

The things is that , the occupied space in SYSAUX tablespace is mostly caused by optimizer history statistics tables.

Here is an example output produced, when we query and sort the objects residing in the SYSAUX tablespace by their sizes;

TABLESPACE_NAME SEGMENT_NAME MB

SYSAUX WRI$_OPTSTAT_HISTHEAD_HISTORY 47,264
SYSAUX I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST 42,150
SYSAUX I_WRI$_OPTSTAT_HH_ST 22,497

As seen, in the first place there is a object statistics history related table and the follower are the related indexes.

The sizes are huge (considering this is an Oracle EBS database sized almost 1 TB and have only few customizations)

Also, altough this size of statistics history can be created because of frequent statistics collection, it is not acceptable.

In fact, MMON process should purge these statistics accordint to the default retention, which is 31 days but it seem it can not. MMON performs these purge activities automatically, but it has a limit of 5 minutes to perform these activities. So if the purging takes more than 5 mins, then the activities are aborted and as a result the stats are not purged.

There are several bugs records about this situation already:

Document 10279045.8 Slow Statistics purging (SYSAUX grows)
Document 8553944.8 SYSAUX tablespace grows
Document 14373728.8 Bug 14373728 - Old statistics not purged from SYSAUX tablespace
Document 11869207.8 Improvements to archived statistics purging / SYSAUX tablespace grows

The similar issue is adressed in Oracle Support document :"SYSAUX Grows Because Optimizer Stats History is Not Purged (Doc ID 1055547.1)"

There are 3 things to do for getting rid of this size, actually to purge the optimizer stats history records.

1) Applying patch 14373728, which superseded 11869207 'i supersede ediyor. The patch 11869207 was faulty anyways. The new patch:14373728, which is an OPTIMIZER overlay patch, can be applied.. This patch is for MMON, the automatic purge.

2) If we can't apply the patch 14373728, we can go ahead and purge the optimizer statistics manually;

begin
for i in reverse 10..100
loop
dbms_stats.purge_stats(sysdate-i);
end loop;
end;
/

This takes time? Then let's see our 3rd solution alternative;

If the patch 1027045 (the related option comes with this patch) is already applied , we can go ahead and use DBMS_STATS.PURGE_ALL to purge all our historical objects statistics. DBMS_STATS.PURGE_ALL just truncates the related table, so all purge activity takes only 5-10 seconds to finish.

The tradeoff of this type of purging can be the inability to make forensic sql performance analysis and corrections accordingly, but in most of the customer environments these operations are done quire rare right? :)

Following is a demo of running DBMS_STATS with PURGE_ALL argument;

Note that, we also change the retention period from the default(31 days) to 15 days.

Initially, the tables are fully loaded. The SYSAUX tablespace is 41 gigs(which is used almost fully) and the WRI$_OPTSTAT_HISTHEAD_HISTORY s almost 14 gigs as seen below;

SQL> set linesize 1000;

SQL> select dbms_stats.get_stats_history_retention from dual;

GET_STATS_HISTORY_RETENTION

---------------------------

SQL> select dbms_stats.get_stats_history_availability from dual;

GET_STATS_HISTORY_AVAILABILITY

---------------------------------------------------------------------------

23-JUL-16 12.23.12.224357000 PM +03:00

SQL> select * from (select owner, segment_name, bytes/(1024*1024) from dba_segments where owner='SYS' order by bytes desc) where rownum <10;

OWNER SEGMENT_NAME BYTES/(1024*1024)

------------------------------ --------------------------------------------------------------------------------- -----------------

SYS WRI$_OPTSTAT_HISTHEAD_HISTORY 14863

SYS I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST 11179

SYS I_WRI$_OPTSTAT_HH_ST 7074

SYS SOURCE$ 5481.64063

SYS I_SOURCE1 4280.03125

SYS IDL_UB1$ 2931.52344

SYS I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST 2224

SYS WRI$_OPTSTAT_HISTGRM_HISTORY 1473

SYS IDL_UB2$ 1087.52344

SQL> select nvl(df.tablespace_name, nvl(fs.tablespace_name,'UNKOWN')) name,

2 to_char(kbytes_alloc,'999,999,990') kbytes,

3 to_char(kbytes_alloc-nvl(kbytes_free,0),'999,999,990') used,

4 to_char(nvl(kbytes_free,0),'999,999,990') free,

5 to_char(((kbytes_alloc-nvl(kbytes_free,0))/kbytes_alloc)*100,'990.9999') pct_used,

6 to_char(nvl(largest,0),'999,999,990') largest

7 from ( select sum(bytes)/1024 Kbytes_free,

8 max(bytes)/1024 largest,

9 tablespace_name

10 from dba_free_space

11 where tablespace_name='SYSAUX'

12 group by tablespace_name ) df,

13 ( select sum(bytes)/1024 Kbytes_alloc,

14 tablespace_name

15 from dba_data_files

where tablespace_name='SYSAUX'

16 17 group by tablespace_name ) fs;

NAME KBYTES USED FREE PCT_USED LARGEST

------------------------------ ------------ ------------ ------------ --------- ------------

SYSAUX 41,943,040 41,878,016 65,024 99.8450 960

NOW WE PURGE!!!

SQL> exec DBMS_STATS.PURGE_STATS(DBMS_STATS.PURGE_ALL);

PL/SQL procedure successfully completed.

SQL> select dbms_stats.get_stats_history_retention from dual;

GET_STATS_HISTORY_RETENTION

---------------------------

SQL> select dbms_stats.get_stats_history_availability from dual;

GET_STATS_HISTORY_AVAILABILITY

---------------------------------------------------------------------------

24-AUG-16 11.16.26.195234000 AM +03:00

SQL> select * from (select owner, segment_name, bytes/(1024*1024) from dba_segments where owner='SYS' order by bytes desc) where rownum <10;

OWNER SEGMENT_NAME BYTES/(1024*1024)

------------------------------ --------------------------------------------------------------------------------- -----------------

SYS SOURCE$ 5481.64063

SYS I_SOURCE1 4280.03125

SYS IDL_UB1$ 2931.52344

SYS IDL_UB2$ 1087.52344

SYS ARGUMENT$ 1011.33594

SYS _SYSSMU9_2885769297$ 849.125

SYS _SYSSMU2_735814084$ 839.125

SYS _SYSSMU4_179149818$ 839.125

SYS _SYSSMU8_751394697$ 836.125

9 rows selected.

SQL> select nvl(df.tablespace_name, nvl(fs.tablespace_name,'UNKOWN')) name,

2 to_char(kbytes_alloc,'999,999,990') kbytes,

3 to_char(kbytes_alloc-nvl(kbytes_free,0),'999,999,990') used,

4 to_char(nvl(kbytes_free,0),'999,999,990') free,

5 to_char(((kbytes_alloc-nvl(kbytes_free,0))/kbytes_alloc)*100,'990.9999') pct_used,

6 to_char(nvl(largest,0),'999,999,990') largest

7 from ( select sum(bytes)/1024 Kbytes_free,

8 max(bytes)/1024 largest,

9 tablespace_name

from dba_free_space

10 11 where tablespace_name='SYSAUX'

12 group by tablespace_name ) df,

13 ( select sum(bytes)/1024 Kbytes_alloc,

14 tablespace_name

15 from dba_data_files

16 where tablespace_name='SYSAUX'

17 group by tablespace_name ) fs;

NAME KBYTES USED FREE PCT_USED LARGEST

------------------------------ ------------ ------------ ------------ --------- ------------

SYSAUX 41,943,040 1,944,960 39,998,080 4.6371 1,434,624

"After the purge we have 39G free ..."

Now, we change the retention-->

SQL> exec dbms_stats.alter_stats_history_retention(15);

PL/SQL procedure successfully completed.

SQL> select dbms_stats.get_stats_history_retention from dual;

GET_STATS_HISTORY_RETENTION

---------------------------

"After the purge we have 3.5 MB for optimizer statistics history related objects..."

SQL> select occupant_desc, space_usage_kbytes from v$sysaux_occupants where OCCUPANT_DESC like '%Statistics%';

OCCUPANT_DESC SPACE_USAGE_KBYTES

---------------------------------------------------------------- ------------------

Server Manageability - Optimizer Statistics History 3584

So, with this demo, we have seen how to empty the SYSAUX tablespace by purging the optimizer statistics history table, so what 's next; I say you what... Now we need to resize our SYSAUX tablespace by giving back the newly emerged free space.

In order to do this; we need to shrink or move the tables and makes sure the continous empty space is configured to be in the end of the datafiles of the SYSAUX tablespace..

So in order to accomplish this, we can follow use the Tanel Poder ' s script. This script will check the current situation of the datafiles and generate the recuired alter operation statements for resizing the database ..

Ref: http://blog.tanelpoder.com/files/scripts/trim_database.sql

with query as ( select /*+ NO_MERGE MATERIALIZE */ file_id, tablespace_name, max(block_id + blocks) highblock from dba_extents group by file_id, tablespace_name ) select 'alter database datafile '|| q.file_id || ' resize ' || ceil ((q.highblock * t.block_size + t.block_size)/1024) || 'K;' cmd from query q, dba_tablespaces t where q.tablespace_name = t.tablespace_name;

But if the free space is not in the end of the datafiles, this script is useless . So in that case, we need to reorganize the objects residing the SYSAUX tablespace to make the free space to be located in the end of the SYSAUX datafiles. In order to do this, we can follow the article published Jonathan Levis -> Ref: https://jonathanlewis.wordpress.com/2010/02/06/shrink-tablespace

That is it for this topic. I hope it will help you.

Friday, August 19, 2016

RAC/Exadata -- Scan listener registration problem, EBS 11i , autoconfig TNS-12514

You may encounter a scan listener registration problem after enabling autoconfig on your EBS database tier. Altough , everything including remote_listener, local_listener parameters as well as network related oracle conf file such as tnsnames.ora,sqlnet.ora and listener.ora may seem perfect, the dynamic registration from EBS database to the scan listener(pointed by the remote_listener parameter) can be problematic.
This problem is encountered in an Exadata X6 environment, on which we hosted an EBS 11i environment with a 12C Oracle database.

The problem is directly related with autoconfig, as autoconfig in some cases can not handle RAC specific configuration perfectly well.
In this case, autoconfig creates a tns entry named as the scan_name:port and actually creates it wrongly by putting the vips as for the host entry. Like below;

SCAN_NAME:SCAN_PORT =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = exa1vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = exa2vip)(PORT = 1521))

The result is ORA-12514 / TNS-12514 when trying to connect to scan name and port .

The solution is; removing the scan_name:scan_port tns entry from the tnsnames.ora , as there should not be any tns entry for registering the scan listener, no need.

After removing the tns entry, it is better to refill the remote_listener and make the pmon to register the database to all available listeners using alter system register; as shown in the following example;

sqlplus / as sysdba
alter system set remote_listener='' scope=both sid='*';
alter system set remote_listener='exa-rac:1521' scope=both sid='*';
alter system register;

Also, it is important to keep in mind that this solution should be reimplemented everytime the autoconfig is run.

An alternative maybe, modifying the autoconfig templates or codes to fix this issue, but it won't be supported and actually I dont have time for this , these days :)

Tuesday, August 16, 2016

Exadata - Cloning Oracle Home and installing bundles patch & psu in to the database afterwards, datapatch

If you clone a database oracle home( to create a new home -- only home...) in Exadata, you need to know that, your cloned home may be patched with the latest bundle patches & PSUs applied to your source oracle home, but any database that you will create from this newly cloned home will not have the bundle patches inside.

So, the bundle patches that are applied to the source home can not be cloned to the cloned home directly, as the bundle also writes to the database.

Well, what we should after such an oracle home cloning operation is, to install the Bundle in to our newly created database using the datapatch utility residing in our newly created oracle home (in opatch directory)

This utility will do the db work of bundle and make our databases created using our new cloned oracle home to be aligned with the RDBMS binaries in terms of Bundle levels.

Here , I m making a demo of applying the datapatch and making my newly created database on my new cloned oracle home to have Bundles inside.
Note that: datapatch requires the database to be in upgrade mode.

SQL> startup nomount;
ORACLE instance started.

Total System Global Area 2.5770E+10 bytes
Fixed Size 6870952 bytes
Variable Size 5894412376 bytes
Database Buffers 1.9730E+10 bytes
Redo Buffers 138514432 bytes
SQL> alter system set cluster_database=false scope=spfile;

System altered.

SQL> shu immediaTE;
ORA-01507: database not mounted
ORACLE instance shut down.

[oracle@exadb01 OPatch]$ srvctl stop database -d TEST70
[oracle@exadb01 OPatch]$ sqlplus "/as sysdba"

SQL*Plus: Release 12.1.0.2.0 Production on Mon Aug 15 16:02:31 2016
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to an idle instance.

SQL> startup upgrade;
ORACLE instance started.

Total System Global Area 2.5770E+10 bytes
Fixed Size 6870952 bytes
Variable Size 5894412376 bytes
Database Buffers 1.9730E+10 bytes
Redo Buffers 138514432 bytes
Database mounted.
Database opened.
SQL> exit

[oracle@exadb01 OPatch]$ ./datapatch -verbose
SQL Patching tool version 12.1.0.2.0 on Mon Aug 15 16:02:53 2016
Copyright (c) 2015, Oracle. All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_166016_2016_08_15_16_02_53/sqlpatch_invocation.log

Connecting to database...OK
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Patch 22674709 (Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)):
Installed in the binary registry only
Bundle series DBBP:
ID 160419 in the binary registry and not installed in the SQL registry

Adding patches to installation queue and performing prereq checks...
Installation queue:
Nothing to roll back
The following patches will be applied:
22674709 (Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016))
22806133 (DATABASE BUNDLE PATCH: 12.1.0.2.160419 (22806133))

Installing patches...
Patch installation complete. Total patches installed: 2

Validating logfiles...
Patch 22674709 apply: SUCCESS
logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/22674709/20057886/22674709_apply_TEST70_2016Aug15_16_04_01.log (no errors)
Patch 22806133 apply: SUCCESS
logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/22806133/19983161/22806133_apply_TEST70_2016Aug15_16_04_02.log (no errors)
SQL Patching tool complete on Mon Aug 15 16:04:30 2016

that s it; now I can query my database and see the Bundles 's applied.

Here is a query for you:

For 12C:

select
TO_CHAR(to_timestamp(substr(action_time,1,30),'DD-MON-YY HH.MI.SS.ff AM'),'DD.MM.YYYY HH24:MI:SS') AS ACTION_TIME,
--substr(patch_id,1,10) as patch_id,
--substr(patch_uid,1,10) as patch_uid,
substr(action,1,10) as action,
substr(status,1,8) as status,
substr(version,1,8) as version,
substr(BUNDLE_SERIES,1,6)||substr(BUNDLE_ID,1,6) as bundle,
description,
substr(logfile, instr(logfile,'/',-1)+1) as logfile
from DBA_REGISTRY_SQLPATCH;

Decreasing Exadata core count , X6-2 1/8

Decreasing the core count of Exadata compute nodes can be your issue one day, as Exadata does not let you to do so easily.
Decreasing the core count is not an operation that is frequently done, but still it may necessary some day.
That is, normally when you purchase Exadata and when you decide to have a minimal capacity on demand configuration, these cpu count decreases are done in the deployment phase, but I have seen some customer environment, on which we need to decrease the cpu counts to be aligned with the license terms , even after the deployment.

So the utility we use dbmcli, but we use it with an undocumented force parameter.
Note that, we execute the dbmcli and the commands in both of nodes and reboot the nodes afterwards.
Without using the force argument, dbmcli our the resoucecountrol do not let us to decrease the cpu cores to the minimum ( 8 cores)
Note, we also disable iaasMode just in case;

Here;

[root@ermanexa01~]# dbmcli
DBMCLI: Release - Production on Mon Aug 15 15:56:16 EEST 2016
Copyright (c) 2007, 2014, Oracle. All rights reserved.
DBMCLI> ALTER DBSERVER pendingCoreCount = 8 force
DBM-10022: At least 26 physical cores need to be active in order to support IaaS.
DBMCLI> ALTER DBSERVER iaasMode = "off"
DBServer exadb01 successfully altered
DBMCLI> ALTER DBSERVER pendingCoreCount = 8 force
DBServer exadb01 successfully altered. Please reboot the system to make the new pendingCoreCount effective.

Exadata X6, fixing the martian source errors, static routing, asymetric network, route-eth3, reverse path filtering and more

This blog post will be about martian source errors which may be recorded in the syslog(/var/log/messages) of Exadata compute nodes. I want to give a place to this subject, as I have faced this problem in one of my customer environments.

The platform was an Exadata X6 and it was newly deployed. I was there to migrate EBS database to this new platform and seen the following errors in /var/log/messages when I was doing my very first health check.

martian source someip from someip, on dev some_interface

ll header: ff:ff:ff:ff:ff:ff:00:18:f8:0e:..........

When I interpreted this and spoked with the clients, realized that some clients could not ping or connect via ssh to Exadata compute nodes.

The martian erros was reported because of a network configuration problem. That is, for some reason, Linux Operating System running on Exadata compute nodes did not like the network interface from where some packets were coming from.

If I need to be more specific , Linux OS did not like the packets that are coming to bondeth0(client connection interface) ,as it thought that those packets had to come from the eth3 interface(the backup interface).

At this point; I want to remind you something important for the Exadata network:

"If the backup network is on a tagged VLAN network, the client network must also be on a separate tagged VLAN network."

So this phrase clearly states that, if your backup network is on a vlan (suppose it is 10.56.23.0) then your client network must be on another vlan . So the client network should not be on the same VLAN as the backup network. So if the backup network is 10.56.23.0, there should not be any clients on 10.56.23.0. (client term can be thought as anything that connects to the Exadata via client interface or through the management interface.) So, the clients that are there for managing the Exadata machine via the management network should be on another VLAN too.

Let me explain this with an example scenario;

Suppose;

the backup interface (eth3 for Exadata X6) is configured to run on 10.56.23.35

the management interface is configured to run on 10.10.10.100

the bonded client network is configured to run on the ip addresses like 192.168.0.80 192.168.0.81 and so on.

suppose there is a client with ip address 10.56.23.81. (same network with the backup interface 10.56.23.0)

So now imagine the client with ip address 10.56.23.81 wants to ping the Exadata management interface. (ping 10.10.10.100)

When he pings; the OS sees the packet and sees that packet is coming from the same network as the eth3(backup interface) 10.56.23.0, however what OS also sees that the packet is coming to the management interface which is on 10.10.10.0 network.

So at that time, OS thinks that this packet should have come from the backup interface(eth3), as its network is the same as the sender of the packet. Well, at this point; the reverse path filter of OS comes into play and filters that packet and prevent the network connection between that client and the management interface. so ping doesn't work. (any connection won't work actually)

This is a problem right? Yes , it is . This is a problem of the deployment actually, but as we may already deployed the Exadata machine, let's see what we can do to fix/or workaround it.

Well, this behaviour can be changed by a workaround like disabling the packet filter in Exadata Compute node Operating Systems by modifying the /etc/sysctl.conf file and running sysctl -p to make it effect. (it is on by default, so we can make it off)

net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.eth0.rp_filter = 0
and so on..

After disabling the reverse path filter, now when we ping the management interface (ping 10.10.10.100), we see that it works.

This move may not be supported by Oracle, but it solves the problem.
So this is an asymetric routing problem, and turning of rp filter does the job.

After disabling the filter , the client can ping. However it still can not telnet to Database listener port using the client network.

That is, the client with ip 10.56.23.81 is okay with the ping command after disabling the rp filter but now it wants to test the database connection.

Now , it uses telnet to reach the listener port running on the client interface of one of the Exadata compute nodes (telnet 192.168.0.80 1523), however, this test fails.
The client cannot telnet to the client network using port 1523. (listener is up and running, no firewall is running).

So after disabling the rp filter, the problematic client can reach the management network via ping, but can not reach the client network via port 1523 using telnet.

Why?

ping uses icmp (level 3). On the other hand, telnet is upper level. When the client sends ping; the client first sends a ICMP Echo Request Packet to host and then, the host will reply with an ICMP Echo Reply to client.

However, as for the telnet, the tcp packets are dropped.
That is, as there is an asymetric routing happening out there. The network components visited all the way down from client to server and server to client such as routers/firewalls(any network components) block the SYN-ACK (the first server response in 3 way connection establishment of TCP) packets. Remember, we disabled the reverse path filtering but , still we have a wrong route defined in our exadata compute nodes for the problematic client's network.
That is , client still tries to connect to the management network, but the route for the server replies are still configured to be through the backup network.

Info: Asymetric routing can be a problem for TCP which has strict state tracking but often does not affect "stateless" protocols such as ICMP or UDP.

In Exadata, we see some static routing are in place. When we look at the /etc/sysconfig/network-scripts directory of a compute node we see the rule and route files.

An example route for eth3 (backup interface):

10.56.23.0/24 dev eth3 table 223
default via 10.56.23.254 dev eth3 table 223

An example rule for eth3 (backup interface):

from 10.56.23.0/24 table 223
to 10.56.23.0/24 table 223

So these route and rule files simply say: use eth3 if you want to go to 10.56.23.0 (remember our problematic client was in the same network as the backup interface and its ip address was 10.56.23.81)

So when these files are in place, the routing table of the compute nodes have a static routing for eth3, so no matter what we do , we can't disable the path that packets will follow for going out from the server to the 10.56.23.* network.

But there is one thing that we can do ;

That is; we can simply change the backup network interface IP and take it to another network where there is no Exadata clients exists and lastly remove those static route and rule files from the /etc/sysconfig/network-scripts directory!

So as for disabling the static route(for backup interface) that comes with the deployment of Exadata, we simply remove the route-eth3 and rule-eth3 files and restart network services.

After this change, our compute node that we are dealing with does not contain any static route for the eth3 device, and then we can telnet to our listener using the client network. ( next move may be adding new route and rule files according to the new ip/network addresses of the backup interface)

This is what I did in our recent project. I did the exact same approach for working around the exact same problem and it did work.

I find this problem and its fix quite interesting and that's why shared this arcticle with you. Thanks for reading.

Wednesday, August 10, 2016

Resizing LVM mounted on root filesystem (online) on Redhat 4, ext2online

I already explained how to resize an LVM online in one of my previous post. (http://ermanarslan.blogspot.fr/2013/10/linux-extend-logical-volume-size-lvm.html)
In that post, I gave all the info for resizing LVM based mount points. The explanation was from stratch. (scan scsi bus, fdisk, resize2fs and so on) and I have followed that approach many many times.
However, today, when I was working on a EBS application tier environment which was running on a RHEL 4 32 bit server, I noticed something and I want to share it with you.

I was requested to enlarge a mount which was based on LVM and which was mounted to root (/).
Again, I followed the same approach as documented in the blogpost, pointed by the above url but this was when I was enlarging the filesystem, the following error produced

[root@erpapp ~]# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 5)

[root@erpapp ~]# resize2fs /dev/mapper/Cluster1-Root
resize2fs 1.35 (28-Feb-2004)
/dev/mapper/Cluster1-Root is mounted; can't resize a mounted filesystem!

So, it seemed resize2fs was not capable of resizing a mounted filesystem in Redhat 4. In other words, the version of resize2fs that comes with the Redhat 4, can not enlarge a mounted filesystem.

While I was trying to find a way to make it work, the utility named ext2online helped me.
Actually it was the right utility to enlarge a mounted filesystem in Redhat 4 or in some of the other Linux Operating systems similar it.

[root@erpapp ~]# ext2online
ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b
usage: ext2online [-C fd] [-dfqvV] device [new_size[bkmgt]]
-C, --completion : print completion information
-d, --debug : turn debug info on
-f, --force : skip safety checks
-q, --quiet : be quiet (print only errors)
-v, --verbose : be verbose
-V, --version : print version and exit
fd is the file descriptor to output completion data to,
new_size is in ext2 blocks (1k, 2k, or 4k) (default),
disk Blocks (512 byte), Kilo-, Mega-, Giga-, or Terabytes

[root@erpapp ~]# ext2online /dev/mapper/Cluster1-Root
ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b

[root@erpapp ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/Cluster1-Root
450G 12G 416G 3% /
/dev/sda1 112M 17M 90M 16% /boot
none 16G 0 16G 0% /dev/shm
/dev/mapper/Cluster1-Temp
2.0G 288M 1.6G 16% /tmp
/dev/mapper/Cluster1-Log

So, at the end of the day, there is an important lesson learned. That is, when you are on Redhat 4, use ext2online to resize your mounted filesystems. (not the resize2fs).
Note: The rest of the approach was exactly the same as documented in http://ermanarslan.blogspot.fr/2013/10/linux-extend-logical-volume-size-lvm.html.

Monday, August 8, 2016

Delphix - Notes vol 1, my new toy for the upcoming months

Recently I have taken a look at the Delphix software and its data as a service solution, as it was one of the prerequisites of our new delphix partner enablement process. I must admit that, I find it quite handy.

When we talk about Delphix, we actually talk about the Delphix engine. That is; there is an engine called Delphix engine, which runs on a Virtual machine based on a Solaris Guest OS and it sits between hosts and the storage for enabling fast clones, rewinds, restores, masking, versioning and provisioining of the non-prod enviroments.

Delphix supports a bunch of well known databases such as Oracle and Microsoft Sql Server and as far as I can see it works smoothly. In addition, it supports virtualizing the filesystems(such as Linux ext3 and ext4) and take fast filesystem clones from the sources to targets.

Using Delphix, we can create our clone environments in just 1-2 minutes. Also we can rewind them, restore them, take another clones from them easly and quickly. We can mask the data while creating those clones and we can even offload the rewind and cloning operations to our development teams by giving them the sophisticated Delphix tool called Jetstream.

In this post, I will give you my notes about the Delphix and this will be just a beginning, as we will dive in to this software deeply in the next months.

Before giving you the notes, I want to give you a quick and a -from surfrace- review about how Delphix works in Oracle environments. I want to give the overview about Oracle, as it is our main focus and I want to give it short because as i already mentioned it is just a beginning;

Delphix is installed on the site where our clone environments supposed to be.(recommended)
It is installed on a Virtual machine. This virtual machine is the platform where Delphix engine actually runs.
Delphix engine is responsible for the Delphix operations and it can be invoked using Delphix tools, CLIs and APIS.
Delphix engine has all access to the underlying storage via the Virtual Server platform that it runs on.
For cloning an Oracle Database, Delphix connects to the production machine and triggers and rman level 0 backup there. However, this level 0 is only taken for one time, for the inialization. After the initialization is completed, Delphix regulary takes level 1 rman backups from the production and store them in its storage in a compressed format. These rman backups are called Delphix Snapshots by the way. (note that, there are 2 snapshot terms in Delphix, 1) rman backups 2) the snapshots that Delphix creates its inside, while we are using the jetstream)
Delphix , if required, can also regularly take the redologs + archivelogs from the production. This is called the log sync and it is done in a time basis way for having a more point in time based clones in the target site..
When we initialized our source environment in Delphix, we can take easy clones by just clicking a few buttons located in the delphix engine Web UI. When we clone a database, which is called "creating VDB operation", Delphix restores the database backup that it took in the first place and roll it forward to the point in time that we want to have our clone database to be. Delphix does these operations transparently and when we look at our target machine, we see our new clone database is running on our target environment and its database files are stored in NFS mounts which are coming from Delphix engine.

So, ofcourse there are a lots of things that Delphix can does as well. There are things that Delphix enables too. Imagine a site migration for example, or imagine a cloud migration. Imagine EBS database to be cloned in just a few miniutes and masked, imagine EBS apps tier can be cloned in just a few minutes. Well, lots of things to say, lots of things to remember, lots of procedures to follow and also lots of expertise to have.
I will give you much deeper details in my next blog posts, but for now, I give you the notes;

For what can we use Delphix:

Application developments: Create a QA environment for Developers. Create restore points(bookmarks), switch between those bookmarks with just a couple of clicks.

Refresh/Restore: Replace from Production/Referesh from Production.

Refesh from branches/bookmarks. It can mask the data from PROD when it is restored to DEV.

Creating new version of applications: By creating Branches/bookmarks, we can create a new timeline for the new version without deleting the old version data or applicaiton.

Data Masking: real data is replaced with fake data which is still realistic. Delphix delivers secure data to development environments.(secure virtual copies are created from the production) -- masking done is approx 4 hours..Deplhix masks the data without breaking the referential integrity-> for ex: Mary is always masked as Clara... in every table, every column...

Test: Test, fail, then go back and restest...

Migration: Place delphix to the target site, create clone environments in the target without copying lots of data. (done in Clorox Company, probably they had lots of clones in the source environment)

Delphix Tools:

JetStream is the Delphix UI console for the use of End Users to control Data Copies. (We give this to DEV and TEST developer, so they use it their own) Developers do the rollback and rollforward for their databases.
DEV and TEST developers creates their copies without touching the infrastructure.(no risk there)

Operation Console(Ops console) : is the tool used by Deplhix admins. The things that are done here are; provision,convert, replicate,mask, role based user privileges, policy management and more.

Security console:
Delphix finds the sensitive data by profiling.
Secure sensitive data by masking.
Audit the sensitive data and ensure it is masked. Alert admins if there are vulnerabilities.

Missing Control Management Console: Used for Reporting,monitoring health, enabling auditing for compliance(user based audit,limited data views and so on)

Delphix captures the production data (initialization), then it records the changes continously and lastly, creates the point it time virtual copies on demand.(delphix shares the data blocks for building virtual copies, rather than copying them)
3 test environment runs in parallel on shared data blocks. no copies are needed for building a new/4th environment.
These test environments can be refreshed , rewinded fastly.

References:

Facebook used it. Facebook financials are on EBS. They reduced their Quarterly Financial close from 3 weeks to 2 days. They use delphix for doing parallel environments so that analysts could find the transactional errors before doing the close on production.

Delphix answers "YES" to the following questions:

Create a new fresh copy of 5 TB DB in 5 mins?
Reset Database copy just before descrutive change without losing all of my work?
can you provide 3,5,10 copies of my database?
Can you provide securely masked copies in minutes?
Can I self service my data copies and version control them like source code?

Delphix generally work as -> Take data , compress it in Delphix engine, virtualize and connect the data to any kind of server. (data can be an Oracle DAtabase, or EBS application files, or can be gathered from any data source)
From One physical data, Delphix creates several virtual TEST, DEV etc environments --> %90 decrease in redundant data
Gains both in OPEX (Operation expenditure, management man/hours) and CAPEX(CApital Expenditure, less storage)

In DEV environment, the errors can not be found. Most of the time, the errors are found QA and UAT environments (too late..thus it increases the release time) This is because DEV environments are not fresh, not full, not synch. With delphix, the errors are found in DEV, less work for errors in UAT and QA, which is good.
So the development becomes more efficient.Thus, when the time of production , there are less errors to deal with.

How Delphix enables its capabilities?

Deplhix(Deplhix engine) sits between RDBMS and Storage. Deplhix runs on any hypervisor and on any server.
Deplhix doesnt touch production at all. It just takes data from there.
Delphix take the copy and compress it . Also record the changes from source. So it tracks all the deltas coming from the source. Delphix masks the data if wanted(mask once, use many times, once masked several environments can be created without need to mask once again). Thus, deplhix allows users to provision any point in time.

Gains & Some References & Some Success stories

Deutsche Bank -> Financial stress testing.. There were various reporting jobs which needed to be distributed. With delphix Deutsche Bank, they doubled application development outputs, virtual environments created, offload reporting from prod to these environments. Data management turned into self service.
Coherent -> Required to do the Critical upgrade of EBS 11i to R12 + migration of this system to a new data center. They used Deplhix to accelerate the upgrade and migration. Using delphix they proveed the ability to archive and the ability to recover the database efficiently.(in case needed)
Molina Healthcare-> personel healthcare information(sensitive) should be secured in Dev, QA, Reporting, Staging envs and so on. With delphix, Molina built lots of virtual test environments. (over 3000) They avoided 3Petabytes of Storage capacity. They also used efficient Data masking in deplhix in a way like "mask once", provision many.
Clorox -> increase the migration speed. They parallelized testing, used self service virtual environments for their migration related works.
Cisco -> on demand data access across all region. They increased their testing cycle speed. They also used Deplhix for migration.
US Marines -> They nneded to Oracle EBS 11i to R12, in parallel a migration to a new site was planned. They used Deplhix 2x acceleration in ERP upgrade. They also used it in the retirement of their legacy data centers.
Facebook -> Increased the speed of Financial close from 3 weeks to 2 days.(as a result of increased error resolution and proactivity.)
Wallmart -> Accelerate release cycles, reduced online errors by cutting frequent refresh cycles (from 6 weeks to 2 days). Eliminated operational and capital cost of having multiple environments by %80.
StubHubb -> Accelerated Application releases. Normally, it was taking 3 weeks to create a testing environment. From monthly to daily mobile application releases. They reduced production errors by %20.
Gap -> Used fresh copies for offloading the ETL access and increased the speed the daily and weekly reports. They offloaded all BI and Reporting from Prod to virtual copies.
Informatica -> With Deplhix, accelerated its master data management project 6 months ahead of its schedule.
Wind River -> They used Deplhix several virtual environments for continous and up-to-date access.. Intel could take the data from these virtual environments. (Intel purchased Wind River, so needed to process its data). Generally, production access is decreased from 8 hours to 15 minutes
Carnival -> Their main challange was Data Protection. They used Delphix to create low cost backup and disaster recovery solution. Also they gained 10x Storage and 2X Server utilization improvement.
P&G -> They used Delphix for application development, delivering quick copies, data protection. They also improved their RTO and RPO. They get superior SLAs.
Comcast -> They needed Much faster point in time recovery and they used Delphix for this, as Deplhix provides quick access to the data in any point of time. They increased the speed of their recovery also. Delphix -> Agility & Availability for Comcast

One of the business usecase of Delphix is moving data between site securely, such as "Delivering secure data from on-premise to cloud".

Using Deplhix, the data can be securely transffered between delphix engines. So Delphix can do engine to engine replication. This replication is done with the compression , so it is very quick.

Using Delphix bookmarks and branches several versions application can be coexist and developed in parallel.
We can create a new version , work on it, then create a bookmark to save our work. After that we can switch the old version's branch and work on it, save our work by creating a bookmark etc...
We can create branches from bookmarks, as well. For example: for version 1.1 -> to 1.1.1 : we create a branch from version's 1.1 's bookmark.
So we restore, we go forward-backward in time, but our data in any data & time is not deleted unless we want it to be deleted.
Bookmarks can be deleted according to the retention policies, automatically.
We do all the branch-bookmark things using jetstream tool.

With Delphix migrataions become : virtualize and migrate. Delphix even provides Linux to Unix endian conversions for Oracle Databases.

Accessing Delphix Information/the sources:

Support Portal: support.delphix.com (download delphix software, knowledge base, public documentation, community forum and more)
Public Documentation: docs.delphix.com (most current documentation, all versions are available)
Shart Tank: Delphix user community, a forum community.delphix.com

Virtual Databases are created using Deplhix Engine graphical interface.
-Deplhix has Web Server GUI, CLI through SSH and APIs for administration. Two user roles are there for administration in Delphix: Delphix Admin(the real admin) and Sysadmin(a lightweight admin who can do the day to day tasks).

Delphix can virtualize Oracle, DB2, postgres, Mysql, Sybase ASE databases.

DELPHIX FOR ORACLE:
---------------------------

Delphix engine is installed as a VM guest on VMware VSphere environment(Version 4.x or 5.x) or Amazon Ec2 Cloud . It requires minimum 8vcpus , 64 gb memory minimum.
Delphix supports Oracle Databases 9.2.0.8, 10.2, 11.1, 11.2 and 12.1
Delphix supports Oracle Databases on Solaris(Sparc and X86), RHEL, OEL, Suse, AIX and HPUX.
There is no supported storage list for Delphix , so it can run on any storage.
Delphix supports both standalone and RAC hosts.

In Delphix, target environment means Virtual Databases.
In Delphix, the delphix-side representation of a Source database is called as Dsource (Data Source).

Source environment speaks with Delphix using SSH, JDBC and DSP. DSP(Delphix Session Protocol) runs on port 8415 . Delpix requires port 8415 to be opened between Source and Delphix systems.
Basically, what Delphix does is , it takes the Dsource and creates a VDB from it and then provision it to a target system as a Read Write fully functional Oracle Database.
spfile is also provisioned by Delphix while create VDBs.

Flow of data for Oracle:

Rman takes Rman Level 0 backup using Rman APIs for the initalization. (this is done once while adding the source system in to Delphix).
Delphix takes that level 0 backup in to the Delphix system for the initalization and then takes level 1 and optionally archive/redo backups to synch itself with the source.
After the level 0 is taken, Delphix compress it 2 to 4 times. Level 1 backups and archivelog backups used to synch the Delphix are also compresses in the same way.
The VDBs are presented to the target servers using NFS or DNFS (Direct NFS). --> Direct NFS has direct IO (no OS cache) and aysnc IO capabilities and it is the power of it. (when compared to NFS)
The changes done in the VDBS are stored in the Delphix Storage Layer Server , compressed.
the VDBS are like the snapshots of Dsources, so no extra storage is needed for creating them, and so that they are very fast to create.
The changes in VDBS are stored in the Delphix Storage layer (like snapshot deltas)
Dsources have a timeflow. We can see the snapshots in those timeflow. The snapshot are actually the level1 backups taken in to the Delphix. (Delphix iniates level1 backups by sending commands to the source environment)
This snapshots, in other words; level1 backups are done via the Delphix Snap Sync service.
Alternatively, the snapshots can be done by collecting the archivelogs and redologs from the source system. This is done via the Delphix Log Sync service.

IO is done : from target server -> Delphix -> Storage.

Deplhix can take data from ASM but can not provision to ASM. (In this scope, Exadata can be a source or target, but Delphix can not provision to Exadata Storage, but it can provision the data to Exadata as an NFS mount point)

VDBs can be created from other VDBS as well.
VDBs can be rolled back or rewinded point in time, as they have also their own timeflow. This rewinds takes place purely in Delphix's Storage Layer (No oracle technologies are used for that)
VDBs can be refreshed from production as well. (from the dsource and from the snaphsots taken for that dsource)

The IO performance can be monitored and analyzed using Delphix 's Network Performance and Performance Analytics pages.

DELPHIX FOR MSSQLSERVER:
---------------------------

The requirements for the Delphix engine is the same as Oracle .
Delphix supports SQL Server 2005, 2008, 2008R2, 2012, 2014 and Windows Server 2003 SP2, 2008, 2008R2, 2012 and 2012R2 (64 bit)
Delphix supports Always on(for only source) + Windows Failover clusters(for both source and target)

When we talk about the database term in Delphix means Data files + Transaction logs + transaction log backups for Sql Server.

As for the Sql Server, Delphix recreates the source system in an intermediate system called the validated sync target/environment.
Validated sync environment is a replica, which is recovered using the database file backups and log backups from the production server via the SMB protocol.
Validated sync environment can be a target server or a standalone sqlserver.
The reason for having this environment is to take the backups from source sqlserver contiously.
When a dsource is created, Delphix kick offs a job to have the validated sync server to take the backups from the source server.
Validated Sync should be in the same Sql Server version as the source.
There is a database created in the validated sync environment to be recovered by the source continously. (this databases is called the staging database)
there should be 1 Validated Sync should be for each Delphix engine. (one to one) . This is recommended.

So, Delhix itself does not consume any backup files.
Once the initilaziation is completed, Delphix remains in sync in terms of differential backups and transaction log backups.

Delphix talks with the source instance using jdbc and see if there are new transaction logs or backup have been taken. Then Delhpix instructs the validated sync database to apply them.
As for the sqlserver (like Oracle), Delphix provides network based storage services for the target servers (ISCI protocol)

Again, snap sync service takes backups from the source database.
Log sync service can also be used for retrieving the transaction logs.

The source Sql Server database must be Full recovery mode.
Without log sync, you can't restore to any time you want. So, with snap sync, you can only recover the time that are associated with backups.(you can only recover to snapshots)
Note: Backups taken from the source automatically creates a snapshot in Delphix.

Key factors on Installation of Delphix:

With IOCollector.sql , performance test are done.
We run it in PROD. Then we run it in current CLONE environment and see the difference.
Then, we will take the CLONE outputs as a performance goal, because we are producing clones(so no need to compare Delphix with the Production environment performance)
We expect most of the CLONE performance to be %50 of PROD performance.
If there are lots of clone environments needed, then multiple engine can be implemented. (this decision will be based on the outputs collected in the analysis phase/sizing phase)
For analyisis and sizing , 3 things matter.

1)CPU,RAM for Delphix Engine
2)Network between Source and Delphix + Delphix and Target
3)IO for Storage

Installation and POV should be done by following the document: Delphix Pre Install User Guide

Jumbo Frames can be enabled between the Source and Deplhix Engine and Target. Jumbo Frames will increase the network performance (because we do DB IO , with big block sizes)

However, if the network between Source,Delphix and Target is not isolated, then all the network will be affected from that.

Jumbo frames will be enabled for all the hosts in the network and thay may harm them.. So be careful and analyze the environment of the customer.

Delphix should be installed closer to the target environment (not the source)

Delhpix Storage should be created on top of one to one relationship. 1 lun should be connected for 1 data store and for 1 vmdk....

Raid configuration is a concern for Delphix. Delhpix can only give recommendations... It does not matter for Delphix as long as it performs well.

VMDKs should be "thick" provisioned zeroed. (not thin provisioning) Thin provisioning can be used for this VMDKs as well.

Landshark environment is the demo environment for Delphix. It consists of 1 Delphix Engine, 1 target and 1 source systems.

The network for installing Landshark is very important. The documentation should be followed for configuring the network of the virtual machines, before starting the Landshark environment.

Landshark when configured, comes as a preconfigured inventory. So that, it automatically sees the delphix engine in it sees its source and its target automatically.

With just 1-2 click, we can create our first VDBs in our target. Also, we can create our Vfiles as well.

Vfiles is used when there is need to create a target filesytem from a source filesystem.

Cloning EBS apps tier using Delphix is an example where we use vfiles.

Tuesday, August 2, 2016

Reverse Proxy-- Enabling SSL on JIRA + Ngnix + Apache

This post is not directly related with Oracle products.
It is actually not so so related with JIRA, as well.
My main focus is to give a recall to one of the web tier components, that we use in some of our advanced EBS configurations like offloading SSL works to Web Servers or enhancing the security of our environment by putting an extra layer in front of our application server, which is supposed to be open to the internet.

However, altough we use this web tier component (called Reverse Proxy) in our EBS configurations as well, this time; I will make a change, and write about this topic, by going through an example of SSL enablement on JIRA, not Oracle EBS.

The Forward Proxy is the the default Proxy that we use in our daily lives with the Proxy term.
Reverse Proxy is something else. Altough reverse proxy is not the direct opposite of forward proxy, they are very different. That is, reverse proxy is for servers, and the forward proxy is for clients. I will not go very deep in that, but let's say, in forward proxy, the servers that are reached through the proxy, do not know the clients, but in reverse proxy, clients do not know the backend servers.
So in other words; in forward proxy, the clients want to go to the servers (like google.com), in reverse proxy, the clients think that they are speaking with the proxy server, but they are actually speaking with the servers in backend.
That is Reverse Proxy talks with the client and passes his/or her request to the other servers that are logically behind of it, and then returns the responses of the servers to the clients.
For example, if I configure my Apache as a reverse proxy and say there :"change every url with the google's url/reverse proxy everything to google", and then if you write my Apache Server's ip address to your browser's address bar and it enter , you will reach the google through my ip address. So, you will not see any change in your browser's address bar, as you will think that you are talking with my Apache Server.

This reverse proxy component can be widely used in the Enterprises. It is used for security , it is used for adding an intermediate levels, it is used for SSL offloading and so on.

Below, you will find how we used a reverse proxy to open a JIRA system to internet.
In this scenario, we will use nginx reverse proxy for redirecting the https traffic to http . The clients will come to the reverse proxy with an internet dns name using https and we will pipe them transparently to our tomcat web server(JIRA) using http, which is running inside our network.
So the connection between the clients and our reverse proxy server will be https. Also from client's perspective, the connection from our clients to jira will be https as well.
We will also use another Apache(running on the same server as ngnix, and listening on port 80 of our internet DNS name) to redirect the http requests that may come to our internet dns name, to https automatically.

So , if I summarize;

Consider our internet dns name (public jira address) as : ermanblabla.erman.com.tr

Consider , we want the clients to access our jira system which is running inside our network, using https://ermanblabla.erman.com.tr:443

Consider , we have a ngnix server running on our reverse proxy server and configured to listen on port 443.

Consider, we have configured our network, so that all the request(that are coming from outside of our network) that are coming to ports 443 and 80 to be redirected to this reverse proxy server.

Consider, a separate Apache is running on this reverse proxy server with nginx and it is listening on port 80.

Lastly, consider, we want our JIRA to be accesses with https and http requests should be redirected to https automatically, but we also want the SSL works to be done by our ngnix proxy server. (not by JIRA-Tomcat itself)

I will give the instructions in a brief format, as follow;

First, we change the server.xml of JIRA (Tomcat) -- this is a jira specific configuration file, and it is a JIRA requirement for enabling SSL.

/home/jira/Atlassian/Jira/conf/server.xml
<Service name="Catalina">
<Connector acceptCount="100"
connectionTimeout="20000"
disableUploadTimeout="true"
enableLookups="false"
maxHttpHeaderSize="8192"
maxThreads="150"
minSpareThreads="25" port="8080" protocol="HTTP/1.1" redirectPort="443" useBodyEncodingForURI="true"
scheme="https"
proxyName="ermanblabla.erman.com.tr"
proxyPort="443"
/>

Then in Jira administration screens; We choose > System, then select "General Configuration to open the Administration page" and then set "Use gzip compression Default: OFF". This is also a jira specific recommandation , as GZIP compression is known to cause performance issues using a reverse-proxy, especially if the proxy is also compressing the traffic.

After that, for SSL offloading on Nginx; We create new conf file under nginx/conf.d folder as below and restart Nginx. With this configuration Ngnix server will redirect this url https://ermanblabla.erman.com.tr to http://internal_ip (server that is inside our network and hosting JIRA)

server {
server_name ermanblabla.erman.com.trr;
listen 443 ssl;
ssl on;
ssl_certificate /etc/nginx/erman.pem; --> the ssl certificates should be already loaded in to this pem file.
ssl_certificate_key /etc/nginx/erman.pem; --> the ssl certificates should be already loaded in to this pem file.
ssl_session_cache shared:SSL:1m;
ssl_session_timeout 5m;
ssl_ciphers HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;
location / {
proxy_pass http://<internal_ip>:8080; --> forward the request to the Jira Server that is listening on port 8080 with the internal ip.
}
access_log /var/log/nginx/jiratest.access.log;

Then lastly, we add the following configuration to Apache and that's it. Apache will redirect all request from http://ermanblabla.erman.com.tr to https://https://ermanblabla.erman.com.tr (just in case) . So, We edit the conf file under /etc/httpd/conf.d/jiratest.conf and restart Apache to make the Apache forward all http request coming from port 80 to https:

<VirtualHost *:80>
ServerName https://ermanblabla.erman.com.tr
Redirect "/" "https://ermanblabla.erman.com.tr"
</VirtualHost>

Subscribe to: Posts ( Atom )