Tuesday, August 2, 2022

OCI -- Running Core Banking on Cloud (+ Weblogic Cluster-aware Auto-scale configuration)

It is already up in the cloud! We are running Core Banking applications on Oracle Cloud Infrastructure! 

Our solution is almost fully managed.  It is Weblogic cluster-aware + it provides a fully automated scale-out mechanism for both having optimized compute resources and distributing the Weblogic & Http Server-related load across the nodes (with increased fault tolerance as as result of those automatic horizontal compute node expansions). 

Well.. With this solution; the cloud infra (including the Weblogic apps servers (WLS) and Http Servers) of the core banking applications is able to scale-out and scale-in according to the load of the cloud nodes (IAS compute nodes).

In this blog post , I will try to shed a light on the enablement process; the purpose; the motivation, the challenges and the final picture of such an architecture to make you think the things that can be done on Cloud, the benefits of having a cloud-based architecture and the ability to run mission critical applications on Cloud. This is especially for Oracle Cloud.. (We are also a Google Partner and have the ability to build a similar architecture on GCP as well!)

Just for intro; the name of our Core Banking application is Symphony. 

Our purpose -> Having a high available and auto-scale Core Banking environment on Oracle Cloud Infrastructure."

Our motivation-> Leveraging the advance features of Oracle Cloud to build a managed and auto-scale banking environment that is cloud ready and easy to manage.

SCOPE OF THE WORK:

Configure OCI compute and storage resources to support the Core Banking application needs.

Migrate the whole Core Banking environment from on-PREM to OCI (IAS Compute + WLS).

Do the development that is needed for auto scale-out operations.

Do the development that is need for auto scale-in operations.

Do the functional and stability tests.

SOLUTION:

Environment is completely on OCI.

Load Balancers in the forefront.

A 2-node RAC database is in the database tier. (will be replaced by the Oracle Autonomous database soon..)

Application servers are in an instance pool that we defined and configured.

A template image is built for adding a new machine to the managed group when a scale-out operation is triggered (when the load increases, OCI trigger the scale-out, custom code does the rest)

Scale-in operations are also triggered by OCI. However, custom code and a Cloud Function run to remove the configuration of the removed node from the Weblogic Cluster.

Python and Bash scripts do the Weblogic-side of things in the case of  both scale-out and scale-in operations. (Configuring the node manager and the cluster)

ARCHITECTURE:

Having all the stack on OCI Frankfurt Region. ( I can't share our core banking architecture here due to privacy concerns, if you are interested please contact me -> erman.arslan@gtech.com.tr)

By looking at the value of the metrics (such as the CPU utilization) , OCI decides to trigger the scale-out and scale-in events and this process can be tuned according to the needs. Metric can be modified + all these operations can be monitored.

Notification mechanism is also placed for notifying the Cloud admins when a scale-out/in operation is triggered.

The auto-scale solution is fast. For instance, when a scale-out operation is triggered; a node can be added to the Weblogic configuration in almost 5 minutes.

FLOW CHART:

In the following diagram we see the general flow, the flow that enables the solution make automated decisions ( these are made by the auto-scale mechanism) for starting scale out and scale in operations. We also see the built-in monitoring mechanism of OCI that listens for the load of the nodes in the instance pool.. We see, the alarming system that is kicked in when the load level is increased or decreased.

In the left pane, we see the custom code, that adds a new machine, configures the Weblogic and starts (removes when we are scaling in) the managed servers automatically.

The Load Balancer is also integrated and automatically gets the new node to its configuration (to its backend) when we add a node into the cluster. Similarly, load balancer removes the node from its backend when we remove a node due to a scale-in operation.

AUTO-SCALE IN ACTION:

Following output shows the auto-scale side of things. 

Here, we started with 2 nodes around 7 am and then auto-scale mechanism kicked in.

As we had lack of resources, auto-scale mechanism added 2 more nodes to the config at around 11 am.

And then, around 13:15, the load decreased, so the auto-scale mechanism removed a node this time.

Note that, our custom code is integrated to this mechanism, so all the WLS configuration is altered automatically during these things happen.


ROOMS FOR IMPROVEMENT:

Wait for the active application sessions before removing the node from the configuration. (during a SCALE OUT operation)

Optimize the local max JVM count in case we may be using all the JVM processes even though we didn’t have any CPU utilization or load on the server.. (This improvement should be done by the Application Developer Team.. There are also things that should be done by Oracle Development to provide the ability to make the code wait when a scale-in or out is kicked in. )

Replace the database layer with the Oracle Autonomous Database. ( This is already on its way, coming!)

No comments :

Post a Comment

If you will ask a question, please don't comment here..

For your questions, please create an issue into my forum.

Forum Link: http://ermanarslan.blogspot.com.tr/p/forum.html

Register and create an issue in the related category.
I will support you from there.