Tech Work: November 2021

There are plenty of EAI middleware tools in the market. Why build a new one in that case? The answer for this question is always debatable with various reasons cited like complete control on the tool, extensibility, customizable to your own unique use cases and of course low pricing. The low pricing part is more tempting with cheap cloud infrastructure, varying loads, and fast changing business dynamics supporting the justification process.

If you are convinced that low price is the most sought out, don't forget to check important features that any EAI tool would perform and compare that with your requirements. Every popular tool in market pretty much offers basic features like minimal coding, quick mapping, real time/batch mode, multiple connectors, retry mechanism, error handling, monitoring, and reports. There are niche features like AI based data mapping or integration that some modern tools provide. If you are looking for most basic features and you don't want to end up learning yet another tool, its time to try IBM Public Cloud(IPC) services for developing your own EAI tool.

The microservices based containers over cloud architecture provides mechanism for building such tool quickly and efficiently. There are multiple offerings by IBM public cloud to support such requirements. There is a Cloudpak that comes with perfectly packaged services to quick start the journey. Other hand, there are individual services right from computing clusters to database that you can cherrypick to build your own infrastructure package.

I want to share my experience of setting up my own services and how easy it was. To start with, I wanted all the basic features that are mentioned above in the tool. Few requirements that I focussed are:

Integration options that was demanding REST API, Kafka and DB integration. So there is a clear need of multiple connectors
Data sync interval - Realtime is the requirement in my case
Exclusive data mapping with heterogeneous applications need to be integrated
Assured delivery - Can't miss even a single transaction

The above requirements clearly pointing to the kind of system needs to be developed that includes :

API server to listen to calling application - A microservice with light weight server needs to be developed here
Independent connectors to connect to API, Kafka and DB of applications - Utility backend microservices for providing connection and retry logic
Data Processor - Yet another microservice that process the data and prepares the mapping to a specific target system
A database that can persist data temporarily in case of connection issues

Techno-Functional overview

Development Effort

Listing the effort required to setup for fulfilling above requirements.

IPC infra overview

Minimal IPC Services required:

IKS cluster with 3 nodes and PV - To develop microservices
MongoDB - NoSQL DB for temporary persistence
LogDNA - Log analysis for debugging purpose

Developing Microservices

Setting up Redhat Open Shift (ROKS) or IBM Kubernetes Service(IKS) to build cloud native containerized microservices takes not more than 20 minutes. This includes all operational tasks like creating separate resources for test and production environment.

The technology used is Python with Flask. Connectors are the main services that would take longer time as it requires end to end testing, retry mechanism to handle errors etc. using different integration approaches.

Microservices created:

Three different connectors for DB, Kafka and REST API
Auth server for OAuth2.0
API server to receive requests
Data processor that handles mapping, transforming and orchestration

CronJob scheduler is created to handle error retries at regular intervals.

MongDB Collection Setup and LogDNA setup

MongoDB is a quick setup with simple collection creation to store the failed records for later retries. LogDNA for logging analysis is also a fairly easy setup before you start coding and adding log statements.

DevOps Setup

Github setup is not part of Cloud services and it needs to be done separately
Jenkins comes by default with ROKS. So you can quickly develop pipelines with the stages you need using groovy scripts. I added unit test, test coverage, static code analysis, build, and publish to image repository in CI pipeline
Separate test and production environment would take duplicated effort
Yaml files for shared configuration or Vault setup to manage secrets

Deployment Setup

Deployment task was setup as part of CD pipeline in Jenkins. Each microservices Yaml are created with 200m cpu and 256Mi memory limits and deployed with maximum 3 replicas based on 70% CPU utilization. Scheduler is created as cronjob and scheduled to run on specific intervals to check and process error records.

Conclusion

The details shared above are at high level and the low level details would define the exact effort required. My intention is to share this as quick reference material as I experienced the easiest way of middleware development using IBM public cloud services.

Tech Work

Tuesday, November 16, 2021

EAI Tool Building With IBM Public Cloud Services

Anti Patterns for Data Integration Hub