Various business domains in large enterprises using many applications, tools and utilities (ERP, CRM, Sales, HR, Finance etc.) poses bigger challenge to IT to support with effective collaboration across business domains, data analysis and critical data synchronization. With many such heterogeneous applications being
added up over time integrating each of them point to point would increase the
complexity exponentially. To reduce integration complexity, Hub and Spoke
design paradigm is regarded as one of the best solution. The hub part needs to be designed to be highly concurrent, distributed,
scalable, micro services oriented, cloud hosted and container orchestrated to
offer all the elements of data integration like real-time data streaming,
transformation, synchronization, quality, and management to ensure that
information is timely, accurate, and consistent across heterogeneous applications.
Event driven design:
Spoke applications triggers messages to hub on its create or update event. Upon
receiving such messages and based on the routing rules, hub decides message
forwarding to corresponding spokes. Routing rules determines the message flow
across spokes systems and these rules relies on trusted data sources for
fetching data. The data validation, synchronization, transformation,
enrichment, filtering for every message is carried out by integrating hub
with trusted data sources.
Tools & technologies for building hub using IBM public cloud :
·
The
real-time data pipeline is built using Kafka cluster that helps hub handle
large amount of data and provides message reliability.
·
To
simplify our concurrent code development, and avail infrastructure that allows
us to scale without modifying application, Akka toolkit is used. Akka
seamlessly handles the distribution of messages and communication in big scale.
The other biggest advantage is that it simplifies concurrency logic which in
turn improves coding efficiency.
·
Play
web server provides lightweight, stateless web server framework and we have
chosen this to expose APIs with minimal resource consumption.
·
Redis
is used for caching and error handling.
·
Grafana
and DataDog are used for monitoring infrastructure and services to update the
administrator about system health.
·
Zipkin
is used to trace the message flow across systems and provides real time update
on the message transition.
Core components that Hub offers are:
·
API
service: Exposes integration over REST to provide integration of spokes with
hub. Here, interface contract is established to standardize the communication
from spoke to hub. Provides HTTPS endpoint and OAuth2.0 for securing the
communication.
·
Processor
Service: Validates, standardize, enriches, filters and transforms the incoming
data. Applies message routing rules to determine destination spoke for incoming
messages.
·
Persistence Service: Provides data persistence to store every transaction that flows through
the hub. DashDB is being used to store consolidated data.
·
Adapter
Service: Service to consume the spoke interfaces is provided by adapter
service. This generalized service gets derived by individual spoke adapter to
be in compliant with its own interface.
·
Dashboard
service: Provides consolidated report on
the opportunity messages. Helps support and administration staff with the
message transaction details.
Real time integration exception
handling and assured delivery:
It is critical to handle extraneous conditions arising from
unexpected events, invalid data and process errors at runtime. The ecosystem must adapt multi-pronged approach to handle various types of errors with assured delivery,
error routing, error mapping, logging, notification, and dashboard monitors for
error details. Robust retry framework is used to automatically resolve
recoverable errors. Hub forks workflow process between recoverable and
non-recoverable exceptions to manage them appropriately.
Circuit breaker mechanism is adopted to handle remote calls
efficiently. This provides us lot of savings in terms of optimal resource usage
and mitigates failure cascading across systems.