Introduction
Siloed applications across enterprise needs to be interconnected to leverage data produced in other trusted data source applications. This helps different business units collaborate better, reuse, analyze, make informed decision and ultimately add great value to their offerings. The process of synchronizing data across applications is crucial exercise in enterprises and significant effort spent by teams on building robust integration. There are many types of integration mechanisms followed in industry today to build robust systems interacting with one another. Still there are challenges that need to be addressed.
Data is tagged as invalid due to various reasons. It could happen in source, or in transit. Wrong way of processing, and network issues can make data invalid. Such invalid data causes real-time data dependent systems fail in delivering data to customers in time. The turnaround time to fetch the valid data in further attempts delays the reports delivery and analytics that in turn slows down making informed decisions. Im proposing a solution in this article to mitigate this issue.
Widely Used Approach
Source Application: A Data source which is trusted across enterprise
Middleware : Links two or more separate application. Provides common connection, orchestration, data transformation and mapping logic in integrating heterogeneous applications.
Target Application : Application that receive data from external sources and stores it.
Common approach in automated real-time integration system that suffers with time delay when encounters with invalid data :
Structured data is always subjected to validation testing for its each attribute when it migrated to new application. Receiving applications are designed and developed to reject whole record even single attribute in that record is invalid. Thus rejected record will come back to Source application where it has to be inspected and correction must be made either in automated way or manually. Manual intervention to correct the data consumes more time and error prone. This delay could make many reporting or analytics applications not depending on the invalid attributes loose time unnecessarily.
Step by step process:
1. Initial data load: Middleware receives the data and finds certain attribute empty
2. Handling invalid data: Middleware tries to find the data in other source systems if orchestration is enabled
3. Middleware processing: Middleware detects the record as invalid
4. Middleware rejects the data to source application
5. Data correction at source: Source application corrects the data and pushes it to middleware. This could take long time depending on the type of error. Most of the times, manual corrections require days to correct the data
6. Reconciled data in target system: Once corrected, data is pushed again to the middleware
7. Middleware validates and if valid, pushes the data to target application
8. Target application validates the data and if valid stores the data
Commonly used integration mechanism
Issues in Current Approaches
My Solution Proposal
Invalid data flagging design
Design implementation
Conclusion
1. Real time integration of applications
3. Structured data migration to other applications
4. Real time reporting and analytics applications which depends on external applications for data