Breaking Down Data Silos

Integrating Information in Merged and Acquired Organizations

Tony Seno Hartono
4 min readJun 14, 2023

The Bahasa Indonesia version can be read here.

At the moment, I am addressing a silo issue within a large organization that arose as a result of a previous merger and acquisition. I believe it would be beneficial to share the strategies I am implementing to dismantle these silos and integrate departments facing similar challenges, as it could be valuable for others experiencing similar issues.

Data Silos Issues

Mergers and acquisitions can lead to data silos because every time a new company is acquired, its data must be integrated with the existing company’s data. This can result in more data quality issues and defects being injected into the integrated environment. Companies may not have time to cleanse the new data coming in as part of the migration process, except when it is absolutely required to make them fit into the existing structure.

Furthermore, software applications have historically been developed to solve a particular aspect of the business problem and rarely consider the impact to other business processes or data consumers. This has led to years of multiple distributed software and business applications with disparate rules. Different systems might contain multiple instances of a customer record with different details and transactions linked to it. Because of these reasons, most companies suffer from fragmented and inconsistent data which ultimately leads to data silos.

Examples of Data Silos

Below are most common examples of data silos as a result of acquisitions and mergers:

  1. Varied Data Storage Solutions: Acquired companies may have different data storage systems, such as on-premises servers or cloud platforms. This can result in data fragmentation and difficulties in integrating and analyzing information across the organization.
  2. Incompatible Data Formats: Acquisitions and mergers often bring together disparate systems with different data formats and structures. These inconsistencies can create barriers to data integration and hinder data-driven insights and reporting.
  3. Department-Specific Data Repositories: Each department within the merged organization may maintain its own data repositories or databases, leading to isolated data sources. This lack of data integration limits the ability to gain a holistic view of operations and impedes cross-functional analysis.
  4. Legacy IT Systems: Acquired companies may have legacy IT systems that are outdated and incompatible with the organization’s existing technology infrastructure. This can result in challenges when attempting to integrate and share data between systems.

Data Integration

Data integration is the process of combining data from different data sources to create a unified view of all the data. This involves joining, transforming, enriching, and cleansing the data values to ensure accuracy, consistency, and data quality. The integration can take place physically by copying and storing the data in a single location or virtually by creating references to where the data is originally stored.

Data integration is an essential foundation for building a high-quality repository with trusted and governed data, enabling businesses to identify duplicates, create relationships, build intelligence, minimize maintenance costs, and control access. It is a complex exercise of multiple details and business and technical challenges that require rigorous disciplines of data management, such as data governance, data stewardship, data quality, and metadata management.

Technology

There are several alternatives to achieve data integration:

  1. ETL (Extract Transform Load), which involves copying data from multiple source data stores and storing the transformed result in a separate data store.
  2. ELT (Extract Load Transform), which involves copying data from multiple source data stores and loading it into a target data store for transformation.
  3. Replication is where data is copied from source data stores to target data stores on a regular basis.
  4. Data virtualization involves creating a unified view of data from different data stores without physically copying or moving the data.

Pros and Cons

ETL, ELT, replication, and data virtualization are different approaches used in data integration and management. Here are the pros and cons of each:

  1. ETL (Extract, Transform, Load) is robust, optimized for large-scale batch processing and data warehousing, offers data cleansing and validation during the transformation phase. However it requires significant upfront planning and development effort.
  2. ELT (Extract, Load, Transform) is simpler than ETL process by loading data first and performing transformations within the target system, but it leads to a temptation to skip the transformation steps which can lead to the data warehouse becoming a garbage dump.
  3. Replication is faster than ETL or ELT, but it requires higher hardware requirements.
  4. Data Virtualization gives data store independency, but it adds extra layer of software, which requires more CPU.

How to Choose?

When selecting the most appropriate data integration technology, there are several key factors to consider:

  1. It is essential to develop a thorough understanding of data integration, which involves merging data from various sources to create a unified and comprehensive view.
  2. The chosen technology should possess the necessary capabilities to support critical functionalities such as data transformation, profiling, cleansing, and conformation.
  3. Effective data management strategies, including metadata management, data governance, data stewardship, and data quality management, must be taken into account to ensure accurate, consistent, and controlled management of master data across multiple domains.
  4. Given the rapidly evolving nature of data management, the selected technology should align with the organization’s goals and vision for the future of data management and integration. This ensures that the chosen solution remains relevant and adaptable, accommodating changing requirements and emerging trends in data management practices.

--

--

Tony Seno Hartono

IT consultant: Cloud, Data Center, Security & Privacy, Tech & Policy, in Government and Healthcare Sectors, worked with World Bank, Microsoft, Cisco & IBM.