News
Report: Data Warehouses/Lakes Converge, Go Mainstream in the Cloud
The data warehouse architecture and the data lake design pattern have converged to form a new, richer data architecture, a new report says, and both have already gone mainstream in the cloud.
Published May 28 by TDWI, the "Building the Unified Data Warehouse and Data Lake" report examines the convergence of the two offerings and details drivers, challenges and opportunities for the unified DW/DL and best practices for leveraging the evolving architecture. It's based on a survey that garnered 220 analytics and data professional respondents early this year.
Having tracked the modernization and evolution of data warehouse (DW) architectures and the more recent emergence of the data lakes (DL) design pattern, TDWI said it has seen both the DW and the DL grow in popularity, especially in the cloud. DLs are known for organizing massive volumes of analytics data. "The new generation of DWs are, in fact, DLs that are designed, first and foremost, to govern the cleansed, consolidated, and sanctioned data used to build and train machine learning models."
The graphic below illustrates the importance of advanced AI/ML analytics in data lakes:
The DW/DL convergence has resulted in a new and richer data architecture. "The architecture is fairly new, and not many organizations have embraced it yet," the report says. "The majority of respondents to this survey see it as an opportunity because it provides more options for managing an increasingly diverse range of data structures, end user types, and business use cases."
Even though it's fairly new, 84 percent of respondents stated that the unified DW/DL was either extremely important (48 percent) or moderately important (36 percent), and, as mentioned, 89 percent view the unified DW/DL as an opportunity.
Also as mentioned, the cloud plays a prominent role in the report.
"Data warehouses and data lakes in the cloud are already mainstream," the report says, "with 36 percent of respondents reporting that they had one or the other," as illustrated in this graphic:
"Interestingly, about half of those with data warehouses in the cloud did not yet have a data lake in the cloud and vice versa. This supports the fact that today, many organizations use either the data lake or the data warehouse in the cloud, and a growing number use both. The cloud provides elasticity, scalability, and flexibility. The provider often deals with software and infrastructure management and updates so the IT team does not need to."
Just looking at DWs, though, data shows on-premises implementations still rule, with a majority of respondents (53 percent) reporting having a data warehouse on-premises. "The on-premises data warehouse is a staple for many organizations, especially in large enterprises. We do not expect that to change any time soon. Many fewer enterprises (23 percent) have a data lake on premises. This may be because the first generation of data lakes (often on Hadoop) turned into data swamps because they lacked strong data governance and information life cycle management practices."
Other highlights of the report include:
-
The biggest value factors for the new architecture are: silo consolidation (mentioned by 53 percent of respondents); providing a better foundation for analysis of new and traditional data types (49 percent); and storage and cost considerations (28 percent).
-
When integrating technologies, companies can implement tools (data pipelines, data catalogs, business glossaries, etc.) and data areas (data governance, master data management, metadata management, etc.).
-
Many use cases of artificial intelligence are the main driving force behind the evolution from data warehousing to integrated data warehousing/data lakes.
-
Modern software tools (even those not specifically designed for data governance) support important practices and help extend data governance capabilities across the enterprise.
-
53 percent percent of respondents believed that data lakes need more robust data curation, data and model governance, and query optimization capabilities.
-
When asked "What is the point of the unified data warehouse and data lake architecture?" the top three answers were "Get more business value from data, whether in operations or analytics (64 percent); "Unify existing, siloed data environs without consolidating or restructuring them" (53 percent); and "Expand analytics into more advanced forms, such as machine learning and AI" (49 percent).
-
The most likely barriers to implementing a data lake that complements and integrates with an existing data warehouse include: lack of data governance (44 percent); inadequate skills for data lake design (29 percent); inadequate skills for designing big data analytics systems (27 percent); and lack of business sponsorship (27 percent).
The report also listed many recommendations for organizations to follow in order to take advantage of unified DWs/DLs. A high-level summary of those recommendations (fleshed out in the report), include:
-
Know why you're unifying
-
Plan the convergence strategy deliberately
-
Architecture is key
-
Utilize a phased approach
-
Plan for new skills
-
Plan for modern pipelining and data engineering tools
-
Stay abreast of new technologies
-
Don't forget about data governance
-
Proactively nurture a better data culture
"Think of the best practices as recommendations that can guide your organization into successful model implementations," says the report published by TDWI, which is a sister company to Virtualization and Cloud Review. The free PDF report can be downloaded upon providing registration information, via the link included above.
About the Author
David Ramel is an editor and writer at Converge 360.