Blog Categories

Blog Archive

Data Lake benefits, architecture and its adoption

January 06 2017
Author: v2softadmin
A look at Data Lake benefits, architecture and its adoption

What is Data Lake?

Data Lake is a large storage repository to hold data in its native format. An Enterprise Data Lake (EDL) is a data lake for enterprise wide information storage repository.

What are benefits of Data Lake?

Using Data Lake the data is managed centrally irrespective of its sources. Once these data are stored these can be combined and processed using Big Data and analytics techniques. Since, the enterprise information is sensitive a proper security mechanism is implemented in Data Lake.

The security measures inside Data Lake provides specific grants to access specific information otherwise the user doesn’t have the access to the original source content.

Once the content is in the data lake, it can be normalized and enriched. This include metadata extraction, conversion of format, data augmentation, entity extraction, cross linking, aggregation, de-normalization or indexing.

Data is prepared “as needed” reducing preparation costs over up-front processing. A big data compute fabric makes it possible to scale this processing to include the largest possible enterprise-wide data sets.

Searching the Data Lake

Data in Data Lake have unstructured and widely varying. The volume of data in Data Lake are very huge. In this environment, search is a necessary tool:

  • To find tables that you need - based on table schema and table content
  • To extract sub-sets of records for further processing
  • To work with unstructured (or unknown-structured) data sets
  • And most importantly, to handle analytics at scale

Only search engines can perform real-time analytics at billion-record scale with reasonable cost.

Search engines are the ideal tool for managing the enterprise data lake because:
  1. Search engines are easy to use – Everyone knows how to use a search engine.
  2. Search engines are schema-free – Schemas do not need to be pre-defined. Search engines can handle records with varying schemas in the same index.
  3. Search engines naturally scale to billions of records.
  4. Search can sift through wholly unstructured content.

Data Lake Architecture

data lake architecture

The Data Lake Adoption

The data lakes are adopted because of the following findings:

  1. Data lakes are increasingly recognizable as both a viable and compelling component within a data strategy, with small and large companies continuing to adopt. 
  2. Governance and security are still top-of-mind as key challenges and success factors for the data lake.