Data Lake is a large storage repository to hold data in its native format. An Enterprise Data Lake (EDL) is a data lake for enterprise wide information storage repository.
Using Data Lake the data is managed centrally irrespective of its sources. Once these data are stored these can be combined and processed using Big Data and analytics techniques. Since, the enterprise information is sensitive a proper security mechanism is implemented in Data Lake.
The security measures inside Data Lake provides specific grants to access specific information otherwise the user doesn’t have the access to the original source content.
Once the content is in the data lake, it can be normalized and enriched. This include metadata extraction, conversion of format, data augmentation, entity extraction, cross linking, aggregation, de-normalization or indexing.
Data is prepared “as needed” reducing preparation costs over up-front processing. A big data compute fabric makes it possible to scale this processing to include the largest possible enterprise-wide data sets.
Data in Data Lake have unstructured and widely varying. The volume of data in Data Lake are very huge. In this environment, search is a necessary tool:
Only search engines can perform real-time analytics at billion-record scale with reasonable cost.
The data lakes are adopted because of the following findings: