Just deploying a data lake isn't enough to deliver a big ROI from Big data. Putting massive data in Hadoop but having only a few experts able to make sense of it throttles the value you need to derive. A data lake should be a data marketplace, open to all, yet secure.
Enterprise data is moving from "big" data to "huge" data (more of it, more complex, ever-changing). You need a single view of data assets to be able to "find the needle in the stack of needles."
Unlocking value needs to happen fast--in minutes, not months. Finding an SME or or searching across data repositories is slow and error-prone. You need to automate the cataloging of data assets to quickly capture and reuse the tribal data knowledge business users.
You can't provide free-for-all access for every business user--top-down compliance and data access policies have to be in place.
Alex Gorelik, author of the forthcoming O'Reilly book, "The Enterprise Big Data Lake", discusses the considerations of and best practices for turning data lakes into data marketplaces, with examples taken from some of the world's leading big data enterprises.
Topics include:
- How to start and grow a data lake and enterprise data marketplace
- Setting up different tiers of data: from raw, untreated landing areas to carefully managed and summarized data
- How to enable self-service to help users find, understand, and provision data and provide different interfaces to users with different skill levels
- Staying in compliance with enterprise data governance policies