S3: Building a Modern Big Data Enterprise: Hadoop, Spark, and Beyond

In this tutorial you will learn about modern Big Data technologies. We quickly explain the ideas of Big Data and then focus on the technologies to process and manipulate them. We start with Hadoop, and its reliable distributed file system. We show how to do Big Data analytics with Hive, Pig, and popular alternatives. We then introduce Spark, which addresses the weaknesses of Hadoop and enables high performance, near real-time processing, including the processing of data streams. We conclude with the overview of the notable members of the Hadoop ecosystem, such as HBase, Flume, Sqoop, as well as new systems such as Kafka and Flink.

Big Data Differences
Hadoop Architecture
Programming Hadoop
Spark: “Lightning-Fast Cluster Computing”
The Hadoop Ecosystem
Big Data in the Enterprise

Dr. Vladimir Bacvanski has over two decades of engineering experience with mission critical and distributed enterprise systems and data technologies. Vladimir has helped a number of companies including the US Treasury, the Federal Reserve Bank, the US Navy, IBM, Dell, Hewlett Packard, JP Morgan Chase, General Electric, BAE Systems, AMD, and others to select, transition to, and apply new software and data technologies.

Vladimir is published worldwide and is a keynote speaker, session chair, and workshop organizer at leading industry events. As a founder of SciSpike, Vladimir is focusing on Big Data technologies and highly scalable reactive software architectures with node.js and Scala. Vladimir is the author of the O'Reilly course on Big Data and NoSQL.