Paweł holds PhD in distributed databases and his interests focus on making Big Data easy. He has 7 years of technical experience at Allegro and currently works as Hadoop Product Owner in a Big Data Solutions Team. The team develops and maintains a petabyte Hadoop cluster with endpoints such as Apache Kafka messaging.
Shell command line is surely the best user interface in the world. Unfortunately some disagree with that and avoid using anything that requires a terminal.
At Allegro we operate a petabyte scale, secured Hadoop cluster that is used by more than two hundred of our employees. In this talk we present our experience in creating a user friendly big data ecosystem.
This will include:
* Jupyter Spark notebooks to write and run Spark jobs from a web browser,
* Hue webapp for executing Hive queries and scheduling Oozie workflows,
* Spark deployment platform integrated within Atlassian Bamboo,
* Hadoop desktop client to access HDFS from workstations,
* Active Directory Integration.
All the presented solutions are built on the top of open source projects.