Deep Dive into Sparkling Water, Jakub Háva - H2O World San Francisco

Please download to get full document.

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
 4
 
  This session was recorded in San Francisco on February 4th, 2019 and can be viewed here: https://youtu.be/QSWYCDV-hhs Bio: Jakub (or “Kuba” as we call him) completed his Bachelor’s Degree in Computer Science and Master’s Degree in Software Systems at Charles University in Prague. As a bachelor’s thesis, Kuba wrote a small platform for distributed computing of any types of tasks. During his master’s degree studies, he developed a cluster monitoring tool for JVM based languages which makes debugging and reasoning the performance of distributed systems easier using a concept called distributed stack traces. Kuba enjoys dealing with problems and learning new programming languages. At H2O.ai, Kuba works on Sparkling Water. Aside from programming, Kuba enjoys exploring new cultures and bouldering. He’s also a big fan of tea preparation and the associated ceremony.
Share
Transcript
  • 1. Sparkling Water Hands-On Jakub Hava Senior Software Engineer H2O.ai https://www.linkedin.com/in/havaj/ #H2OWORLD
  • 2. MEET THE MAKERS VP of Engineering at H2O.ai and creator of Sparkling Water MICHAL MALOHLAVA Senior Software Engineer at H2O.ai at Sparkling Water project. JAKUB HAVA Head of H2O and senior Software Engineer at H2O.ai MICHAL KURKA
  • 3. Sparkling Water • Transparent integration of H2O with Spark ecosystem -Spark and H2O side-by-side • Transparent use of H2O data structures and algorithms with Spark API • Excels in existing Spark workflows requiring advanced Machine Learning algorithms • Deployment tool for Driverless AI MOJOs Functionality missing in H2O can be replaced by Spark and vice versa
  • 4. Benefits • Additional algorithms • NLP • Powerful data munging • SQL • ML Pipelines • Advanced algorithms • speed v. accuracy • advanced parameters • Fully distributed and parallelised • Graphical environment • R/Python interface
  • 5. Model Building Data
 Source Data munging Modelling Deep Learning, GBM DRF, GLM, GLRM
 K-Means, PCA CoxPH, Ensembles Prediction processing
  • 6. Data Munging Data
 Source Data load/munging/ exploration Modelling
  • 7. Stream Processing Data
 Source Off-line model training Data munging Model prediction Deploy the model Stream processing Data Stream Spark Streaming/Storm/Flink Export model
 as MOJO Modelling
  • 8. Stream Processing 2 Data
 Source Off-line model training Data munging Model prediction Deploy the model Stream processing Data Stream Spark Streaming/Storm/Flink Export model
 as DAI MOJO Modelling
  • 9. Scoring • MOJO o Model Object Optimised • MOJO Pipeline o MOJO from DAI with additional future transformations • No runtime dependency on H2O/Driverless AI frameworks
  • 10. New Features Overview • Integration with Driverless AI MOJO • Continuous Integration with Spark Pipelines • Enterprise Security (LDAP, Kerberos/Kerberized Clusters) • Integration with Enterprise Steam • Pysparkling Conda Packages • Stabilization & Client’s Speedup
  • 11. Machine Learning Pipelines • Wrap our algorithms as Transformers and Estimators • Support for embedding them into Spark ML Pipelines • Can serialise fitted/unfitted pipelines • Unified API => Arguments are set in the same way for Spark and H2O Models • Integration with H2O-3 & Driverless AI Mojo pipelines
  • 12. Sparkling Water Roadmap ▪ Stability fixes for client mode ▪ Full API parity for H2O Algos in Pipelines ▪ Monitoring aspects for Spark pipelines ▪ K8S support ▪ Deployment templates (e.g., AWS EMR) ▪ Steam Integration ▪ Official Support for multi-node XGBoost ▪ Job Queues ▪ Support for Spark 2.4 FY18 QTR 4 FY19 QTR 2FY19 QTR 1 FUTURE ▪ Integration with Spark 3 (Hydrogen) ▪ Computation backend for DAI ▪ Running H2O on subset of Spark executors ▪ Sparse Data Handling Speedup ▪ H2O-3 Algos as on-demand service (up-down-up) ▪ H2O-3 AutoML clever autoscaling COMPLETED
  • 13. Ecosystem
  • 14. H2O.ai
 Machine Intelligence Architecture
  • 15. Internal Backend Sparkling App jar file Spark Master JVM spark-submit Spark Worker JVM Spark Worker JVM Spark Worker JVM Sparkling Water Cluster Spark Executor JVM H2O Spark Executor JVM H2O Spark Executor JVM H2O
  • 16. External Backend Sparkling App jar file Spark Master JVM spark-submit Spark Worker JVM Spark Worker JVM Spark Worker JVM Sparkling Water Cluster Spark Executor JVM H2O Spark Executor JVM H2O Spark Executor JVM H2O H2O
  • 17. DEMO TIME!
  • 18. The Topic • The goal of this demo is to train the pipeline in PySpark • The resulted pipeline will be exported into language independent format • The stored pipeline will be deployed in Scala as part of Streaming App
  • 19. Resources • Documentation: http://docs.h2o.ai • Tutorials: https://github.com/h2oai/h2o-tutorials • Slidedecks: https://github.com/h2oai/h2o-meetups • Videos: https://www.youtube.com/user/0xdata • Events & Meetups: http://h2o.ai/events • Stack Overflow: https://stackoverflow.com/tags/sparkling-water • Google Group: https://tinyurl.com/h2ostream • Gitter: http://gitter.im/h2oai/sparkling-water
  • 20. Sparkling Water is open-source
 ML application platform combining
 power of Spark and H2O Learn more at h2o.ai Follow us at @h2oai Thank you!
  • Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks
    SAVE OUR EARTH

    We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

    More details...

    Sign Now!

    We are very appreciated for your Prompt Action!

    x