Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integration with NetApp’s Hybrid

Please download to get full document.

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
 3
 
  This session was recorded in San Francisco on February 5th, 2019 and can be viewed here: https://youtu.be/nZzHFwaoMpU In this presentation, we will demonstrate the integration of H2O Driverless.ai with NetApp Cloud Volumes Service. In addition, we’ll describe key considerations for the development of Deep Learning environments and the solutions that enable seamless data management across edge environments, on-premises data centers, and the cloud. This presentation is targeted for data scientists, data engineers, and line of business leaders. Vinod comes with over 10 years of Marketing & Data Science experience in multiple startups. He was the founding employee for his previous startup, Activehours, where he helped build the product and bootstrap the user acquisition with growth hacking. He has seen the user base for his companies grow from scratch to millions of customers. He’s built models to score leads, reduce churn, increase conversion, prevent fraud and many more use cases. He brings a strong analytical side and an metrics driven approach to marketing.
Share
Transcript
  • 1. Driverless AI Integration with NetApp’s Hybrid- Cloud Data Fabric Sundar Ranganathan Sr. Product Manager, NetApp #H2OWORLD Vinod Iyengar Business Development, H2O.ai
  • 2. Artificial Intelligence ARTIFICIAL INTELLIGENCE MACHINE LEARNING DEEP LEARNING Techniques that enable computers to mimic human intelligence. Ability to learn without being explicitly programmed Learning based on Deep Neural Network 1950’s 1980’s 2010’s
  • 3. DL Model Training Flow Your AI application? Choose Framework H2O ,TensorFlow, Caffe, MxNet, … - Simplifies model building procedure w/ APIs - Allows to connect layers - Allows back propagation - Scale out training Raw Data Set Data Preprocessing Labeling, Cleaning … Data Set Ready CNN - Image Recognition - Object Detection - ResNet 50, 152, Vgg16, Inception V3, AlexNet RNN - Time Series - Result depends on past info. - LSTM Deep Reinforced Learning - Don’t have the correct info - History of moves, Self feedback - Chess, GO Prepare Environment NGC - Download Containers for specific Framework - CUDA Libraries - (TF uses CUDA libs) Python Scripts Use Framework APIs - Read Data - Build Models - Feed Data into Model - Train Training/ Inference - Save Trained model - Push for Inferencing - Push back to adjust weights
  • 4. ML Model Training Flow Your AI application? Choose Framework or Tools - H2O - Matlab - R or SAS - RAPIDS - TensorFlow, Caffe2 Raw Data Set Data Preprocessing Labeling, Cleaning … Data Set Ready Choose Algorithm Depends on the application SVM - Support Vector Machine - Image Classification - Regression Analysis Random Forest - Ensemble Learning - For Classification - & Regression K-Means Clustering Clustering Lasso - Regression - Ex: Stock Price prediction Deep Learning Prepare Environment Python Scripts Framework APIs - Read Data - Build Models - Feed Data into Model - Train Training/ Inference - Save Trained model - Push for Inferencing - Push back to adjust weights
  • 5. Deployment Considerations Where? • Cloud • On-Premises • Hybrid Using? • AI Frameworks • Software Tools • APIs How? • Data Mgmt. • Training • Inference
  • 6. Data is Key, but also the Problem
  • 7. NetApp ONTAP AI Proven Architecture GPUs @ 95% util. ~600us latency Simple to Deploy Ansible Automation Containers support via NGC Performance and Scalability 300 GB/s Throughput 1 DGX to 100s of DGXs Integrated Data Pipeline Edge to core to cloud End-to-end data management
  • 8. Start Small, Scale Big * 1 AFF = 1 HA pair = 2 Nodes ** Based on ONTAP 9.4, NAS workloads Throughput ** Raw Capacity (Typical) Raw Capacity (w/ Expansion) Connectivity AFF A800 1 HA pair * 25GB/s 364.8TB 6.6PB 100GbE4 HA pairs 100GB/s 1.5PB 26.4PB 12 HA pairs 300GB/s 4.4PB 79.2PB AFF A700s 1 HA pair 18GB/s 367.2TB 6.6PB 40GbE4 HA pairs 72GB/s 1.5PB 26.4PB 12 HA pairs 216GB/s 4.4PB 79.2PB AFF A300 1 HA pair 9.7GB/s 182.4TB 11.7PB 40GbE4 HA pairs 38.8GB/s 729.6TB 46.8PB 12 HA pairs 116GB/s 2.2PB 140.4PB AFF A220 1 HA pair 5.3GB/s 91.2TB 4.4PB 10GbE4 HA pairs 21.2GB/s 364.8TB 17.6PB 12 HA pairs 63.6GB/s 1.1PB 52.8PB
  • 9. DL & ML on ONTAP AI 0 50 100 150 200 250 300 Local Storage (Raid) In DGX-1 Memory A800 Time (seconds) - Shorter is Better Data Load + Feature Engineering Data Conversion Machine Learning Training 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 1 2 4 6 7 Image/Sec # DGX-1 ResNet50 - Batch 256 Linear Scale Mem Cached A800 DL with image dataset – 1 to 7 node DGX-1 ML with NVIDIA RAPIDS using XGBoost
  • 10. CloudCoreEdge Ingest Data prep. Unified data lake Training cluster 1 2 3 Training sets Test Deployment IM3 IM2 IM1 Repo Analysis / Tiering • Aggregation • Normalization • Data collection • Edge-level AI • Exploration • Training • Deployment • Model serving • Cloud AI (GPU instances) • Data tiering Edge to Core to Cloud Seamless data management ONTAP Select Cloud Sync SnapMirror ONTAP AI FabricPool Cloud Sync Cloud Volumes Service NetApp Private Store
  • 11. NetApp Cloud Volumes Cloud native NFS & SMB file services with extreme performance Consistent storage capabilities cloud providers (AWS, Azure, and GCP) Any workload in the cloud From zero to 100 TB deployed in seconds Pay-as-you-go to shift expense from CAPEX to OPEX Data protection without application performance effect Cloning of DevOps workspaces instantly for faster releases Replication across geographies for robust collaboration*
  • 12. Cloud Volumes Service for AI • Maintains all GPUs at >95% utilization • Achieves maximum training ratesPerformance • Scale from 0 to 100TB in seconds • No specialized training requiredQuick • Integrates w/ NVIDIA GPU Cloud containers and AWS EC2 • Integration with other NetApp services like Cloud SyncIntegration • Easy on-demand deployment with API support • Single payment model through AWSEasy
  • 13. H2O.ai on NetApp Cloud Volumes
  • 14. Confidential14 Driverless AI Features Target Data Quality and Transformation Modeling Table Model Building Model Data Integration + Driverless AI: Automates Data Science and ML Workflows
  • 15. Infrastructure Modeling Deployment Data Sources Train Test Production Data Model Store Batch Scoring Real-time Scoring dev/ops On-Prem Cloud Machine Learning Workflow AI Architecture
  • 16. Modeling Deployment Data Sources Train Test Production Data Model Store Batch Scoring Real-time Scoring dev/ops Machine Learning Workflow AI Architecture Infrastructure On-Prem CloudHybrid
  • 17. Confidential17 Deploy and collaborate on multiple instances of H2O Driverless AI with NetApp Cloud Volumes What’s available today (as of Feb. 2019) – Multiple instances of Driverless AI launched with the same Cloud Volume mounted – Automatic snapshots: if Driverless AI goes down, all data and experiments are saved and snapshotted. – Easy start/stop of Driverless AI with shared experiments and data – Shared data access management Roadmap – True multi-node support across multiple Driverless AI instances – Hybrid cloud + on-prem deployments with shared experiments and datasets – Integration with Stackpoint to launch Driverless AI instances on kubernetes Integration of H2O Driverless AI with NetApp Cloud Volumes NetApp Cloud Volumes
  • 18. Technical Whitepapers  Solution brief SB-3939  ONTAP AI Reference architecture NVA-1121-design  ONTAP AI Deployment guide NVA-1121-deploy  Edge to Core to Cloud white paper WP-7271  AI with GPUs on AWS & Cloud Volumes Service TR-4718  Scalable AI Infrastructure WP-7267  Designing data pipeline for your AI workflows WP-7264  IDC Technology Spotlight paper  Cambridge Consultants success story AI Blogs  AI across industries: Manufacturing, Telecom, & Healthcare  Bridging the CPU and GPU Universes  Is Your Infrastructure Ready for AI Workflows in Production?  Accelerate I/O for Your Deep Learning Pipeline  Addressing AI Data Lifecycle Challenges with Data Fabric  Choosing an Optimal Filesystem for the AI Pipeline  NVIDIA GTC 2018: New GPUs and Data Storage for AI  Five Advantages of ONTAP AI for AI and Deep Learning  Deep Dive into ONTAP AI Performance and Sizing  Land of Robots meets AI at GTC Japan 2018 #NetAppAInetapp.com/aiResources
  • Related Search
    Similar documents
    View more
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks
    SAVE OUR EARTH

    We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

    More details...

    Sign Now!

    We are very appreciated for your Prompt Action!

    x