This interview was recorded for the GOTO Book Club. #GOTOcon #GOTObookclub
http://gotopia.tech/bookclub Read the full transcription of the interview here:
https://gotopia.tech/bookclub/episodes/234/Scaling-Machine-Learning-with-Spark
Adi Polak - VP of Developer Experience at Treeverse & Contributing to lakeFS OSS @polakadi
Holden Karau - Co-Author of "Kubeflow for Machine Learning" & many more books & Open Source Engineer at Netflix @HoldenKarau
RESOURCES
Adi
https://adipolak.substack.com
https://mastodon.online/@adipolak
https://blog.adipolak.com
https://www.linkedin.com/in/
-adi-polak-68548365
Holden
https://www.twitch.tv/holdenkarau
https://tech.lgbt/@holden
http://www.holdenkarau.com
DESCRIPTION
Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, author Adi Polak introduces data and ML practitioners to creative solutions that supersede today's traditional methods. You'll learn a more holistic approach that takes you beyond specific requirements and organizational goals--allowing data and ML practitioners to collaborate and understand each other better.
Scaling Machine Learning with Spark examines several technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLflow, TensorFlow, and PyTorch. If you're a data scientist who works with machine learning, this book shows you when and why to use each technology.
You will:
• Explore machine learning, including distributed computing concepts and terminology
• Manage the ML lifecycle with MLflow
• Ingest data and perform basic preprocessing with Spark
• Explore feature engineering, and use Spark to extract features
• Train a model with MLlib and build a pipeline to reproduce it
• Build a data system to combine the power of Spark with deep learning
• Get a step-by-step example of working with distributed TensorFlow
• Use PyTorch to scale machine learning and its internal architecture
* Book description: © O'Reilly:
https://www.oreilly.com/library/view/scaling-machine-learning/9781098106812
The interview is based on the book "Scaling Machine Learning with Spark": https://amzn.to/3ppdUkB
TIMECODES
00:00 Intro
02:25 Lead with the tools & resources you have
04:06 The Apache Spark ecosystem
08:44 Book chapter overview
12:22 Exploring the glue spaces in ML & data engineering
19:18 Navigating the trade-offs of distributed ML
29:37 Challenges of keeping up with Open Source software
35:22 Can 2e expect another book?
38:11 Outro
RECOMMENDED BOOKS
Adi Polak • Machine Learning with Apache Spark • https://amzn.to/3ppdUkB
Holden Karau, Trevor Grant, Boris Lublinsky, Richard Liu & Ilan Filonenko • Kubeflow for Machine Learning • https://amzn.to/3JVngcx
Holden Karau • Distributed Computing 4 Kids • https://www.distributedcomputing4kids.com
Holden Karau • Scaling Python with Dask • https://www.oreilly.com/library/view/scaling-python-with/9781098119867
Holden Karau & Boris Lublinsky • Scaling Python with Ray • https://amzn.to/44GU6cC
Holden Karau & Rachel Warren • High Performance Spark • https://amzn.to/3v2eLbn
Holden Karau, Konwinski, Wendell & Zaharia • Learning Spark • https://amzn.to/397e2NE
Holden Karau & Krishna Sankar • Fast Data Processing with Spark 2nd Edition • https://amzn.to/3xKhXKu
Holden Karau • Fast Data Processing with Spark 1st Edition • https://amzn.to/3rHQgOu
https://www.linkedin.com/company/goto-
https://www.facebook.com/GOTOConferences
#Spark #ApacheSpark #ML #MachineLearning #MLlib #TensorFlow #PyTortch #DataScience #AI #ComputerScience #AdiPolak #HoldenKarau #Programming #SoftwareEngineering
Looking for a unique learning experience?
Attend the next GOTO conference near you! Get your ticket at https://gotopia.tech
SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.
https://www.youtube.com/user/GotoConferences/?sub_confirmation=1
No comments:
Post a Comment