Senior ML Ops Engineer
- Published on
About the Role
Join the leader in entertainment innovation and help us design the future. At Dolby, science meets art, and high tech means more than computer code. As a member of the Dolby team, you'll see and hear the results of your work everywhere, from movie theaters to smartphones. We continue to revolutionize how people create, deliver, and enjoy entertainment worldwide. To do that, we need the absolute best talent. We offer a collegial culture, challenging projects, and excellent compensation and benefits, not to mention a Flex Work approach that is truly flexible to support where, when, and how you do your best work.
Responsibilities
- Troubleshooting high-performance computing, storage and networks for machine-learning workloads.
- Collaborate with research, development, and engineering to establish machine-learning and data management workflows and supporting tools and processes that maximize machine-learning activities and resource use.
- Improve capabilities of data set exploration, transformation, and overall data management of large to very large datasets.
- Collaborate with research and development to proactively iterate and fine-tune model training for best performance and efficient use of machine-learning resources.
- Collaborate with infrastructure teams, including physical compute, storage, and network infrastructure experts to enhance on-premise and cloud infrastructure.
- Improve the use of cloud compute and storage for global research teams while managing within budget.
Education and Experience
- BS or MS degree in Computer Science or equivalent experience.
- 4+ years of professional hands-on experience in machine learning operations or equivalent.
- Comprehensive knowledge of AWS and infrastructure-as-code techniques.
- Advanced proficiency with Python, Terraform, Cloud Formation, Ansible, git, and related technologies.
- Experience with machine learning and scaling workloads across both cloud and on-premise GPU server environments.
- Experience with managing and coordinating storage of large machine learning datasets.
- Proficiency in Kubernetes cluster design, deployment and management.
- An interest and understanding of industry trends in machine learning development techniques and tools and processes.
- Comprehensive knowledge of continuous integration and continuous release processes and tools.
Recommended Skills
- Exceptional understanding and practical experience in software and infrastructure configuration management with high-performance compute and storage, maximizing high-availability.
- An active collaborator who helps build a positive community with researchers, scientists, and engineers around machine learning operations and resources.
- AWS resource management and provisioning.
About the Company
Dolby Laboratories Inc. is seeking a Machine Learning Operations Engineer to join our Consumer Entertainment Group, enhancing audio and video experiences.