Skip to main content

Cloud Computing

Welcome to the world of cloud and hybrid computing with SchedMD!

In the dynamic world of high performance computing, cloud computing emerges as a game changer, providing unparalleled flexibility, scalability, and efficiency. Slurm seamlessly integrates HPC capabilities with the limitless possibilities of cloud technology, extending the potential of your environment.

High performance computing for Machine Learning - SchedMD
Cloud Computing Slurm Software - SchedMD

How Can Slurm Help Streamline My HPC Experience?

Slurm is not just an on-prem software solution, but can help sites harness the power of the Cloud. With auto-scaling capabilities, Slurm can automate elastic scaling of instances according to factors like queue depth and job requirements and spin resources down once idle time is reached.

Slurm can also support hybrid clusters to dynamically offload jobs using auto-scaling functionality. Configure existing Slurm controllers, without additional cluster requirements, to burst nodes into specific cloud projects.

Slurm for the Cloud

High throughput scalability icon

High Throughput & Scalability

Slurm can easily manage performance requirements for small , large , and massive cloud environment needs. Slurm outperforms competitive schedulers with consistent execution of 500 batch jobs per second. Pair Slurm’s scalability feature with cloud technology and your computing limits become unprecedented.

Complex business rules icon

Complex Business Rules

Slurm can map to complex business rules and existing organizational priorities, easily establish data governance policies and ensure compliance with industry standards. Our plugin-based architecture makes Slurm adaptable to on prem and cloud conditions to maintain the needs of your individual organization.

First class gpu icon

First Class GPUs

With first class resource management for GPUs, Slurm allows users to request GPU resources alongside CPUs. This flexibility ensures that jobs are executed quickly and efficiently, while maximizing resource utilization. Maintain this efficiency in both on-prem and cloud environments.

Take Your Computing to the Next Level

Join us on the journey to computational excellence – where innovation meets efficiency, and where Slurm becomes the catalyst for unlocking the full potential of your HPCl workflows. Welcome to a new era of performance and productivity with Slurm!

Praise for SchedMD Support

“We have been a SchedMD support customer for seven years. They’ve always given timely, high quality responses.”

Technical University of Denmark

Recent Articles & Publications

March 26, 2024

Slurm releases move to a six-month cycle

February 21, 2024

Common Questions About Slurm

February 21, 2024

How to Use Common Slurm Commands

Cloud Computing FAQs

What security measures does Slurm have in place?

With job and resource isolation capabilities, Slurm allows administrators to define partitions, ensuring that jobs run independently of one another. Partitions ensure sensitive research data is only processed and stored within designated and controlled environments. These isolations help prevent unauthorized access and reduce the risk of data leaks and tampering.

Other checkpoints include comprehensive logging and auditing which tracks user activity, ensuring accountability and traceability in data handling processes. Administrators can also enforce controls and limit access to sensitive research data based on user roles and permissions.

What documentation, training and support resources are available for admin and end users?

SchedMD has a number of services available including:

  • Support contracts
  • On-site trainings
  • Consultations hours
  • Custom development
  • Configuration Assistance
  • Migration Assistance/Proof of Concept

Administrators and users can review Slurm documentation and more information on SchedMD Services.

Does Slurm have any cloud/hybrid capabilities?

Cloud bursting in Slurm is a feature that allows a Slurm cluster to expand its computing resources into a cloud environment, meeting increased demand for computing resources. When the on-premises resources of a Slurm cluster are insufficient, bursting allows the cluster to temporarily extend its capacity by utilizing cloud resources. This can help organizations manage peak workloads without having to invest in and maintain additional physical hardware

Slurm can be configured to work with various cloud providers such as Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), and more. It uses cloud APIs to create and manage virtual machines (VMs) in the cloud.

Slurm ensures a seamless experience for users. If a job starts on the on-premises cluster and then needs to burst to the cloud, it can be migrated without user intervention.

How does Slurm utilize GPUs?

With first class resource management for GPUs, Slurm allows users to request GPU resources alongside CPUs. This flexibility ensures that jobs are executed quickly and efficiently, while maximizing resource utilization.

Slurm provides features and flexibility that allow for effective GPU resource management including resource allocation, scheduling policies, GPU partitioning, and GPU reporting and monitoring.

It’s important to note that the exact behavior of Slurm in managing GPUs can be customized through its configuration files and policies, making it flexible for various HPC cluster setups.

What monitoring and reporting features does Slurm offer?

Slurm has multiple features and commands in place to help administrators and end users monitor cluster activity, track resource utilization, diagnose performance issues, and integrate with monitoring systems. Learn more about features like squeue, scontrol, sinfo, and more in our Common Slurm Commands blog.

Does Slurm support containerized applications in Life Sciences research?

Slurm can support and interact with containers in various ways to manage and execute jobs efficiently.

Slurm supports multiple container runtimes (Docker, Singularity, Shifter) and can be integrated with container orchestrators (Kubernetes, Docker Swarm). Slurm will allocate resources based on job submission requirements and manage the execution of jobs within containers using the specified runtime. The integrated container orchestrator handles the deployment and management of containers across the cluster.

Containers provide isolation between jobs running on the same node, preventing interference and conflicts. Slurm ensures that containers are properly isolated and securely managed within the HPC cluster environment.

Slurm’s support for containers provides users with flexibility in managing and executing jobs in HPC environments, allowing them to leverage container technologies to enhance productivity and resource utilization.

How does Slurm integrate with my site’s existing software and industry tools?

Slurm utilizes REST API, opening a wide array of possibilities for a site’s HPC environment. REST API enables Slurm’s integration with existing software and industry preferred tools. Examples of REST API integrations include:

  • Workflow Management systems to orchestrate complex data processing pipelines
  • Data analytics platforms to efficiently distribute computational tasks across clusters, dynamically allocating and scaling resources based on workload demands.
  • Container orchestration tools to allow users to deploy containerized applications as jobs, manage resources allocation, and scale container instances.
  • Monitoring and logging systems to provide administrators with real-time insights on cluster performance, resource utilization, and job execution.

REST API serves as a versatile integration mechanism to enable seamless communication between SLurm and a wide array of tools, empowering users to leverage their HPC resources to the fullest power.

Organize Your Workload Efficiently & Smoothly with SchedMD

Take your efficiency to the next level with Slurm from SchedMD. We can’t wait to do amazing things with you.

Request a Technical Call Today
Slurm Workload Manager - Download Slurm - SchedMd