News

Slurm version 17.11.0 is now available

After 9 months of development and testing we are pleased to announce the availability of Slurm version 17.11.0!

As usual this can be downloaded from here.

Thanks again for all the help and support to get this out the door. It was fun to see most of you at SC17!

Slurm version 17.11.0rc3 is now available

We are pleased to announce the availability of Slurm version 17.11.0-0rc3 (release candidate 3).

The release candidate series reflects the end of feature development for each release, the finalization of the RPC layer, and will - except for bug fixes developed during the RC time frame - will be functionally identical to the 17.11.0 release when made available.

Please install and test this to help find any issues during the rc stage before the 17.11.0 release at the end of November.

Downloads are available here.

Slurm version 16.05.11, 17.02.9, and 17.11.0rc2 are now available

Slurm versions 16.05.11, 17.02.9 and 17.11.0rc2 are now available, and include a series of recent bug fixes as well as a fix for a recently discovered security vulnerability (CVE-2017-15566).

Downloads are available here.

Ryan Day (LLNL) reported an issue in SPANK environment variable handling that could allow any normal user to execute code as root during the Prolog or Epilog. All systems using a Prolog or Epilog script are vulnerable, regardless of whether SPANK plugins are in use.

This issue affects all Slurm versions from 15.08.0 (August 2015) to present. This issue was reported to SchedMD on October 16th. SchedMD customers were informed on October 17th and provided a patch on request. This is in keeping with our responsible disclosure process.

The only mitigation, aside from installing a patched version, is to disable both Prolog and Epilog settings on your system and restart all slurmd processes.

Release notes follow below. Please note that support for the 16.05 release series ends in November as support for the upcoming 17.11 release starts, and as such 16.05.11 will be the final maintenance update for that branch.

Note that 17.11.0rc2 is the second release candidate for the 17.11 series, and is not considered a stable release suited for production use. We do encourage sites to test this out, and report issues ahead of the 17.11.0 release in November.

Slurm version 17.11.0-0rc1 is now available

We are pleased to announce the availability of Slurm version 17.11.0-0rc1 (release candidate 1). This release marks the end of development on the next major version of Slurm, 17.11, only bug fixes will be added going forward to this branch.

We anticipate another rc release before a .0 is tagged in November. Please install and test this to help find any issues during the rc stage.

Slurm can be downloaded from here.

Slurm version 17.02.8 available

We are pleased to announce the release of Slurm version 17.02.8, which contains 42 bug fixes developed over the past two months.

Slurm can be downloaded from here.

Slurm versions 17.02.7 and 17.11.0-pre2 are now available

Slurm version 17.02.7 contains about 35 bug fixes developed over the past six weeks.

Slurm version 17.11.0-pre2 is the second pre-release of version 17.11, to be released in November 2017.

Slurm downloads are available from http://www.schedmd.com/#repos

Slurm version 17.02.6 is now available

Slurm version 17.02.6 is now available. It contains several bug fixes, including one which can result in communications between the slurmctld and slurmdbd daemons stopping.

Slurm downloads are available from http://www.schedmd.com/#repos

Slurm versions 17.02.5 and 17.11.0-pre1 are now available

Slurm version 17.02.5 contains 18 bug fixes developed over the past month.

Slurm version 17.11.0-pre1 is the first pre-release of version 17.11, to be released in November 2017. This version contains the support for scheduling of a workload across a set (federation) of clusters which is described in some detail here.

Slurm downloads are available from here.

Slurm version 17.02.4 is now available

We are pleased to announce the release of Slurm version 17.02.4, which contains about 40 fixes developed over the past month.

Slurm can be downloaded from here.

Slurm version 17.02.3 is now available

We are pleased to announce the release of Slurm version 17.02.3, which contains 40 bug fixes developed over the past month.

Slurm can be downloaded from here.

Slurm version 17.02.2 available

We are pleased to announce the release of Slurm version 17.02.2, which contains 49 bug fixes developed over the past month.

Slurm can be downloaded from here.

Slurm versions 17.02.1 and 16.05.10 are now available.

We are pleased to announce the release of versions 17.02.1 and 16.05.10.

Version 17.02.1 contains 19 bug fixes discovered over the past week including a deadlock in the slurmctld daemon. Version 16.05.10 contains 30 relatively minor bug fixes discovered over the past 5 weeks. Future changes to version 16.05 will be limited to more significant bugs with our focus being shifted to version 17.02.

Both versions can be downloaded from here.

Slurm version 17.02.0 is now available

After 9 months of development we are pleased to announce the availability of Slurm version 17.02.0.

For a description of what has changed please consult the RELEASE_NOTES file available in the source.

Slurm downloads are available here.

Slurm versions 16.05.9 and 17.02.0-0rc1 are now available

We are pleased to announce the availability of Slurm versions 16.05.9 and 17.02.0-0rc1 (release candidate 1).

16.05.9 contains around 25 rather minor bug fixes. Please upgrade at your leisure.

The rc release contains all of the features intended for release 17.02. Development has ended for this release and we are continuing with our testing phase which will most likely result in another rc before we tag 17.02.0 near the middle of February. A description of what this release contains is in the RELEASE_NOTES file available in the source. Your help in hardening this version is greatly appreciated. You are invited to download this version and assist in testing. As with all rc releases you should be able to install and not worry about protocol/state changes going forward with the version.

Slurm downloads are available from here.

Slurm versions 15.08.13, 16.05.8, and 17.02.0-pre4 are now available

Slurm versions 15.08.13, 16.05.8 and 17.02.0-pre4 are now available, and include a series of recent bug fixes as well as a fix for a recently discovered security vulnerability (CVE-2016-10030).

During a code review on a recent commit, a vulnerability was discovered in how the slurmd daemon informs users of a Prolog failure on a compute node. That vulnerability could allow a user to assume control of an arbitrary file on the system. Any exploitation of this is dependent on the user being able to cause or anticipate the failure (non-zero return code) of a Prolog script that their job would run on.

This issue affects all Slurm versions from 0.6.0 (September 2005) to present. This issue was discovered on December 16th. SchedMD customers were informed on December 21st and provided a version of the fix on request.

Workarounds to prevent exploitation of this is are to either disable your Prolog script, or modify it such that it always returns 0 ("success") and adjust it set the node as down using scontrol instead of relying on the slurmd to handle that automatically. If you do not have a Prolog set you are unaffected by this issue.

Downloads are available at here.

Slurm version 16.05.7 is now available

We are pleased to announce the immediate availability of Slurm 16.05.7. It contains about 40 relatively minor bug fixes.

Slurm downloads are available from here.

Slurm versions 16.05.6 and 17.02.0-pre3 are now available

Slurm version 16.05.6 is now available and includes around 40 bug fixes developed over the past month.

We have also made the third pre-release of version 17.02, which is under development and scheduled for release in February 2017.

Slurm downloads are available from here.

We are excited to see you all next month at SC16, please feel free to come by our booth #412.

The Slurm BoF will be Thursday, November 17th12:15pm - 1:15pm in room 355-E

More information about that can be found here.

Slurm versions 16.05.5 and 17.02.0-pre2 are now available

Slurm version 16.05.5 is now available and includes about 50 bug fixes developed over the past six weeks. We have also made the second pre-release of version 17.02, which is under development and scheduled for release in February 2017. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm version 16.05.4 is now available

Slurm version 16.05.4 is now available and includes about 30 bug fixes developed over the past few weeks.

Slurm downloads are available from here.

Slurm 16.05.3 and 17.02.0-pre1 are now available

Slurm version 16.05.3 is now available and includes about 30 bug fixes developed over the past few weeks. We have also released the first pre-release of version 17.02, which is under development and scheduled for release in February 2017. A description of the changes in each version is appended.

Slurm downloads are available from here.

Slurm versions 16.06.3 and 17.02.0-pre1 are now available

Slurm version 16.05.3 is now available and includes about 30 bug fixes developed over the past few weeks. We have also relesed the first pre-release of version 17.02, which is under development and scheduled for release in February 2017. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm version 16.05.2 is now available

Slurm version 16.05.2 is now available and includes 16 bug fixes developed over the past week, including two which can cause the slurmctld daemon to crash. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm version 16.05.1 is now available

Slurm version 16.05.1 is now available and includes 40 bug fixes developed over the past month. Slurm downloads are available from http://www.schedmd.com/#repos.

CFP: Slurm User Group Meeting

You are invited to submit an abstract of a tutorial, technical presentation or site report to be given at the Slurm User Group Meeting 2016. This event is sponsored and organized by SchedMD and the Greek Research and Technology Network (GRNET) and will be held in Athens, Greece on 26-27 September 2016. This international event is opened to those who want to: Learn more about Slurm, a highly scalable Resource Manager and Job Scheduler Share their knowledge and experience with other users and administrators Get detailed information about the latest features and developments Share requirements and discuss future developments Everyone who wants to present their own usage, developments, site report, or tutorial about Slurm is invited to send an abstract to slugc@schedmd.com

Slurm 16.05.0 and 15.08.12 are now available

We are pleased to announce the release of 16.05.0! It contains many new features and performance enhancements. Please read the RELEASE_NOTES file to get an idea of the new items that have been added. The online Slurm documentation has been updated to reflect this release.

We have also release one of the last tags of 15.08 in the form of 15.08.12.

Both versions can be downloaded from the normal spot.

Slurm version 16.05.0-rc2 available

Slurm version 16.05.0-rc2 (Release Candidate 2) is now available and includes about 11 bug fixes developed over the past week. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 15.08.11 and 16.05.0-rc1 now available

We are pleased to announce the availability of Slurm versions 15.08.11 and 16.05.0-rc1 (release candidate 1).

15.08.11 contains around 25 rather minor bug fixes. Please upgrade at your leisure.

The rc release contains all of the features intended for release 16.05. Development has ended for this release and we are continuing with our testing phase which will most likely result in another rc before we tag 16.05.0 near the end of the month. A description of what this release contains is in the RELEASE_NOTES file available in the source. Your help in hardening this version is greatly appreciated. You are invited to download this version and assist in testing.

Slurm downloads are available from here.

Slurm versions 15.08.10 now available

Slurm version 15.08.10 is now available and includes 10 bug fixes developed over the past week including a race condition that could cause the slurmctld daemon to crash. Details about the changes are listed in the distribution's NEWS file. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 15.08.9 and 16.05.0-pre2 now available

Slurm versions 15.08.9 and 16.05.0-pre2 now available Slurm version 15.08.9 is now available and includes about 40 bug fixes developed over the past six weeks. Details about the changes are listed in the distribution's NEWS file. Slurm version 16.05.0-pre2 is also available and includes new development for the next major release in May. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 15.08.8 and 16.05.0-pre1 now available

Slurm version 15.08.8 is now available and includes about 30 bug fixes developed over the past four weeks. Details about the changes are listed in the distribution's NEWS file. Slurm version 16.05.0-pre1 is also available and includes new development for the next major release in May. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm version 15.08.7 is now available

We are pleased to announce the availability of Slurm version 15.08.7. It contains 46 relatively minor bug fixes you may find interesting. Slurm downloads are available from here.

Slurm version 15.08.6 is now available

We are pleased to announce the availability of Slurm version 15.08.6. This release is primarily in response to the regression in 15.08.5 with respects to finding the lua library. It also contains a few other minor bug fixes you may find interesting. Slurm downloads are available from here.

We hope everyone has a great holiday and thanks for a great year!

Slurm version 15.08.5 is now available

We are pleased to announce the availability of Slurm version 15.08.5 which includes about 30 bug fixes developed over the past few weeks as listed below. Slurm downloads are available from here.

Slurm version 15.08.4 is now available

Slurm version 15.08.4 now available it includes about 25 bug fixes developed over the past couple of weeks.

One notable fix is found in commits 8e66e2677 and d72f132d42 which will fix a slurmctld bug in which a pending job array could be canceled by a user different from the owner or the administrator. This appears to exist in the 15.08.* as well as the 14.11.* branches.

It is recommended you update at your earliest convenience. If upgrading isn't an option generating a patch from those 2 commits is recommended.

Details about the changes are listed in the distribution's NEWS file. Slurm downloads are available from here.

See you all at SC15 next week, Slurm booth #1851!

Slurm version 15.08.3 now available

Slurm version 15.08.3 now available Slurm version 15.08.3 includes about 25 bug fixes developed over the past couple of weeks. Details about the changes are listed in the distribution's NEWS file. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm version 15.08.2 now available

Slurm version 15.08.2 includes about 40 bug fixes developed over the past four weeks. Details about the changes are listed in the distribution's NEWS file. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 15.08.1 is now available

We are pleased to announce the availability of Slurm version 15.08.1 with about 40 bug fixes to 15.08.0. A list of changes is available in the NEWS file. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 15.08.0 and 14.11.9 have been released!

We are pleased to announce the release of 15.08.0! It contains many new features and performance enhancements. Please read the RELEASE_NOTES file to get an idea of the new items that have been added. The on-line Slurm documentation has been updated to reflect this release.

We have also release one of the last tags of 14.11 in the form of 14.11.9.

Both versions can be downloaded from the normal spot here.

Slurm version 15.08.0-0rc1 is now available

We are pleased to announce the availability of Slurm version 15.08.0-rc1 (release candidate 1). This version contains all of the features intended for release 15.08 (with the exception of some minor burst buffer work) and we are moving into a testing phase. You are invited to download this version and assist in testing.

Slurm downloads are available from here.

If you would like to find out more about these new features and others, please join us at the Slurm User Group meeting.

Slurm versions 14.11.8 and 15.08.0-pre6 are now available

Slurm version 14.11.8 includes about 30 relatively minor bug fixes developed over the past seven weeks while version 15.08.0-pre6 contains new development scheduled for release next month. Details about the changes are listed in the distribution's NEWS file. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 14.11.7 and 15.08.0-0pre5 now available

Slurm version 14.11.7 is now available with quite a few bug fixes.

A development tag for 15.08 (pre5) has also been made. It represents the current state of Slurm development for the release planned in August 2015 and is intended for development and test purposes only. One notable enhancement included is the idea of Trackable Resources (TRES) for accounting for cpu, memory, energy, GRES, licenses, etc.

Both are available for download here.

Slurm versions 14.11.6 is now available

Slurm version 14.11.6 is now available with quite a few bug fixes. See the distribution's NEWS file for details. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 14.11.5 and 15.08.0-pre3 are now available

Version 14.11.5 contains quite a few bug fixes generated over the past five weeks including two high impact bugs. There is a fix for the slurmdbd daemon aborting if a node is set to a DOWN state and it's "reason" field is NULL. The other important bug fix will prevent someone from being able to kill a job array belonging to another user.

Version 15.08.0-pre3 represents the current state of Slurm development for the release planned in August 2015 and is intended for development and test purposes only. Notable enhancements include power capping support for Cray systems and add the ability for a compute node to be allocated to multiple jobs, but restricted to one user at a time.

Both versions can be downloaded from here.

Slurm version 14.11.4 and 15.08.0-pre2 are now availabl

Version 14.11.4 contains quite a few bug fixes generated over the past five weeks. Several of these are related to job arrays, including one that can cause the slurmctld daemon to abort.

Version 15.08.0-pre2 represents the current state of Slurm development for the released planned in August 2015 and is intended for development and test purposes only. It includes some development work for burst buffers, power management, and inter-cluster job dependencies.

Both versions can be downloaded from here.

Slurm versions 14.11.3 is now available

Slurm versions 14.11.3 is now available. Version 14.11.3 includes quite a few bug fixes, most of which are relatively minor. There were also a few more major issues fixed that previously would cause various daemons to seg fault in corner case scenarios.

It is encouraged anyone running 14.11 to upgrade to 14.11.3. It is also encouraged everyone else to do the same :).

The new tarball can be downloaded here.

Slurm version 14.11.2 and 15.08.0-pre1 are now available

Slurm versions 14.11.2 and 15.08.0-pre1 are now available. Version 14.11.2 includes quite a few relatively minor bug fixes.

Version 15.08.0 is under active development and its release is planned in August 2015. While this is the first pre-release there is already quite a bit of new functionality.

Both versions can be downloaded from here.

Slurm version 14.11.is now available

Slurm version 14.11.is now available. This includes a fix for a race condition that can deadlock the slurmctld daemon when job_submit plugins are used, plus a few minor changes as identified in the distribution's NEWS file. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 14.11.0 is now available

Slurm version 14.11.0 is now available. This is a major Slurm release with many new features. See the RELEASE_NOTES and NEWS files in the distribution for detailed descriptions of the changes, a few of which are noted below.

Upgrading from Slurm versions 2.6 or 14.03 should proceed without loss of jobs or other state. Just be sure to upgrade the slurmdbd first. (Upgrades from pre-releases of version 14.11 may result job loss.)

Slurm downloads are available from here.

Thanks to all those who helped make this release!

Highlights of changes in Slurm version 14.11.0 include:
-- Added job array data structure and removed 64k array size restriction.
-- Added support for reserving CPUs and/or memory on a compute node for system use.
-- Added support for allocation of generic resources by model type for heterogeneous systems (e.g. request a Kepler GPU, a Tesla GPU, or a GPU of any type).
-- Added support for non-consumable generic resources that are limited, but can be shared between jobs.
-- Added support for automatic job requeue policy based on exit value.
-- Refactor job_submit/lua interface. LUA FUNCTIONS NEED TO CHANGE! The lua script no longer needs to explicitly load meta-tables, but information is available directly using names slurm.reservations, slurm.jobs, slurm.log_info, etc. Also, the job_submit.lua script is reloaded when updated without restarting the slurmctld daemon.
-- Eliminate native Cray specific port management. Native Cray systems must now use the MpiParams configuration parameter to specify ports to be used for communications. When upgrading Native Cray systems from version 14.03, all running jobs should be killed and the switch_cray_state file (in SaveStateLocation of the nodes where the slurmctld daemon runs) must be explicitly deleted.

Slurm versions 14.03.10 and 14.11.0-rc3 are now available

Slurm version 14.03.10 includes quite a few relatively minor bug fixes, and will most likely be the last 14.03 release. Thanks to all those who helped make this a very stable release.

We hope to officially tag 14.11.0 before SC14. Version 14.11.0-rc3 includes a few bug fixes discovered in recent testing but is looking very stable. Thanks to everyone participating in the testing! If you can, please test this release so we can attempt to fix as many issues as we can before we tag 14.11.0.

Just a heads up, version 15.08 is already starting development we will most likely tag a pre1 of this later this month.

Slurm downloads are available from here.

Slurm versions 14.03.9 and 14.11.0-rc2 are now available

Version 14.03.9 includes quite a few relatively minor bug fixes. Version 14.11.0-rc2 includes a few bug fixes discovered in recent testing. Thanks to everyone participating in the testing! Version 14.11.0 is no longer under active development, but is undergoing testing for a planned release in early November. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 14.03.8 and 14.11.0-pre5 are now available

Slurm versions 14.03.8 and 14.11.0-pre5 are now available. Version 14.03.8 includes quite a few relatively minor bug fixes.

Version 14.11.0 is under active development and its release is planned in November 2014. Much of its features and performance enhancements will be discussed next week at SLUG 2014 in Lugano Switzerland.

Note to all developers, code freeze for new features in 14.11 will be at the end of this month (September).

Slurm downloads are available here.

Slurm versions 14.03.7 and 14.11.0-pre4 are now available

Slurm versions 14.03.7 and 14.11.0-pre4 are now available. Version 14.03.7 includes quite a few relatively minor bug fixes. Version 14.11.0-pre4 includes a new job array data structure and APIs for managing job arrays. These changes provide vastly improved scalability with respect to job arrays. Version 14.11.0 is under active development and its release is planned in November 2014. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 14.03.6 and 14.11.0-pre3 are now available

Slurm versions 14.03.6 and 14.11.0-pre3 are now available. Version 14.03.6 includes includes a couple of bug fixes, including a bug related to generic resources that can result in the slurmctld daemon aborting. Version 14.11.0-pre3 includes some performance and scalability enhancements plus some new job prioritization options. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 14.03.5 and 14.11.0-pre2 are now available

Slurm versions 14.03.5 and 14.11.0-pre2 are now available. Version 14.03.5 includes about 40 relatively minor bug fixes and enhancements as described below.
Version 14.11.0-pre2 is the second pre-release of the next major release of Slurm scheduled for November 2014. This is very much a work in progress and not intended for production use.

Slurm downloads are available from http://www.schedmd.com/#repos.

Highlights of changes in Slurm version 14.03.5 include:

  • If a srun runs in an exclusive allocation and doesn't use the entire allocation and CR_PACK_NODES is set layout tasks appropriately.
  • Correct Shared field in job state information seen by scontrol, sview, etc.
  • Print Slurm error string in scontrol update job and reset the Slurm errno before each call to the API.
  • Fix task/cgroup to handle -mblock:fcyclic correctly.
  • Fix for core-based advanced reservations where the distribution of cores across nodes is not even.
  • Fix issue where association maxnodes wouldn't be evaluated correctly if a QOS had a GrpNodes set.
  • GRES fix with multiple files defined per line in gres.conf.
  • When a job is requeued make sure accounting marks it as such.
  • Print the state of requeued job as REQUEUED.
  • Fix if a job's partition was taken away from it don't allow a requeue.
  • Make sure we lock on the conf when sending slurmd's conf to the slurmstepd.
  • Fix issue with sacctmgr 'load' not able to gracefully handle bad formatted file.
  • sched/backfill: Correct job start time estimate with advanced reservations.
  • Error message added when in proctrack/cgroup the step freezer path isn't able to be destroyed for debug.
  • Added extra index's into the database for better performance when deleting users.
  • Fix issue with wckeys when tracking wckeys, but not enforcing them, you could get multiple '*' wckeys.
  • Fix bug which could report to squeue the wrong partition for a running job that is submitted to multiple partitions.
  • Report correct CPU count allocated to job when allocated whole node even if not using all CPUs.
  • If job's constraints cannot be satisfied put it in pending state with reason BadConstraints and don't remove it.
  • sched/backfill - If job started with infinite time limit, set its end_time one year in the future.
  • Clear record of a job's gres when requeued.
  • Clear QOS GrpUsedCPUs when resetting raw usage if QOS is not using any cpus.
  • Remove log message left over from debugging.
  • When using CR_PACK_NODES fix make --ntasks-per-node work correctly.
  • Report correct partition associated with a step if the job is submitted to multiple partitions.
  • Fix to allow removing of preemption from a QOS.
  • If the proctrack plugins fail to destroy the job container print an error message and avoid to loop forever, give up after 120 seconds.
  • Make srun obey POSIX convention and increase the exit code by 128 when the process terminated by a signal.
  • Sanity check for acct_gather_energy/rapl.
  • If the sbatch command specifies the option --signal=B:signum sent the signal to the batch script only.
  • If we cancel a task and we have no other exit code send the signal and exit code.
  • Added note about InnoDB storage engine being used with MySQL.
  • Set the job exit code when the job is signaled and set the log level to debug2() when processing an already completed job.
  • Reset diagnostics time stamp when "sdiag --reset" is called.
  • squeue and scontrol to report a job's "shared" value based upon partition options rather than reporting "unknown" if job submission does not use --exclusive or --shared option.
  • task/cgroup - Fix cpuset binding for batch script.
  • sched/backfill - Fix anomaly that could result in jobs being scheduled out of order.
  • Expand pseudo-terminal size data structure field sizes from 8 to 16 bits.
  • Set the job exit code when the job is signaled and set the log level to debug2() when processing an already completed job.
  • Distinguish between two identical error messages.
  • If using accounting_storage/mysql directly without a DBD fix issue with start of requeued jobs.
  • If a job fails because of batch node failure and the job is requeued and an epilog complete message comes from that node do not process the batch step information since the job has already been requeued because the epilog script running isn't guaranteed in this situation.
  • Change message to note a NO_VAL for return code could of come from node failure as well as interactive user.
  • Modify test4.5 to only look at one partition instead of all of them.
  • Fix sh5util -u to accept username different from the user that runs the command.
  • Corrections to man pages:salloc.1 sbatch.1 srun.1 nonstop.conf.5 slurm.conf.5.
  • Restore srun --pty resize ability.
  • Have sacctmgr dump cluster handle situations where users or such have special characters in their names like ':'.


Highlights of changes in Slurm version 14.11.0pre2 (pre-release) include:
  • Added AllowSpecResourcesUsage configuration parameter in slurm.conf. This allows jobs to use specialized resources on nodes allocated to them if the job designates --core-spec=0.
  • Add new SchedulerParameters option of build_queue_timeout to throttle how much time can be consumed building the job queue for scheduling.
  • Added HealthCheckNodeState option of "cycle" to cycle through the compute nodes over the course of HealthCheckInterval rather than running all at the same time.
  • Add job "reboot" option for Linux clusters. This invokes the configured RebootProgram to reboot nodes allocated to a job before it begins execution.
  • Added squeue -O/--Format option that makes all job and step fields available for printing.
  • Improve database slurmctld entry speed dramatically.
  • Add "CPUs" count to output of "scontrol show step".
  • Add support for lua5.2.
  • scancel -b signals only the batch step neither any other step nor any children of the shell script.
  • MySQL - enforce NO_ENGINE_SUBSTITUTION.
  • Added CpuFreqDef configuration parameter in slurm.conf to specify the default CPU frequency and governor to be set at job end.
  • Added support for job email triggers: TIME_LIMIT, TIME_LIMIT_90 (reached 90% of time limit), TIME_LIMIT_80 (reached 80% of time limit), and TIME_LIMIT_50 (reached 50% of time limit). Applies to salloc, sbatch and srun commands.
  • In slurm.conf add the parameter SrunPortRange=min-max. If this is configured then srun will use its dynamic ports only from the configured range.
  • Make debug_flags 64 bit to handle more flags.

Slurm versions 14.03.4 and 14.11.0-pre1 are now available

Slurm versions 14.03.4 and 14.11.0-pre1 are now available. Version 14.03.4 includes about 40 relatively minor bug fixes and enhancements as described below. Of particular note, there are several enhancements to control layout of tasks across resources and significant performance improvements for backfill scheduling.
Version 14.11.0-pre1 is the first pre-release of the next major release of Slurm scheduled for November 2014. This is very much a work in progress and not intended for production use.

Slurm downloads are available from http://www.schedmd.com/#repos.

Highlights of changes in Slurm version 14.03.4 include:

  • Fix issue where not enforcing QOS but a partition either allows or denies them.
  • CRAY - Make switch/cray default when running on a Cray natively.
  • CRAY - Make job_container/cncu default when running on a Cray natively.
  • Disable job time limit change if it's preemption is in progress.
  • Correct logic to properly enforce job preemption GraceTime.
  • Fix sinfo -R to print each down/drained node once, rather than once per partition.
  • If a job has non-responding node, retry job step create rather than returning with DOWN node error.
  • Support SLURM_CONF path which does not have "slurm.conf" as the file name.
  • CRAY - make job_container/cncu default when running on a Cray natively
  • Fix issue where batch cpuset wasn't looked at correctly in jobacct_gather/cgroup.
  • Correct squeue's job node and CPU counts for requeued jobs.
  • Correct SelectTypeParameters=CR_LLN with job selecition of specific nodes.
  • Only if ALL of their partitions are hidden will a job be hidden by default.
  • Run EpilogSlurmctld for a job is killed during slurmctld reconfiguration.
  • Close window with srun if waiting for an allocation and while printing something you also get a signal which would produce deadlock.
  • Add SelectTypeParameters option of CR_PACK_NODES to pack a job's tasks tightly on its allocated nodes rather than distributing them evenly across the allocated nodes.
  • cpus-per-task support: Try to pack all CPUs of each tasks onto one socket. Previous logic could spread the tasks CPUs across multiple sockets.
  • Add new distribution method fcyclic so when a task is using multiple cpus it can bind cyclically across sockets.
  • task/affinity - When using --hint=nomultithread only bind to the first thread in a core.
  • Make cgroup task layout (block | cyclic) method mirror that of task/affinity.
  • If TaskProlog sets SLURM_PROLOG_CPU_MASK reset affinity for that task based on the mask given.
  • Keep supporting 'srun -N x --pty bash' for historical reasons.
  • If EnforcePartLimits=Yes and QOS job is using can override limits, allow it.
  • Fix issues if partition allows or denies account's or QOS' and either are not set.
  • If a job requests a partition and it doesn't allow a QOS or account the job is requesting pend unless EnforcePartLimits=Yes. Before it would always kill the job at submit.
  • Fix format output of scontrol command when printing node state.
  • Improve the clean up of cgroup hierarchy when using the jobacct_gather/cgroup plugin.
  • Added SchedulerParameters value of Ignore_NUMA.
  • Fix issues with code when using automake 1.14.1.
  • select/cons_res plugin: Fix memory leak related to job preemption.
  • After reconfig rebuild the job node counters only for jobs that have not finished yet, otherwise if requeued the job may enter an invalid COMPLETING state.
  • Do not purge the script and environment files for completed jobs on slurmctld reconfiguration or restart (they might be later requeued).
  • scontrol now accepts the option job=xxx or jobid=xxx for the requeue, requeuehold and release operations.
  • task/cgroup - fix to bind batch job in the proper CPUs.
  • Added strigger option of -N, --noheader to not print the header when displaying a list of triggers.
  • Modify strigger to accept arguments to the program to execute when an event trigger occurs.
  • Attempt to create duplicate event trigger now generates ESLURM_TRIGGER_DUP ("Duplicate event trigger").
  • Treat special characters like %A, %s etc. literally in the file names when specified escaped e.g. sbatch -o /home/zebra\\%s will not expand %s as the stepid of the running job.
  • CRAYALPS - Add better support for CLE 5.2 when running Slurm over ALPS.
  • Test time when job_state file was written to detect multiple primary slurmctld daemons (e.g. both backup and primary are functioning as primary and there is a split brain problem).
  • Fix scontrol to accept update jobid=# numtasks=#
  • If the backup slurmctld assumes primary status, then do NOT purge any job state files (batch script and environment files) and do not re-use them. This may indicate that multiple primary slurmctld daemons are active (e.g. both backup and primary are functioning as primary and there is a split brain problem).
  • Set correct error code when requeuing a completing/pending job.
  • When checking for if dependency of type afterany, afterok and afternotok don't clear the dependency if the job is completing.
  • Cleanup the JOB_COMPLETING flag and eventually requeue the job when the last epilog completes, either slurmd epilog or slurmctld epilog, whichever comes last.
  • When attempting to requeue a job distinguish the case in which the job is JOB_COMPLETING or already pending.
  • When reconfiguring the controller don't restart the slurmctld epilog if it is already running.
  • Email messages for job array events print now use the job ID using the format "#_# (#)" rather than just the internal job ID.
  • Set the number of free licenses to be 0 if the global license count decreases and total is less than in use.
  • Add DebugFlag of BackfillMap. Previously a DebugFlag value of Backfill logged information about what it was doing plus a map of expected resouce use in the future. Now that very verbose resource use map is only logged with a DebugFlag value of BackfillMap.
  • Fix slurmstepd core dump.
  • Modify the description of -E and -S option of sacct command as point in time 'before' or 'after' the database records are returned.
  • Correct support for partition with Shared=YES configuration.
  • If job requests --exclusive then do not use nodes which have any cores in an advanced reservation. Also prevents case where nodes can be shared by other jobs.
  • For "scontrol --details show job" report the correct CPU_IDs when thre are multiple threads per core (we are translating a core bitmap to CPU IDs).


Highlights of changes in Slurm version 14.03.11-pre1 include:
  • Modify sdiag to report Slurm RPC traffic by user, type, count and time consumed.
  • Add support for allocation of GRES by model type for heterogeneous systems (e.g. request a Kepler GPU, a Tesla GPU, or a GPU of any type).
  • Modify squeue --start option to print the nodes expected to be used for pending job (in addition to expected start time, etc.).
  • Add support for non-consumable generic resources for resources that are limited, but can be shared between jobs.
  • Introduce automatic job requeue policy based on exit value. See RequeueExit and RequeueExitHold descriptions in slurm.conf man page.
  • Modify slurmd to cache launched job IDs for more responsive job suspend and gang scheduling.
  • Add srun --cpu-freq options to set the CPU governor (OnDemand, Performance, PowerSave or UserSpace).
  • Add support for a job step's CPU governor and/or frequency to be reset on suspend/resume (or gang scheduling). The default for an idle CPU will now be "ondemand" rather than "userspace" with the lowest frequency (to recover from hard slurmd failures and support gang scheduling).
  • Replace round-robin front-end node selection with least-loaded algorithm.
  • Add new node configuration parameters CoreSpecCount, CPUSpecList and MemSpecLimit which support the reservation of resources for system use with Linux cgroup.
  • Cray/ALPS system - Enable backup controller to run outside of the Cray to accept new job submissions and most other operations on the pending jobs.
  • sview - Better job_array support.
  • Provide more precise error message when job allocation can not be satisfied (e.g. memory, disk, cpu count, etc. rather than just "node configuration not available").

Slurm 14.03.3 is now available

We are pleased to announce that Slurm 14.03.3 is now available at http://www.schedmd.com/#repos.
Changes in Slurm 14.03.3

  • Fix perlapi to compile correctly with perl 5.18.
  • Correction to default batch output file name. In version 14.03.2 was using "slurm__4294967294.out" due to error in job array logic.
  • In slurm.spec file, replace "Requires cray-MySQL-devel-enterprise" with "Requires mysql-devel".
  • Switch/nrt - On switch resource allocation failure, free partial allocation.
  • Switch/nrt - Properly track usage of CAU and RDMA resources with multiple tasks per compute node.
  • Fix issue where user is requesting --acctg-freq=0 and no memory limits.
  • BGQ - Temp fix issue where job could be left on job_list after it finished.
  • BGQ - Fix issue where limits were checked on midplane counts instead of cnode counts.
  • BGQ - Move code to only start job on a block after limits are checked.
  • Handle node ranges better when dealing with accounting max node limits.

Slurm version 14.03.2 is now available

We are Please to announce Slurm 14.03.2 available here.

Please upgrade at your earliest convenience. Refer to the NEWS file for list of changes.

Slurm version 14.03.1 is now available.

Slurm version 14.03.1 is now available. This release includes a four weeks of bug fixes since the release of version 14.03.0. Upgrading from Slurm versions 2.5 or 2.6 should proceed without loss of jobs or other state. Just be sure to upgrade the slurmdbd first. (Upgrades from pre-releases of version 14.03 may result job loss.)

Slurm downloads are available from http://www.schedmd.com/#repos.

Highlights of changes in Slurm version 14.03.1 include:

  • Add support for job std_in, std_out and std_err fields in Perl API.
  • Add "Scheduling Configuration Guide" web page.
  • BGQ - fix check for jobinfo when it is NULL.
  • Do not check cleaning on "pending" steps.
  • task/cgroup plugin - Fix for building on older hwloc (v1.0.2).
  • In the PMI implementation by default don't check for duplicate keys. Set the SLURM_PMI_KVS_DUP_KEYS if you want the code to check for duplicate keys.
  • Add job submission time to squeue.
  • Permit user root to propagate resource limits higher than the hard limit slurmd has on that compute node has (i.e. raise both current and maximum limits).
  • Fix issue with license used count when doing an scontrol reconfig.
  • Fix the PMI iterator to not report duplicated keys.
  • Fix issue with sinfo when -o is used without the %P option.
  • Rather than immediately invoking an execution of the scheduling logic on every event type that can enable the execution of a new job, queue its execution. This permits faster execution of some operations, such as modifying large counts of jobs, by executing the scheduling logic less frequently, but still in a timely fashion.
  • If the environment variable is greater than MAX_ENV_STRLEN don't set it in the job env otherwise the exec() fails.
  • Optimize scontrol hold/release logic for job arrays.
  • Modify srun to report an exit code of zero rather than nine if some tasks exit with a return code of zero and others are killed with SIGKILL. Only an exit code of zero did this.
  • Fix a typo in scontrol man page.
  • Avoid slurmctld crash getting job info if detail_ptr is NULL.
  • Fix sacctmgr add user where both defaultaccount and accounts are specified.
  • Added SchedulerParameters option of max_sched_time to limit how long the main scheduling loop can execute for.
  • Added SchedulerParameters option of sched_interval to control how frequently the main scheduling loop will execute.
  • Move start time of main scheduling loop timeout after locks are aquired.
  • Add squeue job format option of "%y" to print a job's nice value.
  • Update scontrol update jobID logic to operate on entire job arrays.
  • Fix PrologFlags=Alloc to run the prolog on each of the nodes in the allocation instead of just the first.
  • Fix race condition if a step is starting while the slurmd is being restarted.
  • Make sure a job's prolog has ran before starting a step.
  • BGQ - Fix invalid memory read when using DefaultConnType in the bluegene.conf.
  • Make sure we send node state to the DBD on clean start of controller.
  • Fix some sinfo and squeue sorting anomalies due to differences in data types.
  • Only send message back to slurmctld when PrologFlags=Alloc is used on a Cray/ALPS system, otherwise use the slurmd to wait on the prolog to gate the start of the step.
  • Remove need to check PrologFlags=Alloc in slurmd since we can tell if prolog has ran yet or not.
  • Fix squeue to use a correct macro to check job state.
  • BGQ - Fix incorrect logic issues if MaxBlockInError=0 in the bluegene.conf.
  • priority/basic - Insure job priorities continue to decrease when jobs are submitted with the --nice option.
  • Make the PrologFlag=Alloc work on batch scripts.
  • Make PrologFlag=NoHold (automatically sets PrologFlag=Alloc) not hold in salloc/srun, instead wait in the slurmd when a step hits a node and the prolog is still running.
  • Added --cpu-freq=highm1 (high minus one) option.
  • Expand StdIn/Out/Err string length output by "scontrol show job" from 128 to 1024 bytes.
  • squeue %F format will now print the job ID for non-array jobs.
  • Use quicksort for all priority based job sorting, which improves performance significantly with large job counts.
  • If a job has already been released from a held state ignore successive release requests.
  • Fix srun/salloc/sbatch man pages for the --no-kill option.
  • Add squeue -L/--licenses option to filter jobs by license names.
  • Handle abort job on node on front end systems without core dumping.
  • Fix dependency support for job arrays.
  • When updating jobs verify the update request is not identical to the current settings.
  • When sorting jobs and priorities are equal sort by job_id.
  • Do not overwrite existing reason for node being down or drained.
  • Requeue batch job if Munge is down and credential can not be created.
  • Make _slurm_init_msg_engine() tolerate bug in bind() returning a busy ephemeral port.
  • Don't block scheduling of entire job array if it could run in multiple partitions.
  • Introduce a new debug flag Protocol to print protocol requests received together with the remote IP address and port.
  • CRAY - Set up the network even when only using 1 node.
  • CRAY - Greatly reduce the number of error messages produced from the task plugin and provide more information in the message.
  • Slurm version 14.03.0 is now available

    Slurm version 14.03.0 is now available. This is a major Slurm release with many new features. See the RELEASE_NOTES and NEWS files in the distribution for detailed descriptions of the changes, a few of which are noted below. Upgrading from Slurm versions 2.5 or 2.6 should proceed without loss of jobs or other state. Just be sure to upgrade the slurmdbd first. (Upgrades from pre-releases of version 14.03 may result job loss.) Slurm downloads are available from http://www.schedmd.com/#repos. Highlights of changes in Slurm version 14.03.0 include:

  • Added support for native Slurm operation on Cray systems (without ALPS).
  • Added partition configuration parameters AllowAccounts, AllowQOS, DenyAccounts and DenyQOS to provide greater control over use.
  • Added the ability to perform load based scheduling. Allocating resources to jobs on the nodes with the largest number if idle CPUs.
  • Added support for reserving cores on a compute node for system services (core specialization)
  • Add mechanism for job_submit plugin to generate error message for srun, salloc or sbatch to stderr.
  • Added new structures and support for both server and cluster global resources (e.g. license mechanism).
  • Support for Postgres database has long since been out of date and problematic, so it has been removed entirely. If you would like to use it the code still exists in <= 2.6, but will not be included in this and future versions of the code.
  • Significant performance improvements, especially with respect to job array support.
  • Slurm versions 2.6.7 and 14.03.0-rc1 are now available

    We are pleased to announce the availability of Slurm version 2.6.7, plus version 14.03.0-rc1 (release candidate 1). We plan to release version 14.03.0 by the end of the month. See the "RELEASE_NOTES" file in the distribution for a description of the major changes in version 14.03.

    This will most likely be the last 2.6 release. 14.03 code has been frozen for development and will only get bug fixes from here on out. Thanks to all those that have contributed to the effort!

    The Slurm distributions are available from: here

    Slurm versions 2.6.6 and 14.03.0-pre6 are now available

    Slurm version 2.6.6 with a multitude of bug fixes is now available. We are also making available version 14.03.0-pre6 with more development work for the next major release. See the NEWS file in the distribution for detailed descriptions of the changes. Downloads are available here.

    Slurm versions 2.6.5 and 14.03.0-pre5 are now available

    Slurm version 2.6.5 with a multitude of bug fixes is now available. We are also making available version 14.03.0-pre5 with more development work for the next major release. A summary of changes is found in the NEWS file of the tarball. Downloads are available from here.

    Slurm versions 2.6.4 and 13.12.0-pre4 are now available

    Slurm version 2.6.4 with a multitude of bug fixes plus some new development to better support Torque/PBS commands and options is now available. We are also making available version 13.12.0-pre4 with more development work for the next major release. See the NEWS file in the distribution for detailed descriptions of the changes. Downloads are available from http://www.schedmd.com/#repos.

    Slurm versions 2.6.3 and 13.12.0-pre3 are now available

    Slurm version 2.6.3 with a multitude of bug fixes plus some new development to better support Torque/PBS commands and options is now available. We are also making available version 13.12.0-pre3 with more development work for the next major release. See the NEWS file in the distribution for detailed descriptions of the changes. Downloads are available from http://www.schedmd.com/#repos.

    Slurm versions 2.6.2 and 13.12.0-pre2 are now available

    We are pleased to announce the availability of Slurm version 2.6.2 (with various bug fixes) and 13.12.0-pre2 (with second installment of development for the next major release). Downloads are available from http://www.schedmd.com/#repos. Highlights of changes in Slurm version 2.6.2 include:

    • Fix issue with reconfig and GrpCPURunMins
    • Fix of wrong node/job state problem after reconfig
    • Allow users who are coordinators update their own limits in the accounts they are coordinators over.
    • BackupController - Make sure we have a connection to the DBD first thing to avoid it thinking we don't have a cluster name.
    • Correct value of min_nodes returned by loading job information to consider the job's task count and maximum CPUs per node.
    • If running jobacct_gather/none fix issue on unpacking step completion.
    • Reservation with CoreCnt: Avoid possible invalid memory reference.
    • sjstat - Add man page when generating rpms.
    • Make sure GrpCPURunMins is added when creating a user, account or QOS with sacctmgr.
    • Fix for invalid memory reference due to multiple free calls caused by job arrays submitted to multiple partitions.
    • Enforce --ntasks-per-socket=1 job option when allocating by socket.
    • Validate permissions of key directories at slurmctld startup. Report anything that is world writable.
    • Improve GRES support for CPU topology. Previous logic would pick CPUs then reject jobs that can not match GRES to the allocated CPUs. New logic first filters out CPUs that can not use the GRES, next picks CPUs for the job, and finally picks the GRES that best match those CPUs.
    • Switch/nrt - Prevent invalid memory reference when allocating single adapter per node of specific adapter type
    • CRAY - Make Slurm work with CLE 5.1.1
    • Fix segfault if submitting to multiple partitions and holding the job.
    • Use MAXPATHLEN instead of the hardcoded value 1024 for maximum file path lenghts.
    • If OverTimeLimit is defined, do not declare failed those jobs that ended in the OverTimeLimit interval.

    Slurm versions 2.6.1 and 13.12.0-pre1 are now available

    Slurm versions 2.6.1 and 13.12.0-pre1 are now available We are pleased to announce the availability of Slurm version 2.6.1 (with various bug fixes) and 13.12.0-pre1 (with first installment of development for the next major release). Downloads are available from http://www.schedmd.com/#repos.

    Slurm version 2.6.0 available

    We are pleased to announce the availability of Slurm version 2.6. Changes from version 2.5 are extensive and highlights are listed below. Please see the RELEASE_NOTES file in the Slurm distribution for more details. Note the Slurm documentation at schedmd.com has been updated to version 2.6. Highlights of changes in Slurm version 2.6 include:

    • Added support for job arrays, which increases performance and ease of use for sets of similar jobs. This may necessitate changes in prolog and/or epilog scripts due to change in the job ID format, which is now of the form "_" for job arrays.
    • Added support for job profiling to periodically capture each task's CPU use, memory use, power consumption, Lustre use and Infiniband network use.
    • Added support for generic external sensor plugins which can be used to capture temperature and power consumption data.
    • Added mpi/pmi2 plugin with much more scalable performance for MPI implementations using PMI communications interface.
    • Added prolog and epilog support for advanced reservations.
    • Much faster throughput for job step execution with --exclusive option. The srun process is notified when resources become available rather than periodic polling.
    • Advanced reservations with hostname and core counts now supports asymetric reservations (e.g. specific different core count for each node).
    • Added slurmctld/dynalloc plugin for MapReduce+ support. New versions of OpenMPI and MapReduce are required to enable this functionality.
    • Make sched/backfill the default scheduling plugin rather than sched/builtin (FIFO).

    Slurm version 2.6.0-rc2 is now available

    We are pleased to announce the availability of Slurm version 2.6.0-rc2 (release candidate 2).

    We plan to release version 2.6.0 very soon. See the "RELEASE_NOTES" file in the distribution for a description of the major changes in version 2.6.

    A great way to find out about Slurm development is to attend the Slurm User Group Meeting, September 18 - 19 in Oakland, California, USA.

    The Slurm distributions are available from here.

    Slurm versions 2.5.7 and 2.6.0-rc1 are now available

    We are pleased to announce the availability of Slurm version 2.5.7 plus version 2.6.0-rc1 (release candidate 1).

    We plan to release version 2.6.0 after more testing. See the "RELEASE_NOTES" file in the distribution for a description of the major changes in version 2.6.

    A great way to find out about Slurm development is to attend the Slurm User Group Meeting, September 18 - 19 in Oakland, California, USA.

    The Slurm distributions are available from here.

    Slurm versions 2.5.6

    We have just found a regression in 2.5.5 if using the mysql database for accounting along with GRES. There for we tagged a 2.5.6 with the fix.

    You can download it from here.

    This bug only exists in 2.5.5 and 2.6.0-0pre3 systems. Those running 2.6 pre releases are advised to patch there code base or just do a pull from github.

    A simple patch is found here.

    2.5.6 also contains a patch dealing with requeuing jobs that use GRES as well.

    Slurm versions 2.5.5 and 2.6.0-pre3 are now available

    Feel free to update from here. There has been quite a few changes for Cray and BGQ systems, so anyone running them should take a serious look. As always it is a good idea to run with the latest on any system though.

    * Changes in Slurm 2.5.5
    ========================
    -- Fix for sacctmgr add qos to handle the 'flags' option.
    -- Export SLURM_ environment variables from sbatch, even if "--export"
    option does not explicitly list them.
    -- If node is in more than one partition, correct counting of allocated CPUs.
    -- If step requests more CPUs than possible in specified node count of job
    allocation then return ESLURM_TOO_MANY_REQUESTED_CPUS rather than
    ESLURM_NODES_BUSY and retrying.
    -- CRAY - Fix SLURM_TASKS_PER_NODE to be set correctly.
    -- Accounting - more checks for strings with a possible `'` in it.
    -- sreport - Fix by adding planned down time to utilization reports.
    -- Do not report an error when sstat identifies job steps terminated during
    its execution, but log using debug type message.
    -- Select/cons_res - Permit node removed from job by going down to be returned
    to service and re-used by another job.
    -- Select/cons_res - Tighter packing of job allocations on sockets.
    -- SlurmDBD - fix to allow user root along with the slurm user to register a
    cluster.
    -- Select/cons_res - Fix for support of consecutive node option.
    -- Select/cray - Modify build to enable direct use of libslurm library.
    -- Bug fixes related to job step allocation logic.
    -- Cray - Disable enforcement of MaxTasksPerNode, which is not applicable
    with launch/aprun.
    -- Accounting - When rolling up data from past usage ignore "idle" time from
    a reservation when it has the "Ignore_Jobs" flag set. Since jobs could run
    outside of the reservation in it's nodes without this you could have
    double time.
    -- Accounting - Minor fix to avoid reuse of variable erroneously.
    -- Reject job at submit time if the node count is invalid. Previously such a
    job submitted to a DOWN partition would be queued.
    -- Purge vestigial job scripts when the slurmd cold starts or slurmstepd
    terminates abnormally.
    -- Add support for FreeBSD.
    -- Add sanity check for NULL cluster names trying to register.
    -- BGQ - Push action 'D' info to scontrol for admins.
    -- Reset a job's reason from PartitionDown when the partition is set up.
    -- BGQ - Handle issue where blocks would have a pending job on them and
    while it was free cnodes would go into software error and kill the job.
    -- BGQ - Fix issue where if for some reason we are freeing a block with
    a pending job on it we don't kill the job.
    -- BGQ - Fix race condition were a job could of been removed from a block
    without it still existing there. This is extremely rare.
    -- BGQ - Fix for when a step completes in Slurm before the runjob_mux notifies
    the slurmctld there were software errors on some nodes.
    -- BGQ - Fix issue on state recover if block states are not around
    and when reading in state from DB2 we find a block that can't be created.
    You can now do a clean start to rid the bad block.
    -- Modify slurmdbd to retransmit to slurmctld daemon if it is not responding.
    -- BLUEGENE - Fix issue where when doing backfill preemptable jobs were
    never looked at to determine eligibility of backfillable job.
    -- Cray/BlueGene - Disable srun --pty option unless LaunchType=launch/slurm.
    -- CRAY - Fix sanity check for systems with more than 32 cores per node.
    -- CRAY - Remove other objects from MySQL query that are available from
    the XML.
    -- BLUEGENE - Set the geometry of a job when a block is picked and the job
    isn't a sub-block job.
    -- Cray - avoid check of macro versions of CLE for version 5.0.
    -- CRAY - Fix memory issue with reading in the cray.conf file.
    -- CRAY - If hostlist is given with srun make sure the node count is the same
    as the hosts given.
    -- CRAY - If task count specified, but no tasks-per-node, then set the tasks
    per node in the BASIL reservation request.
    -- CRAY - fix issue with --mem option not giving correct amount of memory
    per cpu.
    -- CRAY - Fix if srun --mem is given outside an allocation to set the
    APRUN_DEFAULT_MEMORY env var for aprun. This scenario will not display
    the option when used with --launch-cmd.
    -- Change sview to use GMutex instead of GStaticMutex
    -- CRAY - set APRUN_DEFAULT_MEMROY instead of CRAY_AUTO_APRUN_OPTIONS
    -- sview - fix issue where if a partition was completely in one state the
    cpu count would be reflected correctly.
    -- BGQ - fix for handling half rack system in STATIC of OVERLAP mode to
    implicitly create full system block.
    -- CRAY - Dynamically create BASIL XML buffer to resize as needed.
    -- Fix checking if QOS limit MaxCPUMinsPJ is set along with DenyOnLimit to
    deny the job instead of holding it.
    -- Make sure on systems that use a different launcher than launch/slurm not
    to attempt to signal tasks on the frontend node.
    -- Cray - when a step is requested count other steps running on nodes in the
    allocation as taking up the entire node instead of just part of the node
    allocated. And always enforce exclusive on a step request.
    -- Cray - display correct nodelist, node/cpu count on steps.

    * Changes in Slurm 2.6.0pre3
    ============================
    -- Add milliseconds to default log message header (both RFC 5424 and ISO 8601
    time formats). Disable milliseconds logging using the configure
    parameter "--disable-log-time-msec". Default time format changes to
    ISO 8601 (without time zone information). Specify "--enable-rfc5424time"
    to restore the time zone information.
    -- Add username (%u) to the filename pattern in the batch script.
    -- Added options for front end nodes of AllowGroups, AllowUsers, DenyGroups,
    and DenyUsers.
    -- Fix sched/backfill logic to initiate jobs with maximum time limit over the
    partition limit, but the minimum time limit permits it to start.
    -- gres/gpu - Fix for gres.conf file with multiple files on a single line
    using a slurm expression (e.g. "File=/dev/nvidia[0-1]").
    -- Replaced ipmi.conf with generic acct_gather.conf file for all acct_gather
    plugins. For those doing development to use this follow the model set
    forth in the acct_gather_energy_ipmi plugin.
    -- Added more options to update a step's information
    -- Add DebugFlags=ThreadID which will print the thread id of the calling
    thread.
    -- CRAY - Allocate whole node (CPUs) in reservation despite what the
    user requests. We have found any srun/aprun afterwards will work on a
    subset of resources.

    Slurm User Group Meeting CFA

    You are invited to submit an abstract of a presentation or tutorial to be given at the Slurm User Group Meeting 2013. This event is sponsored and organized by SchedMD and will be held in Oakland, California, USA on September 18 and 19, 2013.

    This international event is opened to everyone who wants to:

    • Learn more about Slurm, a highly scalable Resource Manager and Job Scheduler
    • Share their knowledge and experience with other users and administrators
    • Get detailed informations about the latest features and developments
    • Share requirements and discuss future developments

    Everyone who wants to present their own usage, developments, site report, or tutorial about Slurm is invited to send an abstract to sugc@schedmd.com.

    IMPORTANT DATES:
    May 24, 2013: Abstracts due
    June 21, 2013: Notification of acceptance
    September 18-19, 2013: Slurm User Group Meeting 2013

    Program Committee:
    Yiannis Georgiou (Bull)
    Matthieu Hautreux (CEA)
    Morris Jette (SchedMD)
    Donald Lipari (LLNL, Lawrence Livermore National Laboratory)
    Colin McMurtrie (CSCS, Swiss National Supercomputing Centre)
    Stephen Trofinoff (CSCS, Swiss National Supercomputing Centre)

    Slurm versions 2.5.4 and 2.6.0-pre2 are now available

    Slurm versions 2.5.4 is now available with the bug fixes listed below. The latest versions of Slurm are available from www.schedmd.com/#repos.

    - Fix bug in PrologSlurmctld use that would block job steps until node responds.
    - CRAY - If a partition has MinNodes=0 and a batch job doesn't request nodes put the allocation to 1 instead of 0 which prevents the allocation to happen.
    - Better debug when the database is down and using the --cluster option in the user commands.
    - When asking for job states with sacct, default to 'now' instead of midnight of the current day.
    - Fix for handling a test-only job or immediate job that fails while being built.
    - Comment out all of the logic in the job_submit/defaults plugin. The logic is only an example and not meant for actual use.
    - Eliminate configuration file 4096 character line limitation.
    - More robust logic for tree message forward
    - BGQ - When cnodes fail in a timeout fashion correctly look up parent midplane.
    - Correct sinfo "%c" (node's CPU count) output value for Bluegene systems.
    - Backfill - Responsive improvements for systems with large numbers of jobs (more than 5000) and using the SchedulerParameters option bf_max_job_user.
    - slurmstepd: ensure that IO redirection openings from/to files correctly handle interruption
    - BGQ - Able to handle when midplanes go into Hardware::SoftwareFailure
    - GRES - Correct tracking of specific resources used after slurmctld restart. Counts would previously go negative as jobs terminate and decrement from a base value of zero.
    - Fix for priority/multifactor2 plugin to not assert when configured with --enable-debug.
    - Select/cons_res - If the job request specified --ntasks-per-socket and the allocation using is cores, then pack the tasks onto the sockets up to the specified value.
    - BGQ - If a cnode goes into an 'error' state and the block containing the cnode does not have a job running on it do not resume the block.
    - BGQ - Handle blocks that don't free themselves in a reasonable time better.
    - BGQ - Fix for signaling steps when allocation ends before step.
    - Fix for backfill scheduling logic with job preemption; starts more jobs.
    - xcgroup - remove bugs with EINTR management in write calls
    - jobacct_gather - fix total values to not always == the max values.
    - Fix for handling node registration messages from older versions without energy data.
    - BGQ - Allow user to request full dimensional mesh.
    - sdiag command - Correction to jobs started value reported.
    - Prevent slurmctld assert when invalid change to reservation with running jobs is made.
    - BGQ - If signal is NODE_FAIL allow forward even if job is completing and timeout in the runjob_mux trying to send in this situation.
    - BGQ - More robust checking for correct node, task, and ntasks-per-node options in srun, and push that logic to salloc and sbatch.
    - GRES topology bug in core selection logic fixed.
    - Fix to handle init.d script for querying status and not return 1 on success.

    Slurm versions 2.6.0-pre2 contains the enhancements listed below.

    - Do not purge inactive interactive jobs that lack a port to ping (added for MR+ operation).
    - Advanced reservations with hostname and core counts now supports asymetric reservations (e.g. specific different core count for each node).
    - Added slurmctld/dynalloc plugin for MapReduce+ support.
    - Added "DynAllocPort" configuration parameter.
    - Added partition paramter of SelectTypeParameters to override system-wide value.
    - Added cr_type to partition_info data structure.
    - Added allocated memory to node information available (within the existing select_nodeinfo field of the node_info_t data structure). Added Allocated Memory to node information displayed by sview and scontrol commands.
    - Make sched/backfill the default scheduling plugin rather than sched/builtin (FIFO).
    - Added support for a job having different priorities in different partitions.
    - Added new SchedulerParameters configuration parameter of "bf_continue" which permits the backfill scheduler to continue considering jobs for backfill scheduling after yielding locks even if new jobs have been submitted. This can result in lower priority jobs from being backfill scheduled instead of newly arrived higher priority jobs, but will permit more queued jobs to be considered for backfill scheduling.
    - Added support to purge reservation records from accounting.
    - Cray - Add support for Basil 1.3

    Slurm version 2.5.3 is now available

    Slurm versions 2.5.3 is now available with the bug fixes listed below. The latest versions of Slurm are available from www.schedmd.com/#repos.

    Gres/gpu plugin - If no GPUs requested, set CUDA_VISIBLE_DEVICES=NoDevFiles. This bug was introduced in 2.5.2 for the case where a GPU count was configured, but without device files.
    task/affinity plugin - Fix bug in CPU masks for some processors.
    Modify sacct command to get format from SACCT_FORMAT environment variable.
    BGQ - Changed order of library inclusions and fixed incorrect declaration to compile correctly on newer compilers.
    Fix for not building sview if glib exists on a system but not the gtk libs.
    BGQ - Fix for handling a job cleanup on a small block if the job has long since left the system.
    Fix race condition in job dependency logic which can result in invalid memory reference.

    Slurm versions 2.5.2 and 2.6.0-pre1 available

    Slurm version 2.5.2 is now available with various bug fixes. We have also made available pre-release of version 2.6, (still under development). Notable features in v2.6 include support for job arrays and accounting for a job's energy consumption using IPMI. The job array documentation is available at www.schedmd.com/slurmdocs/job_array.html. The latest versions of Slurm are available from www.schedmd.com/#repos.

    Slurm version 2.5.0 released

    We are pleased to announce the availability of Slurm version 2.5.0. This is a major upgrade from version 2.4 with changes to the Slurm commands and API. Pending and running should be preserved through the upgrade. You should plan to upgrade your slurmdbd (Slurm DataBase Daemon) before upgrading other Slurm daemons or programs. You can get the latest Slurm tar-ball from the repository here.

    We have also released version 2.4.5 with various minor bug fixes. This will likely be the final release of version 2.4.

    Highlights of version 2.5 include:

    • Major performance improvements for high-throughput computing.
    • Added srun option "--cpu-freq" to enable user control over the job's CPU frequency and thus it's power consumption.
    • Account for power consumption by job.
    • Added "boards" count to node information and "boards_per_node" to job request and job information. Optimize resource allocation to minimize number of boards used by a job.
    • Added support for IBM Parallel Environment (PE) including the launching of jobs using either the srun or poe command.
    • Add support for advanced reservation of specific cores rather than whole nodes.
    • Added priority/multifactor2 plugin supporting ticket based shares.
    • Added gres/mic plugin supporting Intel Many Integrated Core (MIC) processors.
    • Added launch plugin to support srun interface to launch tasks using different methods like IBM's poe and Cray's aprun.
    • Web pages have a different appearance.

    Slurm version 2.5.0-rc2 now available

    Slurm version 2.5.0-rc2 is now available. Slurm version 2.5.0-rc1 had a bad slurm.spec file resulting in plugins not being packaged. The slurm.spec file and several bugs have fixed in version 2.5.0-rc2, which is available from here. We currently expect to tag versions 2.5.0 and 2.4.5 about December 4, a bit later than expected, but preferable for improved stability.

    Slurm version 2.5.0-rc1 now available

    We are pleased to announce that Slurm version 2.5.0-rc1 (release candidate 1) is now available for download from here.

    This version should be considered stable and encourage all early adopters to upgrade and test so we can flush out any major issues before the scheduled release of version 2.5.0 in the end of November.

    Thanks for everyone's help with this release. It has a host of new features, see the RELEASE_NOTES file for more information there.

    Slurm version 2.4.4 is now available

    We are pleased to announce that Slurm version 2.4.4 has been tagged and is now available for download from here. It contains a variety of bug fixes, almost all of them for IBM BlueGene/Q systems.

    SLURM versions 2.4.3 and 2.5.0-pre3 now available

    We are pleased to announce the availability of SLURM version 2.4.3 with a sizable number of bug fixes, primarily for IBM Bluegene systems.

    Both are available now for download here.

    We have also made available version 2.5.0-pre3, a pre-release of the version 2.5 code, which is still under development. Of particular note, this version of SLURM supports the IBM Parallel Environment (PE) including POE and IBM's NRT switch interface. We are nearing the end of development for version 2.5 and will soon move into a testing phase before release, planned for November. If you are developing new code please code against the master git repo (2.5) as it is constantly updated so as to avoid as many conflicts as possible.

    As always if you find any bugs let us know through http://bugs.schedmd.com or the slurm-dev list.

    SLURM version 2.4.2 now available

    SLURM version 2.4.2 is now available for download from here.

    This includes many bug fixes, most of which are IBM BlueGene system related.

    As always if you find any bugs let us know through http://bugs.schedmd.com or the slurm-dev list.

    SLURM versions 2.4.1 is now available

    It has come to our attention a bug in 2.4.0 results in job loss when upgrading from 2.3.* to 2.4.0.

    2.4.1 has fixed this problem. This is the only patch in 2.4.1 from 2.4.0.

    2.4.1 will preserve job state from 2.4.0 as well as state from 2.1+.

    Sorry for the inconvenience, thanks to Carles Fenoy for bringing the issue to our attention.

    You may download it here. To avoid future job loss we have taken 2.4.0 away from download. If you need it for historic purposes please fill free to download the tag from github.

    SLURM versions 2.4.0 and 2.5.0-pre1 are now available

    We are pleased to release a formal 2.4.0 release! Also a first development release of 2.5.

    Both are available now for download here.

    If you are developing new code please code against the master git repo as it is constantly updated so as to avoid as many conflicts as possible.

    Note to BGQ earlier adopters: Recently there have been a few changes that require the runjob_mux to run as your SLURM user. Also the plugin_flags must be updated as well to avoid a possible runjob_mux crash if you are starting a job and decide to turn off the slurmctld at the same time. Please read the updated bluegene web page look for "System Administration for BlueGene/Q only" for full instructions.

    Thanks for all your help and support. Among other things 2.4 brings substantial performance enhancements and many other improvements many of which can be found in the RELEASE_NOTES file in the code.

    As always if you find any bugs let us know through http://bugs.schedmd.com or the slurm-dev list.

    Slurm version 2.5.0 released

    We are pleased to announce the availability of Slurm version 2.5.0. This is a major upgrade from version 2.4 with changes to the Slurm commands and API. Pending and running should be preserved through the upgrade. You should plan to upgrade your slurmdbd (Slurm DataBase Daemon) before upgrading other Slurm daemons or programs. You can get the latest Slurm tar-ball from the repository here.

    We have also released version 2.4.5 with various minor bug fixes. This will likely be the final release of version 2.4.

    Highlights of version 2.5 include:

    • Major performance improvements for high-throughput computing.
    • Added srun option "--cpu-freq" to enable user control over the job's CPU frequency and thus it's power consumption.
    • Account for power consumption by job.
    • Added "boards" count to node information and "boards_per_node" to job request and job information. Optimize resource allocation to minimize number of boards used by a job.
    • Added support for IBM Parallel Environment (PE) including the launching of jobs using either the srun or poe command.
    • Add support for advanced reservation of specific cores rather than whole nodes.
    • Added priority/multifactor2 plugin supporting ticket based shares.
    • Added gres/mic plugin supporting Intel Many Integrated Core (MIC) processors.
    • Added launch plugin to support srun interface to launch tasks using different methods like IBM's poe and Cray's aprun.
    • Web pages have a different appearance.

    SLURM 2.4.0-rc1 tagged and released

    SchedMD is pleased to announce the immediate availability of SLURM 2.4.0-rc1!

    A summary of the changes in version 2.4.0-rc1 from version 2.4.0-pre4 can be found in the file "NEWS" in the distributed files. As 2.4 has graduated from "pre" to "rc" only bug fixes will be contained in future 2.4 releases.

    Our current plan is to release another rc in a couple of weeks and then a genuine 2.4 tag in June.

    This code should be considered stable and production ready, please test and let us know if you find any issues.

    The code is available here: http://www.schedmd.com/#repos

    Please report any bug to http://bugs.schedmd.com or the slurm-dev list at slurm-dev@schedmd.com

    Enjoy!

    SLURM 2.3.5 tagged and released

    SchedMD is pleased to announce the immediate availability of SLURM 2.3.5!

    A summary of the changes in version 2.3.5 from version 2.3.4 can be found in the file "NEWS" in the distributed files. Only bug fixes are and will be contained in the 2.3 releases.

    The code is available here: http://www.schedmd.com/#repos

    Please report any bug to http://bugs.schedmd.com or the slurm-dev list at slurm-dev@schedmd.com

    Enjoy!

    SLURM 2.3.4 and 2.4.0-pre4 tagged and released

    SchedMD is pleased to announce the immediate availability of SLURM 2.3.4!

    A summary of the changes in version 2.3.4 from version 2.3.3 can be found in the file "RELEASE_NOTES" in the distributed files. Only bug fixes are and will be contained in the 2.3 releases.

    Also tagged is a new development release 2.4.0-pre4. Future development will be added to this release along with any bug fixes found in the 2.3 branch. If you are developing new code or want to run the most bleeding edge SLURM please use this version. While the code may be fairly stable this version is beta and should be considered such. For most cases this version isn't recommended for production systems.

    2.4 NOTE:
    Because internal data structures may change from one -pre release to another preserving state is not always possible, so jobs may be lost.
    2.4 NOTE:
    If running on a BGQ system this version is most likely the version you want. 2.3 gives a very small subset of functionality 2.4 already offers.

    The code for both versions is available here: http://www.schedmd.com/#repos

    Please report any bug to http://bugs.schedmd.com or the slurm-dev list at slurm-dev@schedmd.com

    Enjoy!

    SLURM 2.4.0-pre3 tagged and released

    For those of you working on SLURM version 2.4 development SchedMD is pleased to announce the immediate availability of the development release SLURM 2.4.0-pre3!

    IBM BlueGene/Q systems are fully supported in this release including documentation.

    Major changes in version 2.4.0-pre3 from version 2.4.0-pre2 can be found in the file "NEWS" in the distributed files.

    Future development will be added to this release along with any bug fixes found in the 2.3 branch. If you are developing new code or want to run the most bleeding edge SLURM please use this version. While the code may be fairly stable this version is beta and should be considered such. For most cases this version isn't recommended for production systems.

    2.4 NOTE:
    Because internal data structures may change from one -pre release to another preserving state is not always possible, so jobs may be lost.
    2.4 NOTE:
    If running on a BGQ system this version is the version you want. 2.3 gives a very small subset of functionality where 2.4 now delivers the complete package.

    The code is available here: http://www.schedmd.com/#repos

    Please report any bug to http://bugs.schedmd.com or the slurm-dev list at slurm-dev@lists.llnl.gov

    Enjoy!

    SLURM 2.3.3 tagged and released

    SchedMD is pleased to announce the immediate availability of SLURM 2.3.3!

    A summary of the changes in version 2.3.3 from version 2.3.2 can be found in the file "NEWS" in the distributed files. Only bug fixes are and will be contained in the 2.3 releases.

    The code is available here: http://www.schedmd.com/#repos

    Please report any bug to http://bugs.schedmd.com or the slurm-dev list at slurm-dev@lists.llnl.gov

    Enjoy!

    SLURM 2.3.2 and 2.4.0-pre2 tagged and released

    SchedMD is pleased to announce the immediate availability of SLURM 2.3.2!

    A summary of the changes in version 2.3.2 from version 2.3.1 can be found in the file "RELEASE_NOTES" in the distributed files. Only bug fixes are and will be contained in the 2.3 releases.

    Also tagged is a new development release 2.4.0-pre2. Future development will be added to this release along with any bug fixes found in the 2.3 branch. If you are developing new code or want to run the most bleeding edge SLURM please use this version. While the code may be fairly stable this version is beta and should be considered such. For most cases this version isn't recommended for production systems.

    2.4 NOTE:
    Because internal data structures may change from one -pre release to another preserving state is not always possible, so jobs may be lost.
    2.4 NOTE:
    If running on a BGQ system this version is most likely the version you want. 2.3 gives a very small subset of functionality 2.4 already offers.

    The code for both versions is available here: http://www.schedmd.com/#repos

    Please report any bug to http://bugs.schedmd.com or the slurm-dev list at slurm-dev@lists.llnl.gov

    Enjoy!

    SLURM 2.3.1 and 2.4.0-pre1 tagged and released

    SchedMD is pleased to announce the immediate availability of SLURM 2.3.1!

    A summary of the changes in version 2.3.1 from version 2.3.0 can be found in the file "RELEASE_NOTES" in the distributed files. Only bug fixes are and will be contained in the 2.3 releases.

    Also tagged is a new development release 2.4.0-pre1. Future development will be added to this release along with any bug fixes found in the 2.3 branch. If you are developing new code or want to run the most bleeding edge SLURM please use this version. While the code may be fairly stable this version is beta and should be considered such. For most cases this version isn't recommended for production systems.

    2.4 NOTE:
    Because internal data structures may change from one -pre release to another preserving state is not always possible, so jobs may be lost.
    2.4 NOTE:
    If running on a BGQ system this version is most likely the version you want. 2.3 gives a very small subset of functionality 2.4 already offers.

    The code for both versions is available here: http://www.schedmd.com/#repos

    Please report any bug to http://bugs.schedmd.com or the slurm-dev list at slurm-dev@lists.llnl.gov

    Enjoy!

    SLURM 2.3.0 tagged and released

    SchedMD is pleased to announce the immediate availability of SLURM 2.3.0!

    A summary of the major changes in version 2.3 from version 2.2 can be found in the file "RELEASE_NOTES" with the distributed files.

    This version should be considered stable and ready for production use.

    The code is available here: http://www.schedmd.com/#repos

    Enjoy!

    SchedMD Porting SLURM to BlueGene/Q for LLNL

    SchedMD LLC anouced today a contract signing with Lawrence Livermore National Laboratory (LLNL) to provide development services for the SLURM workload scheduler. Technical activities center around making SLURM operational on Sequoia, a 20 petaFLOP IBM Bluegene/Q computer slated for delivery to LLNL in late 2011 with deployment scheduled in 2012. Sequoia will have 1.6 million cores, 1.6 petabytes of memory, 96 racks and 98,304 compute nodes, making it one of the most powerful computers in the world.

    Moe Jette, Chief Technology Officer of SchedMD, reports "We eagerly anticipate working with LLNL and extending SLURM's capabilities to the latest generation of hardware from IBM. SLURM was designed for very high scalability from its inception and anticipate no difficulties in managing the workload on Sequoia's 1.6 million cores."

    SchedMD Announces SLURM Support Contract with Swiss National Supercomputer Centre

    Livermore, CA - SchedMD LLC announced today a contract signing with Swiss National Supercomputer Centre (CSCS) to provide support services for the SLURM workload scheduler. CSCS recently installed the SLURM workload scheduler across their supercomputers including the 22,032 core Cray XT5 system.

    Colin McMurtie, Head of Systems at CSCS said "The ease with which we have made this transition is testament to the robustness and high quality of the product but also to the no-fuss installation and configuration procedure and the high quality documentation. We have no qualms about recommending SLURM to any facility, large or small, who wish to make the break from the various commercial options available today."

    Morris "Moe" Jette, Chief Technology Officer of SchedMD, said "We look forward to working with one of the premier computer centers in Europe. The workload scheduler is a critical component of any supercomputer center and SchedMD will provide CSCS with the support they need to use SLURM."

    SLURM Developers Depart Lawrence Livermore National Laboratory

    The primary SLURM developers, Morris Jette and Danny Auble, have decided to depart Lawrence Livermore National Laboratory (LLNL) in order to concentrate their energies on SchedMD LLC, a company they formed in 2010 to provide SLURM development and support.

    SLURM development was begun at Lawrence Livermore National Laboratory (LLNL) in 2002. It has since become one of the most popular job schedulers in high-performance computing, currently installed on the largest computer in the world at the National University of Defence Technology (NUDT) in China, Europe's largest system at Commissariat a l'Energie Atomique (CEA) in France and many others. This popularity was achieved with limited commercial support and no marketing. Our expectation is that by leaving LLNL, more resources can be made available for SLURM development and that its rate of development will increase. We also anticipate that commercial support will make it more attractive to many consumers and lead to more widespread acceptance.

    Our intent is that SLURM remain open-source and freely available to the public. The short-term impact of this transition is a short delay in the release of SLURM version 2.3 to the Fall of 2011. Version 2.3 includes the ability to increase job size, task binding to resources using Linux cgroups, plus support for IBM BlueGene/Q, Cray XT and Cray XE systems.

    The longer-term effect will be largely driven by market forces.

    SchedMD to port SLURM to Cray Computers

    SchedMD LLC announced today that an agreement had been reached with Oak Ridge National Laboratory to port SLURM to Cray systems.

    Oak Ridge National Laboratory (ORNL) operates the most powerful computer in the United States, a Cray XT5 with a peak speed of 2.33 petaflops (over two thousand trillion calculations per second). ORNL also has a contract with Cray for a 20 petaflop computer to begin shipment in 2011.

    Morris "Moe" Jette, Chief Technology Officer of SchedMD, says "SLURM supports all of the high-performance computing architectures today except for Cray systems. This work will open the door for Cray customers to a state of the art, open source job scheduler with tremendous cost savings compared to proprietary schedulers."

    Slurm versions 2.5.2 and 2.6.0-pre1 available

    Slurm version 2.5.2 is now available with the bug fixes described below. We have also made available pre-release of version 2.6,(still under development). Notable features in v2.6 include support for job arrays and accounting for a job's energy consumption using IPMI. The job array documentation is available at www.schedmd.com/slurmdocs/job_array.html The latest versions of Slurm are available from: www.schedmd.com/#repos * Changes in SLURM 2.5.2 ======================== -- Fix advanced reservation recovery logic when upgrading from version 2.4. -- BLUEGENE - fix for QOS/Association node limits. -- Add missing "safe" flag from print of AccountStorageEnforce option. -- Fix logic to optimize GRES topology with respect to allocated CPUs. -- Add job_submit/all_partitions plugin to set a job's default partition to ALL available partitions in the cluster. -- Modify switch/nrt logic to permit build without libnrt.so library. -- Handle srun task launch failure without duplicate error messages or abort. -- Fix bug in QoS limits enforcement when slurmctld restarts and user not yet added to the QOS list. -- Fix issue where sjstat and sjobexitmod was installed in 2 different RPMs. -- Fix for job request of multiple partitions in which some partitions lack nodes with required features. -- Permit a job to use a QOS they do not have access to if an administrator manually set the job's QOS (previously the job would be rejected). -- Make more variables available to job_submit/lua plugin: slurm.MEM_PER_CPU, slurm.NO_VAL, etc. -- Fix topology/tree logic when nodes defined in slurm.conf get re-ordered. -- In select/cons_res, correct logic to allocate whole sockets to jobs. Work by Magnus Jonsson, Umea University. -- In select/cons_res, correct logic when job removed from only some nodes. -- Avoid apparent kernel bug in 2.6.32 which apparently is solved in at least 3.5.0. This avoids a stack overflow when running jobs on more than 120k nodes. -- BLUEGENE - If we made a block that isn't runnable because of a overlapping block, destroy it correctly. -- Switch/nrt - Dynamically load libnrt.so from within the plugin as needed. This eliminates the need for libnrt.so on the head node. -- BLUEGENE - Fix in reservation logic that could cause abort. * Changes in SLURM 2.6.0-pre1 ============================= -- Add "state" field to job step information reported by scontrol. -- Notify srun to retry step creation upon completion of other job steps rather than polling. This results in much faster throughput for job step execution with --exclusive option. -- Added "ResvEpilog" and "ResvProlog" configuration parameters to execute a program at the beginning and end of each reservation. -- Added "slurm_load_job_user" function. This is a variation of "slurm_load_jobs", but accepts a user ID argument, potentially resulting in substantial performance improvement for "squeue --user=ID" -- Added "slurm_load_node_single" function. This is a variation of "slurm_load_nodes", but accepts a node name argument, potentially resulting in substantial performance improvement for "sinfo --nodes=NAME". -- Added "HealthCheckNodeState" configuration parameter identify node states on which HealthCheckProgram should be executed. -- Remove sacct --dump --formatted-dump options which were deprecated in 2.5. -- Added support for job arrays (phase 1 of effort). See "man sbatch" option -a/--array for details. -- Add new AccountStorageEnforce options of 'nojobs' and 'nosteps' which will allow the use of accounting features like associations, qos and limits but not keep track of jobs or steps in accounting. -- Cray - Add new cray.conf parameter of "AlpsEngine" to specify the communication protocol to be used for ALPS/BASIL. -- select/cons_res plugin: Correction to CPU allocation count logic in for cores without hyperthreading. -- Added new SelectTypeParameter value of "CR_ALLOCATE_FULL_SOCKET". -- Added PriorityFlags value of "TICKET_BASED" and merged priority/multifactor2 plugin into priority/multifactor plugin. -- Add "KeepAliveTime" configuration parameter controlling how long sockets used for srun/slurmstepd communications are kept alive after disconnect. -- Added SLURM_SUBMIT_HOST to salloc, sbatch and srun job environment. -- Added SLURM_ARRAY_TASK_ID to environment of job array. -- Added squeue --array/-r option to optimize output for job arrays. -- Added "SlurmctldPlugstack" configuration parameter for generic stack of slurmctld daemon plugins. -- Removed contribs/arrayrun tool. Use native support for job arrays. -- Modify default installation locations for RPMs to match "make install": _prefix /usr/local _slurm_sysconfdir %{_prefix}/etc/slurm _mandir %{_prefix}/share/man _infodir %{_prefix}/share/info -- Add acct_gather_energy/ipmi which works off freeipmi for energy gathering