Parallel Environment Runtime Edition (PE RTE)¶
xCAT software kits for PE RTE for Linux is available on: [1]
- PE RTE 1.3 1 and newer
PE RTE and mlnxofed_ib_install
Conflict¶
PPE requires the 32-bit version of libibverbs
. The default behavior of the mlnxofed_ib_install
postscript used to install the Mellanox OFED Infiniband (IB) driver is to remove any of the old IB related packages when installing. To bypass this behavior, set the variable mlnxofed_options=--force
when running the mlnxofed_ib_install
script.
Install Multiple Versions¶
Beginning with PE RTE 1.2.0.10, the packages are designed to allow for multiple versions of PE RTE to coexist on the same machine.
The default behavior of xCAT software kits is to only allow one version of a kitcomponent
to be associated with an xCAT osimage.
When using addkitcomp
to add a newer version of a kit component, xCAT will first remove the old version of the kit component before adding the new one.
To add multiple versions of PE RTE kit components to the same osimage, use the -n | --noupgrade
option. For example, to add PE RTE 1.3.0.1 and PE RTE 1.3.0.2 to the compute
osimage:
addkitcomp -i compute pperte_compute-1.3.0.1-0-rhels-6-x86_64
addkitcomp -i compute -n pperte_compute-1.3.0.2-0-rhels-6-x86_64
POE hostlist¶
When running parallel jobs, POE requires the user pass it a host list file. xCAT can help to create this hostlist file by running the nodels
command against the desired node range and redirecting to a file.
nodels compute > /tmp/hostlist
Known Issues¶
- [PE RTE 1.3.0.7] - For developers creating the complete software kit. The src rpm is no longer required. It is recommended to create the new software kit for PE RTE 1.3.0.7 from scratch and not to use the older kits as a starting point.
- [PE RTE 1.3.0.7] - When upgrading
ppe_rte_man
in a diskless image, there may be errors reported during the genimage process. The new packages are actually upgraded, so the errors can be ignored with low risk. - [PE RTE 1.3.0.1 to 1.3.0.6] - When uninstalling or upgrading ppe_rte_man in an diskless image,
genimage <osimage>
may fail and stop an an error. To workaround, simply rerungenimage <osimage>
to finish the creation of the diskless image
[1] | If using older releases of PE RTE, refer to IBM HPC Stack in an xCAT Cluster |