RHEL 7.5

xCAT provides a sample package list (pkglist) files for CUDA. You can find them:

  • Diskful: /opt/xcat/share/xcat/install/rh/cuda*

  • Diskless: /opt/xcat/share/xcat/netboot/rh/cuda*

Diskful images

The following examples will create diskful images for cudafull and cudaruntime. The osimage definitions will be created from the base rhels7.5-ppc64le-install-compute osimage.

[Note]: There is a requirement to reboot the machine after the CUDA drivers are installed. To satisfy this requirement, the CUDA software is installed in the pkglist attribute of the osimage definition where a reboot will happen after the Operating System is installed.

cudafull

  1. Create a copy of the install-compute image and label it cudafull:

    lsdef -t osimage -z rhels7.5-ppc64le-install-compute \
      | sed 's/install-compute:/install-cudafull:/' \
      | mkdef -z
    
  2. Add the CUDA repo created in the previous step to the pkgdir attribute:

    chdef -t osimage -o rhels7.5-ppc64le-install-cudafull -p \
      pkgdir=/install/cuda-9.2/ppc64le/cuda-core,/install/cuda-9.2/ppc64le/cuda-deps
    
  3. Use the provided cudafull pkglist to install the CUDA packages:

    chdef -t osimage -o rhels7.5-ppc64le-install-cudafull \
      pkglist=/opt/xcat/share/xcat/install/rh/cudafull.rhels7.ppc64le.pkglist
    

cudaruntime

  1. Create a copy of the install-compute image and label it cudaruntime:

    lsdef -t osimage -z rhels7.5-ppc64le-install-compute \
      | sed 's/install-compute:/install-cudaruntime:/' \
      | mkdef -z
    
  2. Add the CUDA repo created in the previous step to the pkgdir attribute:

    chdef -t osimage -o rhels7.5-ppc64le-install-cudaruntime -p \
      pkgdir=/install/cuda-9.2/ppc64le/cuda-core,/install/cuda-9.2/ppc64le/cuda-deps
    
  3. Use the provided cudaruntime pkglist to install the CUDA packages:

    chdef -t osimage -o rhels7.5-ppc64le-install-cudaruntime \
      pkglist=/opt/xcat/share/xcat/instal/rh/cudaruntime.rhels7.ppc64le.pkglist
    

Diskless images

The following examples will create diskless images for cudafull and cudaruntime. The osimage definitions will be created from the base rhels7.5-ppc64le-netboot-compute osimage.

[Note]: For diskless, the install of the CUDA packages MUST be done in the otherpkglist and NOT the pkglist as with diskful. The requirement for rebooting the machine is not applicable in diskless nodes because the image is loaded on each reboot.

cudafull

  1. Create a copy of the netboot-compute image and label it cudafull:

    lsdef -t osimage -z rhels7.5-ppc64le-netboot-compute \
      | sed 's/netboot-compute:/netboot-cudafull:/' \
      | mkdef -z
    
  2. Verify that the CUDA repo created in the previous step is available in the directory specified by the otherpkgdir attribute.

    The otherpkgdir directory can be obtained by running lsdef on the osimage:

    # lsdef -t osimage rhels7.5-ppc64le-netboot-cudafull -i otherpkgdir
    Object name: rhels7.5-ppc64le-netboot-cudafull
        otherpkgdir=/install/post/otherpkgs/rhels7.5/ppc64le
    

    Create a symbolic link of the CUDA repository in the directory specified by otherpkgdir

    ln -s /install/cuda-9.2 /install/post/otherpkgs/rhels7.5/ppc64le/cuda-9.2
    
  3. Change the rootimgdir for the cudafull osimage:

    chdef -t osimage -o rhels7.5-ppc64le-netboot-cudafull \
       rootimgdir=/install/netboot/rhels7.5/ppc64le/cudafull
    
  4. Create a custom pkglist file to install additional operating system packages for your CUDA node.

    1. Copy the default compute pkglist file as a starting point:

      mkdir -p /install/custom/netboot/rh/
      
      cp /opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.pkglist \
        /install/custom/netboot/rh/cudafull.rhels7.ppc64le.pkglist
      
    2. Edit the pkglist file and append any packages you desire to be installed. For example:

      vi /install/custom/netboot/rh/cudafull.rhels7.ppc64le.pkglist
      ...
      # Additional packages for CUDA
      pciutils
      
    3. Set the new file as the pkglist attribute for the cudafull osimage:

      chdef -t osimage -o rhels7.5-ppc64le-netboot-cudafull \
        pkglist=/install/custom/netboot/rh/cudafull.rhels7.ppc64le.pkglist
      
  5. Create the otherpkg.pkglist file to do the install of the CUDA full packages:

    1. Create the otherpkg.pkglist file for cudafull:

      vi /install/custom/netboot/rh/cudafull.rhels7.ppc64le.otherpkgs.pkglist
      # add the following packages
      cuda-9.2/ppc64le/cuda-deps/dkms
      cuda-9.2/ppc64le/cuda-core/cuda
      
    2. Set the otherpkg.pkglist attribute for the cudafull osimage:

      chdef -t osimage -o rhels7.5-ppc64le-netboot-cudafull \
        otherpkglist=/install/custom/netboot/rh/cudafull.rhels7.ppc64le.otherpkgs.pkglist
      
  6. Generate the image:

    genimage rhels7.5-ppc64le-netboot-cudafull
    
  7. Package the image:

    packimage rhels7.5-ppc64le-netboot-cudafull
    

cudaruntime

  1. Create a copy of the netboot-compute image and label it cudaruntime:

    lsdef -t osimage -z rhels7.5-ppc64le-netboot-compute \
      | sed 's/netboot-compute:/netboot-cudaruntime:/' \
      | mkdef -z
    
  2. Verify that the CUDA repo created previously is available in the directory specified by the otherpkgdir attribute.

    1. Obtain the otherpkgdir directory using the lsdef command:

      # lsdef -t osimage rhels7.5-ppc64le-netboot-cudaruntime -i otherpkgdir
        Object name: rhels7.5-ppc64le-netboot-cudaruntime
           otherpkgdir=/install/post/otherpkgs/rhels7.5/ppc64le
      
    2. Create a symbolic link to the CUDA repository in the directory specified by otherpkgdir

      ln -s /install/cuda-9.2 /install/post/otherpkgs/rhels7.5/ppc64le/cuda-9.2
      
  3. Change the rootimgdir for the cudaruntime osimage:

    chdef -t osimage -o rhels7.5-ppc64le-netboot-cudaruntime \
       rootimgdir=/install/netboot/rhels7.5/ppc64le/cudaruntime
    
  4. Create the otherpkg.pkglist file to do the install of the CUDA runtime packages:

    1. Create the otherpkg.pkglist file for cudaruntime:

      vi /install/custom/netboot/rh/cudaruntime.rhels7.ppc64le.otherpkgs.pkglist
      
      # Add the following packages:
      cuda-9.2/ppc64le/cuda-deps/dkms
      cuda-9.2/ppc64le/cuda-core/cuda-runtime-9-2
      
    2. Set the otherpkg.pkglist attribute for the cudaruntime osimage:

      chdef -t osimage -o rhels7.5-ppc64le-netboot-cudaruntime \
        otherpkglist=/install/custom/netboot/rh/cudaruntime.rhels7.ppc64le.otherpkgs.pkglist
      
  5. Generate the image:

    genimage rhels7.5-ppc64le-netboot-cudaruntime
    
  6. Package the image:

    packimage rhels7.5-ppc64le-netboot-cudaruntime
    

POWER9 Setup

NVIDIA POWER9 CUDA driver need some additional setup. Refer the URL below for details.

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#power9-setup

xCAT includes a script, cuda_power9_setup as example, to help user handle this situation.

Diskful osimage

For diskful deployment, there is no need to change the osimage definition. Instead, add this postscript to your compute node postscripts list.

chdef p9compute -p postscripts=cuda_power9_setup

Diskless osimage

For diskless deployment, the script need to add to the postinstall script of the osimage. And it should be run in the chroot environment. Please refer the following commands as an example.

mkdir -p /install/custom/netboot/rh
cp /opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.postinstall /install/custom/netboot/rh/cudafull.rhels7.ppc64le.postinstall

cat >>/install/custom/netboot/rh/cudafull.rhels7.ppc64le.postinstall <<-EOF

/install/postscripts/cuda_power9_setup
EOF

chdef -t osimage rhels7.5-ppc64le-netboot-cudafull postinstall=/install/custom/netboot/rh/cudafull.rhels7.ppc64le.postinstall