Ray is a unified framework for scaling AI and general Python workflows. Outside of machine learning (ML), its core distributed runtime and data libraries can be used for writing parallel applications that launch multiple processes, both on the same node and across multiple cluster nodes. These processes can subsequently execute a variety of workloads, e.g. Numba-compiled functions, NumPy calculations, and even GPU-enabled codes.
In this webinar, we will focus on scaling Ray workflows to multiple HPC cluster nodes to speed up various (non-ML) numerical workflows. We will look at both a loosely coupled (embarrassingly parallel) problem and a tightly coupled parallel problem.
Collapse all webinars in this section
Click on each webinar for its recording and materials. The programming language R is not known for its speed. However, with some code optimization, R can be used for relatively heavy computations. Additional speedup can be achieved through various parallel techniques, both with multi-threading and distributed computing. This workshop introduces you to working with R from the command line on the Alliance clusters with a focus on performance. We discuss code profiling and benchmarking, various packages for parallelization, as well as using C++ from inside R to speed up your calculations. Introduction to high-performance research computing in R (2023-Jan-31)
Collapse all webinars in this section
You can also browse some of our Julia programming materials here.
Click on each webinar for its recording and materials. In this webinar, we cover parallel stencil computations in Julia using the ParallelStencil.jl package. This package enables you to write high-level code for fast computations on CPUs and GPUs. These computations are common in all numerical simulations involving the solution of discretized partial differential equations (PDEs) on a grid. ParallelStencil.jl provides high-level functions for computing derivatives and updating arrays. You can execute the same code on a single CPU, multiple CPUs with multithreading via Base.Threads, or on GPUs using either CUDA.jl (NVIDIA GPUs), AMDGPU.jl (AMD GPUs), or Metal.jl (Apple Silicon GPUs). Regardless of the underlying parallel hardware, all low-level communication between threads is hidden behind ParallelStencil.jl's macro calls, ensuring that it remains invisible in the simulation code. This framework makes it highly accessible to domain scientists. Furthermore, you can extend this framework to multiple processes, integrating ParallelStencil.jl with ImplicitGlobalGrid.jl (built upon MPI.jl). This combination facilitates easy scaling to multiple cluster nodes, with further parallelization on multiple cores and GPUs on each node. This architecture has been shown to scale efficiently to hundreds of GPUs and hundreds of cluster nodes. Large-scale numerical experiments are central to much of contemporary scientific and mathematical research. Performing these numerical experiments in a valid, reproducible and scalable fashion is not easy. In this webinar I provide an introduction and pointers to two tools my research group uses to perform numerical experiments: Designed specifically for HPC and inspired by the Python library Dask, Dagger is a distributed framework with a scheduler built on top of Distributed.jl for efficient parallel and out-of-core execution of tasks represented by a directed acyclic graph (DAG). Dask supports computing with multiple threads, multiple processes, and on GPUs. Checkpoints are easy to create if you need to interrupt and resume computations. Finally, Dagger provides some debugging and runtime profiling tools. In this webinar, we start with a quick review of Julia's multi-threading features but focus primarily on Distributed standard library and its large array of tools. We show parallelization using three problems: a slowly converging series, a Julia set, and an N-body solver. We run the examples on a multi-core laptop and an HPC cluster. High-level parallel stencil computations on CPUs and GPUs (2025-Jan-21)
Nextflow and Julia for scalable computation (2024-Nov-12)
Easier parallel Julia workflow with Dagger.jl (2021-Oct-27)
Parallel programming in Julia (2021-Mar-17)
High-performance research computing with Julia (2020-Mar-04)
Collapse all webinars in this section
Click on each webinar for its recording and materials. Chapel is a parallel programming language for scientific computing designed to exploit parallelism across a wide range of hardware, from multi-core computers to large HPC clusters. Recently, Chapel introduced support for GPUs, allowing the same code to run seamlessly on both NVIDIA and AMD GPUs, without modification. In addition, for testing and development, Chapel offers a "CPU-as-device" mode, which lets you prototype GPU code on a regular computer without a dedicated GPU. Programming GPUs in Chapel is significantly easier than using CUDA or ROCm/HIP, and more flexible than OpenACC, as you can run fairly generic Chapel code on GPUs. Obviously, you will benefit from GPU acceleration the most with calculations that can be broken into many independent identical pieces. In Chapel, data transfer to/from a GPU (and between GPUs) is straightforward, thanks to a well-defined coding model that associates both calculations and data with a clear concept of locality. As of this writing, on the Alliance systems, you can run multi-locale (multiple nodes) GPU Chapel natively on Cedar, and single-locale GPU Chapel on all other clusters with NVIDIA cards via a container. Efforts are underway to expand native GPU support to more systems. In this webinar, we guide you through Chapel's key GPU programming features with live demos. In this three-part online webinar series, we introduce the main concepts of the Chapel parallel programming language. Chapel is a relatively new language for both shared- and distributed-memory programming, with easy-to-use, high-level features that make it ideal for learning parallel programming for a novice HPC user. Unlike other high-level data-processing languages and workflows, the primary application of Chapel is numerical modelling and simulation codes, so this workshop is ideal for anyone who wants to learn how to write efficient large-scale numerical codes. GPU computing with Chapel (2024-Oct-01)
Working with data files and external C libraries in Chapel (2020-Mar-18)
Working with distributed unstructured data in Chapel (2019-Apr-17)
Intro to Parallel Programming in Chapel (3-part series, early 2018)
Part 1: Basic language features (2018-Feb-28)
Part 2: Task parallelism in Chapel (2018-Mar-07)
Part 3: Data parallelism in Chapel (2018-Mar-14)
Collapse all webinars in this section
As part of their contribution to HPC Carpentry, WestGrid staff authored a Parallel programming in Chapel course. The materials and exercises presented in this course can be presented as a full-day workshop. If you have questions about the materials, please contact Alex Razoumov - alex.razoumov@westgrid.ca.
Click on each webinar for its recording and materials. A Brief Introduction to the Boost MPI Library (2018-May-09)
Click on each webinar for its recording and materials. This online workshop explores how to use OpenMP to improve the speed of serial jobs on multi-core machines. We review how to add OpenMP constructs to a serial program in order to run it using multiple cores. Viewers are led through a series of hands-on, interactive examples, focusing on multi-threading parallel programming. The topics covered include: Intro to Parallel Programming for Shared Memory Machines (2019-Oct)
Click on each webinar for its recording and materials. Memory debugging with Valgrind (2019-Feb-20)
Click on each webinar for its recording and materials.