Media Summary: For more information about Stanford's online Artificial Intelligence programs visit: To learn more about ... Become AI Researcher (Skool) - In this tutorial you'll learn In this talk, Jeff Niu from OpenAI explores how he brought

Coding A Triton Kernel For - Detailed Analysis & Overview

For more information about Stanford's online Artificial Intelligence programs visit: To learn more about ... Become AI Researcher (Skool) - In this tutorial you'll learn In this talk, Jeff Niu from OpenAI explores how he brought Matrix Multiplication is the heart of every Transformer model. If it's slow, your model is slow. In this episode of Bielik Anatomy, we ... Byron Hsu presents LinkedIn's open-source collection of In our quest to build a deep learning framework, we have hit a roadblock! Training is too slow and needs too much memory for ...

I loaded a 1.5B parameter LLM on a GTX 1650Ti, wrote a 30-line For more information about Stanford's online Artificial Intelligence programs, visit: To learn more about ... Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.

Photo Gallery

Coding a Triton Kernel for Softmax (fwd pass) Computation
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 6: Kernels, Triton
Triton Beginner Coding Tutorial From Scratch - Step by Step - Kernel Fusion
THE TRITON LANGUAGE | PHILIPPE TILLET
Flash Attention derived and coded from first principles with Triton (Python)
Triton GPU Programming From Scratch - Tutorial
triton_lite, a Triton clone in Mojo: Jeff Niu at the Modular GPU Kernel Hackathon
How to Beat PyTorch? Writing a Fast MatMul Kernel in Triton - Tensor Cores, L2 Caching & Auto-Tuning
Lecture 28: Liger Kernel - Efficient Triton Kernels for LLM Training
Intro to Triton: A MyTorch Sidequest!
Kernel Fusion from Scratch: Writing a Triton Kernel and Patching It Into a Live LLM
Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 6: Kernels, Triton, XLA
View Detailed Profile
Coding a Triton Kernel for Softmax (fwd pass) Computation

Coding a Triton Kernel for Softmax (fwd pass) Computation

Let's

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 6: Kernels, Triton

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 6: Kernels, Triton

For more information about Stanford's online Artificial Intelligence programs visit: https://stanford.io/ai To learn more about ...

Triton Beginner Coding Tutorial From Scratch - Step by Step - Kernel Fusion

Triton Beginner Coding Tutorial From Scratch - Step by Step - Kernel Fusion

New video:

THE TRITON LANGUAGE | PHILIPPE TILLET

THE TRITON LANGUAGE | PHILIPPE TILLET

Triton

Flash Attention derived and coded from first principles with Triton (Python)

Flash Attention derived and coded from first principles with Triton (Python)

In this video, I'll be deriving and

Triton GPU Programming From Scratch - Tutorial

Triton GPU Programming From Scratch - Tutorial

Become AI Researcher (Skool) - https://www.skool.com/become-ai-researcher-2669/about In this tutorial you'll learn

triton_lite, a Triton clone in Mojo: Jeff Niu at the Modular GPU Kernel Hackathon

triton_lite, a Triton clone in Mojo: Jeff Niu at the Modular GPU Kernel Hackathon

In this talk, Jeff Niu from OpenAI explores how he brought

How to Beat PyTorch? Writing a Fast MatMul Kernel in Triton - Tensor Cores, L2 Caching & Auto-Tuning

How to Beat PyTorch? Writing a Fast MatMul Kernel in Triton - Tensor Cores, L2 Caching & Auto-Tuning

Matrix Multiplication is the heart of every Transformer model. If it's slow, your model is slow. In this episode of Bielik Anatomy, we ...

Lecture 28: Liger Kernel - Efficient Triton Kernels for LLM Training

Lecture 28: Liger Kernel - Efficient Triton Kernels for LLM Training

Byron Hsu presents LinkedIn's open-source collection of

Intro to Triton: A MyTorch Sidequest!

Intro to Triton: A MyTorch Sidequest!

In our quest to build a deep learning framework, we have hit a roadblock! Training is too slow and needs too much memory for ...

Kernel Fusion from Scratch: Writing a Triton Kernel and Patching It Into a Live LLM

Kernel Fusion from Scratch: Writing a Triton Kernel and Patching It Into a Live LLM

I loaded a 1.5B parameter LLM on a GTX 1650Ti, wrote a 30-line

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 6: Kernels, Triton, XLA

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 6: Kernels, Triton, XLA

For more information about Stanford's online Artificial Intelligence programs, visit: https://stanford.io/ai To learn more about ...

CUDA Programming Course – High-Performance Computing with GPUs

CUDA Programming Course – High-Performance Computing with GPUs

Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.