- intro
- core, install, cuda, tensors/numpy
- tensor fundamentals
- operations, mem management
- creation, math ops, autograd track, mem for dataloader
- gpu, shared mem
- data pipeline
- dataset loader, transforms,
- custom dataset, batching strats, custom transforms w lambdas
- workers, prefetching, gpu accel, sharding
- neural network core
- nn.module, layers, weigh init
- param registr
- convolutional 12/2d/3d, xavier/kaiming, mixing cnn/rnn
- hooks, multimodal nets
- training workflow
- autograd, loss functs, optimizers,
- gradiente accumulation, custom loss with c++, swa,
- retain_graph, class-weighted losses, lbfgs, amp
- Model deploy
- serialization, torchscript, onnx export
- zipfile zerialization, python quicks
- Distributed computing
- data parallelism, model parallelism
- distributedDataParallel, torch.pipeline, remote mods
- gradient bucketing, tensor parallelism, asynchronous RPC, nccl backend tunning
- Performance
- profiler toos, gpu, jit compiliation, quantization
- tensorboard profiling, stream semantics, fusion passes, QAT
- mem snapshot analysis, mps, graph optimization, dynamic quantization
- Advanced archs
- graph nets, meta leraning, probabilistic DL, sparse nets
- PyG, maml, torch.distribution, prunning apis
- gradient-based-meta learning, block sparsity
- ecosystem
- torchvision, torchtext, torchaudio, pytorch lightning
- detection mask r-cnn, streaming api, fabric api
- video models, bert tokenizers, text-to-speech, ligthing cli
- debugging & testing
- debubgging tools, unit test
- cuda oom, autograd detect anomaly, pytest, debugging hooks, gradient testing