Steps
  1. Prepare the data and model
  2. Use profiler to record execution events
  3. Run the profiler
  4. Use TensorBoard to view results and analyze model performance
  5. Improve performance with the help of profiler
  6. Analyze performance with other advanced features
  7. Additional Practices: Profiling PyTorch on AMD GPUs
1. Prepare the data and model

导入需要的库:

import torch
import torch.nn
import torch.optim
import torch.profiler
import torch.utils.data
import torchvision.datasets
import torchvision.models
import torchvision.transforms as T

准备数据集

transform = T.Compose(
    [T.Resize(224),
     T.ToTensor(),
     T.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=32, shuffle=True)

模型定义

device = torch.device("cuda:0")
model = torchvision.models.resnet18(weights='IMAGENET1K_V1').cuda(device)
criterion = torch.nn.CrossEntropyLoss().cuda(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
model.train()

模型训练

def train(data):
    inputs, labels = data[0].to(device=device), data[1].to(device=device)
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
2. 使用Profiler记录轨迹

some useful parameters are as follow:

schedule: 参数例如wait=1,warmup=1,active=3,repeat=1(profiler 会跳过第一个step/iteration,在第二个iter热身,记录三个iter。). In total, the cycle repeats once. Each cycle is called a “span” in TensorBoard plugin.

wait阶段,profiler 不生效,在warmup 阶段,proliler 开始工作但不记录结果,是为了减少开销,proliling 的开始开销很大,会影响结果。

on_trace_ready : 在每个cylce结束时调用,例如使用torch.profiler.tensorboard_trace_handler来时生成Tensorboard使用的结果文件,在Profiling后,结果文件存储在./log/resnet18中。

record_shapes:是否记录输入张亮的形状

profile_memory: 追踪张量空间申请和释放。

with_stack:记录算子的代码信息,如果在vscode中集成TensorBoard, 单击可以跳转到特定行。

https://code.visualstudio.com/docs/datascience/pytorch-support#_tensorboard-integration

以上下文管理器启动/停止:

with torch.profiler.profile(
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=1),
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
        record_shapes=True,
        profile_memory=True,
        with_stack=True
) as prof:
    for step, batch_data in enumerate(train_loader):
        prof.step()  # Need to call this at each step to notify profiler of steps' boundary.
        if step >= 1 + 1 + 3:
            break
        train(batch_data)

也可以以非上下文管理器启动/停止:

prof = torch.profiler.profile(
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=1),
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
        record_shapes=True,
        with_stack=True)
prof.start()
for step, batch_data in enumerate(train_loader):
    prof.step()
    if step >= 1 + 1 + 3:
        break
    train(batch_data)
prof.stop()
3. 运行profiler
4. 使用Tensorboard展示结果

安装Pytorch Profiler TensorBoard Plugin

pip install torch_tb_profiler

登录TensorBoard

tensorboard --logdir=./log

打开TensorBoard

http://localhost:6006/#pytorch_profiler
Logo

永洪科技,致力于打造全球领先的数据技术厂商,具备从数据应用方案咨询、BI、AIGC智能分析、数字孪生、数据资产、数据治理、数据实施的端到端大数据价值服务能力。

更多推荐