YOLOv11小白的进击之路（八）理解Grad-CAM可视化源码

对Grad-CAM源码进行了研读，同时分析了Grad-CAM源码各部分的调用关系。

水静川流

1828人浏览 · 2025-02-24 23:00:55

水静川流 · 2025-02-24 23:00:55 发布

最近在对改YOLOv11做可视化，主要利用的是Grad-CAM方法，所以花了点时间对其源码加以分析理解。

GradCAM类

类定义部分

class GradCAM:  
    def __init__(self,  
                 model,  
                 target_layers,  
                 reshape_transform=None,  
                 use_cuda=False):

class GradCAM:：定义了一个名为GradCAM的类。
def __init__(self, ...):：初始化方法，创建类实例时调用。
- model：要解释的深度学习模型。
- target_layers：需要提取激活和梯度的目标层。
- reshape_transform：可选的形状变换函数，用于调整激活和梯度的形状。
- use_cuda：一个布尔值，指示是否使用CUDA加速。

        self.model = model.eval()  
        self.target_layers = target_layers  
        self.reshape_transform = reshape_transform  
        self.cuda = use_cuda

self.model = model.eval()：将模型设置为评估模式。
self.target_layers = target_layers：保存目标层列表。
self.reshape_transform = reshape_transform：保存形状转换函数。
self.cuda = use_cuda：保存CUDA设置。

        if self.cuda:  
            self.model = model.cuda()

if self.cuda:：如果选择使用CUDA，则将模型转移到GPU。

        self.activations_and_grads = ActivationsAndGradients(  
            self.model, target_layers, reshape_transform)

self.activations_and_grads = ActivationsAndGradients(...)：创建ActivationsAndGradients类的实例，用于提取激活和梯度。

获取权重和损失

    @staticmethod  
    def get_cam_weights(grads):  
        return np.mean(grads, axis=(2, 3), keepdims=True)

@staticmethod：声明该方法为静态方法。
def get_cam_weights(grads):：计算每个通道的权重，返回梯度在空间维度上的均值。

    @staticmethod  
    def get_loss(output, target_category):  
        loss = 0  
        for i in range(len(target_category)):  
            loss = loss + output[i, target_category[i]]  
        return loss

def get_loss(output, target_category):：计算损失值。对于目标类别，返回模型输出的对应值之和。

获取CAM图像

    def get_cam_image(self, activations, grads):  
        weights = self.get_cam_weights(grads)  
        weighted_activations = weights * activations  
        cam = weighted_activations.sum(axis=1)  
   
        return cam

def get_cam_image(self, activations, grads):：计算CAM图像。
weights = self.get_cam_weights(grads)：获取每层的权重。
weighted_activations = weights * activations：通过权重加权激活值。
cam = weighted_activations.sum(axis=1)：在通道维度上求和以生成CAM图像。

获取目标宽高

    @staticmethod  
    def get_target_width_height(input_tensor):  
        width, height = input_tensor.size(-1), input_tensor.size(-2)  
        return width, height

def get_target_width_height(input_tensor):：获取输入张量的宽度和高度

计算每层的CAM

    def compute_cam_per_layer(self, input_tensor):  
        activations_list = [a.cpu().data.numpy()  
                            for a in self.activations_and_grads.activations]  
        grads_list = [g.cpu().data.numpy()  
                      for g in self.activations_and_grads.gradients]  
        target_size = self.get_target_width_height(input_tensor)

def compute_cam_per_layer(self, input_tensor):：为每个目标层计算CAM。
activations_list 和 grads_list：提取激活值和梯度的NumPy数组。
target_size = self.get_target_width_height(input_tensor)：获取输入图像的目标尺寸。

        cam_per_target_layer = []  
        for layer_activations, layer_grads in zip(activations_list, grads_list):  
            cam = self.get_cam_image(layer_activations, layer_grads)  
            cam[cam < 0] = 0  
            scaled = self.scale_cam_image(cam, target_size)  
            cam_per_target_layer.append(scaled[:, None, :])  
   
        return cam_per_target_layer

cam_per_target_layer：存储每个层的CAM图像。
zip(activations_list, grads_list)：同时遍历激活和梯度。
scaled = self.scale_cam_image(cam, target_size)：缩放CAM图像到目标大小。
return cam_per_target_layer：返回所有目标层的CAM。

聚合多层的CAM

    def aggregate_multi_layers(self, cam_per_target_layer):  
        cam_per_target_layer = np.concatenate(cam_per_target_layer, axis=1)  
        cam_per_target_layer = np.maximum(cam_per_target_layer, 0)  
        result = np.mean(cam_per_target_layer, axis=1)  
        return self.scale_cam_image(result)

def aggregate_multi_layers(self, cam_per_target_layer):：聚合多层的CAM图像。
np.concatenate(cam_per_target_layer, axis=1)：沿着通道维度合并。
result = np.mean(cam_per_target_layer, axis=1)：计算平均值。

缩放CAM图像

    @staticmethod  
    def scale_cam_image(cam, target_size=None):  
        result = []  
        for img in cam:  
            img = img - np.min(img)  
            img = img / (1e-7 + np.max(img))  
            if target_size is not None:  
                img = cv2.resize(img, target_size)  
            result.append(img)  
        result = np.float32(result)  
   
        return result

def scale_cam_image(cam, target_size=None):：缩放CAM图像到目标大小。
对每张图像归一化处理，并根据需要调整大小。

调用方法

    def __call__(self, input_tensor, target_category=None):  
        if self.cuda:  
            input_tensor = input_tensor.cuda()  
   
        output = self.activations_and_grads(input_tensor)

        if isinstance(target_category, int):  
            target_category = [target_category] * input_tensor.size(0)

def __call__(self, input_tensor, target_category=None):使得该类的实例可以像函数一样调用。
if self.cuda:如果使用CUDA，将输入张量转移到GPU。
output = self.activations_and_grads(input_tensor)：通过模型进行前向传播
if isinstance(target_category, int):：如果目标类别是单个整数，复制为与输入大小相同的列表。

        if target_category is None:  
            target_category = np.argmax(output.cpu().data.numpy(), axis=-1)  
            print(f"category id: {target_category}")

        self.model.zero_grad()  
        loss = self.get_loss(output, target_category)  
        loss.backward(retain_graph=True)

if target_category is None:：如果没有指定目标类别，获取输出中最大概率的类别。
self.model.zero_grad()：清除之前的梯度。
loss = self.get_loss(output, target_category)：计算当前的损失。
loss.backward(retain_graph=True)：反向传播以计算梯度。

计算CAM并返回

        cam_per_layer = self.compute_cam_per_layer(input_tensor)  
        return self.aggregate_multi_layers(cam_per_layer)

cam_per_layer = self.compute_cam_per_layer(input_tensor)：计算每层的CAM。
return self.aggregate_multi_layers(cam_per_layer)：返回聚合后的CAM结果。

清理和上下文管理

    def __del__(self):  
        self.activations_and_grads.release()

    def __enter__(self):  
        return self  
   
    def __exit__(self, exc_type, exc_value, exc_tb):  
        self.activations_and_grads.release()  
        if isinstance(exc_value, IndexError):  
            print(  
                f"An exception occurred in CAM with block: {exc_type}. Message: {exc_value}")  
            return True

def __del__(self):：类被销毁时，释放钩子。
__enter__ 和 __exit__：实现上下文管理器，方便使用with语句时自动管理资源的释放。

ActivationsAndGradients类

这个类的主要作用是通过注册钩子来提取神经网络中目标层的激活值和梯度，方便后续的可视化分析（在前向传播时保存激活，在反向传播时保存梯度，提供了一种比较便捷的清理机制）。

函数定义部分

def show_cam_on_image(img: np.ndarray,  
                      mask: np.ndarray,  
                      use_rgb: bool = False,  
                      colormap: int = cv2.COLORMAP_JET) -> np.ndarray:

def show_cam_on_image(...):：定义一个名为show_cam_on_image的函数。
img: np.ndarray：输入图像，格式为NumPy数组（可为RGB或BGR）。
mask: np.ndarray：CAM图像，热力图的基础。
use_rgb: bool = False：布尔值参数，指定输入图像的格式是RGB还是BGR；默认为False（表示BGR）。
colormap: int = cv2.COLORMAP_JET：OpenCV的colormap，默认为cv2.COLORMAP_JET。

热力图生成与颜色空间转换

heatmap = cv2.applyColorMap(np.uint8(255 * mask), colormap)

if use_rgb:  
    heatmap = cv2.cvtColor(heatmap, cv2.COLOR_BGR2RGB)

heatmap = np.float32(heatmap) / 255

cv2.applyColorMap(...)：将CAM掩膜（mask）转换为热力图。mask经过255倍缩放并转换为无符号8位整数（np.uint8），为了在应用颜色映射时获得适当的值。
if use_rgb:：如果输入图像的格式是RGB，则将热力图从BGR转换为RGB。
np.float32(heatmap) / 255：将热力图转换为浮点数格式，并归一化到[0, 1]范围。

输入图像验证

if np.max(img) > 1:  
    raise Exception(  
        "The input image should be np.float32 in the range [0, 1]")

if np.max(img) > 1:：检查输入图像的最大值是否大于1，若是则抛出异常。确保图像在[0, 1]范围内，以便后续的叠加不会产生不期望的结果。

叠加和归一化

cam = heatmap + img  
cam = cam / np.max(cam)

return np.uint8(255 * cam)

cam = heatmap + img：将热力图叠加到原始图像上，形成最终的可视化效果。
cam = cam / np.max(cam)：对叠加后的图像进行归一化，以确保所有值在[0, 1]之间。
return np.uint8(255 * cam)：将归一化的图像转换回[0, 255]范围，并返回结果。

其他函数

show_cam_on_image函数的主要作用是根据给定的热力图掩膜将其叠加在给定图像上，生成一个可视化的热图，使得模型的注意力区域清晰可见。换句话说，通过使用OpenCV的颜色映射功能，热力图被有效地转换为直观的格式。

center_crop_img函数则是对输入图像进行中心裁剪和大小调整，以确保输出图像的尺寸与指定的size相匹配。这个功能虽然比较基础，但是在处理深度学习模型输入时非常有用，能够保持图像内容的中心区域，同时确保图像尺寸一致。

调用流程

输入图像处理：用户提供原始图像并将其作为输入传递给GradCAM实例，这时候通常会通过center_crop_img进行预处理，就是为了确保图像大小一致。
正向传播：
GradCAM调用activations_and_grads(input_tensor)，这实际上触发了ActivationsAndGradients实例的__call__方法，进行正向传播来得到模型的输出。
损失计算与反向传播：
在得到输出后，GradCAM根据输出和目标类别计算损失，然后反向传播以获取相应层的梯度。
CAM计算：
使用提取的激活值和梯度信息，GradCAM计算CAM图像。
图像可视化：
GradCAM生成的CAM结果会作为掩膜传递给show_cam_on_image函数，结合原始图像叠加热力图生成最后的可视化结果。