使用PyTorch实现限制GPU显存的可使用上限，

文章由Byrx.net分享于2024-04-11 03:04:23评论（215）

使用PyTorch实现限制GPU显存的可使用上限，

从 PyTorch 1.4 版本开始，引入了一个新的功能 torch.cuda.set_per_process_memory_fraction(fraction, device)，这个功能允许用户为特定的 GPU 设备设置进程可使用的显存上限比例。

测试代码：

torch.cuda.empty_cache()
 
# 设置进程可使用的GPU显存最大比例为50%
torch.cuda.set_per_process_memory_fraction(0.5, device=0)
 
# 计算总内存
total_memory = torch.cuda.get_device_properties(0).total_memory
print("实际总内存:", round(total_memory / (1024 * 1024), 1), "MB")
 
# 尝试分配大量显存的操作
try:
    # 使用10%的显存:
    tmp_tensor = torch.empty(int(total_memory * 0.1), dtype=torch.int8, device='cuda:0')
    print("分配的内存:", round(torch.cuda.memory_allocated(0) / (1024 * 1024), 1), "MB")
    print("保留的内存:", round(torch.cuda.memory_reserved(0) / (1024 * 1024), 1), "MB")
    # 清空显存
    del tmp_tensor
    torch.cuda.empty_cache()
    # 使用50%的显存:
    torch.empty(int(total_memory * 0.5), dtype=torch.int8, device='cuda:0')
except RuntimeError as e:
    print("Error allocating tensor:", e)
 
# 打印当前GPU的显存使用情况
print("分配的内存:", torch.cuda.memory_allocated(0) / (1024 * 1024), "MB")
print("保留的内存:", torch.cuda.memory_reserved(0) / (1024 * 1024), "MB")

结果如下

已分配显存：通过torch.cuda.memory_allocated(device)查询，它返回已经直接分配给张量的显存总量。这部分显存是当前正在被Tensor对象使用的。

保留（预留）显存：通过torch.cuda.memory_reserved(device)查询，它包括了已分配显存以及一部分由PyTorch的CUDA内存分配器为了提高分配效率和减少CUDA操作所需时间而预留的显存。这部分预留的显存不直接用于存储Tensor对象的数据，但可以被视为快速响应未来显存分配请求的“缓冲区”。

知识补充

除了上文的方法，小编还为大家整理了一些其他PyTorch限制GPU使用的方法，有需要的可以参考下

限制使用显存

# 指定之后所有操作在 GPU3 上执行
torch.cuda.set_device(3)

# 限制 GPU3 显存使用50%
desired_memory_fraction = 0.5  # 50% 显存
torch.cuda.set_per_process_memory_fraction(desired_memory_fraction)

# 获取当前GPU上的总显存容量
total_memory = torch.cuda.get_device_properties(3).total_memory

# 指定使用 GPU3
tmp_tensor = torch.empty(int(total_memory * 0.4999), dtype=torch.int8, device="cuda") # 此处 cuda 即指 GPU3

# 获取当前已分配的显存，计算可用显存
allocated_memory = torch.cuda.memory_allocated()
available_memory = total_memory - allocated_memory

# 打印结果
print(f"Total GPU Memory: {total_memory / (1024**3):.2f} GB")
print(f"Allocated GPU Memory: {allocated_memory / (1024**3):.2f} GB")
print(f"Available GPU Memory: {available_memory / (1024**3):.2f} GB")

此时占用了50%的显存，而将0.4999改为0.5会爆显存，可能是受浮点数精度影响。

PyTorch限制GPU显存的函数与使用

函数形态

torch.cuda.set_per_process_memory_fraction(0.5, 0)

参数1：fraction 限制的上限比例，如0.5 就是总GPU显存的一半，可以是0~1的任意float大小；

参数2：device 设备号；如0 表示GPU卡 0号；

使用示例：

import torch
# 限制0号设备的显存的使用量为0.5，就是半张卡那么多，比如12G卡，设置0.5就是6G。
torch.cuda.set_per_process_memory_fraction(0.5, 0)
torch.cuda.empty_cache()
# 计算一下总内存有多少。
total_memory = torch.cuda.get_device_properties(0).total_memory
# 使用0.499的显存:
tmp_tensor = torch.empty(int(total_memory * 0.499), dtype=torch.int8, device='cuda')

# 清空该显存：
del tmp_tensor
torch.cuda.empty_cache()

# 下面这句话会触发显存OOM错误，因为刚好触碰到了上限:
torch.empty(total_memory // 2, dtype=torch.int8, device='cuda')

"""
It raises an error as follows: 
RuntimeError: CUDA out of memory. Tried to allocate 5.59 GiB (GPU 0; 11.17 GiB total capacity; 0 bytes already allocated; 10.91 GiB free; 5.59 GiB allowed; 0 bytes reserved in total by PyTorch)
"""
显存超标后，比不设置限制的错误信息多了一个提示，“5.59 GiB allowed;”

注意事项：

函数限制的是进程的显存，这点跟TensorFlow的显存限制类似。

到此这篇关于使用PyTorch实现限制GPU显存的可使用上限的文章就介绍到这了,更多相关PyTorch限制GPU使用上限内容请搜索3672js教程以前的文章或继续浏览下面的相关文章希望大家以后多多支持3672js教程！

您可能感兴趣的文章:

Pytorch 高效使用GPU的操作
pytorch 限制GPU使用效率详解(计算效率)
Pytorch GPU内存占用很高,但是利用率很低如何解决
检测pytorch是否使用GPU的方法小结
PyTorch使用GPU加速计算的实现
pytorch GPU计算比CPU还慢的可能原因分析
pytorch无法使用GPU问题的解决方法

热门文章：

使用PyTorch实现限制GPU显存的可使用上限，

使用PyTorch实现限制GPU显存的可使用上限，

相关内容

最新python教程

python~HOT