machine-learning - OpenPose 如何通过输出形状和真实不匹配来实现其损失函数?
问题描述
我最近一直在实现一个基于 OpenPose 的模型。在 OpenPose 中,它使用 VGG 作为其主干模型来提取特征图,但 VGG 包含最大池化层,这会将输出的形状减少到 1/4。下面是 OpenPose 的模型结构:
VGGOpenPose(
(model0): OpenPose_Feature(
(model): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): ReLU(inplace=True)
(18): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): Conv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(24): ReLU(inplace=True)
(25): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(26): ReLU(inplace=True)
)
)
(model1_1): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU(inplace=True)
(6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(512, 38, kernel_size=(1, 1), stride=(1, 1))
)
(model2_1): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 38, kernel_size=(1, 1), stride=(1, 1))
)
(model3_1): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 38, kernel_size=(1, 1), stride=(1, 1))
)
(model4_1): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 38, kernel_size=(1, 1), stride=(1, 1))
)
(model5_1): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 38, kernel_size=(1, 1), stride=(1, 1))
)
(model6_1): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 38, kernel_size=(1, 1), stride=(1, 1))
)
(model1_2): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU(inplace=True)
(6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(512, 19, kernel_size=(1, 1), stride=(1, 1))
)
(model2_2): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
)
(model3_2): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
)
(model4_2): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
)
(model5_2): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
)
(model6_2): Sequential(
(0): Conv2d(185, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(3): ReLU(inplace=True)
(4): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(5): ReLU(inplace=True)
(6): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(7): ReLU(inplace=True)
(8): Conv2d(128, 128, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
)
)
在原始论文中,它说groundtruth热图和paf与输入图像具有相同的宽度和高度。 OpenPose:使用部分亲和场的实时多人 2D 姿势估计
我已经在 Python 中搜索了一些 OpenPose 的实现。他们中的大多数使用 element-wise loss function 来计算输出和 groundtruth label 之间的损失,就像论文中提到的函数一样:
我想知道 OpenPose 的输出是否与输入图像的大小不同,以及 OpenPose 是如何计算输出和 groundtruth heatmap/paf 之间的损失函数的?
解决方案
推荐阅读
- javascript - 谷歌浏览器扩展权限问题,带有谷歌文档(仅限)
- vb.net - 子报表增长到主报表不创建新页面
- android - TextView 中的水平居中文本不起作用
- php - 如何将 php、nodejs 和 mysql 应用程序部署到 IBM Cloud
- sql - 如何在 SQL Server 表的所有行中添加唯一标识符
- azure - Azure 云服务中的自定义部署槽(经典)
- javascript - React 会在多个函数 setState 调用之间渲染吗?
- sql-server - 用户定义函数 SQL 错误
- bash - 空间选项卡之谜,在 git 与 vim 中
- ruby-on-rails - 奇怪的 Nginx 重定向到外部服务器