大赛简介 |
在移动互联网、大数据的时代背景下,互联网上的视频数据呈现爆发式增长,作为日益丰富的信息承载媒介,视频的深度语义理解是诸多视频智能应用的基础,具有重要的研究意义和实际应用价值。传统基于感知的视频内容分析缺乏语义化理解能力,而充分利用知识图谱的语义化知识并结合多模态学习和知识推理技术,有望实现更深入的视频语义理解。 Semantic video understanding technology plays an integral role in quite a few of the most well-known mobile applications. In the past years, various perception-based video understanding methods are proven to be inadequate in cases where semantic knowledge or multi-modal information are essential cues for understanding.
知识增强的视频语义理解任务,期望融合知识、NLP、视觉、语音等相关技术和多模态信息,为视频生成刻画主旨信息的语义标签,从而实现视频的语义理解。本评测任务以互联网视频为输入,在感知内容分析(如人脸识别、OCR识别、语音识别等)的基础上,期望通过融合多模信息,并结合知识图谱计算与推理,为视频生成多知识维度的语义标签,进而更好地刻画视频的语义信息。 Knowledge-enhanced video understanding technology addresses this issue by introducing multi-modal learning and knowledge reasoning techniques. In this competition, you’re challenged to develop classification algorithms, as well as video tagging algorithms, which accurately assign video-level labels using the provided datasets and knowledge resources. To focus on the scope, perception information of the videos, including face recognition results, OCR results, ASR results, and visual feature vectors are provided alongside. |