作為未來互聯網3.0的主要應用場景,元字審成為目前包括IT領域在內很多應用的熱點話題。報告從基本的數據概念講起,重點結合講者主持的國家重點研究計劃項目的研發進展,對目前元宇宙的一些機會和發展現狀,提出了自己的一些理解和觀點,進而針對工業互聯網未來的應用需求,介紹了工業元宇宙的相關技術及發展趨勢,進而討論了智能技術在工業領域更多場景的落地應用。
Video Moment Retrieval (VMR) aims to retrieve a temporal moment that semantically corresponds to a language query from an untrimmed video. Connecting computer vision and natural language, VMR has drawn significant attention from researchers in both communities. The existing solutions for this problem can be roughly divided into two categories based on whether candidate moments are generated: Moment-based approach and Clip-based approach. Both frameworks have respective shortcomings: the moment-based models suffer from heavy computations, while the performance of clip-based models is familiarly inferior to moment-based counterparts. To this end. we design an intuitive and efficient Dual-Channel Localization Network (DCLN) to balance computational cost and retrieval performance. Meanwhile, despite their effectiveness, Moment-based and Clip-based methods mostly focus only on aligning the query and single-level chip or moment features, and ignore the different granularities involved in the video itself, such as clip, moment, or video, resulting in insufficient cross-modal interaction. To this end, we also propose a Temporal Localization Network with Hierarchical Contrastive Learning (HCLNet) for the VMR task. This report will detail these two works and also share our deeper insights.
主辦:CCF
承辦:CCF協同計算專業委員會、太原科技大學