唇差分画像と畳み込みニューラルネットワークを用いたリップリーディング

李, 一婷; 高島, 悠樹; 滝口, 哲也; 有木, 康雄

doi:https://doi.org/10.24546/81011508

https://doi.org/10.24546/81011508

このアイテムのアクセス数:7件（2024-04-23 16:09 集計）

閲覧可能ファイル

ファイル	フォーマット	サイズ	閲覧回数	説明
81011508 (fulltext)	pdf	1.04 MB	7

メタデータ

ファイル出力

メタデータID	81011508
アクセス権	open access
出版タイプ	Version of Record
タイトル	唇差分画像と畳み込みニューラルネットワークを用いたリップリーディング
その他のタイトル	Lipreading using a difference image of lips and convolutional neural networks
著者	李, 一婷 ; 高島, 悠樹 ; 滝口, 哲也 ; 有木, 康雄
著者名李, 一婷
著者名高島, 悠樹
著者ID A1279 研究者ID 1000040397815 KUID https://kuid-rm-web.ofc.kobe-u.ac.jp/search/detail?systemId=b3ec2a1710d8267b520e17560c007669 著者名滝口, 哲也 Takiguchi, Tetsuya タキグチ, テツヤ所属機関名都市安全研究センター
著者ID A0260 研究者ID 1000010135519 KUID https://kuid-rm-web.ofc.kobe-u.ac.jp/search/detail?systemId=09a784b8ffbc912c520e17560c007669 著者名有木, 康雄 Ariki, Yasuo アリキ, ヤスオ所属機関名都市安全研究センター
収録物名	神戸大学都市安全研究センター研究報告
巻(号)	20
ページ	73-78
出版者	神戸大学都市安全研究センター
刊行日	2016-03
公開日	2019-05-30
抄録	近年，音声認識はパソコンや携帯電話でのインターフェースとして広まり，キーボードからの入力に替わるハンズフリーな技術として注目されています．しかしながら，雑音が多い状況下で認識性能が著しく低下してしまうことや，話すことが難しい場所では利用できないという問題がある．よって，雑音などの影響を受けない入力方法が求められる．本研究では，このような入力方法の一つのリップリーディングの実現を目標としている．リップリーディングは，唇の形や動きで発話内容を理解する方法である．本稿では，唇連続画像に対してフレーム間の変化を表現できる唇差分画像及び，近年，画像や音声認識に広く使われている畳み込みニューラルネットワーク（Convolutional Neural Network, CNN）を用いる手法を提案し，その有効性を示す．
抄録	In recent years, speech recognition spread throughout to a computer or mobile phone interface, and it has attracted attention as a hands-free technology that replaces the input through a keyboard. However, the recognition performance significantly decreases under a noisy condition, or in some conditions which are hard to speak loudly. In order to deal with the problem, as a noise-robust method, lipreading has been studied. In this paper, we investigated a difference image of lips, which can be used to express the difference between frames of an image sequence. Then we propose a robust feature extraction method using Convolutional Neural Networks (CNN). CNN has showed success in achieving translation invariance for many image-processing tasks. The success of CNN is largely attributed to the use of local filtering and pooling in the CNN architecture. Its effectiveness is confirmed by word recognition experiments. The experiment results showed that the CNN-based feature extraction method, which used the difference image of lips, outperformed the conventional method.
キーワード	リップリーディング
	唇差分画像
	畳み込みニューラルネットワーク（CNN）
カテゴリ	都市安全研究センター
	神戸大学都市安全研究センター研究報告＞ 20号（2016-03）
	紀要論文

資源タイプ	departmental bulletin paper
言語	Japanese (日本語)
ISSN	1342-9167　OPACで所蔵を検索　 CiNiiで学外所蔵を検索
NCID	AA11142997　OPACで所蔵を検索　 CiNiiで表示
関連情報	URI http://www.rcuss.kobe-u.ac.jp/publication/publication.html

閲覧可能ファイル

メタデータ

詳細を表示