Reference Information
Title: Activity analysis enabling real-time video communication on mobile phones for deaf users
Title: Activity analysis enabling real-time video communication on mobile phones for deaf users
Authors: Neva Cherniavsky University of Washington, Seattle, WA, USA
Jaehong Chon University of Washington, Seattle, WA, USA
Jacob O. Wobbrock University of Washington, Seattle, WA, USA
Richard E. Ladner University of Washington, Seattle, WA, USA
Eve A. Riskin University of Washington, Seattle, WA, USA
Presentation Venue: UIST 2009: 22nd annual ACM symposium on User interface software and technology;
Jacob O. Wobbrock University of Washington, Seattle, WA, USA
Richard E. Ladner University of Washington, Seattle, WA, USA
Eve A. Riskin University of Washington, Seattle, WA, USA
Presentation Venue: UIST 2009: 22nd annual ACM symposium on User interface software and technology;
Date: 2009;
Location: New York, NY, USA
Summary
Location: New York, NY, USA
Summary
This paper presents a system called MobileASL for real-time video communication on the current U.S. mobile phone network. The goal is to enable Deaf people to communicate with Sign Language over mobile phones via sign language video in real-time.
The authors talk about some of the techniques they have been using to implement this system as well as prior work that has been done on video compression, and sign language video compression.
Challenges
The authors are faced with three main challenges that they need to resolve for MobileASL:
- Low bandwidth: the majority of U.S. mobile phone networks use GPRS
- Low processing speed: the best mobile phones out there have a 400 MHz processor, so the authors have to deal with this constraint and optimize the encoding of video in real-time
- Limited battery life: the authors are dealing with video compression which drains battery life fast.
All these challenges must be resolved while producing intelligible video.
The two main techniques the authors are using are:
- variable frame rate (VFR)
- region-of-interest (ROI)
Variable frame rate saves processor cycles by automatically determining whether a user is signing and encoding accordingly at the highest possible frame rate, and reducing the frame rate to 1 frame per second when the user is not signing. The authors are using a Baseling differencing technique that calculates the sum of absolute differences between successive frames to evaluate whether a user is signing.
As far as the region-of-interest technique, the authors explain that prior researchers found that most of signers watching sign language video focused almost entirely on the face. This one reason why the authors are using the ROI technique. ROI is an area of the frame that is coded at a higher quality at the expense of the rest of the frame. An example can be seen below.
The technique used for evaluating ROI is a quantizer step size or QP. Lower QP indicates higher quality of a frame but also requires a higher bit rate.
As far as the phones used, the authors are currently using HTC TyTN-II phones. These phones have Windows mobile 6.1, Qualcomm MSM7200, 400 MHz ARM processor, Li-polymer battery, and QCIF(176*144) video size.
To evaluate their system, the authors did a series of user studies with a combination of objective and subjective measures and questionnaires. This helped them determine whether users liked their VFR technique, ROI technique, and the likelihood they would want to use the system. It also allowed them to find out how much users had to guess when conversing, how much they had to ask a person to repeat him/herself, and how much they would get annoyed by the system.
The authors mention that the interest from users in using the system has encouraged them to pursue this project, however they do realize that they need to refine the techniques they use, in particular VFR as it received mixed responses.
The techniques they are using also do not allow them to distinguish between a person signing and a person drinking coffee. The authors also plan to move out of the lab and into the field to test their system.
Discussion
This was an interesting paper to read. Unlike previous papers where the math would discourage me from reading on, the math described here was not as hard to follow. I also appreciated how the authors put their work in context of previous work done in the field. This gave a clear view of what has been done, what and why the authors are currently doing what they do, and future work they plan to do.
The application in itself is interesting and could be a very useful tool for the Deaf community. Users from the case studies have already shown interest in the system and were somewhat disappointed that it wasn't available in the market yet.
No comments:
Post a Comment