Tom Yeh
Tom Yeh is an assistant research scientist in the University of Maryland Institute for Advanced Computer Studies (UMIACS). He received his Ph.D. from MIT in Computer Science in 2009 and started at the University of Maryland in 2010. His research interests span human computer interaction, computer vision, and software engineering. He has written over 30 research publications on algorithms for interactive computer vision, vision-based interactive systems, multimedia information retrieval, and visual software test automation. He has served on the program committees of the conferences in his area including the Symposium on User Interface Software and Technology and the Workshop on Compute Vision Application. He has won a number of best paper awards. He is one of the creators of the popular Sikuli software that enables non-programmers to write simple image-based automation scripts.
Publications
2012
2012. Co-designing an e-health tutorial for older adults. Proceedings of the 2012 iConference. :240-247.
2011
2011. Photo-based mobile deixis system and related techniques. 10/762,941(7872669)
2011. Associating the visual representation of user interfaces with their internal structures and metadata. Proceedings of the 24th annual ACM symposium on User interface software and technology. :245-256.
2011. Co‐designing contextual tutorials for older adults on searching health information on the internet. Proceedings of the American Society for Information Science and Technology. 48(1):1-4.
2011. A case for query by image and text content: searching computer help using screenshots and keywords. Proceedings of the 20th international conference on World wide web. :775-784.
2011. Creating contextual help for GUIs using screenshots. Proceedings of the 24th annual ACM symposium on User interface software and technology. :145-154.
2011. Active inference for retrieval in camera networks. Person-Oriented Vision (POV), 2011 IEEE Workshop on. :13-20.
2011. A case for query by image and text content: searching computer help using screenshots and keywords. Proceedings of the 20th international conference on World wide web. :775-784.
2011. Active inference for retrieval in camera networks. Person-Oriented Vision (POV), 2011 IEEE Workshop on. :13-20.
2010
2010. VizWiz: nearly real-time answers to visual questions. Proceedings of the 23nd annual ACM symposium on User interface software and technology. :333-342.
2010. VizWiz::LocateIt - enabling blind people to locate objects in their environment. Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on. :65-72.
2010. Why Did the Person Cross the Road (There)? Scene Understanding Using Probabilistic Logic Models and Common Sense Reasoning Computer Vision – ECCV 2010. 6312:693-706.
2010. GUI testing using computer vision. Proceedings of the 28th international conference on Human factors in computing systems. :1535-1544.
2010. Web-scale computer vision using MapReduce for multimedia data mining. Proceedings of the Tenth International Workshop on Multimedia Data Mining. :9:1–9:10-9:1–9:10.
2009
2009. Searching documentation using text, OCR, and image. Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. :776-777.
2009. Sikuli: using GUI screenshots for search and automation. Proceedings of the 22nd annual ACM symposium on User interface software and technology. :183-192.
2009. Fast concurrent object localization and recognition. Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. :280-287.
2008
2008. Photo-based question answering. Proceedings of the 16th ACM international conference on Multimedia. :389-398.
2008. Scalable classifiers for Internet vision tasks. Computer Vision and Pattern Recognition Workshops, 2008. CVPRW'08. IEEE Computer Society Conference on. :1-8.
2008. Dynamic visual category learning. Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. :1-8.
2008. Fast concurrent object classification and localization. CSAIL Technical Reports (July 1, 2003 - present).
2008. Multimodal question answering for mobile devices. Proceedings of the 13th international conference on Intelligent user interfaces. :405-408.
2007
2007. Adaptive Vocabulary Forests br Dynamic Indexing and Category Learning. Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on. :1-8.
2006
2006. IDiexis: Mobile image-based search on world wide web-a picture is worth a thousand keywords. Proc. of Mobisys.
2005
2005. Doubleshot: an interactive user-aided segmentation tool. Proceedings of the 10th international conference on Intelligent user interfaces. :287-289.
2005. A picture is worth a thousand keywords: image-based object search on a mobile platform. CHI '05 extended abstracts on Human factors in computing systems. :2025-2028.
2004
2004. Searching the web with mobile images for location recognition. Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on. 2:76-81.
2004. Ideixis-image-based deixis for finding location-based information. Mobile HCI, Vienna, Austria, Pages. :781-782.
2004. IDeixis–Searching the Web with Mobile Images for Location-Based Information. Mobile Human-Computer Interaction–MobileHCI 2004. :61-125.
2004. IDeixis: image-based Deixis for finding location-based information. CHI '04 extended abstracts on Human factors in computing systems. :781-782.
2000
2000. An assumptive logic programming methodology for parsing. Tools with Artificial Intelligence, 2000. ICTAI 2000. Proceedings. 12th IEEE International Conference on. :11-18.