Title: Computer Vision with a Billion Eyes
Abstract: A recent trend in computer vision is driven by images and video generated by heterogeneous and multi-perspective visual sensing networks. We present a few examples of research along this line. First, we will present an interesting framework for event recognition. Semantic event recognition based only unconstrained still images available on the Internet or in personal repositories is a challenging problem. With GPS information, we obtain satellite images corresponding to picture locations and investigate their novel use to recognize the picture-taking environment. We then combine this inference with classical vision-based event detection methods and demonstrate the synergistic fusion of the two approaches. However, the current GPS data only identifies the camera location, leaving the viewing direction uncertain. Second, to determine the viewing direction for geotagged photos, we utilize both Google StreetView and Google Earth satellite images: 1) visual matching between a user photo and any available street views in the vicinity determine the viewing direction, and 2) when only an overhead satellite view is available, near-orthogonal view matching between the user photo and satellite imagery computes the viewing direction. Third, we explore using phone-captured images for localization as it contains more context information than the embedded sensory GPS coordinates. The proposed approach is able to provide a comprehensive set of accurate geo-context based on the current image and its associated sensory GPS location. The geo-context includes the real location of mobile user and scene, the viewing angle, and the distance between the user and scene. We then take advantage of the aforementioned techniques to build applications to enable people to enjoy ubiquitous location-based services (LBS) using their phones. including: 1) accurate augmented reality, 2) collaborative localization for rendezvous routing, and 3) routing for photography. Fourth, we leverage crowd-sourced photos to remove unwanted bystanders from tourist photos taken at popular attractions and measure air pollution in major cities in China. Furthermore, given a new source of visual data from public webcams deployed in urban environments, we will present some ongoing work on crowd analytics using such data.
Speaker: Jiebo Luo is a professor of Computer Science at the University of Rochester. Prior to joining Rochester in Fall 2011, he was a Senior Principal Scientist with the Kodak Research Laboratories. His research interests include image processing, computer vision, machine learning, social media data mining, medical imaging, and pervasive computing. Dr. Luo has authored over 200 technical papers and holds over 70 US patents. Dr. Luo has been actively involved in numerous technical conferences, including serving as the general chair of ACM CIVR 2008 and ACM Multimedia 2018, program co-chair of ACM Multimedia 2010, IEEE CVPR 2012 and IEEE ICIP 2017, area chair of IEEE ICASSP 2009-2012, ICIP 2008-2012, CVPR 2008 and ICCV 2011, and an organizer of ICME 2006/2008/2010 and ICIP 2002. Currently, he serves on several IEEE SPS Technical Committees (IMDSP, MMSP, and MLSP) and conference steering committees (ACM ICMR and IEEE ICME). He is the Editor-in-Chief of the Journal of Multimedia, and has served on the editorial boards of the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), the IEEE Transactions on Multimedia (TMM), the IEEE Transactions onCircuits and Systems for Video Technology (TCSVT), Pattern Recognition (PR), Machine Vision and Applications (MVA), and Journal of Electronic Imaging (JEI). He is a Fellow of the SPIE, IEEE and IAPR.