Nowadays, increasingly rich and massive social media data (such as texts, images, audios, videos, blogs, and so on) are being posted to the web, including social networking websites (e.g., MySpace, Facebook), photo and video sharing websites (e.g., Flickr, YouTube), and photo forums (e.g., Photosig.com and Photo.net). Recently, researchers from multidisciplinary areas have proposed to use data-driven approaches for multimedia content understanding by leveraging such unlimited web images and videos as well as their associated rich contextual information (e.g., tag, comments, category, title and metadata). First, we introduce the important general concepts and themes of this timely topic. We then review and summarize the recent multimedia content analysis methods using web-scale social media data, highlighting with two examples in personal image search and video event recognition. Finally, we provide some insight into the challenges and future directions in this area.
Jiebo Luo is a Senior Principal Scientist with the Kodak Research Laboratories in Rochester, NY. His research interests include image processing, computer vision, machine learning, social media data mining, medical imaging, and computational photography. Dr. Luo has authored over 160 technical papers and holds over 60 US patents. Dr. Luo has been actively involved in numerous technical conferences, including serving as the general chair of ACM CIVR 2008, program co-chair of IEEE CVPR 2012, ACM Multimedia 2010 and SPIE VCIP 2007, area chair of IEEE ICASSP 2009-2010, ICIP 2008-2010, CVPR 2008 and ICCV 2011, and an organizer of ICME 2006/2008/2010 and ICIP 2002. Currently, he serves on several IEEE SPS Technical Committees (IMDSP, MMSP, and MLSP) and conference steering committees (ACM ICMR and IEEE ICME). He is the Editor-in-Chief of the Journal of Multimedia, and has served on the editorial boards of the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), the IEEE Transactions on Multimedia (TMM), the IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), Pattern Recognition (PR), Machine Vision and Applications (MVA), and Journal of Electronic Imaging (JEI). He is a Fellow of the SPIE, IEEE, and IAPR.
When extracting semantic information from generic image and video databases, such as those created and maintained by broadcasting companies, one can identify three different types of scenarios: (i) strongly contextualized applications, where semantic objects to be extracted are very specific and no other objects are to be extracted; (ii) contextualized applications, where the context helps adapting and improving the object detection algorithms; and (iii) generic applications, where the lack of context implies the use of generic algorithms. In this talk, we will overview and exemplify each one of these applications to, afterwards, concentrate on the use of region-based generic tools to analyze images and to detect instances of previously modeled semantic concepts.
Ferran Marqués received a degree on Electrical Engineering from the Technical University of Catalonia (UPC), Barcelona, Spain, in 1988. From 1989 to 1990, he joined the Digital Image Sequence Processing and Coding Group at the Signal Processing laboratory of the Swiss Federal Institute of Technology (EPFL). In June 1990, he joined the Department of Signal Theory and Communications of the Technical University of Catalunya (UPC). From June 1991 to September 1991, he was with the Signal and Image Processing Institute (SIPI) at the University of Southern California (USC). He received the Ph.D. from the UPC in 1992, and he obtained the Spanish Best Ph.D. Thesis on Electrical Engineering Award-1992.
In 1993 he joined the Technical University of Catalonia where he is currently Professor. He has served as Associate Dean for International Relations of the Telecommunication School (ETSETB) at UPC (1997-2000). He has been President of the European Association for Signal Processing (EURASIP) in the term 2002-2004. He has been member of the Direction Board of the European Master of Research on Information Technologies (MERIT), which has been awarded with the Erasmus Mundus distinction (2004-2009).
He served as Associate Editor of the Journal of Electronic Imaging (SPIE) in the area of Image Communications (1996-2000), as member of the EURASIP Journal of Applied Signal Processing Editorial Board (2001-2003) and of the EURASIP International Journal of Image and Video Processing Editorial Board. Currently, he is Associate Editor of IEEE Transactions on Image Processing, IEEE (2009-) and Area Editor for Signal Processing: Image Communication, Elsevier (2010-).
He has been Special Sessions Co-Chairman in the International Conference on Image Processing (ICIP-2003, Barcelona, Spain, September 2003) and Technical Chairman of the 4th International Workshop on Content-Based Multimedia Indexing (CBMI-2005, Riga, Latvia, June 2005). Currently he is acting as Special Sessions Co-Chairman in the International Conference on Image Processing (ICIP-2011, Brussels, Belgium, September 2011) and Plenary Talks Chair for the 19th European Signal Processing Conference, EUSIPCO-2011.
He is author or co-author of more than 100 publications that have appeared as journal papers and proceeding articles, 4 book chapters and 4 international patents. He has recently co-authored, jointly with Thierry Dutoit, the book “Applied Signal Processing: A Matlab-based proof of concept”, Springer, 2009 and edited, jointly with the Jean-Philippe Thiran and Hervé Boulard, the book “Multimodal signal processing: Theory and applications for human computer interaction”, Elsevier, 2009.
This talk takes stock of two decades of intense research in content-based multimedia indexing. On the success side, there are numerous methods of near-duplicate detection such as Shazam and Snaptell, based on fingerprinting and quantisation. Yet most of the challenges still remain: semantic gap, polysemy and scalability of indexing. Even more challenges (and opportunities!) have been created by connecting social networks to multimedia and by extending the notion of multimedia to sensor networks. I will look into some particular applications and discuss research opportunities and challenges.
Prof Stefan Rüger read Physics at the Freie Universität Berlin and gained his PhD at the Technische Universität Berlin (1996). He carved out his academic career at Imperial College London (1997-2006), where he also held an EPSRC Advanced Research Fellowship (1999-2004). In 2006 he became a Professor of Knowledge Media when he joined The Open University's Knowledge Media Institute to head a research group on Multimedia and Information Systems. Since 2009 he has held a Honorary Professorship from the University of Waikato, New Zealand, for his collaboration with the Greenstone Digital Library group on Multimedia Digital Libraries. Rüger has published widely in the area of Multimedia Information Retrieval. He was Principal Investigator in the EPSRC-funded Multimedia Knowledge Management Network, of a recent EPSRC grant to research and develop video digital libraries and for The Open University in the European FP6-ICT project PHAROS that established a horizontal layer of technologies for large-scale audio-visual search engines. As of 2011, he has served the academic community in various roles as conference chair (3x), programme chair (3x), journal editor (3x), guest editor (3x) and as referee for a wide range of Computing journals (>25), international conferences (>50) and research sponsors (>10). Rüger is a member of the EPSRC College, ACM, BCS, the BCS IRSG committee and a fellow of the Higher Education Academy.