Article

Pointwise and pairwise clothing annotation: combining features from social media

Citation

Nogueira K, Veloso AA & dos Santos JA (2016) Pointwise and pairwise clothing annotation: combining features from social media. Multimedia Tools and Applications, 75 (7), pp. 4083-4113. https://doi.org/10.1007/s11042-015-3087-2

Abstract
In this paper, we present effective algorithms to automatically annotate clothes from social media data, such as Facebook and Instagram. Clothing annotation can be informally stated as recognizing, as accurately as possible, the garment items appearing in the query photo. This task brings huge opportunities for recommender and e-commerce systems, such as capturing new fashion trends based on which clothes have been used more recently. It also poses interesting challenges for existing vision and recognition algorithms, such as distinguishing between similar but different types of clothes or identifying a pattern of a cloth even if it has different colors and shapes. We formulate the annotation task as a multi-label and multi-modal classification problem: (i) both image and textual content (i.e., tags about the image) are available for learning classifiers, (ii) the classifiers must recognize a set of labels (i.e., a set of garment items), and (iii) the decision on which labels to assign to the query photo comes from a set of instances that is used to build a function, which separates labels that should be assigned to the query photo, from those that should not be assigned. Using this configuration, we propose two approaches: (i) the pointwise one, called MMCA, which receives a single image as input, and (ii) a multi-instance classification, called M3CA, also known as pairwise approach, which uses pair of images to create the classifiers. We conducted a systematic evaluation of the proposed algorithms using everyday photos collected from two major fashion-related social media, namely pose.com and chictopia.com. Our results show that the proposed approaches provide improvements when compared to popular first choice multi-label, multi-modal, multi-instance algorithms that range from 20 % to 30 % in terms of accuracy.

Keywords
Media Technology; Computer Networks and Communications; Hardware and Architecture; Software; Image annotation; Clothing annotation; Bag of visual words; Machine learning; Multi-modal; Multi-instance; Multi-label

Journal
Multimedia Tools and Applications: Volume 75, Issue 7

StatusPublished
FundersBrazilian National Research Council
Publication date30/04/2016
Publication date online08/12/2015
Date accepted by journal17/11/2015
URLhttp://hdl.handle.net/1893/30351
PublisherSpringer Science and Business Media LLC
ISSN1380-7501
eISSN1573-7721

Research centres/groups