Pointwise and pairwise clothing annotation: combining features from social media


Nogueira K, Veloso AA & dos Santos JA (2016) Pointwise and pairwise clothing annotation: combining features from social media. Multimedia Tools and Applications, 75 (7), pp. 4083-4113.

In this paper, we present effective algorithms to automatically annotate clothes from social media data, such as Facebook and Instagram. Clothing annotation can be informally stated as recognizing, as accurately as possible, the garment items appearing in the query photo. This task brings huge opportunities for recommender and e-commerce systems, such as capturing new fashion trends based on which clothes have been used more recently. It also poses interesting challenges for existing vision and recognition algorithms, such as distinguishing between similar but different types of clothes or identifying a pattern of a cloth even if it has different colors and shapes. We formulate the annotation task as a multi-label and multi-modal classification problem: (i) both image and textual content (i.e., tags about the image) are available for learning classifiers, (ii) the classifiers must recognize a set of labels (i.e., a set of garment items), and (iii) the decision on which labels to assign to the query photo comes from a set of instances that is used to build a function, which separates labels that should be assigned to the query photo, from those that should not be assigned. Using this configuration, we propose two approaches: (i) the pointwise one, called MMCA, which receives a single image as input, and (ii) a multi-instance classification, called M3CA, also known as pairwise approach, which uses pair of images to create the classifiers. We conducted a systematic evaluation of the proposed algorithms using everyday photos collected from two major fashion-related social media, namely and Our results show that the proposed approaches provide improvements when compared to popular first choice multi-label, multi-modal, multi-instance algorithms that range from 20 % to 30 % in terms of accuracy.

Media Technology; Computer Networks and Communications; Hardware and Architecture; Software; Image annotation; Clothing annotation; Bag of visual words; Machine learning; Multi-modal; Multi-instance; Multi-label

Multimedia Tools and Applications: Volume 75, Issue 7

FundersBrazilian National Research Council
Publication date30/04/2016
Publication date online08/12/2015
Date accepted by journal17/11/2015
PublisherSpringer Science and Business Media LLC

Research centres/groups