Models and Algorithms for Multiple Instance Regression

Giallombardo, Giovanni; Miglionico, Giovanna; Sammarra, Marcello

Unlike single-instance learning, where each data-sample is represented as a vector of features, in multiple-instance learning (MIL) data-sample are complex objects (bags) described by sets of instances. In this framework, each instance-vector is only characterized by its feature components and its membership in a bag, while a label is only available for the entire bag. In the multi-instance classification (MIC) version of the supervised problem each training bag is associated to a categorical label. Thus, the aim is to learn a prediction model, based on the training set, that allows to determine the class-label of new bags. The relevant difference of multi-instance regression (MIR), as an extension of the traditional regression paradigm, is that each bag is associated with a real-valued label rather than a class. Although there are significant motivation for developing MIR application, as testified by some recent interesting study regarding remote sensing, age estimation, landmark recognition, and drug activity prediction, still MIR is less widespread than MIC. This is mainly due to the intrinsic more challenging nature of the MIR where, instead of learning a classification surface based on the relative positioning of instances of categorical bags, one has to learn a function that associates real numbers to sets of points. This introduces an obvious difficulty as soon as one recognizes that there exist multiple possible mathematical descriptions of the same bag. After briefly surveying existing approaches to MIR, in this work we focus on models that adopt the support vector regression paradigm, discussing about the training algorithms, and addressing the issues posed by the validation phase.