1. Introduction
Recently, active learning has received a substantial growing interest in literature. With the abundant amounts of unlabeled data, the cost of data labelling is, generally, expensive. Thus, active learning is used for selecting the most informative “beneficial” training samples for the learning model in order to achieve high model accuracy using as few examples as possible [
1]. Active learning has proved its superiority in diverse applications such as natural language processing [
2] and image processing [
3]. The active learning process basically proceeds as follows: first, an initial learning model is trained using a few training samples. Then, additional samples are sequentially added to the training data according to a certain querying strategy. This process repeats until a certain stopping criterion is satisfied [
4].
Generally, most of the active learning research mainly focuses on querying data labels to optimize the learning model’s accuracy. Only a few contributions utilize active learning for achieving other objectives. However, in many applications, the data labeling process is costly and the ultimate goal is to optimize a domain-specific objective function, other than minimizing the learning model’s predictive error. Accordingly, in this work, we propose a comprehensive active learning framework which consists of several novel querying strategies for handling general optimization problems where the objective could be some general utility function, not necessarily the learning model’s accuracy. The problem can be framed as selecting the right trade-off for the exploration-exploitation concept. In other words, we encounter a trade-off between minimizing the uncertainty of the target objective function, known as exploration and maximizing the underlying objective function given the available function estimates, which is known as exploitation. The exploration-exploitation trade-off is encountered in machine learning [
5] and optimization algorithms [
6]. Furthermore, this class of optimization problems experiencing a trade-off between exploration and exploitation is prevalent in many real-world applications of various fields, such as recommender systems [
7] and dynamic pricing [
8].
In this paper, we provide a comprehensive analysis of active learning from the point of view of the exploration-exploitation trade-off. Our focus is on having a general optimization function, rather than prediction accuracy. For example, the user may like to select a query point that maximizes his profit. As a case study, we consider the application of the proposed active learning framework to some real-world application, namely dynamic pricing for revenue maximization in case of unknown behavior of the customers’ demand [
9]. Specifically, firms offering a certain good or service seek to adjust prices in a way that maximizes the obtained revenue. However, the price-demand curve which controls the relation between the price and the corresponding behavior of customers, is usually not known beforehand and has to be inferred. Generally, companies learn the price-demand curve through price experimentation by testing a number of prices and obtaining their corresponding demands from actual selling situations. On the other hand, choosing prices for revealing the price-demand relation could yield revenue losses since such prices are not designed to maximize the achieved revenue [
10,
11].
Therefore, we are dealing with two conflicting goals: exploration in the form of choosing prices that minimize the uncertainty of the learned demand model and exploitation in the form of setting prices to maximize the objective function, that is, the obtained revenue. The former is accomplished in a framework of active learning: what price should we suggest next to gain the most knowledge of the demand-price function?
The aforementioned problem of revenue maximization with demand learning represents a case study which can be considered an application of our proposed framework. However, the presented active learning framework is general and it can be applied to any objective optimization problem incurring a trade-off between exploration and exploitation.
The proposed active learning framework consists basically of three main active learning approaches: exploration-based, exploitation-based and balancing strategies that handle both exploration and exploitation. For the exploration-based methods, we propose several novel information-theoretic strategies with the aim of minimizing the learning model uncertainty. On the other hand, the exploitation-based methods are designed merely to optimize the target objective function, without taking into consideration the model accuracy. Finally, we present several active learning strategies specifically designed to address the exploration-exploitation trade-off by combining both objectives of optimizing the target objective and obtaining an accurate learning model.
We apply a set of experiments to evaluate the performance of our proposed active learning methods in terms of both aspects: exploitation in terms of the gained utility and exploration by measuring the regression model’s accuracy. In these experiments, we compare the performance of our proposed methods to some standard baselines.
Active learning has been extensively studied in classification problems [
4]. However, only few studies investigate applying active learning to regression tasks [
12,
13,
14]. In this work, our presented active learning framework mainly targets regression problems. However, it could be easily adapted to handle classification problems, as well.
Active learning is generally classified into sequential and batch mode settings. In the sequential setting, one query sample is selected per iteration. On the other hand, for the batch mode, a group of samples are simultaneously selected for labeling. In this work, we adopt the sequential active learning approach.
Another scheme used for classifying the active learning methods is based on the query generation process. Specifically, active learning is classified into: pool-based and query synthesis approaches. The pool-based approach is the conventional method which is most commonly used in the active learning literature [
4]. In the pool-based scheme, at each iteration, one or more query samples are selected from an unlabeled pool of existing data according to a certain querying criterion and labeling is carried out for these selected samples. On the other hand, the membership query synthesis approach selects one or more synthetic samples from the whole space. In this paper, we apply both approaches—pool-based and query synthesis. Moreover, we perform a comparative study between the two methods. From the experimental results, this work essentially elucidates the significance and the superiority of employing the query synthesis approach over the commonly used pool-based approach. More detailed results will be discussed in
Section 7 and
Section 8.
The goal of this work is not to provide a group of active learning strategies, instead, we aim to introduce a comprehensive active learning framework including novel strategies, for handling a wide class of objective optimization problems confronting the exploration-exploitation dilemma.
The main contributions of this paper are summarized as follows:
Provide a comprehensive active learning framework for a general objective optimization, analyzing it from the point of view of the exploration-exploitation trade-off.
Propose several novel information-theoretic active learning strategies, designed for minimizing the learning model uncertainty.
Design active learning methods for regression tasks.
Present a less-myopic active learning method focusing on exploitation or target optimization.
Develop query synthesis and pool based variants of the proposed active learning strategies and compare the two approaches.
Apply the proposed active learning framework to a real-world application, namely dynamic pricing with demand learning, as a case study.
The paper is organized as follows:
Section 2 presents a literature review.
Section 3 presents the problem formulation.
Section 4 briefly describes the Bayesian formulation of linear regression model that is applied in our experiments. Then, our proposed active learning strategies are represented in
Section 5. After that,
Section 7 presents experimental results.
Section 8 discusses the main findings. Finally,
Section 9 concludes the paper.
3. Problem Formulation
As mentioned in the introduction, this work focuses on regression tasks since it is prominent in different applications such as energy consumption prediction [
38] and price-demand elasticity estimation [
8,
39]. Specifically, in this work, we apply linear regression model but our proposed active learning framework is general and can be applied to any other regression model. Furthermore, the proposed strategies can be adapted to classification models as well.
We consider the following linear regression problem:
where
x is the input feature vector such that
, where
d is the dimensionality of the feature vector,
y denotes the regression response variable
and
is a random error term such that
and
denotes the regression model coefficients.
This work particularly tackles the class of optimization problems which have a certain utility function u to be optimized, for any regression task. However, the utility function u incurs some uncertainty which can be estimated using a probabilistic regression model. Such problems pose the challenging problem of how to strike a balance between maximizing the objective function u (exploitation) and minimizing the uncertainty about the utility function (exploration). In this work, we develop a novel active learning framework consisting of various strategies to interactively seek a balance between exploitation and exploration.
Notation
In this section, we introduce the adopted notation used in the proposed active learning framework.
First, the training data acquired so far is denoted as , the training data term is expressed in terms of a set of pairs of input data samples and their corresponding labels , where N is the number of data samples acquired so far.
The matrix of input data points is denoted as , such that each row x represents one data sample and d is the dimensionality of the data point x. For , it represents the vector of the corresponding output variables. The matrix of data samples whose outputs require prediction is denoted as , such that , where m is the size of data samples to be predicted. In addition, represents the vector of the corresponding output variables and . Similarly, in case of predicting a single data point, the data sample is denoted as and is its corresponding output.
In the adopted linear regression algorithm described in
Section 4, the regression coefficients are denoted as
. In addition,
and
are the mean and covariance matrix of
, respectively.
In the proposed active learning framework,
denotes the unlabeled pool of data samples and
represents the responses of the samples in the pool. The utility function
u represents the objective function to be optimized using active learning as defined in
Section 5.3 and
Section 5.4.
4. Preliminaries: Bayesian Linear Regression
In this section, we briefly describe the Bayesian linear regression model used in the proposed active learning framework. We adopt the Bayesian linear regression model due to several reasons. First, the class of optimization problems that we handle involves uncertainty of the utility function, which can be estimated using probabilistic regression models such as the Bayesian linear regression. Moreover, most active learning querying strategies depend on the uncertainty of predictions, so it is compelling that we use a regression model providing not only predictions but also uncertainty of the obtained predictions and Bayesian linear regression provides such information. Finally, in active learning settings, the initial data points available for training is essentially limited which could result in over-fitting, especially for noisy data, so applying Bayesian linear regression helps to combat the potential over-fitting.
The underlying regression problem is formulated as indicated in Equation (
1), in
Section 1. According to Equation (
1), we have two major parameters in the regression model, the regression model coefficients
and the noise variance
, so we adopt Bayesian linear regression with conjugate prior of
.
Since the noise variance parameter
is a key parameter in the model and we have some prior knowledge about it, for example it must be positive, we can use a conjugate prior distribution for both parameters
and
. We assume an Inverse Gamma prior distribution for
,
.
where
and
. The conjugate prior
can be expressed as a Normal Inverse Gamma (NIG) distribution as follows:
In this section, we have provided the final formulations for Bayesian linear regression model. The interested readers can find more details in References [
40,
42].
5. Proposed Active Learning Framework
In this section, we present our proposed active learning framework for handling optimization problems, encountering an exploration-exploitation trade-off.
First, we describe the general active learning settings. Then, we introduce our proposed active learning strategies which are mainly classified into: exploration-based, exploitation-based and strategies that balance exploration and exploitation.
Figure 1 shows the proposed active learning framework.
5.1. Active Learning Schemes
Active learning can be applied in different modes that define how a new query point is generated. We describe three different schemes, the first two methods are generally known in literature and we define the third one because we incorporate it into some of our proposed strategies.
Pool-based
This is the conventional approach that is mostly used in the active learning literature. In the pool-based approach, there exists an unlabeled pool of data samples and at each iteration, one or more query example(s) is selected from the pool according to a certain querying criterion. Algorithm 1 describes the pool-based active learning approach.
Membership Query Synthesis
Unlike the pool-based approach, the membership query synthesis scheme is not commonly used in the active learning literature. In contrast with the pool-based active learning, the membership query synthesis does not select data samples out of a certain pool of unlabeled data. Alternatively, this approach essentially generates and queries synthetic data samples of the entire input space. Algorithm 2 explains the query synthesis approach.
This approach is very efficient and is not computationally intensive compared to the pool-based approach. The reason for the query synthesis’s computational efficiency is that instead of iterating over the large unlabeled pool of samples and evaluating a certain selection criterion such as mutual information, the query synthesis approach directly generates a synthetic data sample to achieve a certain objective. For example, our proposed query synthesis approach optimizes the underlying querying metric using optimization algorithms. The query synthesis approach is not only computationally efficient, it could be more compelling than the pool-based approach since the generated query sample is not restricted to be part of an unlabeled pool, so the synthetically generated query sample could be more informative and beneficial than the examples in the pool.
Membership Query Synthesis without a Predefined Pool
The query synthesis approach does not need to have a pool of samples. However, some active learning strategies exploit the potential information in the unlabeled data to guide the sample selection such as mutual information strategy defined subsequently in Equation (
28) and the KL divergence strategy defined in Equation (
47). Consequently, such strategies rely on the existence of some unlabeled data to estimate how useful or how representative a certain query point is. However, for some applications, the unlabeled data could not exist or if they exist, they may not be a representative sample for the input space. In such cases, one could generate a representative and diverse sample of unlabeled data using the domain knowledge of the feature space. Another way for generating unlabeled representative data could be to apply any reasonable clustering algorithm using the available training data and the cluster centroids can be used as representatives of the unobserved data. Algorithm 3 elucidates this approach.
Algorithm 1 Pool-based Active Learning |
Input: A dataset , a general active learning strategy S, a utility function u, number of iterations T, a discount factor and a generation method for creating synthetic queries . Output: A Learned model and a cumulative gained utility . labeled data samples randomly chosen out of . Train the regression model using the initial training data to obtain initial model . repeat for each do Apply a certain active learning strategy S to , using current model estimate . end for . the true label for the query sample . Add the acquired data point to the training data: . Evaluate the utility using the new acquired point: . Update the regression model using the new acquired point . untilT iterations executed return The learned model and the cumulative discounted utility .
|
In our experiments, we develop several novel active learning strategies and apply them in the pool-based and query synthesis schemes. For the strategies that use the unlabeled data samples for guiding its selection such as mutual information (MI), modified mutual information (MMI) and Kullback–Leibler divergence (KL), we apply the three aforementioned schemes. More details are provided in the experiments section,
Section 7.
Algorithm 2 Query Synthesis Active Learning |
Input: A dataset , a general active learning strategy S, a utility function u, number of iterations T, a discount factor and a generation method for creating synthetic queries . Output: A Learned model and a cumulative gained utility . labeled data samples randomly chosen out of . Train the regression model using the initial training data to obtain initial model . repeat . the true label for the query sample . Add the acquired data point to the training data: . Evaluate the utility using the new acquired point: . Update the regression model using the new acquired point . untilT iterations executed return The learned model and the cumulative discounted utility .
|
Algorithm 3 Query Synthesis Active Learning without a predefined pool |
Input: A small dataset of points , a general active learning strategy S, a utility function u, number of iterations T, a discount factor and a generation method for creating synthetic queries . Output: A Learned model and a cumulative gained utility u. labeled data samples randomly chosen out of . Train the regression model using the initial training data to obtain initial model . Construct a representative sample of unlabeled data using for example, domain knowledge or clustering. repeat . the true label for the query sample . Add the acquired data point to the training data: . Evaluate the utility using the new acquired point: . Update the regression model using the new acquired point . untilT iterations executed return The learned model and the cumulative discounted utility .
|
5.2. Exploration-Based Strategies
In this section, we describe our novel proposed exploration-based active learning strategies for regression. The exploration-based strategies mainly target enhancing the regression model predictive performance. The presented strategies are not limited to a certain application or a class of problems, they are quite general and could be applied in any settings where the objective is to boost the regression model accuracy. The most popular active learning methods such as uncertain sampling [
15] and Query by Committee [
18] seek to query the most “uncertain” sample, that is, the data sample about which the learning model is the most uncertain. Although this seems helpful for the learning model either classification or regression, the uncertain sampling approach does not consider the potential information of the unlabeled pool of examples. Thus, the uncertain sampling could select noisy patterns or outliers. On the other hand, querying samples not only based on the query sample but also on the unlabeled samples of the pool ([
17,
23]) is more promising since such approach is less myopic and it utilizes the information of the plentiful unlabeled pool.
The following proposed exploration strategies are mainly based on information theory [
43]. To our knowledge, it is the first time that information theoretical concepts (such as mutual information, Kullback–Leibler divergence and model entropy) are applied in active learning for regression. Some information-theoretic metrics such as predictive label entropy, Fisher information and mutual information have been employed for active learning in classification problems [
17,
22,
23]. However, such information theoretic metrics have not been considered yet for regression problems.
Depending solely on a single query sample information could lead to choosing noisy samples or outliers [
19]. It is well-known that an outlier does more damage than help. Consequently, our proposed exploration-based active learning strategies exploit the potential information existing in the unlabeled pool of samples and the learning model uncertainty. Moreover, incorporating the information of the unlabeled pool such as mutual information, into the selection strategy, advocates querying representative samples.
5.2.1. Mutual Information (MI)
The mutual information criterion aims to query the sample which effectively holds a substantial amount of information about the labels of the unlabeled pool. Thus, this strategy chooses the sample that maximizes the mutual information between its label and the labels of the remaining unlabeled samples of the pool excluding , denoted as .
The mutual information between the query sample
and the labels of the unlabeled pool
is defined as:
where
denotes the labeled training data acquired so far.
The first term
represents the prior entropy (or uncertainty) of all the labels of the unlabeled pool of samples. Similarly, the second term
denotes the entropy of the labels of unlabeled pool of samples but after acquiring the new query point
. From Equation (
25), it can be noted that maximizing
is equivalent to minimizing the conditional entropy
, which is defined as follows:
To simplify computations, Equation (
26) could be approximated by eliminating the integration over all the possible labels of
and using the expected value of it
. Other approximations are made in literature [
22,
23], using the optimistic or the pessimistic label. However, we found that employing the expectation could be more reasonable. Accordingly:
where
is the expected predicted label of the data point
, which is calculated using Equation (
23).
As mentioned in
Section 4, the posterior predictive distribution of the predictive labels vector
Y,
is a multivariate Student-T distribution which is defined as follows:
The posterior expectation
and the covariance matrix
are evaluated using the Bayesian linear regression model formulations described in
Section 4, using Equations (
18) and (
21), respectively. However, this method and all our proposed methods are general and can be applied using any regression model that provides uncertainty of its predictions.
According to Reference [
41], the final formulation of the entropy of a random variable
Z following a Student-t distribution
is given by:
where
denotes the correlation matrix of
Z,
d is the dimensionality of
Z and
represents the number of degrees of freedom for the Student-t distribution. In addition,
is a constant depending on
and
d and
.
where
is the digamma function which is defined as:
Accordingly, the conditional entropy of
,
, is calculated using Equation (
30) as follows:
where
m is the number of data points to be predicted, that is, it is the length of the predicted output vector
. To simplify notation, let
,
,
and
. For
, it is evaluated as follows:
such that
denotes the correlation matrix of the unlabeled samples
Y after acquiring the query sample
.
The term
can be evaluated using Equation (
31):
Using algebraic manipulations, the summation in Equation (
35) converges as follows:
Accordingly, substituting from Equation (
36) into Equation (
35) results in:
Then, after substituting from Equation (
37) into Equation (
33), the conditional entropy
can be evaluated as:
Finally, the query sample
that maximizes the mutual information essentially minimizes the conditional entropy of the unlabeled pool of samples as indicated in Equation (
25). Consequently, the query sample
minimizing the conditional entropy
can be evaluated as follows:
Simplifying Equation (
39) by eliminating the term
since it is a constant that does not depend on the query sample, because
basically depends on the number of data being observed, as indicated in Equation (
7). Thus:
For computational efficiency purposes, we evaluate the log determinant of the correlation matrix and its inverse using Cholesky decomposition since the correlation matrix is a symmetric positive semi-definite matrix.
We apply three variants of this active learning strategy: pool-based, query synthesis and query synthesis without pool, which are described in
Section 5.1.
5.2.2. Modified Mutual Information (MMI)
The modified mutual information strategy is basically akin to the aforementioned strategy. This method maximizes the mutual information defined in Equation (
25) but it evaluates the first term of that equation,
, which represents the entropy of the labels of the unlabeled samples and does not ignore it. The intuition of this querying strategy is to account for the impact of the query sample
on reducing the joint entropy of the unlabeled samples
. In other words, if the first term is ignored and we just focus on minimizing the conditional entropy given the underlying query sample
, we may choose a sample
that is redundant and not informative in case the entropy before acquiring
,
is inherently negligible.
Accordingly, the modified mutual information equation is defined using Equation (
25) but without ignoring the first term.
Similar to the mutual information strategy, the second term of Equation (
41) can be evaluated using Equation (
38). As for the first term
, similar to Equation (
38), it can be computed as follows:
where
,
,
, and
. For
it is evaluated as follows:
where
denotes the correlation matrix of the unlabeled samples
Y, given the training data acquired so far
.
Therefore, substituting from Equations (
28) and (
42) into Equation (
41) results in:
Similar to the previous strategy, we apply three variants of this active learning method using the different active learning schemes: pool-based, query synthesis and query synthesis without pool.
5.2.3. Kullback–Leibler Divergence (KL)
So far, the previously mentioned strategies select the sample revealing the most amount of information for the labels of the other samples. However, this strategy addresses a different aspect. The Kullback–Leibler divergence strategy seeks to acquire samples having the greatest impact on the posterior predictive distribution of the unlabeled samples
. So, this method considers the influence of the query sample on the
“distribution” of the unlabeled samples. To achieve that, this method maximizes the difference in posterior predictive distributions of unlabeled pool
before and after querying the query point
. The distribution difference is evaluated using the Kullback–Leibler divergence (KL) metric [
44]. The Kullback–Leibler divergence metric is an asymmetric distance measure that evaluates the distance between two probability distributions
P and
Q. In other words,
measures the information lost when
Q is used to approximate
P [
44]. The
is defined as follows:
where
and
are the probability density functions to be compared.
It is worth noting that the KL divergence has been employed as a powerful method in Bayesian analysis. For example, Lopez et al. apply the KL divergence to influence analysis [
45]. The authors use the KL metric to study the impact of removing one or several observations from data set on the inferences.
In our proposed active learning method,
denotes the posterior predictive distribution of unlabeled example given the query sample
, whereby
is the posterior predictive distribution of unlabeled example
prior to acquiring the query example
. The Kullback–Leibler divergence
is defined as:
We approximate by evaluating the average Kullback–Leibler divergence over all the unlabeled examples of the pool .
Since the true label of the query sample is unknown, we use the expectation of denoted as .
As indicated in
Section 4, both predictive distributions
and
follow Student-t distributions. Let
. Similarly, the posterior predictive distribution after acquiring
is denoted as
.
To simplify notation, let
denote the Kullback–Leibler divergence between the two predictive distributions
, which is calculated as:
Substituting with the Student-t distribution formulation Equation (
22) into Equation (
48):
where the means and variances of the posterior distributions can be given using the regression equations in
Section 4, Equations (
23) and (
24), respectively.
After substituting from Equation (
49) into Equation (
47), the Kullback–Leibler divergence
is evaluated as:
Finally, the query sample
that maximizes the Kullback–Leibler divergence between the posterior predictive distributions of unlabeled pool
before and after querying the query point
is evaluated as follows:
Like the two aforementioned active learning methods, we apply the three variants of active learning settings described in
Section 5.1, along with this active learning method.
5.2.4. Model Entropy (ME)
The aforementioned strategies, the two variants of mutual information and Kullback–Leibler divergence, exploit the potential information of the unlabeled pool to guide the query selection process. However, this novel active learning strategy, named model entropy, considers a different aspect. The tmodel entropy method targets the ultimate objective for the exploration, as mentioned in
Section 1, which is minimizing the learning model uncertainty. In order to achieve this target, this method emphasizes reducing the learning model uncertainty in terms of the model entropy. Thus, this method queries the data sample that minimizes the model entropy in order to reveal the uncertainty of the underlying model and obtain better estimates of the learning model parameters.
In general, the entropy has been used in several applications such as biological systems [
46], financial applications [
47] and model selection [
48]. However, to the best of our knowledge, the use of the model entropy minimization strategy in the active learning field is novel.
According to Reference [
4], the existing work in active learning literature so far mainly addresses the following: minimizing the approximate generalization error [
19] and reducing the model uncertainty indirectly either by choosing the example about which the model is most uncertain [
15] or by querying the example that produces the maximum model change [
12].
For the general regression problem formulation presented in
Section 3, using Equation (
1), the model entropy of the regression model parameters
can be formulated as follows:
Using the Bayesian linear regression formulation presented in
Section 4 (see Equation (
9)), the model parameter
follows a multivariate Student-t distribution such that:
where
and
are the posterior mean and covariance matrix of model parameter
, respectively and they can be evaluated using Equations (
12) and (
13), respectively, according to the Bayesian linear regression formulation described in
Section 4. Furthermore, the posterior values,
and
are evaluated using Equations (
7) and (
8), respectively.
To simplify notation, let
,
and
. According to Reference [
41], the final formulation of the entropy for the multivariate Student-t distribution is given by:
where
denotes the correlation matrix of
,
d is the dimensionality of
,
is a constant depending on
d and
and
.
Substituting from Equation (
36) into Equation (
56):
Substituting from Equation (
56) into Equation (
55) results in:
However, we could safely ignore the term
since it does not depend on the query sample
. Finally, the query sample
minimizing the model entropy
can be estimated as follows:
For this strategy, we apply both of pool-based and query synthesis active learning approaches.
5.3. Exploitation-Based Strategies
In this section, we present the exploitation-based active learning strategies for regression that we apply in our proposed framework. Such strategies purely emphasize on maximizing a certain objective function, with no consideration given to the concept of exploration.
First, we describe the greedy strategy that is considered a pure exploitation method. Then, we propose using a novel active learning querying strategies that mainly focus on exploitation but in a less myopic way than the commonly adopted greedy strategy.
5.3.1. Greedy Strategy (G)
This query strategy addresses pure exploitation by querying the sample resulting in the maximum immediate value of the target objective function (reward). We apply this method as a baseline to compare with, where for every iteration, the query sample is chosen to maximize the expected utility function.
Although the greedy strategy is straightforward and simple, it is myopic in since that it purely considers exploitation, which could result in potential revenue loss, since it pays no attention to improving the model predictive power, which could severely affect the resulting decision, which is commonly known as exploration.
where the expected utility function
u can be expressed as a function of
x and the regression model coefficients
.
5.3.2. Expected Value of Perfect Information (EVPI)
We propose a decision-theoritic querying approach which is based on the expected value of perfect information (EVPI). Evaluating the expected value of perfect information could be beneficial for active learning since one can evaluate how revealing a certain query sample is valuable. In other words, active learning could be guided to choose data points that do improve the gained expected utility using EVPI. According to Russell and Norvig [
49], the expected value of perfect information for revealing a piece of information, named evidence
, given an initial evidence
e is defined as:
where
is the action to be taken and the expected utility of taking action
given the evidence
e and after revealing
,
is defined as:
while the expected utility of taking action
given the evidence
e and without revealing
is denoted by
and it is defined as:
We apply the value of information formulation to the active learning with utility maximization. So, the action is querying a data point to obtain its label . For the initial evidence e, it denotes the training labeled data points so far . Also, represents the acquired label of the query point which represents the piece of information we seek to evaluate.
The expected value of perfect information after querying
and acquiring its label
is:
Accordingly, the expected utility of acquiring the data sample
given the observed training data so far
and after observing the evidence
, the true value of
,
, can be formulated as:
where the utility
u is the target objective function to be maximized, which is conventionally a function of the data point
x and the model parameters
.
denotes the expectation of the updated model parameters
after revealing point
x and its label value
.
The second term of Equation (
64) could be safely ignored, since the objective is to decide whether to acquire the data label
or not, maximizing EVPI, this term is independent of
. This is implied by the following equation, Equation (
66). Consequently, this term does not affect the process of maximizing EVPI.
Consequently, the term
is constant over all query points
, so it could be safely ignored. Then, evaluating the EVPI by substituting from Equation (
65) into Equation (
64) results in the following formula:
Finally, maximizing Equation (
67) by differentiating it with respect to
, equating the obtained derivative to zero and solving the resulting equation or using any direct optimization method, we get the query point
of the highest value of the expected value of perfect information as indicated in the following equation.
The expected value of perfect information method seems similar to the mean objective cost of uncertainty (MOCU) method proposed in Reference [
31] and described in
Section 2.5. Both methods can be viewed from decision theory perspective. The MOCU method seeks to minimize the expected regret which is the difference between the gained utility using the current model and the optimal utility. On the other hand, the EVPI method aims to maximize the difference between the optimal utility before and after acquiring a certain evidence. Accordingly, the MOCU method minimizes the deviation from the optimal decision. However, the EVPI method maximizes the expected utility improvement before and after acquiring a certain piece of information.
5.4. Balancing Exploration and Exploitation Strategies
This section describes several active learning strategies that seek to achieve the balance between exploration and exploitation.
5.4.1. Upper Confidence Bound (UCB)
The Upper Confidence Bound (UCB) strategy is proposed by Auer et al. in [
36] in the context of multi-armed bandit problems [
35]. We apply the UCB method as an active learning baseline strategy to compare with. The main advantage of this method is that it combines exploitation and exploration in a simple, yet an elegant way. The UCB strategy picks the unlabeled example maximizing the upper confidence bound of the random variable of interest, representing the utility function
u.
where
u is the objective utility function to be maximized,
and
denote the expected value and the standard deviation of the utility function for query point
given the training data acquired so far
.
5.4.2. Probabilistic-Based Exploration-Exploitation (PEE)
This active learning strategy is originally inspired by simulated annealing [
50]. More specifically, the probabilistic-based exploration-exploitation strategy is built on the
-decreasing greedy algorithm [
51]. In order to manage the trade-off between exploration and exploitation, this algorithm combines exploration and exploitation in a probabilistic way. With probability
, the exploration is performed via any exploratory strategy mentioned in
Section 5.2 such as mutual information, Kullback–Leibler divergence and model entropy strategies. Furthermore, other exploration strategies in active learning literature can be incorporated into this method, such as uncertain sampling [
4,
15] and random sampling.
The exploration probability
is calculated as follows:
where
is less than 1 and
t is the current time step or iteration number. The exploration probability intuitively decays over time as seen in Equation (
70) since the learning model gets to be more robust and capable of performing some exploitation to achieve the ultimate goal of utility maximization.
To implement this strategy, a uniform random variable Z is generated, if , any reasonable exploration strategy can be performed, otherwise pure exploitation is applied via maximizing the expected utility (the greedy strategy). However, any other exploitation strategy can be employed.
For the probabilistic-based exploration-exploitation strategy, we have implemented all of our proposed exploration based strategies in
Section 5.2 in addition to uncertain sampling and random sampling. To perform exploitation, we use the greedy strategy since it is the simplest method. Although the greedy strategy is myopic since it does not account for enhancing the learning model estimate, in this PEE method the greedy strategy is integrated with an exploration strategy which already achieves an accurate model estimate.
5.4.3. Uncertainty of Strategy (UoS)
Similar to the probabilistic-based exploration-exploitation (PEE) strategy, this proposed active learning method seeks to balance the trade-off between exploration and exploitation in a probabilistic manner. Naturally, active learning querying strategies require a learning model estimate. Furthermore, many active learning strategies including: uncertain sampling [
1] and greedy sampling, build their selection decisions entirely based on the learning model estimate. However, active querying methods that fully trust their estimate of the learning model and do not account for the learning model uncertainty could probably yield inaccurate querying decisions. This argument motivates us to design a novel active learning method named Uncertainty of Strategy (UoS). The UoS method accounts for the inherent uncertainty of the querying criterion which is mainly caused by the model uncertainty or due to any other randomness in the active querying method.
The UoS strategy seeks to achieve the balance between exploitation and exploration. The exploitation can be easily performed using the current model estimate, for example, using greedy sampling. On the other hand, the exploration is done as follows: the UoS strategy sets a window of exploration around the active learning strategy’s best estimate of a data point, which is returned by exploitation. The length of the exploration window can be estimated using the model uncertainty as described subsequently.
Let the query sample
follow a Gaussian distribution as follows:
where the mean of this Gaussian distribution
represents the data point returned using pure exploitation. For the Gaussian distribution’s variance
, it essentially depends on the model uncertainty. We estimate the strategy variance
using two different ways. The first method, named UoS-1 assumes that the strategy variance
is proportional to the model uncertainty, where the model uncertainty is estimated using the covariance matrix of the vector of model parameters
. Equation (
72) defines the estimation of
in terms of the model uncertainty.
where
K is a parameter set to adapt the units of the query point and the model parameters and to control the exploration/exploitation trade-off.
Like the PEE method, we set the K parameter to be time variant, in order to shrink the exploration window as iterations proceed since the model would become more reliable, so more emphasis should be devoted to exploitation.
The second method, named UoS-2, estimates the strategy variance
empirically using a simple Monte Carlo simulation. This simulation runs for
n iterations, where each iteration
i proceeds as follows: first, an instance of model parameters vector
is generated according to the multivariate Student-T distribution using Equation (
9). Then, this model parameters’ instance
is used to evaluate the query point using a pure exploitation strategy
, this is, generally, a simple step as done in greedy strategy Equation (
60) for example. Finally, after the
n iterations finish, the strategy variance
is statistically evaluated as follows:
where
K is a parameter for adapting units of the query point and model variance and for controlling the exploration-exploitation trade-off, akin to the UoS-1 method,
K is defined in Equation (
73). The expectation of strategy returned points
is evaluated as the statistical mean over the
n iterations as follows:
The UoS-2 method is akin to the UoS-1 method for evaluating the strategy variance
defined in Equation (
72) in since that it depends on the model uncertainty. However, this dependency is incorporated indirectly through the described Monte Carlo simulation. Algorithm 4 describes the UoS-2 method.
The proposed UoS active learning method, with its two variants, is general and can be combined with any exploitation-based strategy. Furthermore, the UoS method could be integrated with other popular active learning methods such as uncertain sampling [
1] and expected model change [
12], since most active learning strategies adhere to the greedy approach by querying a data point that maximizes or minimizes a certain selection criterion. In other words, the UoS querying approach could be used as a wrapper for any ordinary active learning method
S that is greedy in its nature or does not consider the uncertainty of the learned model. This could be achieved by using
S as the exploitation-based strategy used in the UoS method and adopting either of the two variants of the UoS method to estimate the strategy uncertainty.
Algorithm 4 The Uncertainty of Sampling Second Variant (UoS-2) Querying Method |
Input: A dataset , an exploitation active learning strategy S, the number of simulation iterations n and a scaling parameter K. Output: A query sample . Train the regression model using the training samples to obtain the mean and the covariance of the model parameters’ vector and the posterior estimates of and . for to n do Sample from . the query sample returned after applying exploitation strategy S, using the sampled model parameters . end for Evaluate the average query sample : . Generate a query sample according to a Gaussian distribution as follows: .
|
5.4.4. Utility minus Model Entropy (UME)
The Utility minus Model Entropy (UME) strategy controls the trade-off between exploration and exploitation in a novel way. The UME querying method adjusts the exploration and exploitation by explicitly modeling both of them in a formulated single objective function. Specifically, the UME method combines the ultimate goal of maximizing a certain utility function
u, representing exploitation and the secondary but necessary target of minimizing model entropy, representing exploration, into one objective function. Then, the strategy queries the data sample
maximizing this hybrid objective as follows:
where the model entropy
is evaluated using Equation (
58) and
is the exploration- exploitation trade-off control parameter. We conveniently let
be exponentially decreasing in time according to Equation (
77). At early iterations, more emphasis is imposed on exploration to have better estimate for model parameters, however at later iterations since the model estimates get more robust over time, then more attention should be paid to the exploitation.
where
t is iteration number and
.
Substituting from Equation (
58) into Equation (
76) results in:
Since
does not depend on the query sample
, then:
6. Case Study: Dynamic Pricing with Demand Learning
We apply the proposed active learning framework described in
Section 5 to a real-world application which is dynamic pricing for revenue maximization in case of unknown behavior of the customers’ demand.
The main challenge of dynamic pricing with unknown demand is that the chosen prices should achieve some balance between exploitation and exploration. Exploitation represents choosing prices aiming to maximize the achieved revenue. On the other hand, exploration selects prices that promote learning the demand model parameters. This motivates us to apply our proposed active learning framework in
Figure 1 to this application.
We assume a linear demand elasticity for modeling the customers’ demand behavior as typically used in the economics/finance literature (see Equation (
80)). The price is the main controlling variable for demand. We assume a monopolist seller, who has a sufficient inventory to satisfy all potential demand and we, specifically, consider pricing a single product over a finite selling horizon
T.
The linear demand model equation is defined as follows:
such that
and
.
The parameter b represents the price-demand sensitivity, so it is naturally negative since the price and demand have an inverse relationship. For example, if price rises by , demand would diminish, On the other hand, when price decreases by , demand would increase.
In order to estimate the demand model parameters
a and
b defined in Equation (
80), we apply the Bayesian linear regression model described in
Section 4. We employ the active learning framework with its different query generation schemes defined in
Section 5.1 and described in detail in Algorithm 1, Algorithm 2 and Algorithm 3. Applying active learning formulation to the dynamic pricing problem, the training data
consists of some pairs of prices and their corresponding demands
. In addition, the query point
denotes the vector
. For this application, the utility function after querying a certain price
p represents the gained revenue
R, which is defined as follows:
where
a and
b are the demand model parameters defined in Equation (
80).
6.1. Active Learning Framework Application
In this section, we apply the active learning formulations represented in
Section 5 to the dynamic pricing with demand learning problem. First, the exploration-based strategies hinge on minimizing the regression model error, without considering the utility function
u in their formulations. So, the formulations presented in
Section 5.2 can be exactly used for the underlying dynamic pricing application. On the other hand, for the exploitation-based and the balancing strategies described in
Section 5.3 and
Section 5.4 respectively, specific formulations should be derived for the considered application, setting the utility function
u to the gained revenue defined in Equation (
81).
6.1.1. Exploration-Based Strategies
In our experiments, we apply the four presented strategies in
Section 5.2 with pool-based and query synthesis schemes. Moreover, for mutual information, modified mutual information and Kullback–Leibler divergence, we implement the query synthesis approach without a predefined pool as described in
Section 5.1 and Algorithm 3. When applying the query synthesis method without a predefined pool to the dynamic pricing problem, we construct
U defined in Algorithm 3 as follows: since the dynamic pricing application has one controlling variable, the product price, we consider the range of all potential prices between
and
and along the active learning iterations, we exclude the prices that are previously queried, added to the training set
. This set of prices
P are used as unlabeled samples for evaluating the information-theoretic metrics as defined in (Equations (
25), (
41) and (
47)).
6.1.2. Exploitation-Based Strategies
In this section, we apply the exploitation-based strategies introduced in
Section 5.3 to the dynamic pricing with demand learning problem.
6.1.3. Balancing Exploration and Exploitation Strategies
In this section, we consider applying the balancing strategies that combine both aspects of exploration and exploitation and attempt to achieve balance between both of them.
Upper Confidence Bound (UCB)
Applying the UCB strategy to the dynamic pricing problem and setting the utility function
u defined in Equation (
69) to the immediate revenue
R results in:
where
and
are the expectation and the standard deviation of the estimated immediate gained revenue
R in response to price
and using training data labeled so far
.
The expected revenue
is calculated as:
where the expected demand
is computed using the Bayesian linear regression (see Equation (
23)) presented in
Section 4.
Using revenue definition in Equation (
81) and the posterior variance for demand defined in Equation (
24), the variance of revenue
is calculated as follows:
Substituting from Equations (
92) and (
93) into Equation (
90), then the price maximizing the UCB criterion can be evaluated as defined in Equation (
94).
where
is evaluated using Equation (
6). For the Gamma distribution parameters
and
, they are evaluated using Equations (
7) and (
8), respectively.
Probabilistic-based Exploration-Exploitation (PEE)
In our experiments, we apply several instances of this hybrid strategy. We combine the pure exploitation, greedy, strategy as defined in Equation (
84), with all the proposed exploration-based methods in addition to the popular active learning method, uncertain sampling [
15] and we apply random sampling as a representative for random exploration.
Uncertainty of Strategy (UoS)
For the uncertainty of strategy method, defined in Equation (
71), the resulting price
follows a Gaussian distribution:
where the mean of this Gaussian distribution,
, represents the price returned using pure exploitation, which is the greedy strategy as defined in Equation (
84). Regarding the variance of strategy
, it can be evaluated using two variants: in terms of model uncertainty and using Monte Carlo simulation as described in
Section 5.4, specifically using Equations (
72) and (
74), respectively.
Utility minus Model Entropy (UME)
The UME criterion as defined in Equation (
76) is a function of the utility which is the immediate revenue and the model entropy. Consequently, substituting from Equations (
31), (
58) and (
92) into Equation (
76) results in the following equation:
where
and
d is the dimensionality which equals to 2 in the dynamic pricing application, with linear demand elasticity as defined in Equation (
80). Therefore, by differentiating the objective function defined in Equation (
96) and equating the resulting equation to zero, we can get the price maximizing the UME.
9. Conclusions
In this paper, we propose a novel active learning framework for optimizing a general utility function. Specifically, this work targets the class of problems incurring some trade-off between exploration and exploitation. We introduce several novel active learning methods for exploration, exploitation and for balancing both. The presented exploration strategies are essentially based on information theory concepts such as mutual information (MI), Kullback–Leibler divergence (KL) and model entropy (ME). Consequently, when combined with exploitation, such information theoretic exploration methods achieve promising performance in terms of the achieved utility and the learning model error as well. Furthermore, we develop new approaches for balancing exploration and exploitation such as the uncertainty of strategy (UoS) method that controls the exploration window according to the model uncertainty. In addition, we present another balancing method, utility minus entropy (UME) where the model entropy is explicitly modeled and augmented with the target utility function into one hybrid objective function to be optimized.
In this work, we investigate two main approaches of active learning, the pool-based approach which is widely used in active learning literature and the membership query synthesis approach. Moreover, we present an empirical analysis for comparing both approaches. The experiments show the exceptional performance of the query synthesis approach compared to the pool-based approach for the synthetic and real datasets. The compelling results for query synthesis approach could help boosting the active learning research towards employing the query synthesis approach.
We have applied the proposed framework to an operation research related application, namely, dynamic pricing with demand learning. However, our proposed framework can easily be adapted to other applications. We perform several experiments using synthetic and real datasets. In our experiments, we compare our proposed active learning strategies to several baselines and our presented strategies yield a significant performance improvement in terms of both aspects: the achieved gained revenue and the regression model error.