For combining multiple detector functions fm(x)=lm, m = 1…M, which output likelihood scores l1,l2,…,lM for the same target class, we propose to use a a BF. The proposed combination function utilizes Boolean AND (∧) and OR (∨) operators and it is defined in disjunctive normal form. The proposed Boolean OR of ANDs function (BOA) B yields a Boolean output B:x→{false, true}. The BOA output B(x)=true denotes input x classification to the “target” class and the BOA output B(x)=false, i.e., ¬B(x)=true, denotes classification to the “non-target” class.
Generally, a BF—possibly infinite—over a combination of thresholded detector scores is capable of producing any binary partition of the input space x or the space of target likelihood scores (l1,l2,…,lM). Due to exclusion of the Boolean NOT rule, a BOA combination restricts the space of different partitions such that the spaces { (l1,l2,…,lM) | B(x) = false } and { (l1,l2,…,lM) | B(x) = true } are simply connected and the decision boundary is monotonic. This is illustrated in the example of Fig. 2, where the data points indicate laughter likelihoods from videos of MAHNOB laughter dataset [7], which is used in our evaluations.
We build a BOA combination of detector functions fm(x)=lm, m = 1…M using Boolean OR (∨) and AND (∧) operators as
$$ {}\mathrm{B}(\boldsymbol{x};\boldsymbol\theta) = \,\bigvee_{q=1}^{Q} \;\; \,\bigvee_{n=1}^{N_{q}} \;\; \left[ \,\, \bigwedge_{i=1}^{M_{q}} \quad \left(\;\;f_{z_{q}(i)}(\boldsymbol{x})\geq \theta_{z_{q}(i)}^{q,n}\;\;\right) \,\,\right], $$
(7)
where in each vector \(z_{q}\in \left \{1\ldots {M}\right \}^{M_{q}}\) there are Mq detector identifiers m∈{1…M} for BOA construction. Each term \( \left [\bigwedge _{i=1}^{M_{q}} \left (l_{z_{q}(i)}\geq \theta _{z_{q}(i)}^{q,n}\right)\right ]\) in (7) is a conjunction over the Boolean threshold comparisons of the target likelihood scores {lm | ∃i m=zq(i)}. The multiplicity of a conjunction type zq is denoted by Nq.
Every conjunction, enumerated by (q,n), operates with a distinct set of thresholds \(\theta _{z_{q}(i)}^{q,n},\; i\,=\, 1\ldots {M}_{q}\).
The negation of the BOA function (7) is used for the cascade implementation of its evaluation. In the BOA cascade, the classification to the “non-target” class is formulated via the negation of the BOA function—whenever the negated BOA function equals true. The Boolean negation of B(x;θ) in (7), in disjunctive normal form, is
[!b]
$$\begin{array}{@{}rcl@{}} \neg \mathrm{B}(\boldsymbol{x};\boldsymbol\theta) &=& \bigvee_{k=1}^{K} \;\;\left[\;\; \bigwedge_{q=1}^{Q}\; \bigwedge_{n=1}^{N_{q}}\; \left(\;\;f_{z_{q}(\mathcal{I}(k,q,n))}(\boldsymbol{x}) < \theta_{z_{q}(\mathcal{I}(k,q,n))}^{q,n}\;\;\right)\;\;\right]\\ &=& \underbrace{ \bigvee_{i_{1,1}=1}^{M_{1}}\quad \bigvee_{i_{1,2}=1}^{M_{1}}\quad \bigvee_{i_{1,3}=1}^{M_{1}} \quad\cdots\quad \bigvee_{i_{1,N_{1}}=1}^{M_{1}} }_{N_{1} \bigvee \text{-operators, i.e., } M_{1}^{N_{1}} \text{conjunctions} } \underbrace{ \bigvee_{i_{2,1}=1}^{M_{2}} \quad \bigvee_{i_{2,2}=1}^{M_{2}}\quad\cdots\quad \bigvee_{i_{2,N_{2}}=1}^{M_{2}} }_{N_{2} \bigvee \text{-operators, i.e., } M_{2}^{N_{2}} \text{conjunctions}}\quad \cdots \\ &&\cdots\underbrace{ \bigvee_{i_{Q,1}=1}^{M_{Q}}\quad \bigvee_{i_{Q,2}=1}^{M_{Q}}\quad\cdots\quad \bigvee_{i_{Q,N_{Q}}=1}^{M_{Q}} }_{N_{Q} \bigvee \text{-operators, i.e., } M_{Q}^{N_{Q}} \text{conjunctions}} \;\;\left[\;\; \bigwedge_{q=1}^{Q}\; \bigwedge_{n=1}^{N_{q}}\; \left(\;\;f_{z_{q}(i_{q,n})}(\boldsymbol{x}) < \theta_{z_{q}(i_{q,n})}^{q,n}\;\;\right)\;\;\right].\\ \end{array} $$
(8)
where the number of conjunctions is given by \(K=\prod _{q=1}^{Q} M_{q}^{N_{q}}\), and the index \(\mathcal {I}(k,q,n)\) of the detector function identifier m within vector zq of the first representation is given by
$$ \mathcal{I}(k,q,n) = \left\lfloor { \frac{\displaystyle \left\lfloor \frac{k-1}{\prod_{i=q+1}^{Q} M_{i}^{N_{i}} } \right\rfloor }{\displaystyle M_{q}^{N_{q}-n}} }\right\rfloor \text{mod } M_{q}\quad + 1, $$
((9))
Figure 2 illustrates the decision boundary using a BOA combination with z1 = [1], z2 = [1,2], and N1 = 1, N2 = 3, which is
$$ {}\mathrm{B}(\boldsymbol{x};\boldsymbol\theta) \,=\,\left(\,l_{1}\geq\theta_{1}^{1,1}\,\right) \vee \bigvee_{n=1}^{3} \left[\; \left(\,l_{1}\geq\theta_{1}^{2,n}\,\right)\, \wedge\, \left(\,l_{2}\geq\theta_{2}^{2,n}\,\right) \;\right] $$
(10)
and its negation is
$$\begin{array}{@{}rcl@{}} \neg\mathrm{B}(\boldsymbol{x};\boldsymbol\theta) \!\!&\!\,=\,\!\!& \bigvee_{k=1}^{8} \left[ \bigwedge_{q=1}^{2} \bigwedge_{n=1}^{N_{q}} \Big(f_{z_{q}(\mathcal{I}(k,q,n))}(\boldsymbol{x})<\theta_{z_{q}(\mathcal{I}(k,q,n))}^{q,n}\Big)\right] \\ &\!\!=& \bigvee_{i_{1}=1}^{2} \bigvee_{i_{2}=1}^{2} \bigvee_{i_{3}=1}^{2}\left[\!\! \Big(f_{1}\!(\boldsymbol{x})\!<\!\theta_{1}^{1,1}\Big)\! \wedge\! \bigwedge_{n=1}^{N_{q}} \!\Big(f_{z_{q}(i_{n})}(\boldsymbol{x})\!<\!\theta_{z_{q}(i_{n})}^{q,n}\!\Big)\!\right] \\ &\!\!=&\left[ \!\left(l_{1}<\theta_{1}^{1,1}\right) \wedge \left(l_{1}<\theta_{1}^{2,1}\right) \wedge \left(l_{1}<\theta_{1}^{2,2}\right)\wedge \left(l_{1}\!\!<\theta_{1}^{2,3}\right) \right]\\ &&\vee\left[ \left(l_{1}\!\!<\theta_{1}^{1,1}\right) \wedge \left(l_{1}\!\!<\theta_{1}^{2,1}\right) \wedge \left(l_{1}\!\!<\theta_{1}^{2,2}\right)\wedge \left(l_{2}\!\!<\theta_{2}^{2,3}\right)\right]\\ &&\vee\left[ \left(l_{1}\!\!<\theta_{1}^{1,1}\right) \wedge \left(l_{1}\!\!<\theta_{1}^{2,1}\right) \wedge \left(l_{2}\!\!<\theta_{2}^{2,2}\right)\wedge \left(l_{1}\!\!<\theta_{1}^{2,3}\right) \right]\\ &&\vee\left[ \left(l_{1}\!\!<\theta_{1}^{1,1}\right) \wedge \left(l_{1}\!\!<\theta_{1}^{2,1}\right) \wedge \left(l_{2}\!\!<\theta_{2}^{2,2}\right)\wedge \left(l_{2}\!\!<\theta_{2}^{2,3}\right) \right]\\ &&\vee\left[ \left(l_{1}\!\!<\theta_{1}^{1,1}\right) \wedge \left(l_{2}\!\!<\theta_{2}^{2,1}\right) \wedge \left(l_{1}\!\!<\theta_{1}^{2,2}\right)\wedge \left(l_{1}\!\!<\theta_{1}^{2,3}\right) \right]\\ &&\vee\left[ \left(l_{1}\!\!<\theta_{1}^{1,1}\right) \wedge \left(l_{2}\!\!<\theta_{2}^{2,1}\right) \wedge \left(l_{1}\!\!<\theta_{1}^{2,2}\right)\wedge \left(l_{2}\!\!<\theta_{2}^{2,3}\right) \right]\\ &&\vee\left[ \left(l_{1}\!\!<\theta_{1}^{1,1}\right) \wedge \left(l_{2}\!\!<\theta_{2}^{2,1}\right) \wedge \left(l_{2}\!\!<\theta_{2}^{2,2}\right)\wedge \left(l_{1}\!\!<\theta_{1}^{2,3}\right) \right]\\ &&\vee\left[ \left(l_{1}\!\!<\theta_{1}^{1,1}\right) \wedge \left(l_{2}\!\!<\theta_{2}^{2,1}\right) \wedge \left(l_{2}\!\!<\theta_{2}^{2,2}\right)\wedge \left(l_{2}\!\!<\theta_{2}^{2,3}\right) \right].\\ \end{array} $$
(11)
The corners of the resulting decision boundary are formed by the conjunctions (q,n)=(1,1), (2,1), (2,2), and (2,3) of (10), which are designated in Fig. 2 by the conjunction indexes (q,n) next to each corresponding outer corner of space { (l1,l2) | B(x;θ) = true }. The outer corners of space { (l1,l2) | ¬B(x;θ) = true }, which are generated by the conjunctions k=1…8 of (11), are similarly designated in Fig. 2.
There may be redundancy in the BOA equation or its negation, depending on values of the thresholds selected for θ. A conjunction within a BOA is redundant, if the BOA decision boundary does not change by removing that conjunction from the BOA equation.
Considering a BOA with conjunction lists z1,z2,…,zQ and conjunction multiplicities N1,N2,…,NQ, to find out whether a conjunction (q,nq) is redundant or not, its thresholds \(\left \{\theta _{z_{q}(i)}^{q,n_{q}}\;|\;\small {i\,=\, 1\ldots {M}_{q}}\right \}\) must be examined. Each threshold \(\theta _{z_{q}(i)}^{q,n_{q}}\) must be compared to thresholds \(\theta _{z_{p}(j)}^{p,n_{p}}\), zq(i) = zp(j) = m, on the same target likelihood score lm, which are used within other conjunctions (p,np) of the BOA. The conjunctions (p,np) to be considered are those with zp containing m = zq(i) and possibly other identifiers from zq. The list zp may not contain identifiers not listed in zq. Formally, {zp|p≠q and ∃i,j∋m = zp(j) = zq(i) and ∀i∈{1…Mp} ∃j∋ zp(i) = zq(j) }. The range of np for (p,np) is naturally np = 1…Np. For a conjunction (q,nq) to be non-redundant, one of its thresholds \(\theta _{z_{q}(i)}^{q,n_{q}},\;\small {i\,=\, 1\ldots {M}_{q}}\) must be smaller than any threshold \(\theta _{z_{p}(j)}^{p,n_{p}}\), zq(i) = zp(j) = m, in its corresponding conjunctions (p,np). That is, in conjunction (q,nq) there must exist at least one threshold \(\theta _{m}^{q,n_{q}}\) for which \(\theta _{m}^{q,n_{q}}<\theta _{m}^{p,n_{p}}\) of all the corresponding conjunctions (p,np).
BOA as a binary classification cascade
Algorithmically, a BF is evaluated in steps, i.e., sequentially. If any of the conjunctions of BOA function (7) or its negation (8) resolves as true, the entire functions (7) and (8) become determinate. In other words, as soon as any of the conjunctions (q,n), q = 1…Q, n = 1…Nq of a BOA B(x;θ) outputs true, i.e., \(\left [\bigwedge _{i=1}^{M_{q}}\left (l_{z_{q}(i)}\geq \theta _{z_{q}(i)}^{q,n}\right)\right ]=\textit {true}\), it means that B(x;θ)=true. Without evaluating the rest of the BOA conjunctions the detection result “target event detected” may then be announced. Similarly, if any of the conjunctions k = 1…K of the negation of the BOA ¬B(x;θ) outputs “true,” that is, if \(\left [\bigwedge _{q=1}^{Q}\bigwedge _{n=1}^{N_{q}}\left (l_{z_{q}(\mathcal {I}(k,n,q))}<\theta _{z_{q}(\mathcal {I}(k,n,q))}^{q,n}\right)\right ]=\textit {true}\), it means that ¬B(x;θ)=true. The evaluation can then be stopped and the classification result “non-target” can be released.
Computationally, the heaviest part of BOA evaluation is the acquisition of target likelihood scores lm for an input sample x by computing the functions fm(x)=lm, m = 1…M. The cost of threshold comparisons within BOA may be considered negligible. From computational aspect of evaluating a BOA, once the likelihood score lm is acquired, all the Boolean comparisons \((l_{m} \geq \theta _{m}^{\;\ast })\) and \((l_{m} < \theta _{m}^{\;\ast })\), which are based on the score lm, become immediately available. In case the BOA function (7) or its negation (8) becomes determinate with the Boolean comparisons of already computed subset of scores lm, m = 1…M, the classification may be released without running the rest of the detector functions at all.
We have implemented the BOA as a binary classification cascade, where a cascade stage s∈{1…S} calculates a score fm(x)=lm=ls using a predefined detector function fm and offers a possibility for releasing the classification result, as shown in Fig. 3. Internal decisions at each stage s=1,2,…,S of the BOA cascade, whether to release a class estimate or to enter the next cascade stage, are made with BFs \(B_{s}^{\text {class}}((l_{1},l_{2},\ldots,l_{s})),\;s\,=\, 1\ldots {S}\), i.e. \(B_{1}^{1}(l_{1})\), \(B_{1}^{0}(l_{1})\), \(B_{2}^{1}(l_{1},l_{2})\), \(B_{2}^{0}(l_{1},l_{2})\),…, \(B_{S}^{1}(l_{1},l_{2},\ldots,l_{S})\) and \(B_{S}^{1}(l_{1},l_{2},\ldots,l_{S})\). That is, the functions \(B_{S}^{1}\) and \(B_{S}^{0}\) of cascade stage s utilize the target likelihood scores l1,l2,…,ls. All these functions are partitions of the BOA function B(x;θ) of (7) and its negation ¬B(x;θ) of (8) such that
$$ \mathrm{B}(\boldsymbol{x};\boldsymbol{\theta}) = B_{1}^{1} \vee B_{2}^{1} \vee \ldots \vee B_{S}^{1} $$
(12)
and
$$ \neg\mathrm{B}(\boldsymbol{x};\boldsymbol{\theta}) = B_{1}^{0} \vee B_{2}^{0} \vee \ldots \vee B_{S}^{0}. $$
(13)
Formal expressions for the partition of the BOA function (7) into functions \(B_{1}^{1}, B_{2}^{1},\ldots.,B_{S}^{1}\) and the BOA negation (8) into functions \(B_{1}^{0}, B_{2}^{0},\ldots.,B_{S}^{0}\) are derived in Appendix 2.
As an example, operation process of BOA cascade \(\mathrm {B}(\boldsymbol {x};\boldsymbol \theta)=\left (l_{1}\geq \theta _{1}^{1,1}\right) \vee \bigvee _{n=1}^{3} \Big [\left (l_{1}\geq \theta _{1}^{2,n}\right)\wedge \left (l_{2}\geq \theta _{2}^{2,n}\right)\Big ]\) for MAHNOB laughter data classification is illustrated in the Fig. 2 with the background color of the l1- vs. l2-axis and is as follows. The classification takes place at the first cascade stage for all the samples x for whom \(B_{1}^{1}(\boldsymbol {x}) =\left (l_{1}\geq \theta _{1}^{1,1}\right) = \textit {true}\) or \(B_{1}^{0}(\boldsymbol {x}) =\left (\,l_{1} < \min \left (\;\theta _{1}^{2,1},\,\theta _{1}^{2,2},\,\theta _{1}^{2,3}\;\right)\right) = \left (l_{1}<\theta _{1}^{2,3}\right) = \textit {true}\). In the first case, the classification is “Laughter detected,” and in the second case “No Laughter.” These subspaces of (l1,l2) on the left and right outskirts of Fig. 2 are indicated with a pale background color. In the second stage of the cascade processing, the likelihood f2(x)=l2 is computed only for the samples with \(\theta _{1}^{2,3}\leq l_{1}<\theta _{1}^{1,1}\), although l2 is shown for all the samples in the Fig. 2. With the dataset in the Fig. 2, it means that classification of approximately 65% of the samples are made using the detector function f1 only.
The computational efficiency of the cascade naturally depends on the order of detector methods to be utilized at cascade stages. Generally, the faster methods should be evaluated first, and the slower ones later. If the methods fm, m = 1…M have very different computational loads \(\mathcal {L}_{m}, \;m\,=\,1\ldots {M}\), it is very likely that a cascade ordered such that \(\mathcal {L}_{s}\ll \mathcal {L}_{s+1}, \,s=1\ldots {S}-1\) is the most efficient one. Precisely, the most computationally efficient cascade structure may be defined via local inequalities among each two consecutive stages s and s+1 as follows. If we denote the probability of a sample arriving stage s to be classified at stage s after computing ls=fs(x) with P1, and the probability of a sample arriving stage s to be classified at stage s if the detector method fs+1 would be utilized instead of the method fs with P2, it must hold that \(P_{1}\geq (\mathcal {L}_{s}/\mathcal {L}_{s+1})P_{2}\).
In our work, the computational loads of the detector methods are very different from each other, i.e. \(\mathcal {L}_{s}/\mathcal {L}_{s+1}\ll 1\). Thus within the BOA cascade the detector methods fm m = 1…M are ordered according to their computational loads. For notational simplicity we assume that the detector methods fm used in a BOA cascade are enumerated such that for their computational loads \(\mathcal {L}_{m}\) it holds that \(\mathcal {L}_{m}\ll \mathcal {L}_{m+1}\), and now in a BOA cascade fs=fm. The Table 3 demonstrates the computational efficiency achieved in our experiments.
For a sample in a dataset X, the computational load of classification with a BOA cascade is on average
$${}\sum\limits_{m=1}^{S} \left(1- \frac{\left|\left\{ \boldsymbol{x}\in X,\;\; \bigvee_{s=1}^{m-1} B_{s}^{1}(\boldsymbol{x})\vee B_{s}^{0}(\boldsymbol{x}) =\textit{true} \right\}\right|}{|X|} \right) \cdot \mathcal{L}_{m}. $$
To design a specific type of BOA cascade, e.g., one-sided or symmetrical, the lists zq, q = 1…Q, which determine the detector functions to be utilized within the conjunctions of the BOA, must be selected appropriately. For data with clearly unbalanced class distribution, one-sided cascade is computationally efficient if the early classification option is available for the prevalent class. This is the case if the decision-makers \(B_{s}^{\text {prevalent}},\,s\,=\, q\ldots {S}\) for the prevalent class are functioning while the decision-makers \(B_{s}^{\text {rare}},\,s\,=\, 1\ldots {S}-1\) for the rare class are null/nonexistent as \(B_{s}^{\text {rare}}(\boldsymbol {x})=\textit {false}\;\forall \; \boldsymbol {x}\). Thus, for the usual case, where the “target” class is rare and the “non-target” class is the prevalent one, to ensure a computationally efficient one-sided BOA cascade, the BOA must be conjunctive, designed with only one conjunction list z1=[1,2,…,S]. In case the target likelihood scores are negated, i.e., −f1(x),−f2(x),…,−fS(x) are used, conjunction list of every subvector of [1,2,…,S], should be used to build a one-sided cascade capable of early classification to “non-target” class. For example in case S=3, the conjunction lists would thus be z1=[1], z2=[2], z3=[3], z4=[1,2], z5=[1,3], z6=[2,3] and z7=[1,2,3].
A symmetrical cascade, which enables early classification to both the classes at all the cascade stages, is suitable for classification tasks with both even and unbalanced class distributions. The time to decision efficiency of the cascade depends on capability of all the internal decision makers \(B_{S}^{1}\) and \(B_{S}^{0}\) for s = 1…S of the cascade to make early classifications. Functioning decision makers for all the stages and both the classes to build a symmetrical BOA cascade are ensured by constructing the BOA from cumulative conjunction lists z1=[1], z2=[1,2],…,zS=[1,2,…,S], that is zs=[1,2,…,s].
BOA tunability property
Classification performance of the BOA depends on all the values of thresholds \(\theta _{m}^{q,n},\,m\,=\, 1\ldots {M},\;q\,=\, 1\ldots {Q},\,n\,=\, 1\ldots {N}_{q}\) in θ. Classifying data X={X0,X1} from two classes with a BOA B(x;θ) results in certain true positive rate tprθ and false positive rate fprθ, which produces one point into a space of precision (P) vs. recall (R). Classifying the data X with the BOA B(x;θ) with all the possible sets of different threshold values in θ results in a constellation of performance points in (P,R) space. Best performing threshold values for the BOA are those corresponding to the classification performance on the upper frontier of this (P,R) constellation.
We want to make the BOA sensitivity tunable with a single parameter in similar way to individual detectors. For that, we introduce a parameter α∈[0…1], which denotes the sensitivity setting of a BOA. A value of the parameter α corresponds to a fixed set θα of the BOA threshold values such that B(x;α)=B(x;θα). In the next section, we introduce an algorithm to select threshold values for θα for a range of values of the sensitivity parameter α. These operating points result in the BOA performance to be close to the upper frontier of the (P,R) constellation of BOA performance with all the possible settings of θ.
The user may then select for a BOA B(x;α) the operating point α with the most desirable behavior with the factual costs of a false positive \(\mathcal {C}_{fp}\) and a false negative \(\mathcal {C}_{fn}\) of the problem. The operating point α∗ of minimal expected misclassification cost can be found at
$${} {{\begin{aligned} \alpha^{*}=\min_{\alpha} \left(P\left(\boldsymbol{x}\in X^{1}\right)\cdot\left(1-\textit{tpr}_{\alpha}\right)\cdot \mathcal{C}_{fn} + P\left(\boldsymbol{x} \in X^{0}\right)\cdot\textit{fpr}_{\alpha}\!\cdot \mathcal{C}_{fp} \right). \end{aligned}}} $$
where P(x ∈ X1) and P(x ∈ X0) are the prior probabilities of the classes.
The proposed algorithm to set parameters of a BOA
We train the BOA B(x;α) by finding suitable values for thresholds θα for a range of values of α∈[0…1] in terms of training data X. The possible threshold values 𝜗m considered for a target likelihood score lm are given by the scores of target class samples x∈X1 as 𝜗m=fm(x)=lm.
The proposed algorithm, BOATHRESHOLDSEARCH, for training a BOA is presented in Algorithm 1. As input, the algorithm needs training data X={X0,X1} from two classes, the conjunction lists z1,z2,…,zQ, maximal conjunction set multiplicities N1,N2,…NQ of the BOA and the maximal number \(N_{\mathcal {S}}^{\text {max}}\)of candidates for θα saved by the algorithm for each α. The algorithm produces sets \(\boldsymbol \theta _{\alpha _{t}}\) of fixed threshold values for BOA operating points \(\alpha _{t}=\frac {t}{T},\;\; t\,=\, 0\ldots {T}\), where T equals the number of samples x∈X1. These operating points correspond to true positive rates \(0,\frac {1}{T},\frac {2}{T},\ldots,\frac {T-1}{T}, 1\) on training data X. The algorithm searches for suitable threshold values step by step starting by selecting values for θ0 for α0=0 and terminating after selecting values for θ1 for αT=1. The method is greedy in a sense that when searching for values for αt at iteration t, the search starts from a potential set of threshold values for αt−1 provided by iteration t−1, and the threshold values are allowed to change only gradually for minimizing the number of false positives locally.
The algorithm starts by fixing the BOA thresholds for sensitivity level α0=0 to be \(\boldsymbol \theta _{\alpha _{0}} = \{\infty \}\). The BOA with parameter setting α0=0 does not accept any sample to the “target” class, i.e., B(x;α0)=false ∀ x∈X. Thus, the algorithm starts with \(\textit {tpr}_{\alpha _{0}}\,=\, \textit {fpr}_{\alpha _{0}}\,=\, 0\). The threshold setting \(\boldsymbol \theta _{\alpha _{0}}\) and the corresponding number 0 of false positives are placed into a set \(\mathcal {S}_{0}\) as an entry (θ = {∞},fp = 0) for the next step to start with.
At each step t = 1…T, every threshold setting θ, given by entries (θ,fp) in \(\mathcal {S}_{t-1}\), provided by the step t−1, is adjusted. One adjusted set θnew is obtained by mitigating one or multiple thresholds \(\theta _{m}^{q,n}\in \boldsymbol \theta \) of one conjunction (q,n) of the BOA. Within each BOA conjunction (q,n), there are \(2^{M_{q}}-1\phantom {\dot {i}\!}\) subsets of thresholds \(\phantom {\dot {i}\!}\left \{\theta _{z_{q}(i)}^{q,n}|\,\small {i\subseteq \{1\ldots {M}_{q}\}}\right \}\) to search for the best change from θ to θnew. Thus in the complete BOA function there are \(P={\sum \nolimits }_{q=1}^{Q} N_{q}\cdot \left (2^{M_{q}}-1\right)\) possible subsets of thresholds to change, and thus one θ generates up to P changed threshold settings θnew.
When mitigating the values of thresholds \(\left \{\theta _{z_{q}(i)}^{q,n}\;|\; i\subseteq \left \{1\ldots {M}_{q} \right \}\,\right \}\) of a conjunction (q,n) from their values in θ for θnew, the amount of changes are such that B(x;θnew) accepts exactly one more sample x∈X1 than B(x;θ). That is, B(x;θ)=B(x;θnew) ∀ x∈X1∖x∗, B(x∗;θ)=false and B(x∗;θnew)=true. If redundancy of BOA function appears with the new threshold set θnew, all the thresholds \(\theta _{z_{q}(i)}^{q,n},\,i\,=\, 1\ldots {M}_{q}\) of the redundant conjunctions (q,n) are reset to be \(\theta _{*}^{q,n}=\infty \). All the acquired new settings θnew are saved with their resulting false positive counts into a set \(\mathcal {S}_{t}\) as entries {(θ,fp)new} to be potential settings for αt.
After processing every entry \((\boldsymbol \theta,\mathit {fp})\in \mathcal {S}_{t-1}\) and saving all the generated new entries into \(\mathcal {S}_{t}\), the best set θ∗ of BOA thresholds among the entries of \(\mathcal {S}_{t}\) is selected for \(\boldsymbol \theta _{\alpha _{t}}=\boldsymbol \theta ^{*}\) to correspond to αt. The best set θ∗ is a selected to be the one corresponding to the smallest number of false positives among the entries in \(\mathcal {S}_{t}\) and using as few BOA conjunctions as possible with non-infinite thresholds. The set \(\mathcal {S}_{t}\) is then pruned to keep the maximal allowed number \(N_{\mathcal {S}}^{\text {max}}\) of the best entries for the next step to start with. In the experiments, we used \(N_{\mathcal {S}}^{\text {max}}=10\), as larger number did not improve the recognition accuracy notably while making the algorithm run remarkably slower.
Figure 4 illustrates the thresholds θα found by the algorithm with \(N_{\mathcal {S}}^{\text {max}}=1\) for a BOA
$$ {}\mathrm{B}(\boldsymbol{x};\alpha)=\mathrm{B}(\boldsymbol{x};\boldsymbol{\theta}_{\alpha}) = \left(l_{1}\geq\theta_{1}^{1}\right) \:\vee\: \left[\, \left(l_{1}\geq\theta_{1}^{2}\right) \wedge \left(l_{2}\geq\theta_{2}^{2}\right) \,\right]. $$
(14)
for \(\alpha =0,\frac {1}{T},\frac {2}{T},\ldots,\frac {T-1}{T},1\).
The memory requirement of the algorithm, besides the training data and the output variables, during the algorithm run is the storage needed for the set \(\mathcal {S}_{t}\) of the potential operating points to be stored at each iteration. As maximally \(N_{\mathcal {S}}^{\text {max}}\) operation points are passed from one iteration to the next one, the number of operation points to be held in memory during an iteration of the algorithm run is maximally \(N_{\mathcal {S}}^{\text {max}}\times {\sum \nolimits }_{q=1}^{Q} N_{q} \left (2^{M_{q}} -1 \right)\).
Computational complexity of the BOATS algorithm is \(\mathcal {O}\left (\;|X^{1}|\; N_{\mathcal {S}}^{\text {max}}\; N_{\text {conj}}^{\text {max}}\; 2^{M}\;\right)\). In practice, multiple positive samples are often selected concurrently, diminishing the multiplier |X1|. The limit \(N_{\mathcal {S}}^{\text {max}}\) is an input parameter which allows the user to decide about the accuracy vs time and memory complexity trade-off of the algorithm. \(N_{\text {conj}}^{\text {max}}\) is the maximum number of conjunctions in the DNF-BF BOA-function, which takes place at the operating point of recall =1. At operating points with lower recall values, the true value is generally lower, and using \(N_{\text {conj}}^{\text {max}}\) sets upper limit for the time complexity. The number 2M is upper limit of options tested when processing each conjunction, the true number for each conjunction (q,n) is \(\phantom {\dot {i}\!}2^{M_{q}}-1\).