In this section, we evaluate the accuracy of the proposed scheme in terms of object recognition, and we estimate its energy consumption when implemented on constrained sensors.
4.2. Target Recognition and Performance Analysis
The proposed scheme combines the Haar wavelet decomposition with transformation of ring projection (TRP) [
43] to extract object features. An object is recognized when the distance between the extracted features and the features of a reference image crosses a predefined threshold. To demonstrate the algorithm’s capabilities, we tested its performance using images of size 64 × 64 pixels 8 bpp and 128 × 128 pixels 8 bpp. We also developed a dataset of 168 images corresponding to six animals (horse, wolf, deer, elephant, rhino, and tiger). The standard MPEG-7 dataset [
55] used in the literature does not simulate different motor modes created by moving objects captured by an object-tracking application. Therefore, we generated 28 8-bitmap grayscale images for each class of animal. Each image manifests the animal in different orientations. This aim is achieved by placing the object in different positions and applying different levels of scaling and rotation (
Appendix A). The image dataset was divided: 60% of the images were used to train the algorithm and learn threshold values, and 25% were used to evaluate the algorithm’s accuracy. The remaining 15% of images were utilized as reference images.
Every image in the dataset undergoes a feature extraction process as follows: (1) apply the Haar wavelet transformation for object feature extraction to extract 2D approximate coefficient vectors; (2) apply transformation of ring projection (TRP) to convert features into a 1D feature vector that is invariant to object rotation. The combined Haar-TPR feature extraction yields only 12 feature vectors to represent the extracted object. This minimum number of features implies lower memory requirements and reduced computational complexity of the signature-matching algorithm.
Figure 4 shows some results obtained from applying the feature extraction based on the Haar wavelet and using the GFD to all images related to a specific class of animal across different orientations. The colored curves represent the cumulative
ith feature vector of a specific image in the class dataset. Comparing
Figure 4a,b, the Haar-based extracted feature set using the horse class shows almost identical curves with a minor variation in the difference between them more than in the GFD-based algorithm. As demonstrated in the remaining graphs in the same figure, we can observe the high correlation level of the extracted features despite the differences in the object orientation using the Haar-based algorithm rather than the GFD technique. This attests to the Haar-wavelet-based features’ stability in representing the animal and their invariance to levels of rotation, scaling, and translation applied to the object. Moreover, as we noticed, the total number of object descriptors is reduced from 52 feature vectors using the GFD-based algorithm to 12 feature vectors using the Haar-based algorithm. This improvement in reducing the needed number of descriptors can free up needed memory space and minimize the transformed data size.
As described earlier, the extracted features were compared against a pre-loaded target signature in the configuration setup. If the difference between the detected signature ( and the reference signature ( is less than a threshold , the detected object is declared a target, and the sensor notifies the user. Otherwise, the detected object is ignored.
We adopted a minimum Euclidean distance (MED) metric [
56] to find the possible discrimination threshold set value. We have to find the minimum Euclidean distance as follows:
We repeatedly applied Equation (18) for each native class in the image dataset where we calculated the minimum and maximum ED for each training set in the native class and each training set in other classes, as shown in
Figure 5. After we apply this process to all, we obtain a set of possible threshold values.
We conducted six rounds of experiments to determine possible threshold values that optimize object recognition for all six classes of images. The performance of the object recognition algorithm is evaluated by measuring the retrieval performance using precision and recall metrics as illustrated in Equations (19) and (20), where TP is the total number of actual positive, relevant classified images, FP is the number of false positive classified irrelevant images, and N is the total number of relevant shapes in the dataset.
Classification and retrieval metrics at each threshold value are illustrated in
Table 3. Note that setting the threshold to 7 achieves excellent classification efficiency across all animal classes. However, as we decrease the threshold value, we note a decrease in classification efficiency. The lowest classification efficiency of 92% was scored at a threshold value of 0 for the wolf animal class. The precision metric was stable at 100% across all animal classes for threshold values ranging from 0 to 5.
Nevertheless, as we increase the threshold to 7, we note a precision as low as 94% regarding the horse class. The recall metric achieves a perfect score at threshold 7. However, as we decrease the threshold, the recall metric can become as low as 92% when evaluated at a threshold value of 0 for the wolf class. We can conclude that choosing a higher threshold will improve both classification and recall efficiency, but the retrieval precision will decrease gradually. Therefore, we select the middle thresholds 3 and 5 to test object recognition accuracy when applied to the testing dataset.
According to this result, our proposed scheme presents a robust and accurate shape descriptor for recognizing and identifying an object. It presents a high ability to capture significant features of the sensed object compared to obtained results from the GFD-based recognition scheme. However, as shown in
Table 4, we can notice that the precision level varied within a short range of ED threshold values from 0.155 to 0.26 related to the feature vectors extracted using GFD, resulting in difficulty in classification efficiency in determining whether the target belongs to the specific class or not. In contrast, this ED range is expanded using the Haar-based scheme as shown in
Table 3 and improves the classification in precision and recall metrics.
For example, taking the horse class, using the highest threshold value will show a high ability of image retrieval with 100%, while the precision shows a low accurate classification ability in the GFD-based algorithm. Using the highest possible threshold value for the same class will maintain at least 94.1% classification accuracy. For the same class, the precision decreased as we increased the threshold value from 100% to 33% using the GFD-based algorithm, while using the Haar-based algorithm, the precision maintains a high level of precision accuracy ranging from 94.1% to 100%.
4.3. Energy Consumption Efficiency Analysis
The presented distributed processing scheme starts with the cluster-establishing phase where the camera selects a set of nodes to participate in the image processing tasks. The scheme has been decomposed into a set of atomic sub-tasks illustrated in
Table 5. We quantified the energy consumption level for each sub-task due to the in-node processing and communication using the Avrora simulator, a tool that emulates the internal resources and processing of a set of sensor nodes such as Mica and TelosB sensors. For instance, the energy consumption of a centralized Haar-based algorithm implementation, where the camera node executes all image processing sub-tasks, can reach 4.02 mJ in 64 × 64-8 bpp images and 4.86 mJ in the 128 × 128-8 bpp image (see
Table 5).
On the other hand, distributing the image processing tasks across more than one node enables the camera node to initiate more sensing cycles, eventually extending the network lifetime.
A typical distribution would assign a sub-task to an individual node. Thus, a cluster would consist of the camera and three cooperating nodes. However, empirical results have shown that distributing the sub-tasks among two cooperating nodes is the optimal energy-saving choice because it avoids excess communication overhead and unfair processing load distribution.
We investigate the related energy consumption and elapsed time for cluster establishment where the camera selects possible candidate processing nodes, as shown in
Table 6. Note that we neglect the energy consumed to capture the image by the camera node.
After the camera forms the processing cluster and the selected co-nodes acknowledge their participation roles, the camera and P1 cooperate to further object identification and feature extraction. The energy consumption is summarized for the image sizes of 64 × 64 8 bpp and 128 × 128 8 bpp in
Table 7 and
Table 8 for Haar-based and GFD-based distributed schemes, respectively.
When the first co-node (P1) receives the region ROI from the camera, it will start the extraction of the object descriptors from the background and normalize the object to be in the center of the image for the Haar-based scheme, while this step is not necessary for GFD due to its invariant properties for object orientation as illustrated in
Section 3. Then, the first co-node (P1) will send the normalized approximate coefficients to the second co-node (P2), which in turn will apply the transformation of ring projection (
TRP). This process will convert the 2D approximate coefficients to 1D feature vectors for the matching process.
Table 9 and
Table 10 illustrate the processing load’s balanced distribution among P1 and P2 for both distributed schemes in detail.
When the target is recognized, the scheme will notify the end-user with different possible notification message types.
Figure 6 shows the energy consumed when a camera node sends a simple 1-byte notification to the end-user. In addition, we show the energy consumption when the extracted feature vectors are added to the notification packets sent to the end-user. Using the feature set extracted with the Haar-based scheme as a notification option will decrease the energy needed for transmission compared to the extracted set based on GFD.
Figure 7 summarizes the total energy consumption and elapsed time between the camera and the cooperating processing nodes P1 and P2. As shown, the presented energy consumption level based on the Haar wavelet shows a more balanced processing load distribution using the 64 × 64 8 bpp and 128 × 128 8 bpp images than that using GFD. The energy consumption related to the processing of the scheme is shared between the different nodes, where the camera node consumes around 27% of energy during a single sensing cycle, while the cooperative nodes consume 73% of energy from the total consumed energy required to process the scheme using an image size of 64 by 64. However, for an image of 128 × 128 8 bpp, these percentages of energy consumption could reach 37% in the camera node and 63% in collaborating nodes. The camera elects candidate nodes for cluster participation based on the highest residual energy in each new sensing cycle. This step will distribute the total collaborated energy consumption over the alive nodes, consequently extending the node lifetime.
Figure 8 shows our presented scheme’s energy consumption and elapsed time compared to the use of the general Fourier descriptors (GFDs) for feature extraction. As we infer from
Figure 8a,b, using Haar wavelet decomposition, the amount of energy consumed by P1 and P2 is the same regardless of the size of the original image. The Haar decomposition result will permanently be fixed to the size 64 × 64 8 bpp. We also note that applying the Haar wavelet to extract features considerably preserves the camera energy in the case of larger images 128 × 128 bpp. The node P2 is responsible for measuring the distance between the extracted signature and the reference signature. The length of the feature vector implies the complexity of this matching. Therefore, we note that P2 consumed more energy when using the GFD (52) feature vector than when using the Haar feature vector (12). There is also noteworthy energy preserving in the node (P1) as the complexity of the TRP process is less demanding than the GFD process.
Figure 8c,d show that Haar wavelet decomposition decreases the needed time for a single cycle compared to the GFD method. It is also evident that the time required by the Haar-based method is fixed regardless of the image size.
Figure 9 plots the energy consumption level in the first five sensing cycles to study the energy level distribution between the camera and the ten cooperating nodes. In each sensing cycle, the camera selects two nodes with the highest residual energy to ensure that participating in the processing cluster will prolong the network lifetime as much as possible. The presented cumulative energy consumption level shows that each node is selected once during every five sensing cycles. The figure shows that the camera repeatedly participates in every sensing cycle while other nodes are alternately selected based on their residual energy level. This is due to the leading role that the camera plays in leading the processing cluster. Therefore, we note that the camera energy is depleted before the energy of any other processing nodes. The same behavior is observed in image size 128 × 128 8 bpps, as shown in
Figure 9b.
Figure 10 shows the centralized implementation of the Haar wavelet scheme, where the camera is responsible for processing all recognition tasks. The graph plots the cumulative energy consumption in the camera from the first sensing cycle until the camera depletes its energy. We note that the centralized scheme can reach only 20 sensing cycles.
On the other hand, the distributed implementation of Haar-based image recognition can extend the camera lifetime, as demonstrated in
Figure 11. As sensing cycles proceed, each node will have different residual energy based on its participation role in the processing cluster.
Figure 11 gradually plots the per-node energy consumption (estimated average) every ten sensing cycles. In the distributed implementation of the Haar-based recognition scheme, the sensing cycles extended from 20 sensing cycles in centralizing to 70. Note that when the image size is 128 × 128 8 bpp, the number of sensing cycles is decreased from 70 sensing cycles, as shown in
Figure 11a, to 40 sensing cycles, as shown in
Figure 11b.
This decrease can be attributed to the increased energy consumption in the camera node when it decomposes the larger image size (128 × 128 bpp) to extract the Haar coefficients. As the sensing cycles iterate, the amount of energy consumed increases. The amount of energy consumed does not exceed 30% of per-node residual energy in both image sizes. This implies that the scheme preserves approximately at least 70% of the per-node residual energy level, which is a promising indicator that adopting the proposed scheme can be applied efficiently within a multi-application network that can reuse the nodes for other sensing purposes.
Figure 12 shows the energy consumption of a distributed scheme based on GFD. The figure plots the average camera residual energy in contrast to the average residual energy in network nodes.
We can see that the proposed scheme prolongs the camera lifetime to accomplish 70 sensing cycles instead of 40 cycles using image size 64 × 64 8 bpp as presented in
Figure 11a and
Figure 12a where it completes 40 sensing cycles instead of 10 sensing cycles using GFD on image size 128 × 128 bpp, as presented in
Figure 11b and
Figure 12b.
Moreover, we can infer that using the Haar wavelet also extends the average residual energy level in network nodes where we have at least 70% of the energy level where it is the same residual energy level using the GFD.
As illustrated in
Figure 13, the elapsed time for a multiple sensing cycle remarkably decreased to 2 s, whereas using GFD needs between 5 and 8 s depending on image size. As shown, we can simultaneously perform 20 sensing cycles using the Haar-based scheme to accomplish only 10 sensing cycles using the GFD-based scheme on image size 64 × 64 8 bpp and 40 sensing cycles using the presented scheme to accomplish only 10 sensing cycles using GFD on image size 128 × 128 8 bpp.
This is evidence that minimizing the number of candidate collaborating nodes raises the distributed processing loads on the network nodes. Nevertheless, using only two collaborating nodes will rapidly drain the network node energy level due to maximizing each node’s selection probability. Conversely, assigning tasks to other network nodes helps the camera process more sensing cycles, extending its lifetime.
Previous work in the literature has investigated various approaches to energy efficiency in event-based multimedia sensing. We can roughly classify previous work, as shown in
Table 11, across two dimensions: the processing model and the implementation approach. Similar to our work, some approaches distributed the work across more than one node, while others adopted a centralized approach where a single node executes all the work. It is also noted that some approaches were implemented using hardware components instead of a software solution.
In [
30], the authors presented a centralized event-based detection solution using centroid distance and histogram methods. It has been shown that a multimedia node consumes 47.6 mJ for an image size of 64 × 64 × 8 bpp and 80.2 mJ for an image size of 128 × 128 × 8 bpp. These levels of energy are much higher than the energy required in this proposed new solution. Furthermore, the centroid distance exhibits a low level of accuracy as a descriptor for target recognition [
57].
In our previous work published in [
31], we compared the energy efficiency of a scheme based on GFD and another one based on Zernike moments (ZMs) for event-based object recognition. We found that the energy consumption using ZMs can reach 9.995 mJ when using images of 64 × 64-8 bpp and 16.8 mJ for images of 128 × 128-8 bpp, while the required energy in a scheme based on GFD is 4.02 mJ for images of 64 × 64-8 bpp and 4.86 mJ for images of 128 × 128-8 bpp. Compared to our new solution, based on distributed scheme implementation, the energy consumption in the camera node decreased to 1.4 mJ instead of 2.46 mJ using GFD (approx. 50% reduction) in all image sizes. This energy expansion will extend the camera life and improve the network performance.
In [
25], the authors introduced an energy-aware face-detection algorithm that utilizes a lightweight feature vector to be sent to the sink at a low transmission cost. Despite its demonstrated low energy consumption, this technique can flood the network with unnecessary data if the end-user is looking for a specific target. Performing the recognition algorithm in the network rather than on the sink can reduce the bandwidth from irrelevant data and can consequently increase the network lifetime.
Some other research work used the approach of energy-aware image compression to reduce the amount of transmitted data through the network. In [
5], the authors have proposed an energy-aware scheme intended for image transmission in WMSNs. Their approach ensures a low overhead data compression for energy saving based on the curve fitting technique. The obtained results demonstrated energy efficiency compared to other similar data compression algorithms, but in a comparison with the proposed distributed scheme, we can note that our approach achieved much better performances in terms of energy consumption.
In [
19], Kouadria et al. used a discrete Tchebichef transform (DTT)-based image compression technique. Due to its lower complexity, the DTT compression technique is an alternative to the discrete cosine transform (DCT). However, experimental results have shown that it consumes a considerable amount of energy per block of 8 × 8 pixels (around 146.63 mJ, which is a very high level of consumed energy).
In [
27], the authors introduced a distributed compression algorithm. It is noted that the approach consumes around 1.4 J for the compression of an image size of (512 × 512) 8 bpp, which is considered an extremely high level of energy consumption. In the same context, G. Nikolakopoulos et al. presented a compression scheme based on quad-tree decomposition in [
36]. The obtained results showed that it consumed 120 mJ energy to transmit an image of 128 × 128 × 8 bpp and 45 mJ to transmit an image of 64 × 64 × 8 bpp.
As a solution to reduce the high energy consumption related to the software implementation of compression algorithms, another approach based on hardware implementation was proposed in [
28]. Although the hardware implementation increases the cost, it ensures a significant gain in energy. However, we have shown that we can identify and recognize events of interest without altering the sensor design while maintaining a low energy consumption level.
In conclusion, the new approach of distributed implementation proved that the processing load of the camera sensor was reduced. At the same time, other tasks were transferred to other cooperating nodes, which extends the sustainability of the application and allows the camera node to execute more sensing cycles. The performance evaluation of the presented scheme shows that our work outperforms the other proposed solution in the literature in terms of energy consumption associated with the target recognition in image-based sensing which consequently extends the multimedia application lifetime in the wireless sensor network.