1. Introduction
Artificial neural networks (ANNs) have become a staple of machine learning, and they are more and more frequently applied in embedded systems as well. To enable their application in, among other things, mobile devices, their computational need and power consumption have to be significantly reduced. This is crucial because mobile devices have significantly limited computational resources and battery life, and they are often used for tasks that require real-time processing. Thus, the goal is to create ANNs that are lightweight and efficient, making them well suited for deployment in mobile devices and other resource-constrained environments.
One possible way of enabling low-energy computation is the employment of special-purpose hardware accelerators, which use the physics of waves to naturally compute convolution integrals. Such devices are known in the literature as surface acoustic waves (SAWs), where an SAW-based convolver uses the interference of two counter-propagating waves to perform a convolution with the physics of the system [
1]. Another implementation of the device uses spin waves [
2], which are, in fact, very amenable to low-power, on-chip implementation for such hardware accelerators [
3].
However, in a real physical system (such as a wave-based convolver), dissipation is unavoidable, and waves (especially spin waves) will decay over a certain distance. This will influence the performance and usefulness of the convolver. The purpose of this paper is to evaluate the performance of the convolver in the presence of such decay.
In this paper, we demonstrate and investigate a special kernel convolution which can be the cornerstone of a wave-based convolver device that can yield a fast and energy-efficient building block of a convolutional neural network without a significant decrease in testing accuracy. With our simulations, we also identify the most important factors, such as the attenuation (decay) and its affect on classification accuracy for commonly investigated datasets. These results can help in the construction of a device by identifying the constraints regarding attenuation with which the device can function with high accuracy.
Nowadays, convolutional neural networks (CNNs) have become a crucial tool for solving various artificial intelligence problems and have been proven to deliver state-of-the-art results across a wide range of applications. In particular, CNNs have shown great success in image processing, video analysis, and natural language processing tasks. These tasks typically involve large amounts of high-dimensional data, such as images and videos, or complex relationships between words and sentences in text data. CNNs have been successful in these tasks because of their ability to automatically learn hierarchical representations of the data and to identify and use the most discriminative features for a given task. As a result, they have become the go-to solution for many researchers and practitioners in these fields and continue to be an active area of research and development.
One area where CNNs are used very successfully is the development of self-driving cars, such as in traffic sign recognition and identification [
4,
5,
6], navigation [
7], and 3D object recognition [
8], agrarian object identification [
9], or in the medical field, such as in ECG signal classification and prediction [
10], diabetic retinopathy recognition [
11], thyroid nodule diagnosis [
12], or lung pattern classification for interstitial lung diseases [
13], in which their application has appeared even in mobile phones and embedded devices [
14].
The main and most energy-consuming operation of these architectures is convolution, so the optimal implementation of this operation is extremely important.
In the case of mobile and embedded vision applications, energy-saving implementation is also an important consideration. Lightweight deep neural networks (such as MobileNets [
15], Xnor-Nets [
16], and spiking neural networks [
17]) can be used to achieve a reduction in energy consumption. Another possibility is to use a special device (such as FPGA [
18] or ASIC devices [
19]) which is able to perform the given operation extremely efficiently, and thus the architecture will be faster, and energy consumption will decrease.
However, in the case of low-power devices, especially emerging and non-Boolean devices, which exploit the analogue and nonlinear device characteristics for computation, they will not be completely ideal, since the nonlinear dynamics of the device implementing the convolution may also affect the operation itself. That being said, this nonlinear phenomenon is not necessarily a disadvantage, as the neural network requires some nonlinear operation, (Typically, nonlinear activations such as the rectified linear unit (ReLU) [
20] or scaled exponential linear unit (SeLU) [
21] provide these characteristics in the architecture.) and Wang et al. [
22] showed that nonlinearity can be included in the convolution.
The authors introduced the applications of kernel convolution (kervolution), which was used to approximate the complex behaviors of human perception systems. The kervolution generalizes convolution via kernel functions, and the authors demonstrated that kervolutional neural networks (KNNs) can achieve higher accuracy and faster convergence than the baseline convolutional neural networks. The authors’ work represents an important contribution to the field of CNNs, and the use of KNNs in real-world applications holds significant promise [
22].
Neural network models containing kervolution can be effectively used in cases of anomaly detection, time series classification [
23], and in authorship attribution [
24], among other applications. Furthermore, kervolution can be combined with left and right projection layers, thanks to which this model (ProKNN [
25]) can be even more effective in certain situations.
In spread-spectrum communications, the real-time surface acoustic wave (SAW) convolver devices have been known for a long time. These convolvers were also applied in programmable matched filtering to improve the signal-to-noise ratio, which was one of the first applications of surface acoustic wave devices and has important potential in many cases [
1].
For example, radar systems have been widely used this process, since they enable the range of the system to be enlarged for a given peak power limitation [
26].
Similar to SAW devices, it is conceivable to implement convolution which is performed using spin-wave magnetic devices, which may allow for much lower energy consumption and computation at higher frequencies [
27].
Spin-wave computing uses magnetic excitations for computations. Spin-wave majority gates are one of the most prominent device concepts in this field. Linear passive logic gates, which are based on spin-wave interference, are a type of technology that takes the most advantage of the wave computing paradigm and therefore holds the highest promise for future ultra low-power electronics [
3].
The spin-wave circuits can also be embedded in complementary metal–oxide–semiconductor (CMOS) circuits, and these complete functional hybrid systems may outperform conventional CMOS circuits since, among other things, they promise ultra low-power operation. Nowadays, the challenges of these spin-wave circuit systems are low-power signal restoration and efficient spin-wave transducers [
3].
Furthermore, several methods have been proposed and studied for the development of spin-wave multiplexers and demultiplexers to greatly increase the data transmission capacity and efficiency of spin-wave systems [
3].
Therefore, based on the factors described above, in this paper, we introduce a kervolutional neural network, where the kervolution is implemented by surface acoustic waves and the nonlinearities of the kervolutions are based on the characteristic function of magnetic devices.
2. Methods
In this section, we propose a special convolutional neural network architecture, which is inspired by physical ideas and does not contain additional classical nonlinear activation functions (such as ReLU or sigmoid functions), but the system maintains nonlinearity through the physical properties of the simulated device, and these characteristics will determine the attenuation and saturation of the convolutional or kervolutional layer.
During the implementation of our neural network, the primary consideration was to examine the physical effects that a device, specifically one developed to perform the operation of convolution, may have on an ideal, theoretical artificial neural network.
2.1. One-Dimensional Network
The real-time SAW convolver, which was the starting point in the implementation of our neural network architecture, can perform convolution only on one-dimensional inputs.
Thus, for hardware considerations, we made a one-dimensional convolutional neural network. During our simulations, both one- and two-dimensional datasets were investigated. We converted the 2D input data and the convolutional kernels to one-dimensional vectors and mapped them onto our simulated devices.
2.2. Convolution
One of the main parts of a CNN is the convolutional layer. The convolution of functions
f and
g in one dimension can be described as follows:
Since our input signal is finite, the value of the function
f is zero outside a certain interval (for example,
). This way, the value of the convolutional integral is also zero in this interval, and thus the formula can be rewritten as
This operation can be implemented by real-time SAW convolvers, such as a three-port elastic SAW convolver (
Figure 1) under nonlinear operation [
1].
The first port of such a device is the input signal port, the second port is the kernel port, and between these is the third one, which is the output or result port. The input and kernel signals can be invoked at the edges of the device by external excitation, and the magnetic or electrical changes can be read out from the result port.
Using the Euler formula, port 1 can be expressed at time
t along the
z reference axis as follows:
where
is the signal modulation envelope as a function of the SAW velocity, where
and
.
The output of port 2 can be similarly expressed as
where the sign of
z is negative, since the signal propagates in the opposite direction.
In this case, the following waveform can be read out from the output port over the length
L of the thin-film metal plate:
where
P is a constant that is dependent on the nonlinear interaction strength. We can use a change of variable
and reformulate this equation as the following:
where
S is the input signal,
R is the kernel signal,
M is a constant dependent on the strength of the nonlinear interaction,
v is the velocity of the waves (signals),
j is the complex unit, and
is the angular frequency of the signal [
1].
Equations (
1) and (
6) differ in only two factors: the nonlinear dampening (
) at the beginning of the formula and that the argument of the kernel (
R) has
instead of
t. The reason for this difference (time compression) is that the signals are traveling toward one another (their relative velocity is
), and thus the interaction is over in half the time [
1].
In the calculations, we studied a device that works similar to real-time SAW convolvers, but the wave exhibited strong damping. Therefore, the model is highly applicable to spin-wave-like convolvers, where damping is more significant [
3].
In our simulation, which can be considered a baseline, a square signal (
) and a triangular signal (
) travel opposite each other, and the waves propagate in a nonlinear manner. Square and triangular signals were selected as case studies, since they can be easily described mathematically and depict the effect of convolution fairly well. Reading the signal at the intersection of the waves yielded the convolution of the two input signals. (In fact, one of the input signals had to be inverted in time to obtain convolution; otherwise, we would find the cross-correlation of the signals.) The simulation is illustrated in
Figure 2. The signal was oscillatory, but if we took advantage of the fact that the frequency of the output signal would be twice the original frequency of the signals, we could filter the output signal, and would find the convolution result.
SAW-Based Kervolution
In the physical system, the input signals attenuate over time as they travel further and further in the space. Taking this phenomenon into account, we applied exponential attenuation to both the input signal and the kernel.
According to the properties of our physical system, we had to apply saturation after the element-wise multiplication. In fact, we used the following kernel convolution (
element of convolution) in our CNN architecture:
where
is the inner product of two vectors with a hyperbolic tangent (which means
, where
is the saturation of the system) and
is the following nonlinear mapping function:
where
i is the discrete time and
a is the attenuation parameter.
Figure 3 depicts the
function with a different
a parameter. (This attenuation formula can also be written in the following way:
, which means
with
).
3. Results
We started from a simple convolutional neural network, and our goal was to implement an architecture which included a special convolutional operation that could be accomplished by a physical device, which could effectively perform the convolution. Thus, we introduced physical characteristics into the system to demonstrate the effects of these features. Then, we examined how our architecture (depicted in
Figure 4) worked on several one- and two-dimensional datasets. For demonstration, we also implemented a CNN that was similar to our neural network but used one-dimensional convolution. We used
kernels and 3 layers (2 layers with 8 kernels and 1 layer with 16 kernels), and after every convolutional layer, we applied ReLU as a nonlinear activation function in the reference CNN model, as shown in
Figure 5. For the detailed parameter settings of both the network architectures and the training algorithms, please take a look at the source code of our neural network model, which can be found in the following GitHub repository:
https://github.com/andfulop/SpinWaveConvolver (accessed on 21 February 2023). The classification accuracy results of these CNNs on various datasets can be found in
Table 1.
As a more complex two-dimensional case study, we investigated the well-known MNIST dataset, which contains handwritten digits and has a training set of 60,000 examples and a test set of 10,000 examples. The size of the images was 28 × 28 pixels. Another two-dimensional dataset is Fashion-MNIST, which is an MNIST-like fashion product database with 10 classes that consists of 28 × 28-sized grayscale images, where the number of elements of the training set is 60,000 and the test set has 10,000 examples. The evolution of the classification accuracies on the test set of the MNIST dataset with a traditional convolutional network and a kervolutional network implemented by an SAW convolver can be seen in
Figure 6, and the confusion matrices of the trained architectures can be found in
Figure 7. The same results for Fashion-MNIST can be observed in
Figure 8 and
Figure 9, respectively.
We examined one-dimensional datasets as well. One of those is the Smartphone-Based Recognition of Human Activities and Postural Transitions Data Set Version 2.1 (HADB [
28]). This consists of a smartphone’s accelerometer and gyroscope signals during 12 different activities (standing, walking, walking downstairs and upstairs, laying, etc.) for 30 subjects. The training set contains more than 7700 samples, while the test set contains roughly 3100 samples. The test accuracies on this dataset during training are depicted in
Figure 10 and the confusion matrices of the trained architectures can be found in
Figure 11.
Another examined one-dimensional database is the Ozone Level Detection Data Set [
29]. We used the one-hour peak set from this dataset. The samples contained wind speed values at various moments and temperature values measured at different times as well. These samples can be categorized into two classes: the first one is the normal day class, and the second one is the ozone day class. The dataset has 2536 instances, and we selected the last 500 as an independent test set. The classification accuracy results of this dataset can be found in
Table 1, along with other accuracy results for the previously mentioned datasets. As can be seen from the results in this table, the same network provided different mean accuracies on different problems, ranging from 77 to 92% depending on the complexity of the exact task. One can observe an approximately 6% performance drop in almost all cases (except the OZONE dataset), and this drop was independent from the original accuracy of the reference network. This demonstrates that an energy-efficient SAW convolver could provide viable implementation in certain problems where this
accuracy drop is acceptable.
The earlier results demonstrate that one can substitute convolution with kervolution for a
drop in accuracy, which could enable the energy-efficient implementation of simple neural networks with SAW convolvers. Unfortunately, in an ideal neural network, signals propagate with infinite speed and without attenuation and noise. To demonstrate the practical usability of an SAW, we investigated how an SAW with different attenuation parameters would perform on the MNIST and HADB datasets. The test accuracies for both datasets can be seen in
Figure 12. As these plots demonstrate, if the attenuation parameter (
a) was larger than 9999, then the network reached a similar accuracy to that reported in
Table 1, and a decrease from 99,999 to 9999 did not have significant effect on the classification accuracy of the network. In the case of a further decrease, as in the case of
, the accuracy of our implementation dropped significantly. This can help in the physical design of the SAW convolver, and one can select materials and frequencies which ensure this small level of attenuation.