1. Introduction
Microalgae are widely distributed in the ocean, which greatly affects the ocean environment [
1]. Monitoring the categories and growing states of microalgae is important, as it can help explain and forecast the change of marine ecological environment and reduce loss from toxic blooms [
2,
3]. The common methods to observe and identify microalgae are based on optical microscopy, which is labor-intensive and needs specialized knowledge [
4].
PCR is a vital tool to identify the categories of microalgae, but the preparation of the sample is time-consuming [
5]. A spectrophotometer can measure the absorption spectrum of microalgae and they can be used to analyze the density and biomass of microalgae [
6]. In addition, the signal of acoustic backscattering shows a good correlation with the abundance of the microalgae under certain concentration range [
7]. However, these methods are based on the analysis of bulk volume, which limits their application in further detailed classification. Recently, there have been some tools developed to assist automatic phytoplankton taxonomy. Li et al. [
8] introduced an imaging system to monitor marine organisms with sizes ranging from 200
to 40 mm. Göröcs et al. [
9] reconstructed the holographic diffraction images to analyze natural water samples. However, these imaging methods are limited by the speed, resolution, and visual field, and meet their bottleneck when facing with micron-sized algae.
The scattering measurement has the advantage of characterizing the physical microstructure of different microparticles. Katherine et al. [
10] classified different suspensions by scattering intensities features at multiple angles. Ye et al. [
11] measured overall microparticle size using the scattering spectrum. Polarization is an inherent property of light [
12]. Polarized light scattering, as an emerging tool, has been applied to characterize different states of microalgae [
13,
14], cancerous tissues [
15], atmospheric microparticles [
16], and microplastics [
17].
Chami et al. [
18] demonstrated the potential of using the polarized signal to analyze biogenic and highly refractive particles in coastal waters. Koestner et al. [
19] used polarized light scattering measurements to characterize particle size and composition of natural assemblages of marine particles. Based on the pre-trained model, the states and categories of particles can be recognized from the mixture. Chen et al. [
20] quantitatively studied the flocculation process with polarized light scattering. Wang et al. [
21] applied polarization parameters to recognize different states of
Microcystis aeruginosa, and gave an early warning strategy. In short, these works show the significance and value of a polarized light scattering dataset. However, there is still no such dataset of the diverse microalgae, which limits the optical polarization tools’ applications in monitoring the microalgae in water.
In this work, a dataset by polarized light scattering measurement is presented, including the information of polarization parameters of 35 categories of marine microalgae. For each category, 10 states of polarization (SOP) of incident light are applied to respectively illuminate the samples, and for each SOP, there are more than 1000 records of the particles. To analyze the dataset, several machine learning algorithms are applied and compared to build the classifier which is used to identify different categories. This work compares linear discrimination analysis (LDA) and different types of support vector machine (SVM). Results showcase that non-linear SVM performs the best among these algorithms. Then, two data preparation approaches for non-linear SVM are compared. Subsequently, we show that more than 10 categories of microalgae out of the dataset can be identified with an accuracy greater than 0.80. With this proposed technique and the dataset, these microalgae can be well differentiated by polarized light scattering.
4. Discussion
The four polarization parameters of each record are [
I, q, u, v], and they are derived from the Stokes vectors of scattered light, which are basically related to the incident SOP. To find the best incident SOP and reduce the detection complexity, the performance of different incident SOP and different combinations of features are discussed and the averaged classification accuracy of all categories in
Table 1 based on non-linear SVM and OVO is used to evaluate the performances.
The results shown in
Figure 3 collect the accuracy among all the incident SOP and different combinations of polarization parameters. When the input was [
I, q, u, v], all the SOPs achieved an accuracy of about 0.85, while the 150° linear SOP was slightly better than other SOPs, and the 120° linear SOP was the worst. Compared with the classification accuracy derived from
I, it is notable that the original classification accuracy was greatly improved with the addition of the polarization features.
However, the relative contribution of the polarization parameters u and was not same. For the SOP of E, L, and R, the addition of the parameter v resulted in a higher accuracy than u. However, for the other linear SOPs, the parameter u brought more effective information than the parameter v.
The Mueller matrix is a 4
4 matrix,
, describes the polarization property of the particle, and combines the incident SOP,
, and the scattered SOP,
, that is,
. The 16 elements of
contain different aspects of physical information of particles. Usually, we have to change
4 times and measure the respective
to calculate
We can always simultaneously get the Stoke vector in a single shot but it is hard to get
in that way, especially for the suspended particles. Therefore, in the dataset, we can only provide the
of the individual microalgae in a given
, as the data used in the classifications in
Figure 2, and at most we can obtain the statistical
but not the individual
of the microalgal cells [
26].
Considering that different will bring different information regarding the elements of the , it is important to discuss the contribution of the elements based on the dataset. Moreover, the quantitative contributions of the elements will guide us to improve the detecting speed of the system by reducing the number of modulated , and find the optimal SOP to better characterize and classify these categories of microalgae, which is important for fast field probing applications.
The 150° linear SOP and right-handed circular SOP are further discussed with the Mueller matrix theorem. For simplicity, we discuss the spherical particles. Theoretically, for spherical particles, the top right and bottom left elements of
are zero [
27]. When
is
,
can be calculated by Equation (5),
where
I is the scattered intensity and
q,
u,
v are the normalized polarization parameters of
.
Usually, the standard SOP of incident light can be represented as for 150° and for R. Then, the Stokes vector of the scattered light is calculated as []T for 150° linear SOP and the Stokes vector of the scattered light is []T when the incident SOP is R.
The classification performance of the incident SOP of 150° is obviously better than the performance of the incident SOP of R. Compared with these two derived Stokes vectors of scattered light, the Stokes vector of 150° linear SOP has the distinctive information of the elements and , and the Stokes vector of R has the distinctive information of . Since the incident SOP of 150° has a better performance of the incident SOP of R, it seems that and may contain more important information than for the classification task.
However, for the contribution of the polarization parameters u and v, the parameter contributes less information compared with the parameter for linear SOP, while the parameter contributes less information compared with the parameter for circular SOP. Thus, it seems that is less useful compared with or .
The result in
Figure 2 shows that non-linear SVM displays the best performance to classify these categories of microalgae based on Stokes vectors. However, this result does not explicitly claim the contributions of the specific polarization parameters, since it learns the polarization parameters of Stokes vectors in a hyper feature space. To verify the above analysis with Mueller matrix, we select two categories of microalgae to quantitatively evaluate the contribution of the polarization parameter
v and the symmetry of the polarization states. Both the
Isochrysis and
Chattonella marina are easily distinguishable from other categories of microalgae. Their biological features are investigated in a previous book [
28,
29]. The result is shown in
Table 5; note that the relative difference is the difference between these two cases over the retrieved accuracy with [
I, q, u]. Moreover, the precision information of these two categories of microalgae can be referred to in
Table 6 and
Table 7, which shows the classification performance of the trained model on these two categories. The two circular SOPs of the incident light, R and L, have theoretical Stokes vectors of
and
. The classification performances with [
I, q, u, v] and [
I, q, u] were compared and the result is shown in
Table 5, which indicates that the SOP of R and L can retrieve a close classification accuracy and the contributions of the feature,
v or
, are about 0.12 in both cases. This analysis indicates that the SOP of incident light L and R has a similar performance in the classification task of the dataset, and this result corresponds to the theoretical explanation with the Mueller matrix.
With the analysis above, the information of is suggested to be contained during the polarized light scattering measurement and the circular SOP of incident light is suggested to be included in the further measurement, as an addition to the measurement with linear SOP of the incident light, such as 150° linear SOP. There may only be one incident SOP allowed when the conceptual prototype is deployed to classify the suspended microalgae in the aquatic field, so an optimal incident SOP should be necessarily designed based on these considerations.
Note that most microalgal cells are not uniform spheres in the morphology and structure; previous literature demonstrates that the top right and bottom left elements of their Mueller matrices are approximately zero [
30]. Therefore, the discussion based on Equation (5) is still valid. During the experiments in this work, we measured 10 polarization states, which is time-consuming. The discussion gives clues to reducing the modulated polarization states; 150° linear SOP and the circular SOP are suggested to be included in future applications.
The results in this work indicate that, due to the diversity and complexity, polarization data equipped by the machine learning algorithm are a feasible way to effectively classify marine microalgae, and more comprehensive methods and an abundant number of data would achieve a better classification performance. The fast-developing machine learning method will provide tools for us [
31], and the Mueller matrix polarimetry of the individual microalgae may be expected in near future, and they both promote classification ability. Moreover, we notice that the optical microscopy is still the standard method to identify the species of the microalgae [
4] and the morphological information and the internal structure play vital roles in the classification of diverse microalgae to which polarization parameters are sensitive [
32,
33]. In addition, an in situ prototype based on polarized light scattering was easily built and demonstrated to be powerful in classifying particles in seawater [
34]. As such, the combination of the microscope and polarized light scattering is a promising tendency and may provide a way to accurately and rapidly classify the microalgae in water.