1. Introduction
A gravitational wave (GW) signal was first observed by LIGO, which successfully confirmed the prediction of Einstein’s general relativity (GR) [
1]. The signal sweeps upwards in frequency, from 35 to 250 Hz, with a peak gravitational wave strain of
[
2]. However, gravitational waves have important astronomical sources in the millihertz (mHz) range (i.e., 0.1–100 mHz) [
3]. In order to detect the important gravitational waves at low frequencies, it is necessary to go into space. The proposed space-borne detection plans, LISA [
4], TianQin [
3], and Taiji [
5], use laser interferometric systems (see
Figure 1) constructed using three identical satellites, each carrying two telescopes for laser pointing. Nevertheless, the interferometric measurements for the experiment are only possible once the three laser links between the three identical spacecrafts are acquired. The attitude of the spacecraft is measured by the star sensor (SSR) and provides an initial rough range for the laser link scanning; in other words, the attitude measurement capacity of the SSR can influence the efficiency of the link docking. Hence, the method used to elevate the measurement capacity of the SSR is important for laser acquisition. This paper aims to improve the generation method of the star catalogue, in order to improve the measurement capacity of the SSR.
In astronomy, star data in the celestial sphere are compiled into a catalogue, according to different requirements. The catalogue is called a star catalogue (or star catalog). In celestial navigation, a GSC (guide star catalogue) is the unique basis for a star sensor (SSR), which facilitates star identification [
6]. An SSR is key in celestial navigation. Compared with sun sensors, earth sensors, magnetometers, horizon sensors, and other common attitude measurement sensors, SSRs have a higher attitude measurement accuracy, can realise navigation independently, and have a strong anti-interference ability. At present, they are the most important attitude sensors used in satellites and other spacecraft. The ability of an SSR to determine the attitude depends on the existence of an excellent navigation star catalogue. Generally, a preferable star catalogue for navigation is characterised by a lower number of guide stars, a higher completeness, and a much more uniform distribution of the celestial sphere [
7].
The most frequently used strategy for selecting guide stars for an SSR is the magnitude filtering method (MFM). Considering the starlight sensitivity of the SSR, a fixed minimum visual magnitude threshold (VMT) is set, and a maximum, “VMT_1”. Stars that are brighter than or equal to the VMT and dimmer than or equal to VMT_1 in the catalogue are chosen as guide stars [
8]. The correct identification of a star atlas in a local celestial area (<100 square degrees) requires at least three guide stars among the observation stars, and at least five guide stars for the entire celestial star identification process [
9]. After processing by MFM, the stars are still unevenly distributed in the celestial sphere, and according to statistics, the maximum is about 10 times larger than the minimum. If we set the value of VMT_1 too small (in other words, the maximum value of the visual magnitude of the star is too small), the GSC will be incomplete, leading to a lack of stars in some directions (called holes). This will lead to a reduction in the success rate of star identification and measurement misalignment [
10,
11]. However, if the value of VMT_1 is too large, the star number also increases, leading to redundancy in the GSC. As a result, the efficiency of star identification may be reduced. Completeness, uniformity, and redundancy are the three most important characteristics of a GSC. It is necessary to probe the relationships among them when selecting a GSC.
There are many studies that have considered the uniformity, completeness, and redundancy of star catalogues. Farshad et al. [
12] aimed to ensure the uniformity of the database for an SSR, and a weighted K-means clustering method (GWKM) was proposed, on the basis of SKM [
13] and GKM [
14]. The results of this research showed that the number of stars was reduced while promoting uniformity, improving the refresh rate of star identification and the efficiency of SSRs.
Ivan et al. [
15] pointed out that the efficiency of star identification depends on the quality of the GSC. They decreased the number of stars (NOS) in dense areas, while the sparse areas had no change. The average NOS in the FOV was, thus, decreased. Finally, the distribution of the stars became more symmetrical and bell-shaped. Additionally, instrumental stellar magnitude estimation and a lower bound evaluation method were also discussed. The onboard star catalogue was sufficient for middle-precision SSRs; however, it was not satisfactory for high-accuracy SSRs.
When the celestial sphere is divided geometrically, the most representative method is the inscribed cube algorithm [
16]. An inscribed cube is used to evenly divide the celestial sphere into six parts, where each of the six parts is then divided into
smaller parts. Finally, an evenly and non-overlapping partition of the guide star selection is realised. However, Li et al. [
7] pointed out that the solid angles are different in size for each sub-block, especially when the optical axis of the SSR is around the celestial pole. They proposed a novel method to generate a “quasi-uniform” star catalogue, in order to solve the differences in solid angles. There were 2664 sub-blocks that made up the whole celestial sphere in total, where the reference was the solid angle corresponding to
. According to the statistical results, the probability of at least three guide stars being in the FOV was more than 99.9%.
According to Zhang et al. [
17], except for neural networks, most methods for recognising a star atlas need a star catalogue. At present, MFM is the most popular method used to generate star catalogues. However, there are holes and redundancy problems when using the MFM. Zhang proposed a Support Vector Machine (SVM) to generate a star catalogue. Their results showed that the SVM has high flexibility. This is a successful case of applying a classification algorithm for the generation of a GSC. Sun et al. [
18] combined NOS with Boltzmann entropy [
19] within a circular region as a feature vector and then used an SVM to select stars to generate an even GSC. Through an experimental comparison, the SVM algorithm was shown to be much better than the MFM, in terms of the distribution uniformity. On the basis of Sun’s method, Liu et al. [
20] used a sphere spiral algorithm to create uniform sampling data and then used the SVM to select the guide star. Finally, defined global and local criteria were used to assess the performance of the created GSC. By comparing a self-organising algorithm (S-OA), the MFM, and a magnitude weighted method (MWM), they found that the proposed method was optimal and had strong adaptability while preserving uniformity.
In summary, the MFM can be used to obtain a streamlined GSC. However, there are still holes and redundancy problems. To solve this problem, many methods have been devised. The SVM and other algorithms, as detailed above, have been introduced to address the uniformity and redundancy of the GSC. However, the redundancy and completeness are still difficult to make compatible. Depending on the nature of the mission, one of them generally must be selectively sacrificed. At present, detection missions are more complex. SSRs have also made comprehensive progress. More attention has been paid to the GSCs for multi-field SSRs and high dynamic SSRs, and the generation of an appropriate GSC is still very important. Most of the work has focused on the completeness, uniformity, and redundancy of the GSC. However, the attitude determination accuracy of SSRs has become increasingly important in space missions, while accuracy criteria have not been considered in most GSC generation methods. The SVM classification algorithm has a good effect on the generation of GSCs. In the machine learning field, there exist many classification algorithms that can achieve the same or an even better classification effects than the SVM. However, few works have discussed the advantages, disadvantages, and adaptability of these algorithms for the generation of GSCs.
This paper is devoted to creating a uniform and complete star catalogue by taking advantage of the K-Nearest Neighbours classification algorithm. The NOS in the FOV of an SSR is considered, in order to ensure sufficient accuracy for an SSR. It is chosen as an important star selection criterion. Firstly, we used datasets, after the MFM and double star processing, as the celestial star data. Subsequently, a sequential scanning of the entire celestial sphere was carried out, and training and validation datasets were obtained. Then, we use thed samples to identify the performances of the classification algorithms. We found that the K-Nearest Neighbours (KNN—82.0%) and Decision Tree (DC—82.0%) performed the best. Finally, we used SGD (Stochastic Gradient Descent), KNN, DC, NN (Neural Network), RF (Random Forest), SVM, and GB (Gradient Boosting) to generate star catalogues, and a test considering 10,000 random boresight directions of the entire celestial sphere was conducted. On the basis of statistical analysis, the KNN classification algorithm was found to be far superior to the MFM and SVM—which are normally used to generate GSCs—in terms of completeness, uniformity, and redundancy. Furthermore, the star catalogue can guarantee an SSR accuracy within 1 arc sec.
2. Star Numbers and SSR Accuracy
The method of establishing a correspondence between measured stars in the FOV and an onboard GSC is called star pattern recognition. According to Liebe [
10,
11,
21], the NOS in the FOV of an SSR is connected with the accuracy of star identification. For an SSR, in order to explore the relationship between the NOS in FOV and accuracy, the first thing to consider is the average NOS in the FOV. It is considered that the GSC is sufficiently evenly spread over the entire celestial sphere. The relationship between them is as follow:
where
A represents the circular FOV which is
A deg wide. The corresponding part of the sky covered by the FOV is
, which is used as one part of Equation (
1).
M is the magnitude sensitivity limit; this is the NOS that are brighter than a set magnitude. The value of
M is evaluated by sketching the correlation between the magnitude and the NOS which are brighter than the set magnitude. According to [
11],
, which is another part of Equation (
1). Finally, the average NOS in the FOV is given by Equation (
1).
The relationships between the average NOS in FOV, the FOV size, and the visual magnitude are shown in
Figure 2. As we can see, the average NOS in the FOV increases with the increase of the size of FOV and the value of
VMT, within a certain range. The lower figure in
Figure 2 shows that, when
VMT = 5 and
or
and
, about three stars and eight stars, respectively, appear on average in the FOV. The attitude accuracy is far less than 1 arc sec. To satisfy the accuracy requirements, and referring to
Figure 2, there must be at least 18 stars in the FOV. Thus, an FOV of
and a VMT of 8 were chosen as the parameters of the SSR.
The high frequency error NEA (noise equivalent angle) is a random measurement error. It can be regarded as the capacity of an SSR to reappear at the identical attitude [
11]. NEA errors are caused by device noise, stray light interference (e.g., from the sun, earth, and bright stars), timing jitter, model error (stellar spectral distribution), non-uniformity of tracking magnitude, and so on. In this paper, we guarantee the NEA to meet the theoretical attitude accuracy of the SSR. The NEA can represent the accuracy of the SSR, and a few parameters are used to estimate the NEA. The cross-boresight (not the optical axis) NEA [
11] is as follow:
where
is the FOV of this SSR (generally
to
;
is considered in this paper) and
is the average centroiding accuracy. At present, the star image points on the sensitive surface of the image sensor are defocused, in order to spread to more pixels to obtain higher star centroid localisation accuracy. The diameter of the diffusion circle is generally 3 to 5 pixels, typically ranging from 0.01 to 0.5. We assume the
is 0.1 pixel.
represents the number of pixels across the focal plane, typically ranging from 256 to 1024 (here, it is taken as 1024).
is the NOS in this FOV, as
Figure 2 shows. In this paper,
is a vital parameter to create a star catalogue, as we are committed to creating a star catalogue that can provide high-accuracy attitude information.
The boresight (optical axis) NEA of an SSR can be estimated as follows: First, suppose that the quadratic focal plane of this SSR is
pixels. The average distance from the star mapped to the focal plane to the center of it is calculated as:
As the optical axis is the roll axis, there is an uncertain spatial range on the roll axis. We set it as
. The accuracy of a single star is derived from its geometric relationship with
:
Statistically, the measurement of the centroid positions of multiple stars in a frame is independent and uncorrelated, and thus, this measurement is used to improve the entire attitude accuracy. Hence, the parameter
is introduced. Finally, we obtain the roll (optical axis) accuracy for the entire attitude estimate as follows:
Figure 3 shows the relationship between the accuracy of three axes and the NOS in the FOV. As the focal length of the SSR is much larger than that of the image sensor in the optical system, the attitude accuracy error of the SSR in the direction of the optical axis (usually the “roll” axis) is about 6–16 times larger than that in the two directions on the focal plane (“yaw” and “pitch”), as shown in
Figure 3.
3. Data and Pre-Processing
Full-sky star catalogues commonly used for satellite attitude determination are listed in
Table 1. There are many stars in the catalogue, but not all of them will be used in celestial navigation. Usually, it is necessary to refer to the nature of the specific tasks, in order to obtain the stars satisfying the mission from the basic catalogue, then to construct the on-board navigation catalogue.
The Smithsonian Astrophysical Observatory Star Catalogue was compiled, in 1966, from the Smithsonian Observatory and other catalogues. In 1979, the radian information of equatorial co-ordinates and the cross-validation information of SAO/HD/DM/GC were added. These data have been updated, according to the latest observation information in 1984, based on J1950.0. The recent catalogue was released in 1990, and was based on J2000.0. The star table contains 258,997 celestial bodies with very accurate positions and motions [
22]. For satellite attitude calculation by an SSR, the position accuracy factor is far greater than the magnitude accuracy [
21]. The position accuracy of the SAO Catalogue can reach
, and the magnitude accuracy is
, so this meets the requirements of attitude calculation well. The catalogue also covers the whole sky, which means that it has a high completeness. For these reasons, we chose it as our basic star catalogue.
The data of the SAO includes 258,997 rows of star information, where each row includes 204 bytes, representing 57 types of information about the stars. What we are most concerned with is the visual magnitude, the celestial right ascension of J2000.0, and the celestial declination of J2000.0. Due to the fact that stars are far away from us, the field angles of stars are much less than 1 arc second, when observed from earth. Stars with a very close sight (called double stars, not binary stars, in astronomy) cannot be distinguished from each other on the imaging plane, and will interfere with the star identification process. Therefore, the general star identification algorithm is not appropriate for double stars [
23]. Double stars are deleted directly, in general. However, when the NOS in the FOV is very small, each one is vital for the identification. In view of this, we combined such stars into one, in order to ensure the completeness of the GSC.
Suppose
and
are the double star’s magnitudes, and the right ascension and declination are
,
, and
,
, respectively. The direction vectors
and
of the two stars are calculated from their right ascension and declination, and the visual magnitude of the combined star is
. The combined brightness is
F and the direction vector is
. The optical flux density can represent the brightness of stars, and the luminance ratio of the double star is expressed by Equation (
6). Then, we combine Equations (
6) and (
7) and obtain Equation (
8). Finally, the brightness of composed star is expressed by Equation (
9).
Assume that the angular distances among this combined star with the double star are
and
. The angular distance between these two double stars is
. Then, we obtain Equation (
10). As
is determined through the calculation of their right ascension and declination, we can obtain the values of
and
. The direction vector,
, of the composed star is obtained by Equation (
11).
According to Zhang [
23], in the process of star centroid extraction, the threshold of binarisation is
T and the double star must satisfy the minimal distance
d:
where
,
, and
B represents the brightness of a star. For an SSR with a
FOV and
pixels,
d is determined as 4 pixels, which is about
. The basic star catalogue, SAO, was reduced to a new star catalogue called “MFM” by setting
and processing the double stars.
Figure 4 and
Figure 5 show the distribution and density of the SAO and MFM star catalogues, respectively.
Figure 4a and
Figure 5a exhibit the distribution density of the GSC in the celestial sphere, in which the colour in the colour bar represents the magnitude of the density, and the colour above the colour bar represents the NOS in this region. It is shown that the SAO catalogue’s distribution is more dense than that of the MFM catalogue, and the colour in
Figure 4a is darker (greener). Based on
Figure 4a and
Figure 5a, we can make an intuitive conclusion that the MFM catalogue has advantages, in terms of data capacity; namely, redundancy. Both SAO and MFM have high completeness.
Figure 4b,c and
Figure 5b,c show the statistics of the NOS distributed over the entire celestial sphere. In the direction of right ascension and declination, they are equally divided into 50 parts, and the corresponding NOS in each range is counted. In these figures, on the right ascension, stars are most concentrated at about
and
; on the declination, stars are most concentrated at about
. The distribution of stars in the GSC obtained by the MFM is similar to that of the GSC SAO, but the NOS is greatly reduced.
Figure 4d and
Figure 5d show that the nuclear density line of the GSC stars in the celestial sphere. The nuclear density curve more clearly shows the distribution of stars in the sky. The right colour scale shows the size of the nuclear density. The larger the scale is, the higher the nuclear density is. That is to say, stars are densely distributed in this part of the celestial sphere. Comparing the numerical values and the nuclear density curves, the distribution of SAO is denser and the MFM is relatively more uniform.
6. Discussion
In this paper, we considered the generation of a GSC as a classification problem and used a variety of machine learning classification algorithms to generate GSCs. Overall, the KNN was found to be the best classifier. All of the methods generated a GSC using the same steps, the only difference being the classifier. The KNN algorithm is simple and easy to implement; there is no need to estimate parameters, and it is suitable for rare event classification; however, it displays poor interpretability (a common problem of machine learning algorithms) and is highly dependent on data—wrong data may directly lead to inaccurate predictions.
The reasons why the KNN was found to be the best method to generate a GSC herein are as follows: (1) The sample set was classified with more crossed or overlapping class domains, as the KNN method mainly depends on a limited number of neighbouring samples, rather than distinguishing class domains. Most of the star feature samples are overlapped or crossed, which is suitable for a KNN. (2) Compared with big data, the samples used to generate a GSC are relatively few, and the KNN algorithm is suitable for small samples.
For different classification problems, machine learning classification algorithms have different performance; further, KNN algorithms have the disadvantage of unexplainability. These two points are recognised in the field of machine learning and need further research. However, the KNN algorithm is exactly suitable for the guide star generation problem studied in this paper.
The uniformity of the KNN GSC needs to be further improved. Somayehee et al. [
12] created a uniform GSC using a novel method of weighted k-means clustering (a machine learning algorithm) with geodesic criteria. However, the main work of this paper was obtaining a machine learning classification that can satisfy high-precision attitude measurement accuracy, for which we identified the KNN algorithm, based on uniformity, completeness, and redundancy standards, which were based on a more comprehensive analysis. Further improvement in GSC uniformity will be pursued in future research.
7. Conclusions
Differing from most other works, we focused on generating a uniform and complete GSC, while ensuring that it can provide an accuracy of at least 1 arc second for an SSR. First, we used the MFM method to create a basic GSC. Then, we constructed the training and validation datasets (with a ratio of 3:1) and compared the accuracy of 7 different machine learning classification algorithms, in order to determine the best method. KNN and DC were the two best, both of which had an accuracy of . After comparison, KNN was determined as the best algorithm for generating a GSC. Three criteria—accuracy, uniformity, and completeness—were considered. Accuracy was the basic criterion. After 10,000 Monte Carlo random experiments, a smaller capacity was obtained and a more uniform GSC was generated by the KNN algorithm. The redundancy of the SSR was also reduced. This method was shown to be much better than the commonly used SVM classification methods. KNN algorithm is introduced for the first time in the construction of a navigation star catalogue. Moreover, the mean NOS in the FOV was and the minimum NOS was 15. Both achieved an attitude accuracy of 1 arc second. There were 16,689 stars in the catalogue, providing slight redundancy for star identification, and the proposed method can provide an accuracy of at least 1 arc second for an SSR, while also providing robust uniformity and completeness.
In summary, GSC generation is vital for star identification, and is also vital for attitude determination. Meanwhile, machine learning methods have incomparable advantages for data processing. The KNN classification algorithm was shown to be better than the SVM in generating a GSC. Furthermore, we added a new accuracy criterion to the generation standard of the GSC. This method is suitable for high-accuracy attitude determination.