Literature Review
Upon review of the existing remote sensing image datasets (RSIDs), we found the datasets had been built for different targets or from different data sources. First, for land cover or scene classification, there are many RSIDs. The University of California, Merced Land Use Dataset (UCLU) [
13] is the first publicly available dataset for evaluating remote sensing image retrieval (RSIR) methods. This dataset contains 21 classes from 100 images with 256 × 256 pixels, and the images are cropped from large aerial images with a spatial resolution of approximately 0.3 m. The Wuhan University Remote Sensing dataset (WHU-RS19) [
14] contains 19 classes from 1005 images with a size of 600 × 600 pixels, and the images have a wide range of spatial resolutions, to a maximum of 0.5 m.
Wuhan University published a remote sensing image dataset named RSSCN7; the RSSCN7 dataset [
15] contains seven classes from 400 images, and each image has a size of 400 × 400 pixels. Another group from Wuhan University published a large-scale open dataset for scene classification in remote sensing images named RSD46-WHU that contains 117,000 images with 46 classes [
16]. The ground resolution of most classes is 0.5 m, and the others are approximately 2 m. Northwestern Polytechnical University published a benchmark dataset for remote sensing image scene classification, named NWPU-RESISC45 [
17]. This is constructed from a list with a selection of 45 representative classes from all the existing datasets available worldwide.
Each class contains 700 images of size 256 × 256 pixels, and the spatial resolution ranges from 0.2 to 30 m for images sourced from the selection derived from these existing public datasets. The Aerial Image Dataset (AID) [
18] is a large-scale dataset for the purpose of scene classification. Notably, as a large dataset, it contains more than 30 classes of buildings and residential and surface targets. The AID consists of 10,000 images of size 600 × 600 pixels. Each class in the dataset contains approximately 220 to 420 images with a spatial resolution that varies between 0.5 and 8 m. These image datasets are extracted from satellite images, such as Google Earth Imagery or the United States Geological Survey (USGS).
Another category of using remote sensing image datasets is for object detection. The Remote Sensing Object Detection Dataset (RSOD-Dataset) is an open dataset for object detection. The dataset includes aircraft, oil tanks, playgrounds, and overpasses, for a total of four kinds of objects [
19]. The spatial resolution is approximately 0.5 to 2 m. The count of the images is 2326. The High-Resolution Remote Sensing Detection (TGRS-HRRSD-Dataset) contains 55,740 object instances in 21,761 images with a spatial resolution from 0.15 to 1.2 m [
20]. The Institute of Electrical and Electronics Engineers Conference on Computer Vision and Pattern Recognition (IEEE CVPR) holds a challenge series on Object Detection in Aerial Images, which published the DOTA-v1.0 and DOTA-v1.5. DOTA-v1.5 contains 0.4 million annotated object instances within 16 categories, and the images are mainly collected from Google Earth, satellite JL-1, and satellite GF-2 of the China Centre for Resources Satellite Data and Application.
The third category of remote sensing image dataset application is for semantic classification. The Inria Aerial Image Labelling dataset [
21] covers dissimilar urban settlements, ranging from densely populated areas with a spatial resolution of 0.3 m. The ground-truth data describe two semantic classes: building and not building. The National Agriculture Imagery Program (NAIP) dataset (SAT-4 and SAT-6) [
22] samples image patches from a multitude of scenes (a total of 1500 image tiles) covering different landscapes, such as rural areas, urban areas, densely forested, mountainous terrain, small to large water bodies, agricultural areas, etc.
The fourth category of remote sensing image datasets is for remote sensing image retrieval (RSIR). The benchmark dataset for performance evaluation is named PatternNet [
11] and contains 38 classes where each class consists of 800 images with a size of 256 × 256 pixels. Open Images is a dataset of more than nine million images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives [
23]. The dataset is annotated with 59.9 Million image-level labels spanning 19,957 classes.
The above image datasets are mostly extracted from satellite images from Google Earth imagery [
11,
13,
14,
15,
16,
17,
18,
20], Tianditu [
16,
19], GF-1 of China, or the USGS [
21,
22].
The RSID also contains many UAV image datasets. The UZH-FPV Drone racing dataset consists of over 27 sequences [
24] with high-resolution camera images. The UAV Image Dataset (UAVid) contains eight categories of street scene context with 300 static images with a size of 4096 × 2160. This dataset is a target for semantic classification [
25]. The AISKYEYE team at the Lab of Machine Learning and Data Mining, Tianjin University, China, presented a large-scale benchmark with carefully annotated ground truth for various important computer vision tasks, named VisDrone, and collected 10,209 static images [
26].
The VisDrone dataset is captured by various drone-mounted cameras, covering a wide range of aspects, including location. Peking University collected a drone image dataset in Huludao city and Cangzhou city named the Urban Drone Dataset (UDD) [
27]. The newly released UDD-6 contains six categories for semantic classification. Graz University of Technology released the Semantic Drone Dataset that focuses on the semantic understanding of urban scenes acquired at an altitude of 5 to 30 m above ground [
28]. The Drone Tracking Benchmark (DTB70) is a unified tracking benchmark on the drone platform [
29]. The King Abdullah University of Science and Technology (KAUST) released a benchmark for a UAV Tracking Dataset (UAV123) [
30], which supports applications of object tracking from UAVs.
However, public special image datasets for farm crops are insufficient. Middle-season corn high-resolution satellite imagery data have been collected at mid-growing season for the identification of within-field variability and to forecast the corn yield at different sites within a field [
3]. For weed detection in line crops [
6] and evaluation of late blight severity in potato crops [
31], tests have been conducted with UAV images and deep learning methods for classification. However, there is currently no article that concentrates on middle-season rice.
As we can ascertain, in the existing remote sensing image datasets or the public image datasets, such as ImageNet, the samples of the images in these datasets always refer to a single object or objects with clear patterns. These image datasets support the ability to identify an object in a category but cannot distinguish the fine classification. Therefore, such datasets are not suitable for the applications of classification of one kind of feature varying in different places and time series.
In this paper, we proposed a UAV image dataset for middle-season rice. We hoped the image dataset can support the monitoring of the growth of middle-season rice. After that, by applying the deep learning methods with this image dataset, we can improve the accuracy and the efficiency of the State Statistics Bureau in agricultural investigation. We scheduled five villages of two cities in Hubei province for source acquisition from 2017 to 2019. The samples of the images have a size of 128 × 128 and 256 × 256, the spatial resolution is 0.2 m, and the total number of the samples is over 500,000 images. This image dataset represents a large area where middle-season rice is grown.
The contributions of this paper are as follows.
First, we set up a high-resolution image dataset for middle-season rice using weekly collected images from drones during the growing period for three years in five villages of two cities. Those images reveal the yearly growth for middle-season rice in the fields of plains along the Yangtze River. This accumulative image dataset will help to monitor the growth of the crops.
Second, we applied the vector information of the fields to tag the samples with the spatial and temporal information automatically. Therefore, we obtained thousands of samples for the middle-season rice in different periods and places. This automatic tagged method can extend the building method for the remote sensing image dataset.
Last, we use a fine classification method to learn the spatial-temporal information of the middle-season rice. The image dataset can support different deep learning networks and achieve a good result. This strategy of conversion for fine classification of the image dataset will extend the applications for the original deep learning algorithms.