Spatial and spectral remote sensing features to detect deforestation in Brazilian Savannas

The Brazilian Savannas have been under increasing anthropic pressure for many years, and land-use/land-cover changes (LULCC) have been largely neglected. Remote sensing provides useful tools to detect changes, but previous studies have not attempted to separate the effects of phenology from deforestation, clearing or fires to improve the accuracy of change detection without a dense time series. The scientific questions addressed in this study were: how well can we differentiate seasonal changes from deforestation processes combining the spatial and spectral information of bi-temporal (normalized difference vegetation index) NDVI images? Which feature best contribute to increase the separability on classification assessment? The study area is inserted in the northern of Minas Gerais State (MG), Brazil. We used Landsat images from 2015 and 2016 to apply an object-based remote sensing method that is able to separate seasonal changes due to phenology effects from LULCC by combining spectral and the spatial context using traditional spectral features and semivariogram indices, exploring the full capability of NDVI image difference to train random forest (RF) algorithm. We found that the spatial variability of NDVI values is not affect by vegetation seasonality and, therefore, the combination of spectral features and semivariogram indices provided high global accuracy (97.73%) to separate seasonal changes and deforestation or fires. From the total of 13 features, 6 provided the best combination to increase the separability on classification assessment (4 spatial and 2 spectral features). How to accurately extract LULCC while disregarding the ones caused by phenological differences in Brazilian seasonal biomes undergoing rapid land-cover changes can be achieved by adding semivariogram indices in combination with spectral features as input data to train RF algorithm.


Introduction
The Brazilian Savanna biome is among the most endangered ecoregions on Earth due to high rates of deforestation and few formally protected areas (Hoekstra et al. 2005), consisting of a mosaic of land cover types, undergoing a strong seasonality in climate. In these regions, a significant challenge in remote-sensing change detection is accurately extracting land-use and land-cover changes (LULCC) while disregarding those associated with phenological differences (Acerbi Junior et al. 2015;Silveira et al. 2018a;Silveira et al. 2018b). The effects of vegetation phenology lead to seasonal spectral changes (Schwieder et al. 2016) that are erroneously classified as having changed.
In a recent study, Verbesselt et al. (2010) applied a generic approach to detect LULCC, using breaks for additive seasonal and trend (BFAST). The methodology separated different disturbances such as deforestation, urbanization, flooding and fires from seasonal changes, using a dense time series of NDVI. Hamunyela et al. (2016) used the spatial context to improve early detection of deforestation from NDVI Landsat time series. The spatial context approach reduced the seasonality effects on LULCC detection in highly seasonal areas. However, they found challenges in the implementation of the method, that is a pixel-based approach, that is sensitive to registration errors and computational expensive (Zhu 2017).
The change detection using the object-based approach instead of the pixel one, allows the incorporation of spatial information, such as indices derived from semivariograms (a geostatistical tool) to both analyse spatial heterogeneity (Silveira et al. 2017a) and improve the LULCC without the need of using a dense time series (Silveira et al. 2018a). Balaguer et al. (2010) created a set of indices extracted from the semivariogram using high spatial resolution images. An advantage of the proposed set of features, as opposed to the raw semivariance values, is that they synthesize the most relevant information about the shape of the semivariogram in a few features. They identify the singular points and enhance the information contained on the first lags, where spatial correlation at short distances is higher. Although semivariograms have been applied to detect LULCC (Sertel et al .2007;Costantini et al. 2012;Acerbi Júnior et al. 2015), few studies have focused on change detection using an objectbased analysis, combining both the spectral and spatial information to improve deforestation detection in areas affected by vegetation seasonality.
Thus, the main objective of this study was to explore both the semivariogram indices (spatial features) and traditional spectral features derived from bi-temporal NDVI images to accurately detect deforestation in areas with strong influence of vegetation phenology, such as Brazilian savannas. The scientific questions addressed in this study were: how well can we differentiate seasonal changes from deforestation processes combining the spatial and spectral information of bi-temporal NDVI images? Which feature best contribute to increase the separability on classification assessment?
We used the group of spatial and spectral features derived from a pair of NDVI Landsat images to train random forest (RF) algorithm. The method does not require a time series of satellite images because it exploits the spatial and spectral domains of NDVI, reducing processing time and avoiding cloud coverage problems.

Material and Methods
This study exploits the potential of semivariogram indices and traditional spectral features extracted from NDVI Landsat bi-temporal images to detect deforestation in the Brazilian savanna biome. The method is summarized in six steps (Figure 1), which we described in detail in the following sections. (1) Images acquisition and processing; (2) Image segmentation by a multiresolution algorithm (3) Class definition; (4) Feature extraction from objects; (5) Change detection using Random Forest algorithm; (6) Evaluation by the confusion matrix.

S tudy area and remote sensing data
The study area is inserted in the northern of M inas Gerais State (M G), Brazil, between the coordinates 14º 00' to 16º 30'south latitude and 43º 00' to 46º 00' west longitude ( Figure  2a) and comprises the Rio Pandeiros Water Resources Planning and M anagement Unit (UPGRH-SF9). The predominant vegetation types are cerrado grassland (open grassland), cerrado sensu stricto (open grassland with sparse shrubs), deciduous forest (predominance of deciduous individuals whose loss of foliage reaches more than 50%), and palm swamp (veredas), a riparian type of vegetation (Silveira et al. 2017b). The climate is Aw, with distinct dry winter and wet summer. Approximately 90% of the rains are concentrated from October to April with annual precipitation ranging from 1,200 to 1,800 (Peel et al. 2006). In order to perform all the analysis, we selected 6 samples of 10 km² distributed in the UPGRH-SF9 (Figure 2b) based on the presence of LULCC and natural phenological changes.
Operational Land Imager (OLI)/Landsat-8 images from July 21, 2015 and July 23, 2016orbit 219, point 70 -have been downloaded from the US government institute that supports research involving geological surveys and Earth observation, the United States Geological Survey for Earth Observation and Science (USGS\EROS). We selected the images not only avoiding presence of clouds, but also selecting images around dry and wet months to maximize the effects of vegetation seasonality. All scenes were preprocessed to surface reflectance levels using the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) atmospheric and topographic correction algorithm (Vermote et al. 1997). We used the NDVI (NIR and red bands) of each period. Although NDVI is a simple remote sensing index, it minimizes the effect of shadows caused by the terrain's topography (Vorovencii 2014). We then applied an image differencing technique (Lu et al. 2004). This technique has long been used to highlight areas of image change quickly with minimal supervision and is still in use today, typically applied to image-objects (Tewkesbury et al. 2015).

Image segmentation
We applied the multiresolution segmentation algorithm (Baatz and Schäpe 2000) from the eCognition software, using the image differencing 2016-2015. Using object as unit analysis, pixels are not individually classified but are combined into homogenous groups (objects) and classified together (Chen et al. 2012). We used the following parameters: 0.3 for shape, 0.3 for compactness and 150 for scale parameter (SP). The most critical step is the selection of the SP, which controls the size of the image objects that directly influences the semivariogram-predefined criteria: lag distance (Silveira et al. 2018b). We adopted a trial and error approach (Duro et al. 2012) to find an ap propriate scale parameter that was defined to 150 based on a visual assessment of segmentation suitability. The objects generated were overlapped with the NDVI image differencing to extract the spatial and spectral features for change detection using RF algorithm.

Class definition
The study focused on two classes: (a) vegetation covers with seasonal changes in NDVI caused by phenology; and (b) vegetation covers with changes caused by deforestation/clearing and fires ( Figure 3). These classes were set up according to prior visual interpretation. We used a data set of 130 objects well distributed over the study area, with 65 sampled of each class.

Feature extraction
From the NDVI difference image, we extracted the spectral and spatial features inside the objects. We used as spectral features the M EAN and standard deviation (STDV) of the NDVI values. The spatial information was extracted from the experimental semivariograms (Equation 1), where the (h) is the estimator of the semivariance for each distance h, N (h) is the number of pairs of points separated by distance h, Z(x) is the value of the regionalized variable at point x, and Z(x+h) is the value at point (x+h): (1) Semivariance functions are characterized by three parameters: sill (σ²), range (φ) and nugget effect (τ²). The sill is the plateau reached by the semivariance values, measuring the variance explained by the spatial structure of the data. The range is the distance until the semivariogram reaches the sill, reflecting the distance at which the data become correlated. The nugget effect is the combination of sampling errors and variations in scale that occurs over scales of less than the distance between the sampled points (Curran 1988). We Adv. For. Sci., Cuiabá, v.6, n.4, p.775-782, 2019 attempted to find an optimal lag distance to ensure that sill values would provide a concise description of data variability. We fixed the number of lags as 30 pixels and the lag size equivalent to the image spatial resolution (30 m), resulting in a lag distance of 900 m.
From the semivariogram we calculated six indices (Balaguer et al. 2010) extracted in the software FETEX 2.0 (Ruiz et al. 2011) (Table 1). These indices describe the shape of the experimental semivariograms, and therefore the properties that characterize the spatial patterns of the image objects. ) +γ max_1 ) -(γ1(h max_1 -h1))

Change detection
We chose random forest (RF) machine learning algorithm to classify the changes. The samples were divided into two parts, one for classifier training (70%) and the other for classification assessment (30%). The RF algorithm, initially proposed by Breiman (2001) is an ensemble method which generates a set of individually trained decision trees and combines their results. RF is a robust non-parametric classifier and has the ability to accommodate many predictor variables (DeVries et al. 2016).
We used the open-source software WEKA 3.8 to fit the RF. Two parameters need to be set in order to produce the forest trees: the number of decision trees to be generated (Ntree) and the number of variables to be selected and tested for the best split when growing the trees (M try) (Belgiu and Drăgu 2016). Five hundred trees were grown for each classification and M try parameter was left at its default value (log of the number of features + 1) (M illard and Richardson 2015). Design of a decision tree required the choice of an attribute selection measure and a pruning method. Thus, we chose the Information Gain Ratio (GR) criterion as a quality measure of the attributes used to classify.

Evaluation of Change Detection
We evaluated the combination of spatial and spectral features as input data to train RF to classify seasonal changes and changes caused by deforestation or fires; using a confusion matrix (Congalton et al. 2014). From this, we measured the overall accuracy, producer's and user's accuracy and the kappa coefficient.

Results and discussion
We first analysed the semivariogram as a tool to quantify the spatial variability of NDVI pixels inside the objects. The semivariograms reached the sill within the calculated distance (900m), indicating that their spatial extents were sufficiently large to encompass the entire spatial variability of NDVI derived from OLI/Landsat-8 images.
From the semivariograms, we found two distinct patterns: the shape and the overall variability (sill) of the data remained constant during the analysed period in areas with seasonal changes due to phenological effects between 2015 and 2016 ( Figure 4ab); and the shape and sill increased during the analysed period in areas undergoing deforestation or fires (Figure 4cd). These results indicate that the spatial variability of NDVI quantified by semivariograms is very sensitive to Adv. For. Sci., Cuiabá, v.6, n.4, p.775-782, 2019 changes in the vegetation cover caused by human-induced activities. On the other hand, we found that regardless of the strong seasonality characteristic of the TSBs studied here, natural vegetation phenology did not change the shape and overall variability of the semivariograms. The low variability of NDVI values (sill=0.0010) of the landscape types covering the study area in 2015 (year 1) is explained by the high and homogeneous values of this index inside the objects. In 2016 the land-cover did not change, however the NDVI values decreased due to the effect of vegetation seasonality. Nevertheless, the NDVI variability did not change because the phenology affected the whole object. The values decreased from 2015 to 2016, but its variability did not. In the presence of deforestation or fires, the high variability of the objects is explained by the mix of bare soil and remained vegetation. The increase in overall variability (sill=0.0030) after change is explained by the combination of high NDVI values for the remaining vegetation inside the objects and low NDVI values for bare soil.
Here, after analyses the semivariogram behavior, we used the NDVI image difference to generate the semivariograms and extract the indices. As expected, the semivariograms obtained from the NDVI images presented distinct patterns, as presented in Figure 5, showing the ability of the semivariogram to depict landscape spatial heterogeneity. The image spatial variability, increased considerably from seasonal change class to fires and/or deforestation. Similar results were found by other studies. Acerbi Junior et al. (2015) working in Brazilian savannas, analysed semivariogram parameters obtained from NDVI images to detect changes. They concluded that sill and range parameters increased their values after deforestation and remain similar if the land cover had not been changed. Sertel et al. (2007) analyzed the use of semivariograms to identify earthquake damage in an urban area in Turkey, and concluded that the semivariogram shape was different for pre and postearthquake if the area was severely damaged, but similar if the area was not severely damaged. The areas of severe damage had even larger increases in range and sill whereas the areas of minor earthquake damage had similar ranges and sill before and after the earthquake. Garrigues et al. (2006) using SPOT-HRV images with spatial resolution of 20 meters, found that the spatial variability provided by sill increased considerably from forest areas (homogeneous) to agricultural areas (heterogeneous), due to the mosaic of objects analysed with high NDVI values and the presence of bare soil areas with low values of this index, contrasting with vegetation areas.  Table 2 summarizes the results in terms of global accuracy, user accuracy and producer and Kappa precision related to the map generated for the study area ( Figure 6). We obtained a global accuracy of 97.73% and Kappa index 0.95, indicating high accuracy. The most important point to note is that all changed objects due to deforestation or fires were correctly mapped (producer's accuracy = 100%) with an Adv. For. Sci., Cuiabá, v.6, n.4, p.775-782, 2019 inclusion error of 10%. The seasonal change class also good producer's accuracy, reaching 96.15%. Silveira et al., 2018b classifying seasonal changes and LULCC in Brazilian Savannas, found an overall accuracy of 95% and 88.33% using the individual sill semivariogram parameter and AFM index derived from NDVI bi-temporal images, respectively. The change detection classes analysed were distinguished by the overall variability provided by the parameter of the semivariogram. The accuracies of the sill parameter and AFM semivariogram indices were higher than those of the other indices, as they provide information that represented the overall variability of the NDVI, and the other indices provide information related to specific parts of the semivariogram.  The most important indices computed by the Information Gain Ratio (GR) criteria are presented in Table 3. The most important feature was the VFM (0.925) semivariogram index and M ean (0.629) spectral feature, followed by MFM, DESVT, RVF and AFM . According to Balaguer et al. (2010), VFM parameter computes the variance of the semivariogram values between the first value and the first maximum. It is directly related to the homogeneity of the values in the image. It is complementary to the information provided by the MFM (mean of the semivariogram values up to the first maximum). RVF is the ratio between the values of the total variance and the semivariance at first lag Since the semivariogram tends to reach the sill near the variance (Fig. 1), this parameter is an indicator of the relationship between the spatial correlation at long and short distances. Its value increases when high variability at long distances and low variability at short distances occurs. This feature also presents high values for images with large primitives or periodic patterns. AFM is the Adv. For. Sci., Cuiabá, v.6, n.4, p.775-782, 2019 area between the semivariogram value in the first lag and the semivariogram function until the first maximum. This parameter provides information about the semivariogram curvature and is also related to the variability of the data  Wu et al. (2015) tested the use of semivariogram features for object-based high-resolution image classification. They showed that such features are of a more beneficial supplement to spectral features. In Balaguer et al. (2010) the classification accuracies obtained using the proposed semivariogram features were, in general, higher and more balanced than those obtained using the spectral sets and standard texture features. Thus, combining spatial and spectral remote sensing features increases the accuracy of LULLC detection.

Conclusion
Here we combined spatial and spectral features derived from NDVI image difference to differentiate seasonal changes from deforestation/fires. Our study has demonstrated that the use of spatial information combined with spectral features provide good results, presenting 97.73% of global accuracy. The spatial variability of NDVI values were not affect by vegetation seasonality, favouring the addition of semivariogram indices to reduce the impact of seasonality for detecting deforestation or fires from NDVI image difference. From the total of 13 features, the ones that best contributed to increase the separability on classification assessment were VFM , M EAN, M FM , DESVT, RVF and AFM . How to accurately extract LULCC while disregarding the ones caused by phenological differences in Brazilian seasonal biomes undergoing rapid land-cover changes can be achieved by adding semivariogram indices in combination with spectral features as input data to train random forest algorithm.