Modelagem mista generalizada para estimar afilamento do fuste de árvores de Pinus taeda em diferentes espaçamentos de plantio

The goal of this study was to test two non-segmented taper models fitted using two approaches, considering fixed effects only and using mixed models varying the random term (tree, stand, diameter class). Data was from 60 Pinus taeda trees scaled in two stands with different planting spacings (4 m x 2 m and 3 m x 2 m). The fit yielded precise diameter estimates over the stem. RMSE was lower than 0.87 cm and MAE was lower than 0.65 cm. The mixed modeling approach overperformed modeling considering fixed effects only. Using tree as the random term yielded the best estimates compared to stand and diameter classes, resulting in RMSE value lower than 0.51 cm and MAE value lower than 0.38 cm. Therefore, the best approach was using the polynomial of integer and fractional powers fitted by the mixed approach considering “tree” as random. Modeling was validated using the Bootstrap technique with 100 random samples. We recommend mixed modeling to improve estimates and the polynomial of integer and fractional powers considering tree as the random term. This way, fitting is more general on stands with different spacings. Original Article *Corresponding author: ximena@unicentro.br


Introduction
Pinus taeda L. is native to South and Southeast United States of America, occurring in 14 states (Robertson et al. 2011). This specie was introduced in 1960 in Brazil (Santos et al. 2014) and nowadays, it is one of the most planted in the country (Shimizu et al. 2018). The genus occupies 1.64 million hectares, majority in Paraná and Santa Catarina states (IBÁ, 2020). Pine species produce the raw base material for many industries, such as construction, furniture, resin, pulp and paper, panels, and energy (Correa and Fett Neto, 2012;. The different uses of the wood determine the applicable management at each situation regarding spacing, thinning, pruning, and rotation, for example (David et al. 2018). This way, management planning and the productive capacity of a site will be sync, so wood will be produced uninterruptedly, yielding economic, social and environmental benefits (Souza et al. 2017).
All these management approaches have to be considered when modeling taper, since stem shape is affected by several stand attributes, such as species, age, spacing, and site quality (Burkhart and Tomé, 2012). Considering all these factors when modeling taper can lead to a large number of equations to address them, for example, describing different spacing (Vendruscolo et al. 2016), age classes (Koehler et al. 2016) or diameter classes (Ribeiro and Andrade, 2016).
One alternative to fitting multiple specific models for varied data affecting the response variable is to apply generalized models. It consists of a flexible kind of modeling, allowing to include variables explaining the response variable in the model (Farjat et al. 2015) and random variables as well, yielding mixed models (Scolforo et al. 2018a). Mixed model approach considers average parameters for a population and specific parameters for groups. For example, fixed effect parameters affect the whole population, and random parameters describe specific responses for each tree .
This study's goal was to assess four approaches fitting two taper functions for P. taeda trees planted in two stands with two planting spacing (4 m x 2 m and 3 m x 2 m). The approaches were: a) modeling considering fixed effects only, b) modeling with mixed effect considering stand as random, c) modeling with mixed effect considering diameter class as random, d) modeling with mixed effect considering tree as random. These approaches were applied in two taper models, the fifth degree polynomials (Schöepfer, 1966) and the polynomial of integer and fractional powers (Hradetzky, 1976).

Site and data sampling
Data was collected in a forest plantation at the Universidade Estadual do Centro-Oeste, in Irati city, Parana state (25º27'56'' S, 50º37'51'' W). The forest was planted in 2003, composed of five stands with varying spacings. At data collection, the forest was 17 years old. Two stands were sampled, stand 1 was 1.12 ha, planted in 4 m x 2 m spacing and stand 2 was 0.76 ha, planted in 3 m x 2 m spacing.
The forest is located in the Mixed rainforest domain, in a wet temperate climate, classified as Cfb according to Köppen-Geiger classification. This climate encompasses 2.6% of Brazil, and 37% of Paraná state (Alvares et al. 2013). Cfb is characterized by uniform precipitation along the year, frequent frost at winter, and average, maximum and minimum temperatures of 18 ºC, 22 ºC, and -3 ºC (IBGE, 2021).
Diameter distributions in the stands were retrieved from the 2019 forest inventory performed in the area. Table 1 presents the main statistics for stand 1 and 2 from the forest inventory. Trees were stratified into 4 cm diameter classes, and trees of each class were indirectly scaled using Criterion RD 1000. Stand 1 (4 m x 2 m) contained 6 diameter classes, from 13 and 37 cm, in which 5 trees were scaled in each class. Stand 2 (3 m x 2 m) contained 5 diameter classes, from 13 and 33 cm, in which 6 trees were scaled in each class. In total, 30 trees were scaled in each stand, totaling 60 trees scaled.
To control uncertainties related to this data, two procedures were taken. First was to ensure that the person scaling the trees received training before data collection. Second, up to 2 meters (0,2 m; 0,5 m; 0,7 m; 1,0 m; 1,3 m e 2 m) diameters were measured indirectly and directly, with a Criterion RD 1000 and with a metric tape, respectively. After 2 meters only indirect measurements were made, every one meter, up to commercial height. Indirect scaling was preferred over direct scaling because it is more practical and economical, and its accuracy was attested by several studies (Curto et al. 2019;Nicoletti et al. 2015;Oliveira et al. 2018).
Four approaches were taken to fit these models, using the 60 trees scaled: a) a linear fit with no random term (b1 and b2); b) mixed effect fit, considering as random (b1 and b2) the two different spacing; c) mixed effect fit, considering as random (b1 and b2) the 11 diameter classes (6 at stand 1 and 5 at the stand 2); d) mixed effect fit, considering as random the term trees (b1 and b2).

Fitting and validation assessment
Data was processed using the software R (R Core Team, 2020). The package ggplot2 (Wickham, 2016) was used to produce graphs and the package nlme (Pinheiro et al. 2016) was used to fit models with mixed effect. Modeling assumptions were verified for the random error term and to the random parameter terms in the models. The fit was assessed by the Akaike Information Criterion (AIC) and by the Bayesian Information Criterion (BIC). Charts 1:1 showing estimated x observed data were produced as well. Besides, the Root Mean Square Error (RMSE) (3) and the Mean Absolute Error (MAE) (4) were used.
(4) Where: n =number of data observed; diobs. = value of di observed at scaling; diest. = value of di estimated by the model. The resulting arrangement yielded by the combination of approach x model was validated using the Bootstrap technique, which is a nonparametric method with reposition (Efron, 1982) that generated 100 random databases from which RMSE and MAE were calculated. This technique was used in other studies to validate models (Scolforo et al. 2018a;Hall et al. 2019).

Results
Models were fit using the approach considering fixed effects only, and the tree approaches considering the random term as well. All assumptions for modeling were verified. Fixed coefficients (Table 2) were significant to α= 5%, presenting p values lower than 0.05. The method of stepwise selected the powers of 0.004, 0.8, 30 and 80 to the Hradetzky equation.  Table 3 shows minimum, maximum and standard deviation of the random coefficients b1 and b2, according to the random term, stand, diameter class or tree considered on each approach used for the mixed modeling. Assessment of estimated diameters at several heights on the stem yielded by the different modeling approaches was done by analyzing 1:1 charts ( Figure  1). Regardless of the model used, the mixed modeling approach considering tree as the random term yielded the best estimates.
Estimates from the two models on the four approaches were assessed by the AIC, BIC, RMSE and MAE statistics (Table 4). Mixed modeling approach overperformed the fixed modeling approach in all statistics assessed. In the mixed models, considering the random term as stand, diameter class and tree yielded the third, second and best approaches, respectively.
Hradetzky (2) model overperformed Schöepfer (1) model to all statistics (Table 4). Therefore, the best modeling was yielded by using Hradetzky model fit with the mixed approach using tree as the random term.
The selected equation (Hradetzky model fitted with the best approach -mixed model, tree as the random term) was validated using 100 random samples from the non-parametric Bootstrap method with reposition. MAE and RMSE distributions are shown on Figure 2. Validation showed high precision and low bias associated to the estimates, since MAE and RMSE varied in length in short intervals close to zero (Figure 2).

Discussion
Pine species are the second most planted genus in Brazil to several ends (IBÁ, 2020). Taper functions are key to support better assortment calculations for better planning (IBÁ, 2019). The models tested in this study were used in other studies as well (Yoshitani Junior et al. 2012;Téo et al. 2013;Kohler et al. 2016;Santos et al. 2019). These models are of the non-segmented kind. Although some studies applied segmented functions (Sabatia and Burkhart, 2015;Souza et al. 2018), Favalessa et al. (2012 and Scolforo et al. (2018a) recommended the non-segmented models over the segmented ones.
Taper functions have been fit at site specific basis, such as for specific spacing (Vendruscolo et al. 2016), for age classes (Koehler et al. 2016) and diameter classes (Ribeiro and Andrade, 2016). Knowing the factors affecting tree shape allows one to include them into modeling (Gomat et al. 2011) and to use the mixed models approach, randomizing one or more coefficients in the model (Scolforo et al. 2018b), which allows generalizing the fit and to amplify the modeling use. This way, just one general equation can be used instead of many to accommodate all these sources of variation. Our study is unique because it provides a generalized mixed model able to precisely estimate taper for Pinus taeda trees planted in different spacings from different diameter classes. Besides, all three mixed modeling approaches tested generated good estimates as well, and it was the first time that these approaches were tested for this species, which is an important milestone for Pinus taeda modeling.
In this study, the mixed approach was used to generalize diameter estimates over the stem of Pinus taeda trees, which was more accurate than modeling diameters considering fixed effects only. Scolforo et al. (2018a) also fit mixed and fixed taper functions and recommended the mixed approach. Their database was composed of different Eucalyptus clones and ages and with just one fitting, the authors obtained a generalized mixed model flexible enough to represent the different conditions in the forest. Liu et al. (2020a) also pointed out the advantages of the generalized mixed approach, including the need for a smaller database to fit the random term at the equation.
In addition to flexibility to fit well different field conditions, the mixed approach can be useful for different ends in forestry, such as for modeling total tree height (Bronisz and Mehtätalo, 2020;Goméz-García et al. 2015), diameter, height and carbon aboveground (Leite et al. 2020), crown size (Fu et al. 2015), soil organic matter (Mello et al. 2018), wood density (Oliveira et al. 2021), breeding (Henriques et al. 2018), and tree volume (Cerqueira et al. 2020).
The random term in the model can vary regarding the level of detail to describe the response variable. In this study, the most specific level (tree) for the random variable yielded the best modeling, compared to the more general levels tested (diameter class and stand). Ferraz  randomized two levels of information (plot and tree) to estimate tree height for trees grown in different sites, ages, planting spacings, thinning, and fertilization regimes and obtained precise estimates considering the most detailed level as the random term. Scolforo et al. (2018b) considered clones and trees as random to model taper functions for a broad Eucalyptus database collected all over the Brazilian territory, using a penalized mixed spline and found high accuracy.
Other studies considered only one factor as random for the mixed modeling. Cerqueira et al. (2020) modeled Eucalyptus tree volume considering as random different agroforestry systems. Özçelik e Alkan (2020) modeled taper for Pinus brutia trees planted in different spacings, sites, and with different ages, at the Mediterranean region of Turkey. The authors used a segmented linear mixed model considering tree as the random term. Therefore to consider the most detailed level of information "tree" as the random term yielded precise estimates in this study and in other studies as well. This is convenient, since allows the model to be general, and specific at the same time, since attributes described at tree level, such as shape and taper are crucial for accurate volume estimates over a tree stem.
To evaluate the aspects of fitting, there is the validation phase. Validation is a very important process to assess the viability of applying the equation to a different database. In studies where extensive database is available, database can be split, part used for fitting and part for validating (Liu et al. 2020b). However, in most cases, database is limited, so the technique of Bootstraping, which is a nonparametric method with reposition can be used to validate models. The larger the randomized sample, the closer the model is to the central limit theory to a database (Xu and Goodacre, 2018). This technique was also applied in forestry to validate taper models (Scolforo et al. 2018a), and growth and yield models (Hall et al. 2019), for example. As in our study, Hall et al. (2019) used a total of 100 samples to validate their model. From the validating process statistics, we can attest that the equation produced in this study can be used to estimate taper for any Pinus taeda trees planted in a 4 m x 2 m and 3 m x 2m spacing from different classes of diameter. Future studies can be done addressing other species, spacing, areas in the country and age.

Conclusions
Mixed effect modeling is recommended to improve accuracy of taper models for Pinus taeda trees. Using the most detailed level of information as the random term, tree, contributed to better estimates. The polynomial of integer and fractional powers overperformed the fifth degree polynomial. Therefore, we recommended polynomial of integer and fractional powers considering tree as the random term.