Classical and machine learning tools for identifying yellow-seeded Brassica napus by fusion of hyperspectral features

M=150 as an example. Before that, we placed the randomly selected 150 test samples in the three-dimensional space constructed by the above spectral indexes, as shown in Figure 7. The yellow-seeded and non-yellow-seeded are clustered together visually, respectively and separated from each other. www.frontiersin.org

Figure 7. Two-classification of the yellow and non-yellow-seeded samples based on the feature combination .

Now, we hope to find a segmentation hyperplane to separate the above two types of samples. To do so, firstly, we input the remaining 150 samples including the spectral index combination and the corresponding labeled Y = 0/1 to train the Logit-R model. The trained discriminant function is

h=11+exp−3.3589RSI+4.3883DSI−5.1165NDSI.

Then, inputting the three-spectral indexes combination of the 150 remaining samples into the trained model. Figure 7 depicts the sample distribution of four categories: yellow-seeded (yellow dots), non-yellow-seeded (grey squares), misclassified yellow-seeded samples (red dots), and misclassified non-yellow-seeded (red squares). As seen in Figure 8, there are only one yellow sample is misclassified and 9 non-yellow samples are misclassified. The RA of yellow-seeded and non-yellow-seeded is 97.78% and 91.43%, respectively, and the average RA is 93.33%.

www.frontiersin.org

Figure 8. The classification of the yellow and non-yellow-seeded samples based on the feature combination .

3.2.3 Predication results with machine learning methods

In this section, we employ two machine learning techniques: random forest Breiman (2001) and support vector classifier (SVC) Pisner and Schnyer (2020) to perform a classification task aimed at identifying the yellow seed. Differing from the preceding feature selection methods, we enhance our feature pool by integrating the original hyperspectral reflectances, consisting of nine spectral indexes calculated according to Table 3, and 23 trilateral parameters. This augmentation results in a total of 633 features.

To extract pivotal attributes and eliminate redundant data, we employ lasso-penalized logistic regression with the R package ‘glmnet’ (Friedman et al., 2010). The parameter λ, which balances the penalty and loss terms, is fine-tuned to maximize the “auc” index. Upon tuning, λ is set at 0.0062. Consequently, we identified 10 key features out of the initial 633. These features include the original hyperspectral values at wavelengths 394, 415, 416, 417, and 418, NDSI(568,988) (a spectral index), as well as Yep, Dymin, SDr, and ske (trilateral parameters).

According to these selected ten features, we conduct random forest and support vector classifier algorithms via R packages ‘rpart’ (Therneau et al., 2015) and ‘e1071’ (Meyer et al., 2024), respectively, by default settings with different training sizes (M: 150–240). The prediction results in test sets for the recognition task are reported in Table 5.

www.frontiersin.org

Table 5. Results of recognition accuracies with machine learning methods.

In comparing the recognition accuracies achieved by random forest and SVC methods, several key observations emerge. Regarding average accuracy, random forest demonstrates results ranging from approximately 96.52%–96.92%, while SVC exhibits a slightly higher range of around 97.87%–98.08%. Both methods exhibit an increase in accuracy with larger training sizes, with random forest’s accuracy gradually ascending and SVC’s accuracy following a similar trend, albeit with minor fluctuations.

In terms of category-specific accuracies, random forest achieves accuracy rates between approximately 93.25% and 94.26% in the yellow category, whereas SVC yields higher accuracy within the range of 95.68%–96.61%. For the non-yellow category, random forest consistently attains commendable accuracy, hovering between roughly 98.23% and 98.57%, whereas SVC shows even greater accuracy, ranging from approximately 98.94%–99.32%, generally surpassing random forest.

The consistency of performance across training sizes and categories is notable in the random forest’s case, where stability is observed. On the other hand, SVC displays slight performance variations, particularly noticeable within the yellow category. In the broader context of comparison, SVC emerges as the more favorable option, showcasing superior performance across most categories and training sizes. While random forest performs admirably in the non-yellow category, it falls short of SVC’s accuracy levels in the yellow category. This analysis underscores the nuanced strengths of each method and the importance of considering the specific problem context when selecting an appropriate machine learning approach.

3.3 Discussion of the proposed recognition methods

Up to this point, the task of recognizing yellow-seeded varieties has been effectively accomplished through the application of hyperspectral technology. Now, we would like to delve into the details of the four proposed models.

Beginning with the PLSR-based model, our approach involves identifying yellow-seeded varieties by predicting the RGB values through three essential trilateral parameters obtained from hyperspectral imaging of rapeseed, along with three significant spectral indices. During the process of predicting each R/G/B channel, we extract several noteworthy spectral features that contribute to enhancing the model’s interpretability. It’s important to note that the success of this method relies heavily on the accuracy of RGB calibration. Moving on to the Logit-R model, our strategy revolves around determining yellow-seeded or non-yellow-seeded categorization based on generating probabilities. However, one potential challenge of this model lies in dealing with imbalanced sample data. To address this, when data imbalance is encountered, it’s essential to consider adjusting classification thresholds to ensure accurate results. The optimal hyperspectral feature chosen for both the aforementioned models is determined through a thorough correlation analysis between the R/G/B values and the 23 trilateral parameters and spectral indices derived from a complete band combination. It’s worth mentioning that this approach might potentially omit some information from the original spectral reflectance data. Diverging from the two aforementioned methods, the machine learning models operate differently. In this case, we initially conduct feature dimensionality reduction from a total of 633 features, encompassing all 23 trilateral parameters, 9 spectral indices, and reflectance data from 601 original bands. Subsequently, we select ten key features to input into the random forest and SVC models.

All four models demonstrate high average accuracy rates, showcasing relatively similar performance differences ranging from 93% to 98%. This consistency highlights the feasibility of the framework that combines spectral features with intelligent models for accurately identifying yellow-seeded B. napus varieties. Considering factors like ease of operation and comprehensive utilization of information, the SVC model is recommended as an optimal choice for the task of identifying yellow-seeded varieties.

4 Discussions and conclusion

Remote sensing technology, recognized as an essential national strategy, finds extensive application across both military and civilian domains. It facilitates the efficient acquisition of spectral data, enabling tasks like land classification and parameter inversion that are challenging for vision-based systems. Hyperspectral imaging technology, a near-Earth remote sensing tool, forms the basis for this advancement. As an innovative method of photoelectric detection and recognition, it integrates spectroscopy with optical imaging, offering a non-destructive and highly efficient alternative to traditional empirical and lab-based approaches for discerning the color of rapeseed seeds. Leveraging the rich spectral and image data inherent to rapeseed samples, this technology holds great promise for agricultural applications.

Comparing yellow seeds with black and brown seeds in Brassica napus reveals that yellow seeds have a thinner seed coat, higher oil content, and better quality. They also have higher protein content in the cake, lower cellulose and polyphenol levels, and higher economic value. Breeding yellow-seed varieties has become a key goal in rapeseed breeding worldwide. However, the complex seed color and inconsistent standards in current identification methods pose challenges. Most researchers use the naked eye or RGB color systems for seed color identification. However, the inconsistent phenotypic color and environmental influences make RGB methods unstable. In contrast, hyperspectral technology, which detects internal seed quality, is less affected by surface color, providing more stable results.

In this study, we introduce four intelligent models carefully designed to distinguish yellow-seeded rapeseed, as depicted in the model flowchart in Figure 9. The first two models, PLSR and Logit-R, synergize spectral indices with hyperspectral trilateral parameters. This process begins with extracting three spectral indices and 23 trilateral parameters. Through correlation analyses across the R, G, and B color channels of rapeseed seeds, we determine the optimal combinations of these spectral indices and trilateral parameters.

www.frontiersin.org

Figure 9. Flowchart of identification modeling.

The PLSR model leverages six features derived from three spectral indices and three trilateral parameters, achieving an impressive recognition accuracy (RA) between 92.32% and 96.55% in differentiating yellow from non-yellow seeds. The Logit-R model, which prioritizes the three spectral indices combined with the R channel, achieves a remarkable RA of 98%.

Additionally, we employ two machine learning models-random forest and SVC-to tackle the identification task. Beyond the 23 trilateral parameters and nine optimal spectral indices, we include the original 601 spectral reflectance values in the feature set. Using lasso-penalized logistic regression, we identify ten key features, which serve as input for the random forest and SVC models, achieving an average RA of approximately 98%, with SVC slightly outperforming random forest.

We emphasize that the proposed identification framework for yellow-seeded rapeseed, which integrates classical statistical methods and advanced machine learning tools, demonstrates robust generalizability. This framework is not limited to rapeseed classification but holds significant potential for application to seed classification and identification tasks across a wide range of other crops. By combining hyperspectral feature extraction with predictive modeling techniques, it provides a versatile approach that can adapt to various seed types, accommodating their unique physical and spectral characteristics. This generalizability makes it a valuable tool for advancing precision agriculture and improving the efficiency of crop breeding programs. Additionally, the framework’s ability to extract internal quality information and analyze large-scale data through machine learning models makes it adaptable for various agricultural tasks, including crop variety identification, stress detection, and quality assessment across different agricultural production systems. By customizing the spectral features and models for specific crops, this framework can be effectively extended to other agricultural systems, enhancing precision farming and crop management in diverse contexts.

This study identifies several limitations and proposes future research directions. It suggests integrating machine vision with machine learning for rapeseed color recognition as a cost-effective alternative to hyperspectral feature fusion, which remains expensive. Machine vision, efficient for non-destructive small-target color recognition, contrasts with hyperspectral remote sensing, which excels in large-area identification. A promising approach involves combining hyperspectral remote sensing with intelligent models, establishing a key paradigm for agricultural monitoring. Future work will focus on integrating machine vision and hyperspectral remote sensing to enhance rapeseed color recognition across broader areas.

Second, this study is limited by the small sample size and narrow color range, based on two B. napus varieties from a single field trial in Changsha (2020–2021). As an initial exploration of hyperspectral technology and machine learning for yellow-seeded rapeseed identification, it provides valuable insights but requires expansion. Future research will include more rapeseed varieties and account for environmental factors like temperature, humidity, light, and altitude by incorporating multi-year, multi-location data for a comprehensive analysis of seed color variability.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

FL: Conceptualization, Formal Analysis, Methodology, Writing–original draft. FW: Funding acquisition, Methodology, Supervision, Writing–review and editing. ZZ: Validation, Writing–review and editing. LC: Software, Visualization, Writing–review and editing. JW: Formal Analysis, Software, Supervision, Writing–review and editing. Y-GW: Formal Analysis, Software, Writing–review and editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research was supported partially by the National Natural Science Foundation of China (Grant No. 12471489), the Key Research Project of the Department of Education of Hunan Province (CN) (Grant No. 22A0135), the Excellent Young Research Project of the Department of Education of Hunan Province (CN) (Grant No. 24B1085), the “Chunhui” Program Collaborative Scientific Research Project (202202004), and the Australian Research Council project (DP160104292).

Acknowledgments

We would like to thank the Oil Institute of Hunan Agricultural University for providing two experimental rapeseed varieties of Xiangyou 708 and Xiangyou 710. The author wishes to thank the reviewers and the handling editor for their constructive comments and suggestions, which led to a great improvement in the presentation of this work.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Auger, B., Marnet, N., Gautier, V., Maia-Grondard, A., Leprince, F., Renard, M., et al. (2010). A detailed survey of seed coat flavonoids in developing seeds of brassica napus l. J. Agric. Food Chem. 58, 6246–6256. doi:10.1021/jf903619v

PubMed Abstract | CrossRef Full Text | Google Scholar

Baetzel, R., Lühs, W., Badani, A.-G., and Friedt, W. (2003). Development of segregating populations in the breeding of yellow-seeded winter rapeseed (brassica napus l.). Proc. 11th Int. Rapeseed Congr. 1, 238–242.

Google Scholar

Bai, Z., Tian, J., Hu, X., Sun, T., Luo, H., and Huang, D. (2022). A back-propagation neural network model using hyperspectral imaging applied to variety nondestructive detection of cereal. J. Food Process Eng. 45, e13973. doi:10.1111/jfpe.13973

CrossRef Full Text | Google Scholar

Broeckx, J., Vanmaercke, M., Duchateau, R., and Poesen, J. (2018). A data-based landslide susceptibility map of africa. Earth-Science Rev. 185, 102–121. doi:10.1016/j.earscirev.2018.05.002

CrossRef Full Text | Google Scholar

Bu, Y., Jiang, X., Tian, J., Hu, X., Han, L., Huang, D., et al. (2023). Rapid nondestructive detecting of sorghum varieties based on hyperspectral imaging and convolutional neural network. J. Sci. Food Agric. 103, 3970–3983. doi:10.1002/jsfa.12344

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, C., Xiao, L., Zhao, Z., and Du, D. (2015). Research progress in seed coat color of yellow-seeded rapeseed. J. Henan Agric. Sci. 44, 1–6. doi:10.15933/j.cnki.1004-3268.2015.09.001

CrossRef Full Text | Google Scholar

Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 (1), 1–22. doi:10.18637/jss.v033.i01

PubMed Abstract | CrossRef Full Text | Google Scholar

Guo, L., Yu, Y., Yu, H., Tang, Y., Li, J., Du, Y., et al. (2019). Rapid quantitative analysis of adulterated rice with partial least squares regression using hyperspectral imaging system. J. Sci. Food Agric. 99, 5558–5564. doi:10.1002/jsfa.9824

PubMed Abstract | CrossRef Full Text | Google Scholar

He, S.-F., Zhou, Q., and Wang, F. (2022). Local wavelet packet decomposition of soil hyperspectral for som estimation. Infrared Phys. & Technol. 125, 104285. doi:10.1016/j.infrared.2022.104285

CrossRef Full Text | Google Scholar

Hong, Y., Liu, Y., Chen, Y., Liu, Y., Yu, L., Liu, Y., et al. (2019). Application of fractional-order derivative in the quantitative estimation of soil organic matter content through visible and near-infrared spectroscopy. Geoderma 337, 758–769. doi:10.1016/j.geoderma.2018.10.025

CrossRef Full Text | Google Scholar

Jiang, S., Wang, F., Shen, L., and Liao, G. (2018). Local detrended fluctuation analysis for spectral red-edge parameters extraction. Nonlinear Dyn. 93, 995–1008. doi:10.1007/s11071-018-4241-y

CrossRef Full Text | Google Scholar

Jiang, S., Wang, F., Shen, L., Liao, G., and Wang, L. (2017). Extracting sensitive spectrum bands of rapeseed using multiscale multifractal detrended fluctuation analysis. J. Appl. Phys. 121. doi:10.1063/1.4978308

CrossRef Full Text | Google Scholar

Liu, H. (1992). Studies on the inheritance of yellow-seeded brassica napus l. Acta Agron. Sin. (China). doi:10.3321/j.issn:0496-3490.1992.04.001

CrossRef Full Text | Google Scholar

Li, J., Chen, L., Tang, Z., Zhang, X., and Yan, S. (2001). “Genetic study and commercial application of the yellow-seeded rapeseed (brassica napus l.),” in Proceedings of the international symposium on rapeseed science (New York: Science Press), 19–23.

Google Scholar

Li, J., Li, Q., Wang, F., and Liu, F. (2022). Hyperspectral redundancy detection and modeling with local hurst exponent. Phys. A Stat. Mech. its Appl. 592, 126830. doi:10.1016/j.physa.2021.126830

CrossRef Full Text | Google Scholar

Li, X., Chen, L., Hong, M., Zhang, Y., Zu, F., Wen, J., et al. (2012). A large insertion in bhlh transcription factor brtt8 resulting in yellow seed coat in brassica rapa. PLoS One 7, e44145. doi:10.1371/journal.pone.0044145

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, Y., Liu, X. L., Li, J., Yin, J., and Xu, X. (2012). Construction of near-infrared reflectance spectroscopy model for seed color of rapeseed. Chin. J. Oil Crop Sci. 34.

Google Scholar

Liang, J., Wang, Y., Shi, Y., Huang, X., Li, Z., Zhang, X., et al. (2023). Non-destructive discrimination of homochromatic foreign materials in cut tobacco based on vis-nir hyperspectral imaging. J. Sci. Food Agric. 103, 4545–4552. doi:10.1002/jsfa.12528

PubMed Abstract | CrossRef Full Text | Google Scholar

Lin, Y., Deng, X., Li, X., and Ma, E. (2014). Comparison of multinomial logistic regression and logistic regression: which is more efficient in allocating land use? Front. Earth Sci. 8, 512–523. doi:10.1007/s11707-014-0426-y

CrossRef Full Text | Google Scholar

Liu, F., Wang, F., Liao, G., Lu, X., and Yang, J. (2021). Prediction of oleic acid content of rapeseed using hyperspectral technique. Appl. Sci. 11, 5726. doi:10.3390/app11125726

CrossRef Full Text | Google Scholar

Liu, F., Wang, F., Wang, X., Liao, G., Zhang, Z., Yang, Y., et al. (2022). Rapeseed variety recognition based on hyperspectral feature fusion. Agronomy 12, 2350. doi:10.3390/agronomy12102350

CrossRef Full Text | Google Scholar

Liu, X., Tu, J., Chen, B., and Fu, T. (2005). Identification and inheritance of a partially dominant gene for yellow seed colour in brassica napus. Plant Breed. 124, 9–12. doi:10.1111/j.1439-0523.2004.01051.x

CrossRef Full Text | Google Scholar

Meacham-Hensold, K., Montes, C. M., Wu, J., Guan, K., Fu, P., Ainsworth, E. A., et al. (2019). High-throughput field phenotyping using hyperspectral reflectance and partial least squares regression (plsr) reveals genetic modifications to photosynthetic capacity. Remote Sens. Environ. 231, 111176. doi:10.1016/j.rse.2019.04.029

PubMed Abstract | CrossRef Full Text | Google Scholar

Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., and Chang, C.-C. (2024). Misc functions of the department of statistics (e1071), tu wien. R. Package 1, 7–16. doi:10.32614/CRAN.package.e1071

CrossRef Full Text | Google Scholar

Munshi, T., Zuidgeest, M., Brussel, M., and van Maarseveen, M. (2014). Logistic regression and cellular automata-based modelling of retail, commercial and residential development in the city of ahmedabad, India. Cities 39, 68–86. doi:10.1016/j.cities.2014.02.007

CrossRef Full Text | Google Scholar

Peng, Z., Lin, S., Zhang, B., Wei, Z., Liu, L., Han, N., et al. (2020). Winter wheat canopy water content monitoring based on spectral transforms and “three-edge”’ parameters. Agric. Water Manag. 240, 106306. doi:10.1016/j.agwat.2020.106306

CrossRef Full Text | Google Scholar

Petisco, C., García-Criado, B., Vázquez-de Aldana, B. R., De Haro, A., and García-Ciudad, A. (2010). Measurement of quality parameters in intact seeds of brassica species using visible and near-infrared spectroscopy. Industrial Crops Prod. 32, 139–146. doi:10.1016/j.indcrop.2010.04.003

CrossRef Full Text | Google Scholar

Pisner, D. A., and Schnyer, D. M. (2020). “Support vector machine,” in Machine learning (Elsevier), 101–121.

Google Scholar

Qu, C., Fu, F., Lu, K., Zhang, K., Wang, R., Xu, X., et al. (2013). Differential accumulation of phenolic compounds and expression of related genes in black-and yellow-seeded brassica napus. J. Exp. Bot. 64, 2885–2898. doi:10.1093/jxb/ert148

PubMed Abstract | CrossRef Full Text | Google Scholar

Sen, R., Sharma, S., Kaur, G., and Banga, S. S. (2018). Near-infrared reflectance spectroscopy calibrations for assessment of oil, phenols, glucosinolates and fatty acid content in the intact seeds of oilseed brassica species. J. Sci. Food Agric. 98, 4050–4057. doi:10.1002/jsfa.8919

PubMed Abstract | CrossRef Full Text | Google Scholar

Sibanda, M., Mutanga, O., Dube, T., Odindi, J., and Mafongoya, P. L. (2019). The utility of the upcoming HyspIRI’s simulated spectral settings in detecting maize gray leafy spot in relation to sentinel-2 MSI, VENµS, and landsat 8 OLI sensors. Agronomy 9, 846. doi:10.3390/agronomy9120846

CrossRef Full Text | Google Scholar

Somers, D. J., Rakow, G., Prabhu, V. K., and Friesen, K. R. (2001). Identification of a major gene and rapd markers for yellow seed coat colour in brassica napus. Genome 44, 1077–1082. doi:10.1139/g01-097

PubMed Abstract | CrossRef Full Text | Google Scholar

Tańska, M., Rotkiewicz, D., Kozirok, W., and Konopka, I. (2005). Measurement of the geometrical features and surface color of rapeseeds using digital image analysis. Food Res. Int. 38, 741–750. doi:10.1016/j.foodres.2005.01.008

CrossRef Full Text | Google Scholar

Wei, Y., Li, X., Pan, X., and Li, L. (2020). Nondestructive classification of soybean seed varieties by hyperspectral imaging and ensemble machine learning algorithms. Sensors 20, 6980. doi:10.3390/s20236980

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, F., Ye, S., Ma, X., Chen, Y., Yi, B., Ma, C., et al. (2021). Resynthesis of yellow-seeded brassica napus and comparative metabonomic analysis of differently colored seed coats. Mol. Plant Breed., 1–14. doi:10.13271/j.mpb.022.003979

CrossRef Full Text | Google Scholar

Ye, Q., Wang, Y., Zhou, S., Cheng, X., and Jia, J. (2018). Color discrimination based on hyperspectral imaging method. Spectrosc. Spectr. Analysis 38, 3310–3314. doi:10.3964/j.issn.1000-0593(2018)10-3310-05

CrossRef Full Text | Google Scholar

Zhang, J., Huang, Y., Li, Z., Liu, P., and Yuan, L. (2017). Noise-resistant spectral features for retrieving foliar chemical parameters. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 10, 5369–5380. doi:10.1109/jstars.2017.2713039

CrossRef Full Text | Google Scholar

Zhang, L., Sun, H., Rao, Z., and Ji, H. (2020). Hyperspectral imaging technology combined with deep forest model to identify frost-damaged rice seeds. Spectrochimica Acta Part A Mol. Biomol. Spectrosc. 229, 117973. doi:10.1016/j.saa.2019.117973

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, T., Wei, W., Zhao, B., Wang, R., Li, M., Yang, L., et al. (2018). A reliable methodology for determining seed viability by using hyperspectral data from two sides of wheat seeds. Sensors 18, 813. doi:10.3390/s18030813

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, X., Pang, Z., Chen, L., Yin, J., and Li, J. (2006). Seed color detection by computer technology in rapeseed. Chin. J. Oil Crop Sci. 28, 11. doi:10.3321/j.issn:1007-9084.2006.01.003

CrossRef Full Text | Google Scholar

留言 (0)

沒有登入
gif