Management Summary
The goal of this study was to evaluate an intuitive, user-friendly tool to help farmers make reproduction management decisions by using herd- and cow-level parameters to forecast the chance of conception success for a particular insemination. Machine learning was used to predict the insemination outcomes of individual cows based on phenotypic data using the Modulos AutoML platform.
Machine learning techniques are ideally suited for investigating dairy cattle reproduction performance due to their capacity to handle complicated correlations in data for explanatory variables. ML is a well-established and widely used methodology in both agriculture and academia, but scientific research on its use in breeding is typically conducted without the use of a standard process model to increase the performance and efficiency of machine learning applications. In our master's thesis we developed an ML application for predicting the reproductive management of dairy cows following the principles of the CRISP-ML(Q).
The study examined data from 2,500 Swiss farms during a five-year period from 2015 to 2020. Data on health, reproduction, production, and breeding values were taken from Qualitas AG's ArgusQ database (a subsidiary of swissherdbook) and evaluated using the Modulos AutoML platform. There were 210’000 breeding records from cows of various breeds in the prepared data set (including Holstein, Swiss Fleckvieh and Simmental). Each data point in the final data set included 65 explanatory variables and 1 binary label. The model provided a reasonable prediction of the likelihood of conception success. In the external validation (blind-test), based on 15% of the available data (38’046 records) the model predicted 19’982 successful and 18’064 unsuccessful inseminations, the number of false positives are 5’645, the number of false negatives are 4’667, while true positives are 14,337 and true negatives are 13’397. The F1 Score binary was 73.5%, the recall score was 75.4% and the accuracy was 72.8%. This also results in a positive ML effect of 6.57 CHF per successful insemination.
We can therefore summarize that the developed model has a positive economic and statistical performance. However, we consider the number and percentage of false negatives problematic. The consequence of false negative results could be the culling of still functional cows by the breeders, this poses an ethnic risk. Further research and optimization is necessary before swissherdbook could release such a product for its customers.
Key words: reproductive management, dairy cattle, prediction, conception, algorithm, machine learning, quality assurance methodology, guidelines