Calf Weight Estimation with Stereo Camera Using Three-Dimensional Successive Cylindrical Model

Various studies have been conducted on methods for estimating the weight of cattle. In this study, we propose a method to estimate body weight by modeling the shape of calf using three-dimensional information extracted from the stereo images. Initially, a stereo camera is set with two fixed network cameras, to take a motion image of a calf. Three-dimensional coordinate is calculated by applying the stereo matching method to a static image obtained by splitting the motion image into frames. Then, we model only the body with a three-dimensional model, because chest girth and waist girth have the highest correlation with body weight. As the body of a cattle has a rounded shape, we used a three-dimensional successive cylindrical model. The linear regression equation of the volume of the cylindrical model and the actual measured body weight is calculated to estimate the weight. In the experiment, the effectiveness of the proposed method was verified by using data of 48 cattle in total. The best results were a correlation coefficient of 0.8679 and an average error of 21.46%. In the future, we aim to further improve accuracy and establish a method to automatically extract images suitable for analysis.


Introduction
Delivery interval is often used as an indicator for fertility of beef cattle. The value of the breeding cattle depends on the efficiency of producing healthy and well grown calf. However, it is doubtful to say that overall reproductive ability is high, as problems may arise in a growth process of calves no matter how high the birthrate is. In addition to delivery interval, maternity ability is also required in order to improve the ability to adequately nourish calves born with high productivity. The weight of the calf has been used as an indicator for this maternal ability since the past decades [1]. In this research, in order to grasp its maternal abilities and improve productivity, the objective is set to simplify the estimation of calf weight.
Various studies have been conducted on methods for estimating the weight of cattle [2]∼ [6]. There are methods to count the number of pixels of cattle body from the image taken by camera set on vertical direction, and estimate the weight by examining the correlation between the number of pixels and its actual weight [2] [7]. In those methods, however, there are many conditions, such as in a situation where images should be taken directly from above and cattle must be in a normal posture. In this paper, we propose the method to estimate weight by three-dimensional model of cattle shape using stereo images. This enables to solve the problem that requires no special environment to shoot images only from above in the previous work. Moreover, we assume that it can mitigate the issue of the posture of cattle by grasping it three-dimensionally.
In the proposed method, firstly, the stereo camera is created by fixing two network cameras. The image of the calf is taken by these cameras, and the stereo matching method [8] [9] is applied to the captured image to calculate threedimensional coordinates. Next, we model only the body because chest girth and waist girth have the highest correlation with weight. Since the body has a rounded shape, we use a three-dimensional successive cylindrical model. We perform weight estimation by using the linear regression equation between the volume of the obtained three-dimensional successive cylindrical model and the weight measured manually.
The rest of this paper is organized as follows. In Section 2, we introduce the relationship between weight and each body parts of cattle which is the focus of our study, and extract each parts data from an image. In Section 3, we propose a method to estimate weight by modeling the body using a successive cylindrical model. In Section 4, the effectiveness of the proposed method is evaluated through experimental results. Finally, we conclude this paper and provide some future research directions in Section 5.

Basic Information of Cattle Body and Extraction of Body Parts Data 2.1 Parts of Cattle Body
As basic information on cattle, Fig. 1 shows the name of each part of the cattle [10]. It is possible to calculate body weight from measured values of each part of cattle [3]∼ [6]. It can be stated that the width from the chest to the abdomen is closely related to the cattle weight, because the chest girth and waist girth are closely correlated to the weight, and then with thurl width, body length, and chest width. On the other hand, withers height and weight are irrelevant.

Extraction of each Body Parts Data from Stereo Images
In this study, we shoot motion images of calves using two fixed network cameras. Then, we split the captured motion image into frames and analyze it as a static image. The image can be handled as a stereo image, by calibrating each camera beforehand [11], and determining the values of the internal and external parameters. Respective parameters obtained during this process are used for rectifying images. In stereo matching, the distance between the camera and the target object needs to be constant. Moreover, depending on the size of the target object, the appropriate interval is different. We experimented with images in which the cattle and camera distance is 1 to 3 meters.
Data of each body part of the cattle is acquired from the stereo image. As a method to extract cattle body, the following two methods are used together. The first is the background subtraction method [12] using the background image taken in advance. The second is a method using parallax values obtained from the stereo matching method [8] [9]. Specifically, we create a mask image, which is obtained from background subtraction methods using background images and images of cattle taken at the same place. Then, we create a disparity image from stereo image, and create another mask image extracted only from the parts with disparity value which is close to the cattle body. The common part of these two mask images is taken to extract the cattle body. By adopting these two methods, it can be expected to  eliminate the remaining noise in each process. Moreover, the image is converted into binary image in which the cattle body is white and the remaining part is black, and then minute noise is eliminated to leave the largest white area. Fig. 2 shows an example of a binary image. Fig. 3 shows the location and range of data for detections. The body length is measured from the shoulder point to the end of the pin bone (the tip of the buttocks). In order to acquire body length data from a static image, it is necessary to detect the portion of the shoulder point. However, it is difficult to detect because there are no distinct features on the shoulder point. Thus, we first acquire withers height data, and define the length of a straight line from withers height to the end of the pin bone as a pseudo body length. Withers height is the length from the shoulder top (withers) to the ground (the hoof in contact with the ground). We detect the hoof from the image and set the withers height with a perpendicular line from the hoof to the shoulder top.
When detecting the hoof, there may be cases in which the hoof is located far away from the perpendicular line of the shoulder point depending on the posture of the cattle. Therefore, the static images of cattle used in this study are to be obtained from motion image, mainly in the normal posture of Fig. 3, for example.

Modeling Body of Cattle 3.1 Definition of Body
Chest girth is measured based on the position of crops. It is difficult to determine the position behind the crops on the image and cannot detect the  chest girth precisely. Therefore, we model the entire body using a successive cylindrical model to find the correlation with weight. A model is created by measuring the thickness of the body at each position using three-dimensional point cloud data from the chest to the pin bone.
The red rectangle in Fig. 4 is the target of the body part to model. The horizontal size is the range in which the body length data is detected. The vertical size is the range between the top end of shoulder required for detecting withers height and the bottom end of the detected body.

Modeling Body
Three-dimensional point cloud data is calculated from stereo images using the stereo matching method. Since the surface of the cattle has little change in color, the three-dimensional point cloud data seems to contain many errors of matching between images. For this reason, it is difficult to calculate the precise three-dimensional volume data of the trunk from the three-dimensional point cloud data. Therefore, we attempt to model it by using a simple and fixed three-dimensional model (i.e. a cylinder, cuboid). Fig. 5 shows an image taken from the upward direction of cattle. It can be interpreted that it is almost symmetrical around the spine. Therefore, if the bulge of either the left or right trunk of body can be detected, then the entire body can be modeled. Moreover, since the body has a rounded shape, it is desirable to model with a cylindrical model that can express a roundness rather than a model with a corner such as a cuboid.
On the other hand, the thickness of the body varies depending on the position, as it can be seen from Fig. 5. Therefore, it is necessary to obtain the girth of the body to  create a closer model to the actual body. Then, we express different thicknesses depending on the positions of the body by slicing the body into several parts. The method of measuring around the body and the method of slicing will be described in detail in the next section. Fig. 6 shows an image of the successive cylindrical model. In reality, the portions are much thinner, so the number of cylinders increases, and the body line becomes smoother.

Method of Determining Radius in Successive Cylindrical Model
We extract the body defined in Section 3.1 and obtain three-dimensional coordinates from parallax image. In the image, let x cm be the length in horizontal direction, y cm be the length in vertical direction, and z cm be the length in depth direction as shown in Fig. 7. It is the length obtained by the stereo method. The body length is the length of the extracted body in the x direction.
As described in Section 3.2, it is necessary to divide the body into several parts in order to change the radius of the cylindrical model according to the position in the x direction of the body. Therefore, a slide division method is adopted in which the width w in the x direction is fixed and divided while slightly sliding (for example, 5 cm width is slid by 1 cm each). Fig. 7 shows an image of slide division method.
We divide the body in x direction. A set of points existing in the divided space is considered as one set, and a set of points is projected on the YZ plane for each set. In the projected point cloud data, the value of z is averaged over the y direction so that for one y value, the value of one z can be determined. However, the point cloud data on the three-dimensional space used in this study contains many noises due to errors of the stereo matching methods. So it is difficult to determine the shape of the body directly by using point cloud data. Therefore, the radius of the cylindrical model is determined by using the circle closest to the point cloud data on the YZ plane obtained previously. This enables to detect the body shape while reducing errors.
To develop the circle nearest to the point cloud data, a circle fitting using a generalized Hough transform[?] is employed. This is a technique for detecting straight lines and circles from images. Let the radius of the developed circle be the radius of the cylindrical model at that location. The flow of these processes is shown in Fig. 8. Even though the body is smoothened, if the radius of the developed circle changes remarkably with the radius of the adjoining circle obtained previously, it may be caused by detection of outliers because a large amount of noise exists in the point cloud data. Therefore, we also consider methods to determine the radius referring to the size of adjoining circles.

Correlation between Weight and Volume of Model
Let r i be the radius of the ith column from the left and w i be the length of the horizontal cylinder. Let L be the length of the body of cattle and I be the number of cylinders.
Applying the slide division method, consider a case where the w cm width is divided while sliding it by s cm. Circle fitting is performed using a point cloud data existing w/2 cm prior and later of the position for calculating the radius. At this time, note that point cloud data with w/2 cm prior or later at both ends of the body cannot be used. Therefore, circle fitting is performed from the point cloud data which can be used w/2 cm prior and later, and both ends are supplemented by the radius of the closest cylindrical model.
Since the sum of the lengths of all the cylinders is the body length, The volume V of the successive cylindrical model is, Moreover, when considering the size of the adjoining circle, if the radius detected by the circle fitting is r, the ith radius r i ,   When the condition of Eq. 4 is satisfied, r i = r. If it is not, r i = r i−1 . We achieve weight estimation using stereo images by linear regression equations to show the relation between actual weight of cattle and the volume of the successive cylindrical model calculated by Eq. 3.

Experiment 4.1 Experiment on division width of body 4.1.1 Data Set
We conducted experiments on black calves bred at the Food Resources Education and Research Center, Graduate School of Agricultural Science, Kobe University, from 5 days old to the ones weighing about 200 kg. Two types of network camera are used: CG-NCBU031A manufactured by corega (Fig. 9), and M1145 network camera manufactured by AXIS (Fig. 10). There are two photographed places as shown in Fig. 11. The photographed place 1 indicated by 1 ⃝ is a passage which only the calves can pass through in Fig. 11 center. The photographed place 2 indicated by 2 ⃝ is inside the cowshed that breeds calves over 100 kg.
It is necessary to collimate the captured image by using the internal and external parameters of the camera set beforehand. We calibrated the camera every time when shooting to improve the accuracy of disparity value. Then, we analyze the image using the proposed method. In addition, we will consider the case of employing multiple stereo im-  ages in a pseudo manner by averaging the volume of the same cattle model taken on the same day in this study.
In order to generate images, we guided calves one by one in front of the camera and took pictures of walking from the side or in a normal posture. Data to be used is motion images of 48 cattle in total, among them, 16 cattle were taken with two types of camera on January 13, 2017, and 16 cattle were taken with corega camera on January 19, 2017. In addition, five stereo images were analyzed for one cattle.
We create three patterns of successive cylindrical models by performing circle fitting under the following three conditions when we model the body:

Extraction of Parts of Cattle Body and Comparison of Shooting Environment
The cattle body is extracted from obtained image. Some of the results are shown in Figs. 12 and 13. Each figure shows a stereo image in A and a parallax image in B. Fig. 12 shows the images photographed at place 1 using the AXIS camera, and Fig. 13 shows the images photographed at place 2 using the corega camera.
When comparing the results, backlit photo was taken at place 1, but it was possible to shoot dark places brightly using AXIS camera. Thus, parallax value of the cattle body was obtained clearly even in backlight. These results indicate that AXIS camera is suitable for place 1, and corega camera is suitable for place 2.   Table 1 shows two items, (i) the correlation coefficient between the volumes of the model under the three individual conditions performed in this experiment and the measured weight, and (ii) the average value of error between estimated weight which is calculated from linear regression equation and measured weight. In addition, the results of estimated weight obtained by averaging the volume of the same cattle from stereo image are also shown.

Estimation Result of Volume using Successive Cylindrical Model
Figs. 14 and 15 show the graphs with condition 3 as the best result. The horizontal axis of the graph is the volume of the obtained model and the vertical axis is the measured weight. Fig. 14 is a graph plotting all the obtained data, and Fig. 15 is plotting the average value of the volume of the same cattle.

Discussion and Summary
We conducted the experiment to examine the effectiveness of the proposed method for data of 48 cattle in total, using motion images taken with two types of network camera. One cattle was analyzed from five stereo images.
First, when comparing the cases where the body is divided and not divided, the correlation coefficient and the average error are better when divided. Then, comparing the case where each result was individually examined and the volume was averaged, all of the result was better when the volume was averaged. Thus, we verified the effectiveness of slicing the image of the body with the proposed method and to shoot and use multiple images of the same cattle.
The highest accuracy was obtained by averaging the multiple volumes of the slide division method considering adjoining circle, resulting in a correlation coefficient of 0.8679 and an average error of 21.46%. Then, by considering the adjoining circle, it can be stated that modeling close to the real body can be generated by creating a smoothened successive cylindrical model. On the other hand, as it can be analyzed from Figs. 14 and 15 show the error is the largest for a light weight cattle (about 20 kg). These results show that the error of circle fitting became large as small calf has small swelling in the body.
The experimental results have revealed the following: 1. There is a correlation between weight and volume obtained by modeling the body.
2. The proposed method that models by slicing the image of the body is highly effective.
3. There is a possibility to improve accuracy of weight estimation by using multiple stereo images.

Comparison with modeling using elliptical fitting 4.2.1 Data Set
According to the experimental results using circle fitting described above, it can be considered that detection of the body by elliptical fitting can lead to further improvement in accuracy as the shape of the body is closer to an ellipse than a circle. To verify this, we briefly compare the precision difference between circle fitting and elliptical fitting by using 11 cattle taken on December 19, 2016. Only one types of network camera is used: CG-NCBU031A manufactured by corega. We guided calves one by one in front of the camera and took pictures of walking from the side or in a normal posture. Also from the previous experiments, it was presumed that using multiple stereo images can improve accuracy, so we analyzed five stereo images for each calf and averaged the volume of the same cow model in the stereo images to examine the difference in accuracy.
We create five patterns of successive cylindrical models under the following five conditions when we model the body, and confirm the difference in precision between circle fitting and elliptical fitting. The reason we widened the division width of the elliptical fitting is that the processing time has greatly increased as compared with the circle fitting. So only the accuracy comparison with the circle fitting is performed without considering the adjoining circle.  Table 2 shows two items, (i) the correlation coefficient between the volumes of the model under the three individual conditions performed in this experiment and the measured weight, and (ii) the average value of error between estimated weight which is calculated from linear regression equation and measured weight. Fig. 16 shows the graph with condition 4 as the best result. The horizontal axis of the graph is the volume of the obtained model and the vertical axis is the measured weight.

Calculation Result of Volume by Modeling Using Successive Cylindrical Model
The result shows that high precision is obtained for the elliptical fitting for a slight difference. In addition, since the division width is extremely affected by the processing time, if we can implement finer division in the elliptical fitting, it is considered that the accuracy can be much improved.

Conclusions
In this study, we proposed a method to estimate body weight of cattle by analyzing images suitable for weight estimation. These images are captured from motion images of a calf taken by two fixed network cameras as stereo cameras. In order to examine the effectiveness of the proposed method, we conducted experiments under three conditions. At present, it is not possible to automatically extract images suitable for analysis, but it has proved that the weight of cattle can be estimated with the proposed method. However, as the average error is 21.46% with the best results, further improvement in accuracy is desired. According to a comparison experiment between elliptical fitting and circle fitting, we suggest that further accuracy improvement can be expected by implementing elliptical fitting. In addition, it enables the use of multiple stereo image in a pseudo manner by averaging the volumes obtained from different stereo images of the same calf. Since it is possible to generate model with higher accuracy by using multiple images in the model generation process, further study is required in the future.
In this study, we took the motion image of calves by guiding one by one in front of the camera, in order to acquire data. Needless to say, however, it requires hard work to move the cattle from place to place, which the workload is almost the same as putting them on a weighing machine. Therefore, rather than forcefully moving the cattle in front of the camera, it should be pursued that their photos should be taken naturally without moving them. Thus, it is necessary to establish the method to extract analyzable images from motion video of calves living in the barn, in order to estimate their weight. In addition to the proposed method, we aim to establish a method to automatically extract such images from motion video, and apply it to a practical environment.