Data synchronization for gas emission measurements from dairy cattle: A matched filter approach

.


Introduction
Methane (CH 4 ) released from the enteric fermentation of dairy cows constitutes a substantial portion of greenhouse gas emissions (Moss et al. 2000, Gerber et al., 2013, Pachauri 2014, Beach et al. 2015, Charmley et al. 2016).There have been several initiatives aimed at reducing the climate impact of dairy production.Suggested approaches have included feed additives (Olijhoek, et al. 2019), dedicated diets (Bell 2012), and the genetic selection of cows with low CH 4 emission per produced unit of milk and meat (Lassen et al. 2012).
Regardless of the strategy adopted to handle bovine emissions, a well-designed automated measurement system will be crucial for monitoring the efficacy of reduction programs.Any technologies and instruments applied to monitoring bovine emissions will require verification and assessment amenable to large-scale implementation and cost efficiency suitable for commercial use (Garnsworthy et al., 2012a, 2012b, Difford et al. 2018).
Respiration chambers are the most known and well-established proposed gas emission measurement methods (Place et al. 2011, Castelán Ortega et al. 2020).A respiration chamber encloses an animal's pen for multiple days such that the animal's emitted gasses are measured by highly sensitive gas sensors that are able detect a number of gases within a controlled ventilated airstream.While the method is useful for scientific purposes, it is not practical for implementation on an industrial scale because respiration chambers have a low animal testing capacity, they are relatively costly and labor-intensive to operate, and they provide an unnatural environment not suited for phenotyping (Difford et al. 2016).
Alternatively, some have developed measurement approaches focused on eructation emissions, which account for approximately 87% of bovine greenhouse gas emissions (Blaxter and Joyce 1963, Murray et al. 1976, Moss et al. 2000).The most common eructation emissions measuring systems are the GreenFeed system (Huhtanen et al. 2015) and sniffers (Madsen et al. 2010, Garnsworthy et al. 2012a, 2012b).The GreenFeed system attracts cows with pelleted concentrates to entice them to visit it frequently.During each visit, a mounted gas sensor measures CH 4 fluxes in eructation gases from the air stream.
Sniffers are based on a mobile technology that provides reliable estimates of CH 4 emission with a large testing capacity suitable and relatively low installation and labor costs suitable for commercial conditions (Garnsworthy et al. 2012a, 2012b, Lassen et al. 2012, Bell et al. 2014).Briefly, sniffers are integrated units with a small air pump, dust filter, often a de-humidifier to avoid breath condensation, and gas sensors that detect infrared light absorbance at wavelengths specific to carbon dioxide (CO 2 ) and CH 4 .Sniffer data are stored on SD memory cards by a simple data logger and then transferred occasionally to larger databases.Although there is not yet a standard commercial sniffer on the market, a number of sniffers have been customized by different research groups.
Sniffers are easier to implement on an industrial scale than respiration chambers or GreenFeed systems because, in most cases, they can be integrated with already existing facilities (Garnsworthy et al., 2012a(Garnsworthy et al., , 2012b)).Commonly, a sniffer gas emission system (GES) can be mounted as an add-on within an automatic milking station (AMS) such that gasses within the feed bin of each equipped AMS are estimated continuously during milking.However, generally, AMS-mounted sniffers are not integrated with AMS software or databases.Hence, individual cows are not identified in an automated direct manner, leaving the need for staff to correlate time-stamped GES data with animal-related data.Furthermore, frequently, facilities do not have the software or internet connectivity needed to synchronize time clocks between AMS records and sniffer data records during automated data acquisition (Difford et al. 2016).Thus, the advantages of AMS-mounted sniffers notwithstanding, there remains a need to associate sniffer GES time series data correctly with animal identity time series data.Because AMS units are neither fitted with an interface that provides a simple connection to their data that would allow for direct data transfer to external equipment nor designed to receive data from outside sources, the two datasets must be synchronized retroactively.
Because of the constantly growing demand for large-scale CH 4 emission measurements for genetic analysis, the amount of sniffer installations is expected to increase in the coming years.Hence, the need for an efficient solution to the synchronization issue is expected to become an increasingly important limitation.To the best of our knowledge, there has been only one published study reporting a clock alignment method for a sniffer (Difford et al. 2016).Unfortunately, that method is not well-suited for the continuous online automation that would be needed for industry-setting installations because it requires human operators to evaluate intermediate results and make subjective clock-alignment decisions, limiting the size of datasets for which it can be employed.
The matched filtering is a well-known technique commonly used in signal processing (Haykin and Moher 2007).Therefore, a number of studies were published over the years where the use of this technique for the data synchronization problems was reported, mainly for wireless and digital communication problems.Puska and Saarnisaari (2007) presented the method of time and frequency synchronization for orthogonal frequency division multiplexing (OFDM) systems.The same systems were studied in (Saadat et al. 2015), where the authors proposed a matched filter-based timing and frequency synchronization method for multiple input multiple output OFDM systems with application for estimating the timing location at multiple receiver antennas.In (Awan and Koch, 2010) a low complexity multi-rate synchronizer that makes use of a polyphase filter bank to simultaneously perform matched-filtering and arbitrary interpolation for symbol timing synchronization in a sampled-data receiver was described and studied.The synchronization and matched filtering of signals in time dispersive, frequency dispersive and time-frequency dispersive channels were addressed in (Korevaar et al. 2012).
Unfortunately, such reported results and proposed methods cannot be directly used in the case of gas emission data without thorough adaptation of the matched filtering technique to the specific case of biologically determined time series data.Hence, a rigorous justification, substantiation and accurate validation of this technique is required.This, to the best of our knowledge, has not been studied yet.
The objective of this study was to develop a robust and computationally efficient method for automatized online data synchronization during sniffer-mediated gas emission measurements from dairy cows.First, we suggest and justify a novel approach to the GES data/AMS data synchronization problem based on matched filter theory.Next, we verify this approach with emission data records from commercial dairy farms in Denmark.Finally, the suggested approach is analyzed and discussed.

Experimental setup and automated data acquisition
CH 4 and CO 2 emission data records were collated from commercial dairy farms across Denmark.Cows whose data were used in this study belonged to typical commercial dairy cattle herds using AMSs in Denmark.
Gas concentrations were measured by infrared analyzers (Guardian NG/Gascard, Edinburgh Instruments Ltd, Livingston, UK).Gas samples were drawn in from each feeding bin via a hose installed within each AMS (DeLaval International AB, Tumba, Sweden; and Lely Holding B.V., Maassluis, The Netherlands).The cows had free access to AMSs with a 4h minimum milking cycle, except during two daily automated cleaning cycles.Internet-linked AMSs were run with stand-alone GES data loggers (1 record/sec).Data were downloaded as compressed files from either an SD card or via internet cable connection.GES data were logged with Novus Field Logger software (NOVUS Automation, Canoas, Brazil.).The hardware and software installations for emission measurements, the emission data acquisition technique, and the sniffer experimental setups were the same across the farms, with the latter following a previously published plan (Difford et al. 2016).A general overview of our data acquisition system, the types of data collected, the data pipelines from source to outputs used for synchronization is shown in Fig. 1.

Measured data
Two distinct data pipelines were established for each milking site.An AMS pipeline collected cow identification data, milking data, and the milking unit's gate states, which signify the onset and ending of milking sessions (Fig. 1a).The acquired data were processed by an AMS data server (Fig. 1b) that mediated aligned collection of time series data.The AMS servers were internet-connected and ran a clock reset standard to their associated servers' operating systems at midnight every night.Each GES pipeline collected CH 4 and CO 2 concentrations estimates in volume percent units.Similar to the AMS pipeline, the GES pipeline had its own data server and, hence, produces aligned gas emission data.
When a GES server is not subjected to a clock reset, its clock may drift over time and clocks may drift differently across GES pipelines.Merging AMS and GES pipelines required synchronization to correct for skewing due to differences in clock setups within each data server.To account for time skewing and enable synchronization, our approach was designed to exploit characteristic data (defined below) for each pipeline.

n}, where t i is the time the measured value v i was registered according to a server's clock and n is a number of records.
The characteristic data for the AMS pipeline were each milking unit's gate-state records.Particular combinations of gate states were represented in a binary fashion, namely whether there is a cow in the unit or not (Fig. 1b,top).These binary data {0, 1} were recorded constantly, with 0 corresponding to the gate-open state (no cow being milked and thus no gas emissions) and 1 corresponds to the gate-closed state (cow being milked and thus measurements being taken).
The characteristic data for the GES pipeline were CO 2 records (Fig. 1b, bottom), which were preferable to CH 4 records owing to the higher discharge frequency of CO 2 and thus more sensible temporal changes, especially with the gate state changing often and rapidly.The CO 2 records included interval values [c min , c max ], where c min is the minimal registered gas concentration normally recorded when there is no cow in the milking unit (gate open) and c max is the upper limit of gas concentration, which always occurred during milking (gate closed).We used GES data from two commercial dairy farms in Denmark for case studies.At one location, data were collected from 162 cows over 10 d; on average, 13 AMS visits (milking cycles) per cow were registered.At the second location, data were collected from 114 cows over 30 d; on average, 46 AMS visits (milking cycles) per cow were registered.In both locations, GES records were collected at 1-sec intervals.

Matched filter (MF) approach applied to timing skew estimation
In this section, we derive the method for the gas emission data synchronization.The method is formulated in terms of the timing skew estimation problem.Associated data sets and their related timing skews are provided in Definition 2. The model that coupled datasets is proposed in Proposition 1. Finally, using the introduced data model, we formulate and further elaborate the synchronization problem as a signal filtering task.The signal filtering is considered in the context of MF theory.

Definition 2. The characteristic datasets C
their values were sampled during the same AMS visit.Timing skew ρ is defined as ρ = t c1 − t g 1 , where t ci and t g i are the times that the measured values c i and g i , respectively, were sampled; N and K are the number of samples; T c and T g are the durations of sampling.
Note, each AMS visit represents a cow's presence in a milking unit (Fig. 1a) where it is milked by a milking robot and all measurements occur.Thus, it is implied that the (time) length of datasets C and G are equal, T c = T g .The associated datasets are distinct representations of the same AMS visit (due to the Definition 2).Hence, the synchronization problem can be reduced to a problem of estimating a timing skew for the associated data.
The following proposition suggests a data model which relates two characteristic datasets C and G.
where w(t) is a random noise in the range [ − 1, 1] with the covariance matrix R = E{ww tr } where E is an expectation and w tr indicates the transposition of w.
According to the model Eq.1 (see also the Appendix A), the only difference between the two associated signals is due to the random noise, w(t).Hence, these signals should possess the same waveform if w(t)→0.In practice, while g(t) always appears as an undisturbed binary signal (Fig. 2, the top left plot), the gas concentrations c(t) is noisy and can be approximated as g(t) + w(t), that means, the gate signal g(t) is immersed in the noise w(t) (Fig. 2, the top right plot).Finally, the scaling requirement (in the proposition 1) guaranty the Eq.1, see the Fig. 2 (the bottom left plot) where both scaled signals shown together.The plots depicted in the Fig. 2 represents the simulated data with normally distributed random noise.
We begin by supposing our aim is to synchronize the data record C(T) with G(T).This means, for a given specific AMS record g(t) ∈ G(T) with the starting time t g1 , sampling duration T g , and the time t < T, we are looking for an (unknown) associated gas emission record c(t) in C(T) characterized by the starting time t c1 = t g1 +ρ (according to Definition 2) and the duration T c = T g .The Proposition 1 substantiates the general approach for this task.It is to search (filter) the whole signal C(T) until the presence of a pattern characteristic of g(t) is detected at some sampling time t * = t c1 .Then, the associated gas emission record is c(t We assume that c(t) can take the following two possible forms, namely a form consisting of solely of noise or a form that consists of both noise and the signal of interest: To determine which of these two eventualities is true c(t) is filtered so that if g(t) is present, then filtered output at some time t = δ will be significantly greater than when there is no g(t) signal.
Assuming the noise covariance matrix R is known and the filter, characterized by the impulse response h(t), is linear and time-invariant, the MF will be based on maximization of the output signal-to-noise ratio (SNR) (Turin 1960, Robey et al. 1992, Jiang et al., 2012, Bazdresch 2018, Ren et al. 2022): where h H indicates the conjugate transpose of h.It follows from Eq.2 that the only type of linear filter that maximizes SNR is the following MF (Turin 1960, Haykin and Moher 2007, Bazdresch 2018, Zubeldia, et al. 2021): where k is some constant.If we assume R = σ 2 I and k = ̅̅̅̅̅ , where σ 2 is variance, I is the identity matrix, and g H is the conjugate transpose of the signal g(t), then. where √ .The MF output, which is a random variable for determining whether a signal is present, is defined by the integral (Haykin and Moher 2007).
where τ is a vector of time offsets and T is time (the upper limit of integration).
To obtain the detection time δ and the related timing skew ρ let.
Here t c1 is the time of the very first record in c(t).Here we assume l g(t) > l c(t) , where l g(t) and l c(t) are lengths of the related signals; σ y(t) is the standatd deviation of y(τ).In addition, the overall algorithm for timing skew estimation is summarized in Appendix B.

Data scaling
As stated in Proposition 1, our MF approach requires scaling transformation performed on raw (original) data (see section 2.3 and the figure therein) such as.

s(t) = M⋅S
(5) where s(t) = {c(t), g(t)} is a resultant (scaled) data vector; M is a k × k transformation matrix; S = {C, G} is a k × 1 raw data vector and k is the number of records in a dataset with equal time intervals No prior assumption has been made regarding the distribution of w(t) beyond the covariance matrix R.However, when the assumption for R is satisfied, a data-specific distribution of w(t) can be disregarded and the simplest MF is obtained if w(t) is normally distributed (Turin 1960) (for example, see simulated data in Fig. 2).Therefore, the matrix M is expressed here in a form that allows a Gaussian approximation of the V. Milkevych et al.
noise vector w(t) ∼ N(0, σ w I): where μ up and μ dw are the scaling constants for data above and below the p th percentile respectively; I up and I dw are diagonal k × k identity matrices everywhere there are zeros, except for indices that correspond to data above and below p th percentile respectively; S * is k × k matrix; I is identity matrix; 2 is the k × 1 vector where all values are 2; S − 1 is the 1 × k vector defined as , where S = {C, G} is a raw data vector.

Data transformation
An overview of the empirical working data modified by the transformation M (according to Eq. ( 5)) is depicted in Figs. 3 and 4. A small portion of the data is shown in Fig. 3 for demonstration purposes.The record g(t), which represents gate states, appears as a binary signal within the desired range g(t) ∈ { − 1, 1} for all t ∈ [0,∞).The record c(t), as expected, exceeds the range c(t) ∈ [− 1, 1] due to the presence of the noise w(t) in accordance with the data model in Eq. ( 1) where the noise is defined as w(t) = c(t) − g(t).The resulting signal w(t) is depicted in the bottom plot in Fig. 3.
As can be seen from the w(t) subplot in Fig. 3, noise is not homogeneous in real-world data and thus is not Gaussian as occurs in Fig. 3. Overview of rescaled working data.The plots represent a small portion of the working data after being subjected to the transformation defined by Eq. ( 5).

Fig. 4. Distribution of the noise signal.
The plots represent the distribution of the noise in real-world working data, where the noise, w(t), is calculated as w(t) = c(t) − g(t).The left plot shows a noise distribution plot without a Gaussian approximation.The right plot shows a noise distribution with a Gaussian approximation (Eq.( 6)) after undergoing transformation defined by Eq. ( 5).The distributions reflect to the full dataset from which small portions are shown in Fig. 3.The red curves represent the Gaussian fits for each dataset.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)simulated data (e.g., see Fig. 2).The signal differs between the two alternating major periods, namely no-milking periods where g(t) = 0 and milking periods where g(t) = 1 (see top subplot and corresponding w(t) subplot in Fig. 3).
Because there is no direct animal-sourced gas emission during nomilking periods, there is only background noise and the distribution of w(t) is determined primarily by random factors and, hence, assumed to be the same for all no-milking periods.Conversely, during milking periods, random factors of ambient noise interact with physiology-based factors of gas emission through eructation and breath.As a result, during milking, a more complex w(t) distribution is expected that is distinct from the ambient noise distribution.Hence, the obvious result of such noise distribution inhomogeneity is its bimodality which is clearly shown in Fig. 4 at the left, compare the histogram plot with the Gaussian fit depicted by the red curve.
The MF approach requires prior transformation of all raw data according to Eq. ( 5), which produces c(t) and g(t) signals for each data case.These signals are then subjected to further filtering in accordance with Eq. ( 4).This transformation serves not only to rescale the data but also to enable a Gaussian approximation of the noise signal that allows for a simple MF form (Eq. ( 3)).
The Gaussian approximation of w(t) obtained with Eq. ( 6) is shown in Fig. 4. Note that the effects of Gaussian approximation (changes applied to the original data) are evident in the right plot of Fig. 4 compared to the plot of the pre-transformation data in the left plot of Fig. 4. Specifically, the applied transformation reduces noise inhomogeneity, while preserving evident bimodality with a distribution that is not strictly normal.The transformed dataset depicted in Fig. 3, despite being based from a specific example (a record from one farm), is representative across the examined farms (examples of noise distributions for other records from different units can be found in Appendix C).

The estimated timing skews
For case-studies to test the introduced MF approach, we used empirically recorded GES data from two commercial dairy farms in Denmark.The data were acquired under different conditions (farm locations, time settings, etc.).All of the collected data possess general properties of transformed signals (Eq. ( 5) and ( 6), and the results described in section 3.1).Because data pipelines at distinct farms are subject to different conditions (instruments and server setups), different timing skews are expected across the farms.Otherwise, we assume that similar conditions should lead to similar timing skew values.
Several estimated timing skews are depicted in  The subplot series shown for each farm in Fig. 5 reflects the results of calculations based on one complete AMS data recording and multiple associated sets of GES recordings.These are time-distinct blocks of records, which are normally saved as separate files by corresponding data servers (Fig. 1).Hence, in terms of signals, the data appeared as: g(T), c 1 (T 1 ), c 2 (T 2 ), .., c n (T n ), where T ≥ ∑ n j=1 T j ; T and T j are the durations of the records.Each particular subplot in Fig. 5 corresponds to the specific block of records c j (T j ).Note that the AMS and GES records are different between the distinct farms (hence, the pairs of associated sequences g(T), c j ( T j ) ).In general, the calculations reveal that large-scale sniffer implementation is characterized by timing skew values that vary by a few minutes.Our case-studies indicate that ρ = 3.3 min (linear fit intercept in Fig. 5a subplots) and ρ = 60.3 min (linear fit intercept in Fig. 5b subplots), wherein 60 min could be due to omitted seasonal time changes for GES.These are typical values for sniffer implementations in our study.The regularity of ρ values across farms (and thus sniffer implementations) points to common factors that may underlie the appearance of timing skews.These factors can be determined relatively easily (e.g.daylight savings, operating system clock configuration, etc.) and handled in ongoing sniffer technique improvements.
Besides the means of ρ, the calculations reveal two other important characteristics: the time drift δ ρ , characterized by the slope of a linear fit (red lines in Fig. 5); and the variance of the timing skew, σ ρ 2 .Based on our empirical observations, we can deduce that δ ρ is intrinsic to the sniffer technique and with an upper bound of δ ρ that is approximately equal to δ ρ ∼ ̅̅̅̅̅̅̅ σ ρ 2 √ .A specific cause of a particular time drift is rather difficult to identify.Variation of voltage levels in a data pipeline supply network may explain the existence of δ ρ to some degree.The timing skew variance for the case-study cases were found to be σ ρ 2 = 8⋅10 − 3 − 123⋅10 − 3 min (farm   4).The right plot shows the distribution of normalized MF-output values from the left plot with a Gaussian approximation of the data (red curve).This is a distribution of no-detection values of y(τ).Here, ŷ is the mean of y; f is the frequency.The result represented in both plots is a representative example of MF output during signal filtering.The data used for this example were chosen arbitrarily.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)A) and σ ρ 2 = 5⋅10 − 3 − 58⋅10 − 3 min (farm B).The shown ranges indicate values of σ ρ 2 for time-distinct blocks of records within each case-study farm.

Variation of the estimated timing skew
While it appears that the time drift is likely an intrinsic characteristic of the sniffer technique, a substantial portion of timing skew variation can be explained by other factors that are not directly related to the measurement method.These sources of ρ variation are: (i) the data record's sampling interval Δ T (see Definition 1); (ii) the length l c(t) of the signal c j (T j ) used for ρ estimation (section 3.2); and (iii) the type of distribution (and its complexity) of the noise signal w j (T j ).As can be seen in the results depicted in Fig. 6 (left), on average, the variance and sampling interval are related as σ ρ 2 ∼ Δ T 2 .This relationship implies that timing skew calculation accuracy can be moderated by changing the sampling interval for initial data records.
More complex timing-skew variance relationships can be obtained by considering other factors of ρ variation.A graphical representation of the contribution of l c(t) to the variation of ρ is shown in Fig. 6 (right) where it is depicted in terms of the normalized standard deviation of ρ.
The calculations reveal that increasing the length of the signal linearly increases ρ variation.This expected trend is likely due to a time drift of ρ.
The qualitative estimate of the upper bound of the variance can be The relationship shown in Eq. ( 7) indicates that timing skew variation can be reduced substantially by decreasing the length of the signal c(t).Unfortunately, it is not possible to give an exact upper bound for σ ρ 2 because the contribution of the w j (T j ) distribution to σ ρ 2 cannot be quantified easily within the current theoretical framework and would require a dedicated theoretical development.However, we believe that contribution is rather negligible and due mostly to the level of quantitative discrepancy between the generated noise distribution and the Gaussian distribution (left vs. right plots in Fig. 4 and in Figs.C.1 and C.2).We suppose σ ρ 2 decreases when the w j (T j ) distribution approaches a Gaussian form.

MF output
MF output is calculated with Eq. ( 4), where y(τ) is a random vector that determines whether a searching signal c(t) is present in g(t).Signal detection is indicated by a significantly elevated value of y(τ) compared to its mean ŷ or a maximum element within the set θ(y).
The MF output mean is represented by the term ŷ.In Fig. 7 (left), there is considerable peak (global maximum) at the time point δ (08 : 00, Mar15) where y/ŷ > 80.This peak is an explicit indication that the signal c(t) is present in g(t) because the mean of y/ŷ is 0 and the majority of y/ŷ values are between − 20 and + 20 (Fig. 7, right).
The calculations performed within the current study across distinct farms suggest that c(t) length affects signal detection magnitude θ max = |θ(y) |.Qualitatively, θ max increases when l c(t) increases.This dependency is due to the SNR (Eq.( 2)), which (within the developed MF approach) yields higher values for a lengthier signal.Stated differently, short signals (in our study, typically l c(t) < 60 min) are too noisy to allow clear detection.Poor detection in short signals may be due to the lack of a unique signal, that is, there being more than one signal of a specific waveform.The dependency of the normalized magnitude of MF output on searching signal length can be seen in Fig. 8 (left), where it is expressed as a well-established linear relationship.
In addition to an analysis of the role of signal length on detection magnitudes, we consider the sampling time Δ T .Unlike l c(t) , we find that Δ T does not affect θ max substantially, though θ max decreases slightly when Δ T increases (Fig. 8, right), likely owing to the way the signal c(t) preserves its uniqueness (a specific waveform distinct from other signals).The condition of uniqueness of c(t) is satisfied until some threshold value Δ T lim after which the signal c(t) loses information due to infrequent sampling.Consequently, the probability that there are two signals c i (t) and c j (t) with similar waveforms increases.Although it is rather difficult to give an exact estimate of Δ T lim within the scope of the current study, our observations suggest that Δ T lim < 3 sec.

Computational performance
Overall, the computational efficiency and performance of our MF approach should be considered in the context of a specific implementation where particular algorithms, compilers, and targeting hardware are taken into account.However, because optimization of implementation is beyond the scope of this study, we focus on general performance-related characteristics that should be preserved across implementations of the proposed approach.
Our analysis of σ ρ 2 suggests that there are two major factors that are important for determining the accuracy of the MF approach: sampling time and searching signal length.These controlled factors that can be regulated or predicted prior to the performance of any timing skew calculations.The mode of accuracy adjustment, determined by applying changes to Δ T and l c(t) , depends fundamentally on the computational capacity of the data server.While decreases of Δ T and l c(t) reduce σ ρ 2 , and thus improve accuracy (Eq.( 7)), the quantity of computations and the computation time increase proportionally for l c(t) and increase exponentially for Δ T .These relationships can be seen in Fig. 9, where the results of performance tests for a Matlab based implementation of the MF approach provided data for an arbitrarily chosen farm.
The performance test results suggest firstly that it is best to keep Δ T ≤ 5 sec because changes in t * elp beyond this threshold are insignificant, unlike changes in σ ρ 2 that increase quadratically.Secondly, the performance test results suggest that, if the Δ T ≤ 5 sec condition is adhered to, then, considering the results in section 3.4, the secure range for the signal length is 60 ≤ l c(t) ≤ 250 min.

Implementation notes
It was observed that bovine gas emission concentrations show a complex pattern determined by multiple factors.While some of these factors are microclimate related, others are determined by an animal's physiology.In the context of the sniffer approach, unpredicted movements of a cow's head during milking create an additional disturbance to GES recordings and thus result in additional noise.The advantage of the presently developed approach is that noise, regardless of its source, is a part of the model.In other words, such noise is accounted for in the data model used in this study.Consequently, data synchronization does not require that noise be source-partitioned.This advantage is particularly important for post-processing of synchronized data wherein true emissions should be separated from background noise, though such applications are beyond the scope of our study.
Practical implementation of the approach presented here requires consideration of the timing skew variation issue (section 3.3) to avoid compromising data synchronization quality.Errors related to timing skewness variation can be well handled via two steps without altering data alignment.Firstly, records can be split into shorter segments that enable each signal to be aligned separately, which reduces time drift effects and prevents error accumulation.Secondly, the first and last few seconds of the records from each milking event should be removed in data post-processing.
Finally, we believe that the approach presented here is sufficiently general to be able to handle a wider range of time-series synchronization problems beyond sniffer GES data, particularly applications requiring a high level of automation in data pipelines.

Conclusion
This study addressed the problem of synchronization of sniffer GES data from dairy cattle.A novel approach for handling the synchronization problem was developed, verified, and analyzed.This examined approach is based on the well-established and robust MF theory, which does not require additional theoretical development for the data synchronization problem.The specific application of the theory to GES data was verified in this study with real-world GES data records from two commercial dairy farms in Denmark.The accuracy and computational performance in this application were analyzed and discussed.Based on the presently reported results, we conclude: (i) the presently developed and examined MF approach is robust and applicable to the problem of dairy cattle GES data synchronization and amenable to automated fast data processing; (ii) the approach is general enough to be used across farms and herds and may further be applied to other systems with continuous data recording, such as concentrate feed stations, activity loggers, and weigh scales.Noise is calculated as w(t) = c(t) − g(t).Pre-and post-transformed (Eq.( 5)) noise distributions do not (left) and do (right) follow a Gaussian approximation, shown by the red curves (Eq.( 6)).The distributions correspond to the full data from which the plots in Fig. 3 5)) noise distributions do not (left) and do (right) follow a Gaussian approximation, shown by the red curves (Eq.( 6)).The distributions correspond to the full data from which the plots in Fig. 3 were derived.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 1 .
Fig. 1.Flow diagram of experimental automated data acquisition system.a) Milking unit where all measurements occur.b) Data acquisition and processing.AMS, automated milking system; GES, gas emission system; G.E., gas emission.Arrows indicate directions of data flows.

Fig. 2 .
Fig. 2. Data representation.The plots depict simulated data over a 10-min period.The upper plots show untreated data to be subjected to scaling.Post-scaling data are shown in the bottom left plot, wherein the red trace indicates the transformed gate signal, g(t), and the blue trace indicate the transformed gas emission signal, c(t).The bottom right plot shows the expected noise distribution where the noise was generated according to the model in Eq. (1); the green curve is a Gaussian approximation of the generated w(t).(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Proposition 1 .
Let the characteristic data D = {C, G} be subject to scaling such as for the gate states g(t) ∈ G the range is { − 1, 1}, where − 1 indicates an open gate, 1 indicate a closed gate; and for the gas concentrations c(t) ∈ C the range is [ − 1, 1].Then, any two associated datasets (represented by g(t) and c(t) signals) are related as.
Fig. 5.The two distinct farms (A and B) have different mean timing skew values, ρ , and are associated with distinct skew-influencing factors.In addition to each of the analyzed farms and instruments possessing a distinct ρ , the farms, farm AMS records, and farm GES records were chosen arbitrary.

Fig. 5 .
Fig. 5.Estimated timing skews.a) Data sampled at farm A. b) Data sampled at farm B. Each subplot in the top-to-bottom series within a and b corresponds to a particular GES recording.These are time-distinct blocks of records, which are normally saved as separate files on the associated data server.Blue dots are estimated values of the timing skew ρ; the red line in each subplot represents a linear fit of ρ and the dashed blue lines represent the associated 95% prediction interval; t indicates time passed from the first record within the specific data block.The standard deviations σ ρ of ρ for the respective GES recording data blocks (from top to bottom) are 0.35, 0.25, 0.15, 0.13, and 0.09 min for farm A while those for farm B are 0.15, 0.24, 0.17, 0.07 min.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 6 .
Fig. 6.Analysis of estimated timing skew variation.The left plot represents the relationship between the standard deviation of the timing skew σ ρ and the data record time interval Δ T The right plot shows the relation between σ ρ and the signal length l c(t) used to estimate ρ.

Fig. 7 .
Fig. 7. Example of MF output.The left plot shows a typical normalized output of MF calculated according to Eq. (4).The right plot shows the distribution of normalized MF-output values from the left plot with a Gaussian approximation of the data (red curve).This is a distribution of no-detection values of y(τ).Here, ŷ is the mean of y; f is the frequency.The result represented in both plots is a representative example of MF output during signal filtering.The data used for this example were chosen arbitrarily.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 8 .
Fig. 8. Magnitude of MF output.The left plot shows the dependency of normalized MF output on the length l c(t) of the searching signal c(t); σ y is the standard deviation of MF output.The right plot depicts the dependency of the normalized MF output on the time interval Δ T of data records.

Fig. 9 .
Fig. 9. Computational performance.Matlab-based performance test results obtained for implementation of the MF approach with data from a single arbitrarily chosen farm.The subplots in the top row and the first three subplots in the bottom row show normalized elapsed times t * elp (of timing skew estimation) calculated for different Δ T , l c(t) , and lg(t) lc(t) ratios.The last two parameters are expressed here as a combined measure l 2 c(t) lg(t) ; t * elp = telp•lg(t) lc(t) , where t elp is the elapsed time, Δ T is the sampling interval in sec, and l c(t) and l g(t) are the lengths of the signals c(t) and g(t), respectively.The last subplot in the bottom row depicts the relationship between normalized elapsed time and the sampling interval; here, the blue circles are t * elp calculated for the sampled range of l 2 c(t) lg(t) and the red curve is an exponential fit; these are the results that are depicted in the other subplots of this figure.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) σ

Fig. C2 .
Fig. C2.Noise signal distribution, case 2. The plots represent the distribution of the noise w(t) of an empirical working dataset distinct from that shown in Fig. C.1.Noise is calculated as w(t) = c(t) − g(t).Pre-and post-transformed (Eq.(5)) noise distributions do not (left) and do (right) follow a Gaussian approximation, shown by the red curves (Eq.(6)).The distributions correspond to the full data from which the plots in Fig.3were derived.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Fig. C2.Noise signal distribution, case 2. The plots represent the distribution of the noise w(t) of an empirical working dataset distinct from that shown in Fig. C.1.Noise is calculated as w(t) = c(t) − g(t).Pre-and post-transformed (Eq.(5)) noise distributions do not (left) and do (right) follow a Gaussian approximation, shown by the red curves (Eq.(6)).The distributions correspond to the full data from which the plots in Fig.3were derived.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)