The indicators extracted were assigned to four principles, inspired by WQ® classification: comfort, behavior, feeding and health. The results are presented separately for cattle (including both dairy and beef cattle), and for small ruminants (sheep and goats) and separate tables were compiled for each criterion. For cattle, the production type was also specified (dairy or beef), while for small ruminants only the species (sheep or goats) was described, considering that small ruminants at pasture are mostly viewed as dual purpose animals, and therefore it was difficult to assign them to a specific production type.
3.1. Animal-Based Measures for Cattle on Extensive/Pasture-Based Systems
Table 1 displays seven ABMs concerning the comfort principle, reported in 25 papers deriving from studies carried out in all continents, and the evaluations were mainly carried out on dairy cows and by direct assessment. Most of authors evaluated cleanliness as yes–no binary rating, while only two [
26,
27] preferred to consider the animal score on a four- or five-point rating scale from clean to dirty. Hernandez et al. [
31] were the only authors evaluating animals at the milking parlor during milking all the others did it at pasture. Animal position on pasture (lying, resting, sitting or standing) was frequently assessed. Direct assessments mainly considered the time spent resting on the ground [
39] or standing still [
40], while authors who used sensors such as pedometers, mostly monitored the number of lying bouts and their duration [
36]. The use of sensors may be related to the difficulty of individually measuring these indicators. Time spent lying can be an indicator of welfare issues, for example lying was identified by Thompson et al. [
33] as an effective indicator of lameness in grazing systems, but the effect differs depending on both the severity of lameness and the type of lying surface. On the other hand, several authors [
32,
36] found a positive influence of grazing and comfortable surfaces on lying movements and duration. Standing [
36] and standing still with the head raised [
45,
46] were identified as a potential warning signal for inadequate feed allocation. Concerning rising movement [
44], the indicator is of limited importance on pasture condition as it aims at assessing the adequacy of available farm structures, even if longer rising times may be linked to feet injuries and locomotion issues similar to what was found for lying movements and duration. However, unless recorded with sensors, such indicators are extremely time consuming to collect and may be prone to observers’ bias, reducing the feasibility of such indicators for welfare assessment on the pasture. Concerning sitting behavior [
41], it seems a rare finding on pasture and may describe a prolonged response to poor availability of on-farm resources. It is thus not considered a relevant ABM, at least for year-long grazing animals.
The use of shade or shelter was assessed as the passage of the animals to and from the water source or sun protection. Despite the great importance of shade at pasture for ensuring thermal comfort, few authors [
24,
39,
48,
49] considered this indicator, probably because the number of trees is usually considered as a resource-based and not as an animal-based measure. Nonetheless, when access to shade was provided, cows spent less time at the water trough and laying down, and chose to perform behavioral activities, including grazing, in the shade emphasizing the benefits of silvo-pastoral systems for animal welfare.
Table 2 summarizes the ABMs found in 21 papers related to the behavior principle to be collected in extensive conditions. From these papers, we identified 11 ABMs. Behavior principle is, indeed, characterized by a wide diversity of application, including daily activities, social interactions, human–animal relationships, and the assessment of emotional state. Most ABMs (68.85%) are recorded by direct assessment, followed by video-recording (22.95%, that also include vocalizations collected by sound recording), and sensors (in only 8.20% of cases). The use of sensors was only limited to those papers that investigated activities such as walking (e.g., [
34,
37,
47]) and consists of data loggers attached to the hind legs or neck of the animals. Pedometers are not expensive and are already commonly used in many farms to record heat or to allow animals to be milked by automatic systems. Their use in extensive husbandry systems can provide information on the spatial behavior of cattle. However, more expensive sensors may be of use to investigate behaviors other than walking: spatial proximity loggers collect data on associations between cows and allow us to gather information on social networks and affiliative behaviors [
53]. Cost may be a limit on the use of these sensors, but they can provide detailed information on the relationships and changes in behavior of the herd during the year.
Most behaviors are collected by direct assessment. Direct assessment can be adopted for behavioral observations and for indicators that require a test performed by humans, as in the case of the evaluation of human–animal relationships using an avoidance distance test [
29,
30,
50]. These authors did not report any feasibility constraint; however, according to Hernandez et al. [
31], approaching animals in extensive systems may be difficult and sometimes not very informative as cattle bred in large groups in extensive systems may avoid the human touch, even if not necessarily afraid of it. The feasibility of direct assessment for behavioral observations is often low, especially in extensive/pasture-based systems: observations are usually time consuming (e.g., [
41] up to 24 h/day), many assessors need to be trained (e.g., [
42] trained six observers), and, furthermore, information provided about inter-observer reliability is not always sufficient ([
32] tested the inter-observer reliability of three trained assessors before applying the welfare protocol). The method most frequently used to record behaviors is the instantaneous and scan sampling method [
38,
41,
42].
Direct assessment was also used to assess animal emotions and the only indicator identified to this aim is Qualitative Behavior Assessment (QBA). Some authors [
30,
32] reported more positive emotional states of cattle at pasture compared to animals kept indoors. Although QBA received some criticisms, mainly due to possible bias in judgment [
54] or subjectivity [
31], it is important to notice that, when performing direct observations, observers are always unavoidably aware of the type of husbandry systems they are assessing, and this may concern both quantitative and qualitative indicators [
54], thus affecting their perception. However, a study conducted on dairy goats kept in indoor and pasture-based systems reported that if assessors receive an effective QBA training, this can help in overcoming the influence of an environment perceived as more “welfare friendly” [
55]. The feasibility of QBA in extensive systems is high as observations last at most 20 minutes, followed by few minutes where the assessor scores the descriptors. Some situations may require the use of binoculars in order to observe the animals at a distance and avoid disturbing their activities. Video-recording for behavioral observations were mainly used to record social behaviors as cohesive and agonistic behaviors. The time of recording, when provided, is relatively limited ([
31] recorded the animals at pasture for only two hours) and sometimes influenced by factors, e.g., weather, temperature, routine changes, and animal behavior. Although the use of video-recording may increase the feasibility of an indicator, further research is needed in order to gather information on the right time for recording, including the best moment of the day to register a specific behavior and the sufficient length of the recording.
Some papers included indicators already tested for indoor husbandry systems and the authors stated that they selected the most feasible indicators for extensive systems. However, valid and feasible indicators for indoor systems need to be tested again and sometimes adapted to be used in extensive systems. In most cases, insufficient information is provided about selection criteria or other useful information that can be extrapolated to suggest the use of a specific indicator for pasture-based systems.
Table 3 shows a total of six ABMs concerning the feeding principle, and 26 scientific papers investigating a link between these measures and animal welfare. The measurements were mainly carried out by direct assessment, while in only a few cases were sensors used. Sixty-nine per cent of the measures concerned dairy cows and the remaining 31% concerned beef cows. Latin America is the geographic area where most of the experiments were carried out.
A measure widely used to evaluate the nutritional status of animals, in particular dairy cows, refers to the amount of stored body fat. The body condition score (BCS) method [
61] allows us to estimate the general body fat by means of a visual (or, less frequently, tactile) evaluation of the quantity of subcutaneous fat in certain body regions of the animal (essentially the tail head cavity, pin bones, rump, short ribs, backbone). In contrast to the measure of body weight, BCS is not affected by body size, by intestinal filling or by pregnancy status. The lowest value of the BCS indicates a very lean condition (linked to a serious underfeeding and/or a disease state), while the highest value indicates a very fat condition (linked to an overfeeding and consequent risk of metabolic diseases). Monitoring the BCS of grazing dairy cows is extremely useful and allows us to evaluate the energy balance in the various phases of the lactation cycle. Long periods on pasture with low energy intake cause an energy deficiency responsible for alterations in milk composition, milk yield and lactation persistency [
62], and may be also related to reproductive performance [
63]. During the grazing period, it is not always easy to fulfill dairy cows’ nutritional requirements only through grazing. The BCS therefore allows the breeder to understand if there is a need for food supplements in order to avoid hunger and nutritional imbalances.
In the selected papers, several types of scores were chosen to assess the BCS as a welfare indicator of grazing animals. For dairy cows, in experiments conducted in Italy and Mexico, a score of 0–2 was used, in line with the WQ assessment protocol for cattle [
28,
29,
31,
44,
56], while in other countries and situations a score of 1–5 [
27,
33,
35,
57,
58] or 1–10 [
59] was used. Other authors [
30] used a score of 1–9 for grazing beef cows. The review did not identify experiments that used 3D cameras to monitor the BCS of cattle in extensive situations, which may represent a promising and time-saving assessment option in the future [
64], considering the importance of body condition assessment on pasture.
In extensive systems, particular attention must be paid to water provision. Authors evaluated water utilization by using different methods: the time spent drinking [
41,
45,
48], the percentage and number of animals moving to water sources [
31,
42], rather than the access (free or limited) to the source [
57]. Some authors analyzed the consumption of water, through the presence of signs of dehydration on the animal [
30] or by indicating the urinating actions [
45]. Water provision and cow’s welfare are closely connected, and climate change might further compromise animal well-being especially during the second phase of the grass vegetative stage or in geographical areas affected by droughts. Lardner et al. [
65] and Coimbra et al. [
66] underline the link between drinking behavior and body size, dry matter intake, production stage, air and water temperature, quality or type of water access. Thus, if not contextualized, the estimated daily average intake per animal at the troughs provides limited information on water requirement. On the other hand, a sign of dehydration seems a rather demanding measure to be taken in pasture-based and extensive systems, limiting the potential role of ABMs in the assessment of adequate water provision.
The evaluation of the feeding behavior of grazing cattle, in place of or in addition to the BCS, allows us to respond adequately to the feed requirements in terms of animal welfare. The availability of data regarding the feeding behavior of grazing cows allows the breeder to identify specific individual problems and act to restore the best conditions for animal welfare. In the past, these measurements were mainly carried out using visual methods (e.g., Tucker et al. [
39] with instantaneous scan sampling) and still today many authors, such as those identified in this review, adopt these rather than analytical methods which are more time consuming (e.g., Bovolenta and colleagues [
25,
67], estimating herbage intake using the n-alkane method). Grazing and rumination is positively related to feeding time and dry matter intake. Following periods of high feed intake, cows spend more time ruminating, usually after a 4-h lag. In recent years, the tools of "precision livestock farming" [
68], adopted and developed indoors in order to optimize the use of resources and improve the productive and reproductive performance of animals, have also been proposed for the pasture environment [
69], and could represent a radical change in terms of the feasibility and effectiveness of animal welfare monitoring in extensive systems. Some selected papers [
26,
46,
47,
48,
60] have proposed electronic equipment (in particular behavior-monitoring collars, GPS devices, pedometers) for the continuous monitoring of feeding and locomotion behavior, which has proven to be efficient and reliable.
Table 4 displays 12 animal-based measures related to the health principle of large ruminants on pasture. Most indicators were measured by assessors through the direct observation of dairy cattle. While some measures were well-established indicators of health in indoor intensive systems and followed the WQ assessment methodology [
74], others were specifically developed for grazing animals. For example, hoof and leg injuries, as well as integument and body alterations, represent major welfare issues for housed cattle and are among the most important reasons for culling. In particular, an open shoulder is an indicator of reduced tonicity, mostly found in pluriparous cows housed in permanent tie-stall systems and it may be an indicator of limited importance in year-round pasture-based systems. The pasture is also considered to be a protective factor against claw disorders and lameness [
12,
75] according to several studies that compared the occurrence of such conditions between indoor and pasture-based systems [
28,
30]. Nonetheless, claw disorders and lameness do also represent a significant welfare issue in pasture-based systems, and thus should be constantly monitored. Despite no studies identified through this systematic review reporting the use of sensors, smart technologies could also play a role in the early detection of claw and locomotion disorders in grazing animals. Natural environments could also represent a risk for health and pose challenges for grazing animals. For example, diet composition cannot always be controlled in extensive systems and improper forage intake may result in gastrointestinal disorders. Signs of diarrhea, softer feces and bloated rumen were the indicators of gastrointestinal disorders assessed in dairy [
44] and beef [
30] cattle. Pasture access may also increase the risk of both endo- and ectoparasite infestation. While signs of endoparasite infestation may be assessed through body condition measurement or the observation of gastrointestinal disorders, the presence of ectoparasites was assessed through direct observation of parasites on hides or through the effects of their infestation such as skin lesions or ocular discharges [
29,
30]. Exposure to climate variability and extreme weather (e.g., heat waves) are a further challenge for grazing animals. Assessment of thermal stress was performed by observing respiration patterns or through temperature measurement. Unless recorded with laser thermometers as described by Morales and colleagues [
30], the measurement of body temperature appeared not suitable for beef cattle systems in which chances for animal restrain are little compared to dairy systems. In this regard, the direct observation of respiration patterns and rates may represent a better choice for all systems and production types, until new technologies will allow the remote monitoring and recording of body temperature, effectively combining the early detection of heat imbalances and disease occurrence.
3.2. Animal-Based Measures for Small Ruminants on Extensive/Pasture-Based Systems
For small ruminants, 20 ABMs were extracted from 14 studies carried out in Australia, the UK and, to a lesser extent, in Italy, France, and Argentina (
Table 5,
Table 6,
Table 7 and
Table 8). Most of the studies (86%) were carried out on sheep, only one focused exclusively on goats [
55], and one paper dealt with both species [
84]. This is probably due to the higher economic importance of sheep and to their management system, which is almost exclusively pasture-based, whereas goats are often raised in intensive or semi-intensive systems, especially in more developed countries. In most cases (71% of the articles), all the indicators were collected by direct assessment, whereas sensors were used for data collection in 21% of the studies, and in one study [
80], both approaches were adopted. The use of sensors based on omnidirectional accelerometers [
80,
81,
83] was helpful for the assessment of activities related to comfort, behavior and feeding principles, and the integration with GPS devices [
81] provided additional interesting and detailed results on spatial behavior and movements (that could be associated with feeding behavior), even in a very extensive context, without disturbing the animals. This is obviously much less time-consuming than carrying out direct or video-recorded observations, whose feasibility on farms can be considered quite low, due to the long observation time required to detect irregularities in behavioral rhythm that may be indicative of health and welfare issues. However, McLennan et al. [
80] suggest that the level of detail provided by accelerometer devices needs to be further improved, as in their study, high levels of accuracy could only be obtained for gross behavior categories (low vs. medium/high activity level).
It also has to be noticed that both [
80,
81] present interesting methodological approaches for the collection of behavioral data using sensors, and mention the importance of monitoring behavior as a good indicator of animal welfare, but they do not provide clear indications as to how to interpret the results. Therefore, the validity of behaviors such as walking, grazing or searching for food as indicators of animal welfare has not been discussed in these studies. Within the behavior principle, the results of [
83] on the assessment of circadian rhythms of general activity using the Degree of Functional Coupling (DFC, which expresses the percentage of the measured behavior that is harmonically synchronized with environmental rhythms, over a 24-h period) provide reliable information on sheep welfare: high DFCs indicate high synchronization, which is considered a positive indicator of animal welfare [
89].
Another interesting measure related to the behavior principle was used by Munoz et al. [
82] to investigate the quality of human–animal relationships: the ewe’s response (flight distance and behavior reaction) to an unfamiliar human was evaluated in a small random sample of sheep in a holding pen. The execution of the test in the pen can be feasible; however, its validity and reliability under this specific situation have not been investigated.
As to the feeding principle, another promising application of sensors is described by the study of Gonzalez-Garcia et al. [
88], who used a remote weighing prototype based on the walk-over-weighing concept, combined with radio-frequency identification, that allowed them to record sheep body weight in extensive conditions, with no need to restrain the animals. The direct assessment of body weight was carried out by McGregor et al. [
84]: these authors could not confirm the importance of live weight as a welfare indicator, but highlighted the importance of BCS, which was significantly correlated with mortality rate in Angora goats. Although not described in detail in this paper, both body weight and BCS probably implied restraining the individual animals, and were therefore time-consuming. The same time constraints apply to body condition scoring carried out by other authors [
76,
77,
82,
85,
86,
87].
Furthermore, for other ABMs, such as cleanliness [
76,
77,
79,
82], or health indicators (e.g., integument alterations, fleece conditions, or foot lesions [
76,
77,
79,
82]), the evaluation was carried out by assessors, and the animals had to be restrained in small holding pens to allow individual examination; for the evaluation of mastitis, restraining the animals in a crate was also required [
82]. These operations were therefore time-consuming and probably induced some level of stress in animals that were not used to being handled due their extensive living conditions. In the case of Munoz et al. [
79], it is worth noticing that the selection of the individual animals to be inspected was grounded on an appropriate sampling scheme based on a power calculation assuming a 50% prevalence of the trait under observation. The selection of appropriate sampling schemes is very important, especially when dealing with large herds (as sheep often are) and when animals have to be herded for the inspection, which is a common situation in extensive farming systems. Angell et al. [
76,
77] also included the evaluation of lameness, that was scored by a trained assessor in a holding pen, while Munoz et al. [
79,
82] used a similar locomotion score but evaluated it when the sheep were released from the holding pen.
Phythian et al. [
78] used a different approach for lameness evaluation in sheep, that did not require to herd the animals: a group-level assessment was performed by an assessor who briefly observed the flock at a distance for five minutes, and then counted the number of lame animals based on the observation of behavioral cues (e.g., nodding of head, grazing on knees, uneven gait, etc.), rather than assigning a lameness score as in Angell et al. [
76,
77]. Phythian et al. [
78] adopted the same practical approach for recording other ABMs: coughing, breech soiling, abdominal soiling, pruritis, wool loss, and “dull physical demeanour”. Additionally, these authors applied a Qualitative Behavior Assessment, which only required an average time of 30 min/farm for flocks of up to 120 sheep, observed from a distance with no need to enter the field. Interestingly, some QBA descriptors were correlated with other welfare measures (e.g., the proportion of lame sheep and of sheep with “dull physical demeanour” was correlated with descriptors like distressed, dull and dejected), providing evidence of the concurrent validity of these measures. QBA was also applied on goats, using a similar feasible procedure, and highlighted interesting differences between the emotional state of goats on pasture vs. indoor housing, with a good inter-observer reliability [
55].
Additional information about the reliability of ABMs for small ruminant welfare assessment is provided by Munoz et al. [
79], who found poor agreement for rumen fill, foot-wall integrity, and hoof overgrowth, and considered fleece cleanliness not be meaningful for extensive systems. Based on these considerations, the authors suggest the use of body condition score, fleece condition (based on lumpiness or signs of ectoparasites), skin lesions, tail length, dag score and lameness for on-farm welfare assessments of extensive managed sheep, as all these measures are also feasible due to the fact that they do not require any specialized equipment. Tail length was listed as an ABM [
79,
82] despite the fact that it may be considered as a risk factor for several conditions such as rectal prolapse, flystrike and bacterial arthritis. Furthermore, Munoz et al. [
79] consider that most of these measures (e.g., thin body condition, lameness and dag score) can be visually recorded from a distance viewing sheep in their paddock, rather than in holding pens, with minimal interference with farm work. This suggestion is supported by the successful collection of similar measures by Phythian et al. [
78], as reported above. Furthermore, Munoz et al. [
79] suggest that the lactation period may not be the best time to carry out the evaluation due to the presence of lambs.