1. Introduction
What is peer learning? A definition of peer assisted learning (PAL) which is still widely quoted is: “PAL is the acquisition of knowledge and skill through active helping and supporting among status equals or matched companions. PAL is people from similar social groupings, who are not professional teachers, helping each other to learn and by so doing, learning themselves” [
1] (p. 1). PAL includes peer tutoring, peer modelling, peer education, peer counselling, peer monitoring, and peer assessment, both reciprocal and non-reciprocal, in schools and institutions of higher education, as well as in the workplace. This definition can also clearly include all forms of cooperative learning, in which students work together in small groups, sometimes with the specification of different roles. However, this special issue includes many other contributors discussing cooperative learning, so in this paper, the focus is largely on peer tutoring and (particularly) peer assessment.
Peer tutoring has been demonstrated to be effective over many years, with a great number of reviews, systematic analyses, and meta-analyses supporting this, even before 1996 [
2] and continuing through 2019 [
3]. There are recent special editions regarding 51 studies of peer tutoring in music education [
4], a systematic review of 16 reviews and meta-analyses on peer tutoring with students with behavioural problems [
5] and a systematic review purely on online PAL [
6].
Parallel, but somewhat later, interest in peer assessment has also spawned a large number of reviews and meta-analyses, especially in recent years (e.g., [
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19]). However, none of these have specifically explored the differences between online and offline learning.
Additionally, there have been many recent reviews of online and blended learning, particularly discussing ‘emergency remote learning’, such as that which occurred during the pandemic, which was often completely online. As we emerge from the pandemic, the nature of future pedagogy is being more widely discussed. There are many reviews asserting that online learning is at least as effective as face-to-face learning (e.g., [
20]). There are also systematic analyses revealing that blended (a mixture of face-to-face and online) is even more effective than purely online learning (e.g., [
21]).
While there is a great number of papers on online PAL, unfortunately, there are almost no reviews which directly compare online learning, blended learning, and face-to-face learning, at the same time and in the same context as peer tutoring/peer assessment. Even simultaneously comparing online with face-to-face learning proves difficult enough. This paper aims to address this gap by systematically reviewing papers that directly compare online learning with face-to-face learning in the context of peer tutoring or peer assessment, and is informed by and updates the only previous review in this area. During the preparation of this paper, there were initially no reviews, but one meta-analysis appeared by virtue of diligent searching. However, systematic analyses and meta-analyses are very much determined by the search terms used and the databases targeted for this purpose. Changes in either can substantially affect the outcomes, despite such searches typically encompassing hundreds of potential papers.
The present paper thus updates the paper by Jongsma et al. in 2022 titled, “Online versus Offline Peer Feedback in Higher Education: A Meta-Analysis” [
22]. Peer feedback is a common element in many PAL projects. The search terms used in this paper were: ‘peer assessment’ OR ‘peer feedback’ OR ‘peer review’ OR ‘peer evaluation’ OR ‘peer rating’ OR ‘peer scoring’ OR ‘peer grading’ AND ‘learning outcome’ OR ‘learning achievement’ OR ‘achievement’ OR ‘outcome’ OR ‘learning performance’ OR ‘academic achievement’ OR ‘academic performance’. ‘Peer tutoring’ and other forms of PAL were not included, and the inclusion of outcome/achievement keywords seems likely to have limited the search.
Jongsma borrowed five papers from another author, then selected only five papers of her own, so lack of coverage is a possibility. From a parallel search (discussed below), I found thirteen other papers up to 2020 (when Jongsma et al.’s search ended) which seemed relevant. Additionally, I go beyond 2020 to consider four relevant papers which appeared in 2021–2023. Then, I offer some discussion of socio-emotional issues stemming from PAL, since such supplementary gains paralleling cognitive gains may exist, and socio-emotional factors may partially determine the longevity of any cognitive effects (longer-term follow-up is also largely absent from the literature).
2. Methodology
This paper studies formal learning in higher education, including peer interaction for all students to enhance learning in the pursuit of higher academic achievement. As peer interaction often occurs outside the classroom, it can be somewhat difficult to monitor. This paper is not about informal learning, which typically is engaged in more by some students than others, and not at all by some, and is never monitored or directly assessed; thus, it is excluded here.
In this systematic review, two research questions were identified:
Which research studies on peer tutoring, assessment, and feedback directly compare the effectiveness of online and offline teaching and learning in the same study?
Is there evidence of effectiveness, and if so, what proportion of this research is solely dependent on student and teacher perceptions, and how much of it uses other indicators?
What search terms were employed in this attempt to parallel Jongsma’s systematic analysis? First, I tried: “peer learning” OR “peer assessment” OR “peer feedback” OR “peer review” OR “peer evaluation” OR “peer rating” OR “peer scoring” OR “peer grading” AND online AND offline OR face-to-face OR “face to face”. There were no date restrictions. Five databases were searched: Web of Science, Scopus, JSTOR, ERIC, and Google Scholar (these were quite different from Jongsma, whose meta-analysis came up in none of these searches). These keywords did not yield many hits. Seeking to obtain more hits, I tried these more general terms: “peer learning” AND “peer assessment” AND online versus offline. These broader and fewer keywords yielded more hits. Given that fewer and broader keywords yielded more hits, I then tried: “peer learning” AND offline versus online AND review OR analysis. Then I tried: “peer learning” OR “peer tutoring” OR “peer assessment”, which of course, generated many hits and required considerable inspection of titles and abstracts.
The inclusion criteria were that the paper needed to: (1) directly compare online and offline learning, simultaneously and with the same course content, (2) include some kind of data (but not necessarily be experimental), and (3) relate to higher education (college or university). The four search strategies in five databases yielded a total of 724 hits (excluding replications). Google Scholar yielded the highest number and JSTOR the lowest. Obviously, the more generic search terms yielded many more hits, which required more time to inspect, but most of these putative hits proved irrelevant. Eventually, I selected only 13 papers from 2020 and before, and four papers from 2020–2023. Not all of these were experimental, and their quality varied significantly.
Papers were reported in relation to whether they were experimental or not. This might be taken as implying that experimental papers are always of superior research design than non-experimental papers, but that is not the implication here. Rather, the results here are so divided to enable ease of relating to the purely experimental results of Jongsma et al. (2022) [
22]. Critique of the quality of each paper will be found in the results.
4. Results: The Jongsma Meta-Analysis of Online/Offline Peer Feedback
Meta-analysis is sometimes held up as the “gold standard” of evidence synthesis, but it does possess many problems, not least of which is the exclusion of an enormous proportion of the papers focusing on a phenomenon in the cause of research design rigour, which sometimes leaves a ridiculously small number of papers as the focus of the analysis. Of the various critical reports, some authors, for example [
36] (p. 2), comment that: “Most meta-analyses include too few randomised participants to obtain sufficient statistical power and allow reliable assessment of even large anticipated intervention effects. The credibility of statistically significant meta-analyses with too few participants is poor, and intervention effects are often spuriously overestimated (type I errors) or spuriously underestimated (type II errors). Meta-analyses have many false positive and false negative results.”
However, Jongsma et al. (2022) [
22] are to be congratulated on publishing a meta-analysis which conforms to many of the official requirements for such an undertaking. Here, we will critique their work, but that is not to denigrate the enormous contribution they have made. These authors point out that online peer assessment offers the possibility of anonymity, can save teacher time, and more easily allows the teacher to monitor the peer feedback comments (assuming these are written). Disadvantages include the fact that dialogue can be difficult in an asynchronous context. In feedback dialogue, students have the opportunity to receive feedback on the feedback they have given, clarifying or negotiating the meaning of the received feedback.
Jongsma et al. (2022) [
22] only studied university students. They drew five studies from Zheng et al. (2020) [
16], and their own search resulted in the addition of five more, so the total number of studies was small. Their search terms have been critiqued earlier. They only searched two databases (following [
16]), a small number given the variability between databases. Meta-analyses are often critiqued for combining studies that are fundamentally unalike, and this applied here as well: the sample group size ranged from
n = 11 to
n = 65, the subject domain was mainly the English language, but included graphic design and other studies which merely mentioned “teaching”; the assessed task was mainly a form of writing, but included graphic design elements and a reading test; in only one study was the peer assessment anonymous (despite the assertions noted above); training was mostly provided (although its intensity was unclear, and in two studies, even the access to training was unclear); the technology for the online element varied greatly, from online blogs, to Google Docs, to videos, to “learning environments”; and assessments mostly comprised grades for performance, most often in writing, rather than more elaborated feedback. The duration of the studies varied from less than 5 weeks, to 5–10 weeks, and on up to more than 10 weeks.
On the positive side, the studies were assessed for risk of bias using the Cochrane Risk of Bias tool for randomized controlled trials (which few of the studies were). Only two of the ten studies found no effect, while the other eight all had at least one finding concluding that online learning was better. No study found that offline learning was better. The meta-analysis provided an overall effect size (Hedges) of g = 0.33, p < 0.05 for online peer feedback, which is somewhere between small and moderate.
6. Results: Socio-Emotional Outcomes
Five out of the ten studies included in the Jongsma et al. (2022) meta-analysis [
22] also reported student perceptions of peer feedback, but one was very weak. Researchers surveyed and interviewed experimental students [
41]. Results showed students felt positive about blogs, but were not sure about their confidence in giving and receiving peer feedback. However, they did not feel embarrassed when providing feedback. The use of blogs was easy and time-independent and contributed to the feeling of being a ‘real writer’. In another study, students were surveyed about online peer feedback [
42]. They said online peer feedback reduced their writing anxiety and gave them more time to think about how to comment on their peers’ writing.
Students using Adaptive Comparative Judgment (a form of assessment using comparisons instead of criterion scoring) enjoyed the peer feedback process more, thought the process was easier to follow, and found the peer feedback more helpful than students in the control group using paper-based peer feedback [
43]. In 2019, other researchers [
44] concluded that students felt giving peer feedback was more helpful than receiving peer feedback, but without any difference between online and offline feedback. Online peer feedback was appreciated because its asynchronous nature gave more time before providing peers with feedback. However, these asynchronous discussions made direct exchanges between students difficult.
Turning to papers in addition to those from the Jongsma et al. (2022) meta-analysis [
22] which contained information about socio-emotional issues, in 2004, other researchers [
24] used an asynchronous anonymous system and noted that more students were more ready to ask questions and communicate in that context. However, some students gave only positive comments without offering solutions, and this was felt to be unhelpful by about half of the participants. Students also preferred a lengthy grading scale to enable them to be more discriminatory in their feedback. In 2007, other researchers [
28] found offline sessions better in that immediate dialogue and the ability to discuss issues in the native language were possible, but worse in that sessions were rushed and participants often did not have enough time. By contrast, online sessions offered more flexibility, resulted in longer peer feedback, and actually resulted in less sense of social pressure, but the time delay presented problems for some.
In another study [
38], all the students were actively engaged in the process of sharing comments through both direct and online interactions, and students became more self-sufficient in their learning. Most of the students could identify mistakes in their friends’ drafts and make corrections related to those mistakes. Even through Facebook, students actively gave feedback and did it on time. However, training from the instructor was valuable in developing relevant skills. Asynchronous interactions allowed students more time to read and give comments to their peers’ writing. The extra time gave the students the chance to read their peers’ writing in detail and offer more complete corrections. As a result, online student feedback was more informative than offline feedback. Additionally, some students felt that class was noisy and rushed for time and that online activity enabled them to concentrate better—noisy and rushed classes could result in simpler and more basic feedback. Some students read more of the work of other students than they were required to, since they found it interesting and informative. However, online communication had its difficulties, not the least of which was the absence of non-verbal feedback. The recording of online comments enabled complete re-reading of the feedback as necessary, which was not possible offline.
In terms of the advantages of online review, in general, students seem positive about online peer feedback. In particular, its time-independence was flexible and convenient. If the class was noisy and rushed for time, the online activity helped them to concentrate better. Online feedback allowed them to check resources and gave them more time to think about and phrase comments before providing feedback, thus resulting in clearer and more elaborated peer feedback. Online peer feedback could thus reduce anxiety surrounding giving peer feedback and result in less social pressure, and especially in an asynchronous anonymous system, more students were ready to ask questions and communicate. Students became more self-sufficient in learning and were more likely to respond on time. Online recording enabled complete re-reading of feedback, which was not possible offline. Online methods also enabled the review of work from multiple peers.
In terms of the advantages of offline, offline sessions were better in that immediate dialogue and the ability to discuss issues in the native language were possible—the time delay presented problems for some. Additionally, offline review included non-verbal feedback, which some students missed in the online mode. However, participating in a synchronous feedback dialogue could be difficult because of the imperative of thinking while talking. Some students still preferred feedback from a teacher, so students did not always trust peer feedback.
7. Discussion
In terms of cognitive outcomes, in general, online peer tutoring and assessment are more effective than offline methods, although some studies found them only equally effective. The substantial contribution of the Jongsma et al. (2022) meta-analysis [
22] was critiqued, as were an additional five experimental studies, before and after 2020. However, direct comparisons of online and offline methods were relatively rare. Nonetheless, perhaps this is of relatively little importance if the future trend is to be towards the more effective blended, rather than purely online, learning. It will, of course, remain important for those courses which operate only remotely, such as MOOCs.
In terms of socio-emotional outcomes, online review was generally popular (although it may not be popular at the outset). Its flexibility, convenience, and facilitation of concentration were greatly valued, especially in an asynchronous mode, and led to more elaborated peer feedback, while offline methods were more rushed. Thus, online feedback reduced anxiety and involved less social pressure, so more students were ready to ask questions and communicate. Students became more self-sufficient, recording enabled re-reading of feedback, and students were more likely to respond on time. Online methods also facilitated the review of work from multiple peers. However, offline sessions enabled immediate dialogue, non-verbal feedback, and the ability to discuss issues in the native language, although thinking while talking could be challenging. Some students still preferred feedback from a teacher because they did not trust peer feedback.
There are a number of limitations to the research synthesised here. First, some studies relied entirely on student and/or teacher perceptions. While this is valuable, it is difficult to accept as the only form of data given its subjectivity, and one would wish to see it accompanied by other forms of more objective data. It would be helpful if future studies included objective outcome indicators as a valuable triangulation on student or teacher perceptions. Further, we neither searched for nor found any studies on additional behavioural outcomes from peer learning, such as intent to graduate or class participation, rather than on cognitive and socio-emotional issues.
It is also noteworthy that very few of the papers reviewed here mentioned long-term outcomes. Would students get better at peer feedback with more practice over time? Or would they become bored with it and want to revert to teacher assessment (which would require less of their energy and, at least in one way, be less stressful)? Additionally, one might ask if there was any spontaneous generalization to other subjects and courses. Do students become sufficiently engaged with peer tutoring and peer assessment to begin to do this informally in other courses, where neither is encouraged by the teacher? All of these are questions for future research, which should also seek to conduct a further systematic review and/or meta-analysis at some future date to see how the field has developed.
All of these issues should, of course, be taken up by authors offering guidance on how to conduct peer tutoring, peer assessment, and feedback. Recent examples from 2023 [
45,
46,
47] are beneficial, although slightly older ones are also likely to be helpful (e.g., [
48]).
The first study [
45] specifically addresses peer assessment in online courses, having searched technology journals from 2010. Eight principles were proposed based on the literature reviewed: (1) provide training, (2) consider the impact of pair or group formation, (3) consider the pros and cons of anonymity, (4) combine peer grading and peer comments, (5) encourage assessors to address strengths and weaknesses and provide sufficient explanations, (6) use strategies such as scaffolding and monitoring to actively engage assessees, (7) encourage interactions between students, and (8) provide supportive structures. However, the author notes that almost all the papers studied focused on offline peer assessment, which raises questions about how his title can purport to be about online feedback.
An interesting instrument was offered in the second study [
46], which reports the varying characteristics of peer assessment designs. A section on context requires details of: subject domain, time/place, setting, requirement, and alignment. A section on instructional design requires details of: purpose, object, product/output, relation to staff assessment, official weight, reward, directionality, degree of interactivity, frequency, group constellation, constellation assessor, constellation assessee, unit of assessment (assessor), unit of assessment (assessee), privacy, contact, matching, format, training, revision, and scope of involvement. A section on outcomes includes: beliefs and perceptions, emotions and motivation, performance, reliability, validity, feedback content, and feedback processing. A section on moderators/mediators includes: gender, age, ability, skills, and culture. Clearly this contains much detail, although again, how much of this is specific to online work is a moot point. Nonetheless, any teacher completing this checklist would become sharply aware of what issues were in danger of being overlooked.
Other researchers offered something broader [
47], a scale for assessing students’ peer feedback literacy in writing. They noted that previous literature indicated that feedback literacy involved four elements: recognition of the feedback’s value and then responding to the feedback by revising, and making judgments (not only of their own work, but also of the received feedback) and managing affect (dealing with their feelings, emotions and attitudes). They also noted that acceptance of feedback varied over time and with experience. In parallel, other researchers [
49] found that willingness to accept feedback was lower among students who only experienced peer feedback once, relative to both those students who had no prior experience and those who had more experience (although this study was limited to MOOCs).
The scale [
47] had a particular focus on assessing gains made from giving feedback, as well as gains from receiving it. It was developed from the questionnaire responses of 474 Chinese undergraduates, equally balanced between the arts and sciences, recruited by convenience sampling from eight universities, so there may be issues of sample bias and cultural specificity. Thirty items were included, on a six-point Likert scale, coupled with five open-ended items. Four factors, based on 26 items, emerged, accounting for 62% of the total variance: feedback-related knowledge and abilities, cooperative learning ability, appreciation of peer feedback, and willingness to participate. Reliabilities varied between 0.80 and 0.89. A reduced scale of 20 items was then developed.
Follow-up comparisons showed that for two factors (willingness to participate and cooperative learning ability), students with more feedback experience had significantly higher scores than students with less. For feedback knowledge and abilities, only those students with much more peer feedback experience had significantly higher means. For peer feedback appreciation, moderate experience yielded the highest scores. However, the items in the questionnaire were all phrased positively, i.e., there was no alternation of positive and negative statements, which may have led to “yea-saying” (a positive bias), so future users may wish to adapt the questionnaire.
While these practical instruments are undoubtedly of value, the extent to which any of them engage fully with all the issues found in this review of online vs. offline peer assessment effectiveness is an interesting question, and this is equally true of ref. [
48].
8. Conclusions
We can answer the research questions thus:
In addition to the 10 studies in the Jongsma meta-analysis [
22] of studies up to 2000, 17 additional studies were identified, 13 before 2000, and 4 after 2000. Overall, there was some evidence that online learning was better than offline learning—studies found that online was better or that online and offline were equal. None found that offline was better.
Is there evidence of effectiveness, and if so, what proportion of this research is solely dependent on student and teacher perceptions, and how much of it uses other indicators?
A good deal of the evidence used only subjective perceptions, with the other portion using more objective measures or triangulating measures.
There were also questions about whether the Jongsma meta-analysis [
22] used entirely appropriate search terms and databases. Other relevant research was found, both before and after this meta-analysis was conducted.
Online peer assisted learning (including peer tutoring and peer assessment) is at least as effective as offline PAL in cognitive terms, and probably modestly more effective in many contexts. From a socio-emotional perspective, online has more advantages than offline PAL, although both have some disadvantages. Students’ response to PAL may be affected by cultural values and/or initial conservatism in the first instance, but experience of the benefits of online methods should lead them to a preference for online work. However, given that blended learning seems more effective than purely online learning, it is likely that online PAL will, in most cases, be an element of blended course delivery in the future.