How Do Background and Remote User Representations Affect Social Telepresence in Remote Collaboration?: A Study with Portal Display, a Head Pose-Responsive Video Teleconferencing System
Round 1
Reviewer 1 Report
Dear Author,
I congratulate you for the bold initiative of using such a subject and the intention for publishing it. In order to reach for the desired quality publication standards and requirements, a number of issues need to be addressed:
1. your research must follow the specific methodological procedures of academic writing; for this reason, the introduction must address the research question
2. second, within the literature review, the main issues/variables of your study must be carefully described in the light of previous research and addressed within the hypotheses proposed, with specific clear argumentation
3. please specify the source of figures and tables
4. within the study procedures and results, you might want to follow a specific pattern : materials and methods, analysis strategy, methods setting and sample followed by results
As an example you might use
Sustainability | Free Full-Text | Assessing Teleworkforce and Electronic Leadership Favorable for an Online Workforce Sustainability Framework by Using PLS SEM (mdpi.com)
Remote Sensing | Free Full-Text | From Land Cover to Land Use: A Methodology to Assess Land Use from Remote Sensing Data (mdpi.com)
5. moreover, after describing your results, a table with the confirmed/rejected hypotheses might be useful for the external reader. Please provide previous literature seriously argument in regard to each of your H
6. within the results, a similar action must be followed: when specifying the ranges of your data and results, please provide a sound literature fundament in this regard
7. which software did you use when reporting results? (section 3.3)
8. you need to argue within the literature review about your scale variables
9. you cannot have a string of tables (please see table 1, 2 ) without any text in between
10. within the methodology it would be advisable to describe every variable that you have considered for developing your study
11. your literature review therefore, needs to be vastly improved. A Resources of 32 titles is very inappropriate, no matter the novelty of the subject proposed
12. a research design figure would also be useful
Best regards,
minor English editing errors
Author Response
Dear Reviewer 1,
Thank you for your insightful comments and feedback on our manuscript. We highly appreciate the time and effort you invested in reviewing our work.
Review #1: your research must follow the specific methodological procedures of academic writing; for this reason, the introduction must address the research question
In response to R1's feedback, we have revisited our manuscript and recognized that the specific research questions we aim to address were not made explicit, although there were parts that hinted at them. To provide clearer guidance to our audience, we have now articulated three distinct research questions in our revised manuscript. These questions delineate the specific aims and scope our research seeks to address.
Review #2: second, within the literature review, the main issues/variables of your study must be carefully described in the light of previous research and addressed within the hypotheses proposed, with specific clear argumentation
We recognized that our initial manuscript did not sufficiently elucidate the rationale behind our hypothesis selection. To address this, we have reframed our hypotheses in response to the research questions we set forth. To further clarify our reasoning and the selection of variables and hypotheses, we have referenced additional pertinent literature.
Specifically, we referred to the work of Yu et al. [23], which highlights the potential of point cloud representations in influencing aspects such as presence, behavior impression, and humanness within VR contexts. While the study by Kauff et al. [20] offers an established perspective on how point clouds, when paired with a virtual background, can enhance telepresence, our research seeks to build upon and extend this foundational knowledge in the context of contemporary technological developments.
Based on these, we formulated a series of hypotheses (H1a-H3) that address the central variables and concerns of our study. To confirm these hypotheses, we detailed our experimental methodology, emphasizing the use and user experience of the Portal Display, which includes elements such as system usability, social telepresence, and the focus directed toward the remote user.
Review #3: please specify the source of figures and tables
We recognize the importance of properly citing figures and tables and ensuring the appropriate assignment of copyrights. We wish to confirm that all figures and tables presented in the manuscript were created by the authors. Additionally, we have meticulously reviewed the main text to ensure that each figure and table is explicitly referenced.
Review #4: within the study procedures and results, you might want to follow a specific pattern : materials and methods, analysis strategy, methods setting and sample followed by results
In our initial submission, we adhered to a more free-form format typical of computer science-related research. However, we acknowledge that electronics journals may have specific expectations regarding section formatting. Responding to R1's feedback, we have restructured our "Study Procedures and Results" section, dividing it into separate "Study Procedures" and "Results" sections. In the newly delineated "Study Procedures" section, we have incorporated subsections such as "Task Design", "Analysis Strategy", and "Protocol (Method settings)", as thoughtfully suggested by R1.
Review #5: moreover, after describing your results, a table with the confirmed/rejected hypotheses might be useful for the external reader. Please provide previous literature seriously argument in regard to each of your H
We acknowledge R1's feedback regarding the clarity of our initial manuscript, particularly in conveying the rationale behind our hypothesis selection and their explicit confirmation or rejection. To address this, we have referred to previous literature, such as Yu et al. [23] and Onishi et al. [30], among others. Furthermore, we have introduced Table 7, which provides a comprehensive overview of the outcomes of our user experiment, detailing the confirmation or rejection of each hypothesis.
Review #6: within the results, a similar action must be followed: when specifying the ranges of your data and results, please provide a sound literature fundament in this regard
We've newly defined our data range in the "4.3 Protocol" subsection, which details the System Usability Scale (SUS) questionnaire's structure and reliability. This subsection sheds light on scale ranges, their implications, and established usability benchmarks. In Section 6.2, our results heavily reference pertinent literature, underscoring the advantages of point cloud-centric representations in interactions, with support from studies by Nowak et al. [40], Yu et al. [23], and Kang et al. [42]. While potential challenges of point cloud streaming, as pointed out by Zhang et al. [43], exist, our data indicates its superiority over graphical methods in telepresence platforms. With these adjustments, we've tried to address the feedback regarding the alignment of our research with existing literature.
Review #7: which software did you use when reporting results? (section 3.3)
In response to R1's suggestion, we recognize the importance of specifying the software used to facilitate replication by future researchers. For all our statistical procedures, we employed the JASP software. This has now been emphasized within the main text of our manuscript.
Review #8: you need to argue within the literature review about your scale variables
We appreciate the emphasis on detailing the rationale behind our choice or formulation of questionnaire scales used for measuring our dependent variables. To address this, we have now delineated "4.3 Protocol" as a separate subsection, where we introduce the specific questionnaires utilized and articulate the reasoning behind their selection. Within this subsection, we offer a comprehensive overview of the System Usability Scale (SUS) questionnaire, emphasizing both its structure and proven reliability. This includes clear presentations of the scale ranges, their underlying significance, and established benchmarks for usability. While we employed the questionnaire by Nakanishi et al. [13, 32] to gauge social telepresence perception, we also integrated a concentration questionnaire. This addition aims to holistically assess the efficacy of remote collaboration, especially during instructed assembly tasks, by considering usability, presence, and actual effectiveness as indicated by user concentration.
Review #9: you cannot have a string of tables (please see table 1, 2 ) without any text in between
In our initial presentation, our goal was to offer a holistic understanding by presenting tables consecutively. However, in light of R1's feedback, we recognize the potential benefit of interspersing explanatory text. Accordingly, we have incorporated descriptive text between the tables (lines 241-247, 272-281, 294-302) to provide immediate context following each table, thereby assisting readers in understanding our results and the associated post-hoc tests.
Review #10: within the methodology it would be advisable to describe every variable that you have considered for developing your study
In the revised manuscript, we explicitly stated the three target variables within all three research questions and associated hypotheses (a, b, c) before presenting the methodology section. Additionally, in the study procedure section detailing the methodology, we have detailed the questionnaires used, complete with a full list of items, in the newly established "4.3 Protocol" subsection.
Review #11: your literature review therefore, needs to be vastly improved. A Resources of 32 titles is very inappropriate, no matter the novelty of the subject proposed
In response to R1’s feedback, we have revised our manuscript to strengthen its connection to established knowledge in the field. We have increased our list of references from 38 to 48. We made a conscious effort to ensure that these additional references are both directly relevant to our topic and contribute to a more comprehensive understanding of the subject matter. Additionally, we have revised our literature review section to provide a more cohesive synthesis of these references, ensuring they support and give context to our research findings. Furthermore, to maintain the integrity and focus of our paper, we have also removed some references that might not be as pertinent to our core topic.
Review #12: a research design figure would also be useful
In an effort to provide readers with a clear overview of our study's structure and methodology at the outset, we have introduced Figure 1 to our manuscript, following R1's suggestion. This figure comprehensively depicts our research design, encompassing the Portal Display's technical design, the experimental process, and the subsequent analysis.
Review #13: minor English editing errors
Thank you for your feedback on the language clarity. We've had our manuscript professionally reviewed by professional editors to revise English errors and have attached the certificate for verification.
Author Response File: Author Response.pdf
Reviewer 2 Report
This paper is well written and organized. However, there are some critical issues that should be addresed.
1. In the introduction, the authors fail to address the key theoretical contributions. This must be linked to the selling points for readers.
2. Research questions are necessary to hit your logic of the research insight.
3. In the section of discussion, there are no theoretical and practical implications. To support your results, recent works should be cited.
Author Response
Dear Reviewer 2,
Thank you for your insightful comments and feedback on our manuscript. We highly appreciate the time and effort you invested in reviewing our work.
Review #1: In the introduction, the authors fail to address the key theoretical contributions. This must be linked to the selling points for readers.
We thank R2 for highlighting the need to clearly articulate the key theoretical contributions in the introduction.
In lines 58-66 of our revised manuscript, we have made significant enhancements to underscore the distinctive contributions of our research. Specifically, we have introduced the "Portal Display", a unique screen-based video conferencing system equipped with a depth camera. The system is designed to provide users with a sense of spatial depth during video interactions. Unlike many prior systems that require multiple depth cameras, the Portal Display achieves this experience using just a single depth camera, offering both economic and spatial benefits.
Moreover, our study not only introduces a system but also examines the user experience across different representation methods. This comparative approach provides insights that can contribute to better understanding user preferences in teleconferencing setups.
We trust that these modifications offer a clearer and more modest depiction of the value and scope of our research.
Review #2: Research questions are necessary to hit your logic of the research insight.
In response to R2's insightful feedback regarding the articulation of research questions, we have refined our introduction to clearly present our Research Questions (RQ1-RQ3). We recognize that delineating these questions is pivotal in providing readers with a clear understanding of our study's scope, position, and logical progression. Consequently, within the introduction, we have now outlined the following primary research objectives:
- RQ1: How does the difference in remote user representation (point cloud streaming vs. graphical rendering) in Portal Display influence overall system usability, social telepresence, and concentration toward the remote user?
- RQ2: How does the difference in representation of the remote user’s background (point cloud streaming vs. graphical rendering) impact overall system usability, social telepresence, and concentration toward the remote user?
- RQ3: Is there an interaction effect between the methods of remote user representation and the background representation (point cloud streaming vs. graphical rendering) that influences the overall Portal Display system usability, social telepresence, and concentration toward the remote user?
Review #3: In the section of discussion, there are no theoretical and practical implications. To support your results, recent works should be cited.
In light of your feedback, we have incorporated additional recent studies into our discussion to strengthen the context of our findings.
We incorporated insights from Zinchenko et al. [41], emphasizing the innate human preference for anthropomorphic representations during interactions. This addition further supports the heightened sense of social telepresence observed in our study with point cloud-centric representations.
Further, we brought in findings from Yu et al. [23], which underlined the superior social telepresence evoked by point cloud representations in comparison to graphical avatars. Their study corroborated our observations, lending additional weight to our arguments.
To address the significance of realistic facial detail, we referenced Kang et al.'s [42] work on the role of facial detail in augmented reality contexts. Their emphasis on the importance of realistic facial representations echoed our study's sentiments and reinforced our discussions on the enhanced sense of social telepresence with point clouds.
Lastly, drawing from Lee et al. [45], we discussed the non-detracting influence of virtual backgrounds on the overall usability of teleconferencing systems, resonating with our observations on background representations in telepresence.
We hope these changes adequately address your concerns.
Reviewer 3 Report
This manuscript focuses on evaluating the impact of graphic-rendered and video-streamed backgrounds and remote user representations on social telepresence, usability, and concentration during conversations and collaborative tasks for Portal Display. Overall, the motivation of the manuscript is solid and the technical contents have clear contributions to the relevant research branch. To improve the paper, the below were recommended to be modified or rewrite.
1, The manuscript addresses stereo disparity by transforming the 3D graphic environment and projecting it onto a flat display shown as Figure 1-3. However, the principle was not been clearly explained, which may cause some trouble for readers.
2, The Inverse-kinematics to transform the end-effector joint positions (facial landmarks and upper body) into avatar motions was presented in Figure 7. How does it work?
3, In the study, the topic involves not only technology but also psychology. However, the relation between the technology and psychology was not be clearly illustrated.
4, Some terms or symbol such as SD, M, F in section 3.3 were not defined or referred.
5. The issue to address in the manuscript was proposed in the “introduction” section, “However, many video conferencing applications fall short in conveying a genuine sense of interconnectivity, particularly in terms of non-verbal cues. Another issue is that the streaming videos are often not responsive to the diverse viewpoints and environments of the current and remote users. As a result, the precise orientation of gestures, gazes, and other spatial details such as the location, size, and direction of objects in the surroundings may be compromised”. However, whether it has been solved successfully was not clearly presented in the section 4 or 5.
Author Response
Dear Reviewer 3,
Thank you for your insightful comments and feedback on our manuscript. We highly appreciate the time and effort you invested in reviewing our work.
Review #1: The manuscript addresses stereo disparity by transforming the 3D graphic environment and projecting it onto a flat display shown as Figure 1-3. However, the principle was not been clearly explained, which may cause some trouble for readers.
In response to R3's feedback, we recognize the need to enhance our figure representation for clearer comprehension, especially regarding the principles of stereo disparity, given our system introduces a novel spatial projection approach. We've updated Figure 3 to include images directly from the Unity environment, which illustrates the "Step-by-step composite linear transformation process within the 3D engine space." By showcasing this transformation within the familiar context of Unity, we aim to provide readers with a more intuitive understanding of the principles under discussion.
Review #2: The Inverse-kinematics to transform the end-effector joint positions (facial landmarks and upper body) into avatar motions was presented in Figure 7. How does it work?
Thank you for your valuable feedback and for pointing out the need for a clearer explanation regarding the utilization of inverse kinematics in our system.
In lines 151-159 of our revised manuscript, we detailed how the graphical rendering representation we adopted uses both facial landmark detection algorithms and leap motion tracking to understand and capture the kinematics of a user’s upper body. Specifically, we gathered data on the head’s position and rotation using the FaceTrackNoIR API, which relies on webcam input, and the hand position and rotation were ascertained using the Leapmotion SDK.
The information was integrated using the Final IK asset model of Unity Engine. We used the model and the positions and rotations from the aforementioned joints – notably the head and both wrists – were subjected to inverse kinematics. This procedure transformed the raw data into motions within the feasible human upper body movement range. The end result of this transformation is that our system's avatars can emulate the genuine, natural movements of real users with a high degree of authenticity, as evidenced in Figure 6(b), Figure 8, and Video S3.
Review #3: In the study, the topic involves not only technology but also psychology. However, the relation between the technology and psychology was not be clearly illustrated.
We appreciate your suggestion and have made the following revisions to address your concerns:
In Section 6.2, we added references to the work of Zinchenko et al. [41] to emphasize the inherent human preference for anthropomorphic representations in virtual interactions. This psychological insight aligns with our findings and reinforces the heightened sense of social telepresence provided by point cloud-centric representations, as opposed to their graphical counterparts.
Specifically, we included: "According to the field of the psychology of perception, individuals inherently prefer anthropomorphic (human-like) representations during interactions, suggesting an innate preference for realism and lifelike interactivity (Zinchenko et al.) [41].”
Review #4: Some terms or symbol such as SD, M, F in section 3.3 were not defined or referred.
Thank you for highlighting the lack of clarity surrounding certain terms and symbols.
To address this concern, we have made sure to define and refer to these terms at their initial mention within the text. For instance, in line 223, we have explicitly described the terms "Mean" and "Standard Deviation" alongside their respective symbols "M" and "SD". Similarly, in line 234, we clarified the terms "F-statistic" with its symbol "F", and the "significance level" with its symbol "p".
Review #5: The issue to address in the manuscript was proposed in the “introduction” section, “However, many video conferencing applications fall short in conveying a genuine sense of interconnectivity, particularly in terms of non-verbal cues. Another issue is that the streaming videos are often not responsive to the diverse viewpoints and environments of the current and remote users. As a result, the precise orientation of gestures, gazes, and other spatial details such as the location, size, and direction of objects in the surroundings may be compromised”. However, whether it has been solved successfully was not clearly presented in the section 4 or 5.
We appreciate your feedback highlighting the need for clearer connections between the challenges presented in the introduction and the solutions our system offers. To address this, we have refined Section 4.1 ("Task Design") to better elucidate how our designed tasks both evaluate the system's performance and showcase its capabilities in addressing the stated challenges. A particular focus was given to the representation of non-verbal cues and offering diverse viewpoints in video conferencing.
Lines 195-201 now elaborate on our block assembly task's design rationale, informed by the works of Onishi et al. [30] and Kim et al. [31]. This task has been carefully constructed to authentically capture non-verbal cues. We detail how guiding participants to utilize their eyes and fingers for block selection and assembly illustrates the Portal Display's proficiency in ensuring accurate spatial orientation, fostering enhanced interactivity, and delivering a genuine telepresence experience.
With these modifications, we aim to present a more coherent narrative that effectively bridges the introduction's outlined challenges with the solutions and findings demonstrated in our study.
Reviewer 4 Report
This research is fascinating. It uses experimental methods to verify four conditions (2*2) and studies social telepresence, usability, and concentration. The structure of this study is rigorous, the experimental steps are practical, and the analysis tools and methods are appropriate. Although the results obtained are predictable, the experimental spirit of this study is still affirmed. For the parts that need to be strengthened, it is recommended that the literature discussion chapter of this article be added to add more literature. In the analysis and conclusion part, the results of this study need to add some support from the literature.
Minor editing of English language required
Author Response
Dear Reviewer 4,
Thank you for your insightful comments and feedback on our manuscript. We highly appreciate the time and effort you invested in reviewing our work.
Review #1: For the parts that need to be strengthened, it is recommended that the literature discussion chapter of this article be added to add more literature.
We appreciate your suggestion on enriching the literature discussion chapter. In response, we have expanded our references, increasing the count from 38 to 48. We've carefully chosen these additions to align with our topic and further illuminate the subject matter for our readers. In the updated literature discussion, these sources have been synthesized to provide a richer context that supports and frames our study's findings. To maintain clarity and relevance, we've also removed a few references that were tangential to our core theme.
Review #2: In the analysis and conclusion part, the results of this study need to add some support from the literature.
In response to your feedback on the analysis and conclusion sections, we have enriched these parts with supporting references to provide a well-grounded context for our results.
We included insights from Zinchenko et al. [41], underscoring the inherent human inclination towards anthropomorphic interactions. This inclusion further bolsters the increased sense of social telepresence noted in our study when using point cloud-centric representations.
We also referenced the findings of Yu et al. [23], which highlighted the enhanced social telepresence that point cloud representations bring compared to traditional graphical avatars. Their study reinforces our observations, adding more depth to our conclusions.
Highlighting the importance of realistic facial portrayals, we incorporated Kang et al.'s [42] research on the impact of facial detail in augmented reality environments. Their focus on the pivotal role of genuine facial representations complements our study's conclusions and strengthens our analysis.
Lastly, drawing upon Lee et al., we articulated the neutral influence of virtual backgrounds on the overall usability of teleconferencing platforms, which aligns with our insights on the role of background representations in enhancing telepresence.
Reviewer 5 Report
The study is rather interesting and can be useful for building video conferencing systems. However, it should be supplemented with some psychological analyses from the field of psychology of perception. This may be a wish for future research.
In general, English is good. Some minor editing would be useful.
Author Response
Dear Reviewer 5,
Thank you for your insightful comments and feedback on our manuscript. We highly appreciate the time and effort you invested in reviewing our work.
Review #1: However, it should be supplemented with some psychological analyses from the field of psychology of perception. This may be a wish for future research.
In response to R5's insights, we have recognized the potential link between our findings and established knowledge within the psychology of perception. To underscore this connection, we have incorporated references to Zinchenko et al. [41], highlighting the human inclination towards anthropomorphic representations in virtual interactions. Our results, which indicate a heightened sense of social telepresence with point cloud-centric representations compared to graphical alternatives, resonate with this psychological perspective. Specifically, we've added that the field of the psychology of perception suggests that individuals have an innate preference for anthropomorphic (human-like) representations during interactions, reflecting a deep-rooted inclination towards realism and lifelike interactivity.
Review #2: In general, English is good. Some minor editing would be useful.
Thank you for your feedback on the language clarity. We've had our manuscript professionally reviewed by professional editors to revise English errors and have attached the certificate for verification.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Dear authors,
I congratulate you for the final form of yout manuscript.
Best of luck!