Atoms of Representation in Natural Language Processing
A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".
Deadline for manuscript submissions: 31 January 2025 | Viewed by 415
Special Issue Editor
Special Issue Information
Dear Colleagues,
As language models take center stage not only in NLP but in a vast array of scientific applications, the question of how it is best to map natural language in textual form into vector space gains more and more interest. While most popular models still use subword tokens as their atomic units, “token-free” methods including character-level, byte-level, and encoding of visual text rendering have been making promising progress. Still, development and analysis of tokenization and untokenization methods is advancing at a slower rate than research in model architecture and optimization technologies, mostly due to the early stage at which representation is applied, which makes evaluation of new algorithms and techniques particularly challenging. Fundamental insights into the effect of representation atomicity on morphological modeling, on multilingual and crosslingual applications, on computation efficiency, on representations of groups in society, and on other aspects, are still being gained, making this research topic ripe for aggregation and integration of findings and methodologies.
This Special Issue aims to collect such findings and insights, to encourage diving deep into the relationships between language and computation, and to foster holistic approaches and collaboration in development and assessment of different aspects of representation in language models and other NLP systems and applications.
Suggested themes and article types for submissions include:
- Novel schemas for subword tokenization and for tokenizer application methodologies
- Benchmarks and analyses of tokenizer effectiveness and quality, including crosslingual and multilingual setups, morphological aspects, information-theoretic constructions, correlation with quality of learned embeddings and downstream model performance, ability to handle linguistic phenomena, security implications, societal implications, etc.
- Development, modification, evaluation, and analysis of token-free representation schemata based on textual input
- Development, modification, evaluation, and analysis of token-free representation schemata utilizing multimodal input such as visual, spatial, or acoustic signals; combination of different linguistic signals (auditory, textual, sign language) into a single input framework
- Theoretic contributions addressing expressive power or limitations of various textual representation methodologies
- Analysis of the textual modality and its representation on the computational level, e.g. of Unicode standards
Dr. Yuval Pinter
Guest Editor
Manuscript Submission Information
Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.
Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.
Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.
Keywords
- language modeling
- representation learning
- word embeddings
- tokenization
- computational morphology
- information theory
- text analysis
Benefits of Publishing in a Special Issue
- Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
- Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
- Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
- External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
- e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.
Further information on MDPI's Special Issue polices can be found here.