Endowing machines with the ability to represent and understand the physical world in which they live is a longstanding challenge in the AI research community. The last years have seen significant advancements in the fields of Natural Language Processing (NLP) and Computer Vision (CV) and the development of robotic hardware and accompanying algorithms. Deep learning is feeding NLP advances thanks to neural networks models with outstanding achievements in several tasks, such as language modeling (Devlin et al., 2019), sentence classification (Pota et al., 2020), named entity recognition (Catelli et al., 2021), sentiment analysis (Pota et al., 2021), and question answering (Zhang et al., 2019). Deep learning has led to impressive achievements on various Computer Vision tasks and becoming state of the art in computer vision (He et al., 2015; Smith et al., 2021). Even though these fields are among the most actively developing AI research areas, until recently, they have been treated separately without many ways to benefit from each other. On the contrary, it is fundamental to integrate verbal and no verbal communication to consider the multimodal nature of communication (Maniscalco et al., 2022). With the expansion of deep learning approaches, researchers have started exploring the possibilities of jointly applying both NLP and CV approaches to improve robotic capabilities. A prevalent artifice to self-organize language-meaning representations in robotic architectures is the use of supervised learning methods amid which some have taken significant steps toward humanlike intelligence demonstrated in learning experiments with real robots (Ogata et al., 2007; Tani, 2016; Yamada et al., 2016). However, these proposals mainly implicate low-level motor skills and in a way that neglects perspectives from cognitive linguistics and psychology. By contrast, this has inspired the efforts of several authors that model usage-based language acquisition and production, unidirectionally (Golosio et al., 2015; Moulin-Frier et al., 2017; Hinaut and Twiefel, 2019) or bidirectionally (Heinrich et al., 2020), i.e., learn motor meaning from language and emerge language skills from motor exploration. It is important to highlight also the attempt to model multiple language learning (Giorgi et al., 2020). This Research Topic aims to provide an overview of the research being carried out in both the areas of NLP and CV to allow robots to learn and improve their capabilities for exploring, modeling, and learning about the physical world. As this integration requires an interdisciplinary attitude, the Research Topic aims to gather researchers with broad expertise in various fields--machine learning, computer vision, natural language, neuroscience, and psychology--to discuss their cutting edge work as well as perspectives on future directions in this exciting space of language, vision and interactions in robots.

Editorial: Language and Vision in Robotics: Emerging Neural and On-Device Approaches

Esposito;Massimo;Maniscalco;Umberto;
2022

Abstract

Endowing machines with the ability to represent and understand the physical world in which they live is a longstanding challenge in the AI research community. The last years have seen significant advancements in the fields of Natural Language Processing (NLP) and Computer Vision (CV) and the development of robotic hardware and accompanying algorithms. Deep learning is feeding NLP advances thanks to neural networks models with outstanding achievements in several tasks, such as language modeling (Devlin et al., 2019), sentence classification (Pota et al., 2020), named entity recognition (Catelli et al., 2021), sentiment analysis (Pota et al., 2021), and question answering (Zhang et al., 2019). Deep learning has led to impressive achievements on various Computer Vision tasks and becoming state of the art in computer vision (He et al., 2015; Smith et al., 2021). Even though these fields are among the most actively developing AI research areas, until recently, they have been treated separately without many ways to benefit from each other. On the contrary, it is fundamental to integrate verbal and no verbal communication to consider the multimodal nature of communication (Maniscalco et al., 2022). With the expansion of deep learning approaches, researchers have started exploring the possibilities of jointly applying both NLP and CV approaches to improve robotic capabilities. A prevalent artifice to self-organize language-meaning representations in robotic architectures is the use of supervised learning methods amid which some have taken significant steps toward humanlike intelligence demonstrated in learning experiments with real robots (Ogata et al., 2007; Tani, 2016; Yamada et al., 2016). However, these proposals mainly implicate low-level motor skills and in a way that neglects perspectives from cognitive linguistics and psychology. By contrast, this has inspired the efforts of several authors that model usage-based language acquisition and production, unidirectionally (Golosio et al., 2015; Moulin-Frier et al., 2017; Hinaut and Twiefel, 2019) or bidirectionally (Heinrich et al., 2020), i.e., learn motor meaning from language and emerge language skills from motor exploration. It is important to highlight also the attempt to model multiple language learning (Giorgi et al., 2020). This Research Topic aims to provide an overview of the research being carried out in both the areas of NLP and CV to allow robots to learn and improve their capabilities for exploring, modeling, and learning about the physical world. As this integration requires an interdisciplinary attitude, the Research Topic aims to gather researchers with broad expertise in various fields--machine learning, computer vision, natural language, neuroscience, and psychology--to discuss their cutting edge work as well as perspectives on future directions in this exciting space of language, vision and interactions in robots.
2022
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
nlp
Robotics
Computer Vision
Language Developement
deep learning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/440227
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact