<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/CINECAstyle.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-06-11T03:52:20Z</responseDate><request verb="GetRecord" identifier="oai:iris.cnr.it:20.500.14243/570522" metadataPrefix="oai_dc">https://iris.cnr.it/oai/request</request><GetRecord><record><header><identifier>oai:iris.cnr.it:20.500.14243/570522</identifier><datestamp>2026-03-04T01:28:12Z</datestamp><setSpec>com_20.500.14243_22</setSpec><setSpec>com_20.500.14243_21</setSpec><setSpec>col_20.500.14243_23</setSpec><setSpec>ou_ou239</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Leveraging encoder-only large language models for mobile app review feature extraction</dc:title>
<dc:creator>Motger Q.</dc:creator>
<dc:creator>Miaschi A.</dc:creator>
<dc:creator>Dell'Orletta F.</dc:creator>
<dc:creator>Franch X.</dc:creator>
<dc:creator>Marco J.</dc:creator>
<dc:contributor>Motger, Q.</dc:contributor>
<dc:contributor> Miaschi, A.</dc:contributor>
<dc:contributor> Dell'Orletta, F.</dc:contributor>
<dc:contributor> Franch, X.</dc:contributor>
<dc:contributor> Marco, J.</dc:contributor>
<dc:subject>Extended pre-training</dc:subject>
<dc:subject>Feature extraction</dc:subject>
<dc:subject>Instance selection</dc:subject>
<dc:subject>Large language models</dc:subject>
<dc:subject>Mobile app reviews</dc:subject>
<dc:subject>Named-entity recognition</dc:subject>
<dc:description>Mobile app review analysis presents unique challenges due to the low quality, subjective bias, and noisy content of user-generated documents. Extracting features from these reviews is essential for tasks such as feature prioritization and sentiment analysis, but it remains a challenging task. Meanwhile, encoder-only models based on the Transformer architecture have shown promising results for classification and information extraction tasks for multiple software engineering processes. This study explores the hypothesis that encoder-only large language models can enhance feature extraction from mobile app reviews. By leveraging crowdsourced annotations from an industrial context, we redefine feature extraction as a supervised token classification task. Our approach includes extending the pre-training of these models with a large corpus of user reviews to improve contextual understanding and employing instance selection techniques to optimize model fine-tuning. Empirical evaluations demonstrate that these methods improve the precision and recall of extracted features and enhance performance efficiency. Key contributions include a novel approach to feature extraction, annotated datasets, extended pre-trained models, and an instance selection mechanism for cost-effective fine-tuning. This research provides practical methods and empirical evidence in applying large language models to natural language processing tasks within mobile app reviews, offering improved performance in feature extraction.</dc:description>
<dc:date>2025</dc:date>
<dc:type>info:eu-repo/semantics/article</dc:type>
<dc:identifier>https://hdl.handle.net/20.500.14243/570522</dc:identifier>
<dc:identifier>10.1007/s10664-025-10660-y</dc:identifier>
<dc:identifier>info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-105003228374</dc:identifier>
<dc:relation>info:eu-repo/semantics/altIdentifier/wos/WOS:001471816400001</dc:relation>
<dc:language>eng</dc:language>
<dc:relation>volume:30</dc:relation>
<dc:relation>issue:3</dc:relation>
<dc:relation>journal:EMPIRICAL SOFTWARE ENGINEERING</dc:relation>
<dc:rights>info:eu-repo/semantics/restrictedAccess</dc:rights>
<dc:rights>license:NON PUBBLICO - Accesso privato/ristretto</dc:rights>
<dc:rights>license uri:iris.PRI01</dc:rights>
</oai_dc:dc></metadata></record></GetRecord></OAI-PMH>