CNR Institutional Research Information System

Mobile app review analysis presents unique challenges due to the low quality, subjective bias, and noisy content of user-generated documents. Extracting features from these reviews is essential for tasks such as feature prioritization and sentiment analysis, but it remains a challenging task. Meanwhile, encoder-only models based on the Transformer architecture have shown promising results for classification and information extraction tasks for multiple software engineering processes. This study explores the hypothesis that encoder-only large language models can enhance feature extraction from mobile app reviews. By leveraging crowdsourced annotations from an industrial context, we redefine feature extraction as a supervised token classification task. Our approach includes extending the pre-training of these models with a large corpus of user reviews to improve contextual understanding and employing instance selection techniques to optimize model fine-tuning. Empirical evaluations demonstrate that these methods improve the precision and recall of extracted features and enhance performance efficiency. Key contributions include a novel approach to feature extraction, annotated datasets, extended pre-trained models, and an instance selection mechanism for cost-effective fine-tuning. This research provides practical methods and empirical evidence in applying large language models to natural language processing tasks within mobile app reviews, offering improved performance in feature extraction.

Leveraging encoder-only large language models for mobile app review feature extraction

Motger Q.;Miaschi A.;Dell'Orletta F.;Franch X.;Marco J.

2025

Abstract

Mobile app review analysis presents unique challenges due to the low quality, subjective bias, and noisy content of user-generated documents. Extracting features from these reviews is essential for tasks such as feature prioritization and sentiment analysis, but it remains a challenging task. Meanwhile, encoder-only models based on the Transformer architecture have shown promising results for classification and information extraction tasks for multiple software engineering processes. This study explores the hypothesis that encoder-only large language models can enhance feature extraction from mobile app reviews. By leveraging crowdsourced annotations from an industrial context, we redefine feature extraction as a supervised token classification task. Our approach includes extending the pre-training of these models with a large corpus of user reviews to improve contextual understanding and employing instance selection techniques to optimize model fine-tuning. Empirical evaluations demonstrate that these methods improve the precision and recall of extracted features and enhance performance efficiency. Key contributions include a novel approach to feature extraction, annotated datasets, extended pre-trained models, and an instance selection mechanism for cost-effective fine-tuning. This research provides practical methods and empirical evidence in applying large language models to natural language processing tasks within mobile app reviews, offering improved performance in feature extraction.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.ancejournal	EMPIRICAL SOFTWARE ENGINEERING	en
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	en
dc.authority.people	Motger Q.	en
dc.authority.people	Miaschi A.	en
dc.authority.people	Dell'Orletta F.	en
dc.authority.people	Franch X.	en
dc.authority.people	Marco J.	en
dc.collection.id.s	b3f88f24-048a-4e43-8ab1-6697b90e068e	*
dc.collection.name	01.01 Articolo in rivista	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.date.accessioned	2026/03/03 15:02:32	-
dc.date.available	2026/03/03 15:02:32	-
dc.date.firstsubmission	2026/03/02 19:07:53	*
dc.date.issued	2025	-
dc.date.submission	2026/03/02 19:07:53	*
dc.description.abstracteng	Mobile app review analysis presents unique challenges due to the low quality, subjective bias, and noisy content of user-generated documents. Extracting features from these reviews is essential for tasks such as feature prioritization and sentiment analysis, but it remains a challenging task. Meanwhile, encoder-only models based on the Transformer architecture have shown promising results for classification and information extraction tasks for multiple software engineering processes. This study explores the hypothesis that encoder-only large language models can enhance feature extraction from mobile app reviews. By leveraging crowdsourced annotations from an industrial context, we redefine feature extraction as a supervised token classification task. Our approach includes extending the pre-training of these models with a large corpus of user reviews to improve contextual understanding and employing instance selection techniques to optimize model fine-tuning. Empirical evaluations demonstrate that these methods improve the precision and recall of extracted features and enhance performance efficiency. Key contributions include a novel approach to feature extraction, annotated datasets, extended pre-trained models, and an instance selection mechanism for cost-effective fine-tuning. This research provides practical methods and empirical evidence in applying large language models to natural language processing tasks within mobile app reviews, offering improved performance in feature extraction.	-
dc.description.allpeople	Motger, Q.; Miaschi, A.; Dell'Orletta, F.; Franch, X.; Marco, J.	-
dc.description.allpeopleoriginal	Motger Q.; Miaschi A.; Dell'Orletta F.; Franch X.; Marco J.	en
dc.description.fulltext	restricted	en
dc.description.numberofauthors	5	-
dc.identifier.doi	10.1007/s10664-025-10660-y	en
dc.identifier.isi	WOS:001471816400001	-
dc.identifier.scopus	2-s2.0-105003228374	en
dc.identifier.source	scopus	*
dc.identifier.uri	https://hdl.handle.net/20.500.14243/570522	-
dc.language.iso	eng	en
dc.relation.issue	3	en
dc.relation.volume	30	en
dc.subject.keywords	Extended pre-training	-
dc.subject.keywords	Feature extraction	-
dc.subject.keywords	Instance selection	-
dc.subject.keywords	Large language models	-
dc.subject.keywords	Mobile app reviews	-
dc.subject.keywords	Named-entity recognition	-
dc.subject.singlekeyword	Extended pre-training	*
dc.subject.singlekeyword	Feature extraction	*
dc.subject.singlekeyword	Instance selection	*
dc.subject.singlekeyword	Large language models	*
dc.subject.singlekeyword	Mobile app reviews	*
dc.subject.singlekeyword	Named-entity recognition	*
dc.title	Leveraging encoder-only large language models for mobile app review feature extraction	en
dc.type.driver	info:eu-repo/semantics/article	-
dc.type.full	01 Contributo su Rivista::01.01 Articolo in rivista	it
dc.type.miur	262	-
iris.isi.extIssued	2025	-
iris.isi.extTitle	Leveraging encoder-only large language models for mobile app review feature extraction	-
iris.mediafilter.data	2026/03/04 02:52:31	*
iris.orcid.lastModifiedDate	2026/03/04 01:09:50	*
iris.orcid.lastModifiedMillisecond	1772582990394	*
iris.scopus.extIssued	2025	-
iris.scopus.extTitle	Leveraging encoder-only large language models for mobile app review feature extraction	-
iris.sitodocente.maxattempts	1	-
iris.unpaywall.bestoahost	repository	*
iris.unpaywall.bestoaversion	submittedVersion	*
iris.unpaywall.doi	10.1007/s10664-025-10660-y	*
iris.unpaywall.hosttype	repository	*
iris.unpaywall.isoa	true	*
iris.unpaywall.journalisindoaj	false	*
iris.unpaywall.landingpage	https://hdl.handle.net/2117/432930	*
iris.unpaywall.license	cc-by-nc-nd	*
iris.unpaywall.metadataCallLastModified	04/03/2026 04:34:28	-
iris.unpaywall.metadataCallLastModifiedMillisecond	1772595268422	-
iris.unpaywall.oastatus	green	*
isi.authority.ancejournal	EMPIRICAL SOFTWARE ENGINEERING###1382-3256	*
isi.category	EW	*
isi.contributor.affiliation	Universitat Politecnica de Catalunya	-
isi.contributor.affiliation	Inst Computat Linguist A Zampolli ILC CNR	-
isi.contributor.affiliation	Inst Computat Linguist A Zampolli ILC CNR	-
isi.contributor.affiliation	Universitat Politecnica de Catalunya	-
isi.contributor.affiliation	Universitat Politecnica de Catalunya	-
isi.contributor.country	Spain	-
isi.contributor.country	Italy	-
isi.contributor.country	Italy	-
isi.contributor.country	Spain	-
isi.contributor.country	Spain	-
isi.contributor.name	Quim	-
isi.contributor.name	Alessio	-
isi.contributor.name	Felice	-
isi.contributor.name	Xavier	-
isi.contributor.name	Jordi	-
isi.contributor.researcherId	CCM-5349-2022	-
isi.contributor.researcherId	GCD-5321-2022	-
isi.contributor.researcherId	NVY-1615-2025	-
isi.contributor.researcherId	KAM-2369-2024	-
isi.contributor.researcherId	C-7258-2015	-
isi.contributor.subaffiliation	Dept Serv & Informat Syst Engn	-
isi.contributor.subaffiliation	ItaliaNLP Lab	-
isi.contributor.subaffiliation	ItaliaNLP Lab	-
isi.contributor.subaffiliation	Dept Serv & Informat Syst Engn	-
isi.contributor.subaffiliation	Dept Comp Sci	-
isi.contributor.surname	Motger	-
isi.contributor.surname	Miaschi	-
isi.contributor.surname	Dell'Orletta	-
isi.contributor.surname	Franch	-
isi.contributor.surname	Marco	-
isi.date.issued	2025	*
isi.description.abstracteng	Mobile app review analysis presents unique challenges due to the low quality, subjective bias, and noisy content of user-generated documents. Extracting features from these reviews is essential for tasks such as feature prioritization and sentiment analysis, but it remains a challenging task. Meanwhile, encoder-only models based on the Transformer architecture have shown promising results for classification and information extraction tasks for multiple software engineering processes. This study explores the hypothesis that encoder-only large language models can enhance feature extraction from mobile app reviews. By leveraging crowdsourced annotations from an industrial context, we redefine feature extraction as a supervised token classification task. Our approach includes extending the pre-training of these models with a large corpus of user reviews to improve contextual understanding and employing instance selection techniques to optimize model fine-tuning. Empirical evaluations demonstrate that these methods improve the precision and recall of extracted features and enhance performance efficiency. Key contributions include a novel approach to feature extraction, annotated datasets, extended pre-trained models, and an instance selection mechanism for cost-effective fine-tuning. This research provides practical methods and empirical evidence in applying large language models to natural language processing tasks within mobile app reviews, offering improved performance in feature extraction.	*
isi.description.allpeopleoriginal	Motger, Q; Miaschi, A; Dell'Orletta, F; Franch, X; Marco, J;	*
isi.document.sourcetype	WOS.SCI	*
isi.document.type	Article	*
isi.document.types	Article	*
isi.identifier.doi	10.1007/s10664-025-10660-y	*
isi.identifier.eissn	1573-7616	*
isi.identifier.isi	WOS:001471816400001	*
isi.journal.journaltitle	EMPIRICAL SOFTWARE ENGINEERING	*
isi.journal.journaltitleabbrev	EMPIR SOFTW ENG	*
isi.language.original	English	*
isi.publisher.place	VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS	*
isi.relation.issue	3	*
isi.relation.volume	30	*
isi.title	Leveraging encoder-only large language models for mobile app review feature extraction	*
scopus.authority.ancejournal	EMPIRICAL SOFTWARE ENGINEERING###1382-3256	*
scopus.category	1712	*
scopus.contributor.affiliation	Universitat Politècnica de Catalunya	-
scopus.contributor.affiliation	ItaliaNLP Lab	-
scopus.contributor.affiliation	ItaliaNLP Lab	-
scopus.contributor.affiliation	Universitat Politècnica de Catalunya	-
scopus.contributor.affiliation	Universitat Politècnica de Catalunya	-
scopus.contributor.afid	60007592	-
scopus.contributor.afid	60021199	-
scopus.contributor.afid	60021199	-
scopus.contributor.afid	60007592	-
scopus.contributor.afid	60007592	-
scopus.contributor.auid	57209540522	-
scopus.contributor.auid	57211678681	-
scopus.contributor.auid	57540567000	-
scopus.contributor.auid	6603081752	-
scopus.contributor.auid	8332219900	-
scopus.contributor.country	Spain	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Spain	-
scopus.contributor.country	Spain	-
scopus.contributor.dptid	109636042	-
scopus.contributor.dptid	121833164	-
scopus.contributor.dptid	121833164	-
scopus.contributor.dptid	109636042	-
scopus.contributor.dptid	112881698	-
scopus.contributor.name	Quim	-
scopus.contributor.name	Alessio	-
scopus.contributor.name	Felice	-
scopus.contributor.name	Xavier	-
scopus.contributor.name	Jordi	-
scopus.contributor.subaffiliation	Department of Service and Information System Engineering;	-
scopus.contributor.subaffiliation	Institute for Computational Linguistics “A. Zampolli” (ILC-CNR);	-
scopus.contributor.subaffiliation	Institute for Computational Linguistics “A. Zampolli” (ILC-CNR);	-
scopus.contributor.subaffiliation	Department of Service and Information System Engineering;	-
scopus.contributor.subaffiliation	Department of Computer Science;	-
scopus.contributor.surname	Motger	-
scopus.contributor.surname	Miaschi	-
scopus.contributor.surname	Dell’Orletta	-
scopus.contributor.surname	Franch	-
scopus.contributor.surname	Marco	-
scopus.date.issued	2025	*
scopus.description.abstracteng	Mobile app review analysis presents unique challenges due to the low quality, subjective bias, and noisy content of user-generated documents. Extracting features from these reviews is essential for tasks such as feature prioritization and sentiment analysis, but it remains a challenging task. Meanwhile, encoder-only models based on the Transformer architecture have shown promising results for classification and information extraction tasks for multiple software engineering processes. This study explores the hypothesis that encoder-only large language models can enhance feature extraction from mobile app reviews. By leveraging crowdsourced annotations from an industrial context, we redefine feature extraction as a supervised token classification task. Our approach includes extending the pre-training of these models with a large corpus of user reviews to improve contextual understanding and employing instance selection techniques to optimize model fine-tuning. Empirical evaluations demonstrate that these methods improve the precision and recall of extracted features and enhance performance efficiency. Key contributions include a novel approach to feature extraction, annotated datasets, extended pre-trained models, and an instance selection mechanism for cost-effective fine-tuning. This research provides practical methods and empirical evidence in applying large language models to natural language processing tasks within mobile app reviews, offering improved performance in feature extraction.	*
scopus.description.allpeopleoriginal	Motger Q.; Miaschi A.; Dell'Orletta F.; Franch X.; Marco J.	*
scopus.differences	scopus.subject.keywords	*
scopus.document.type	ar	*
scopus.document.types	ar	*
scopus.funding.funders	501100004895 - European Social Fund Plus; 501100004837 - Ministerio de Ciencia e Innovación; 501100004837 - Ministerio de Ciencia e Innovación;	*
scopus.funding.ids	PID2020-117191RB-I00 / AEI/10.13039/501100011033;	*
scopus.identifier.doi	10.1007/s10664-025-10660-y	*
scopus.identifier.eissn	1573-7616	*
scopus.identifier.pui	2034300319	*
scopus.identifier.scopus	2-s2.0-105003228374	*
scopus.journal.sourceid	18650	*
scopus.language.iso	eng	*
scopus.publisher.name	Springer	*
scopus.relation.article	104	*
scopus.relation.issue	3	*
scopus.relation.volume	30	*
scopus.subject.keywords	Extended pre-training; Feature extraction; Instance selection; Large language models; Mobile app reviews; Named-entity recognition;	*
scopus.title	Leveraging encoder-only large language models for mobile app review feature extraction	*
scopus.titleeng	Leveraging encoder-only large language models for mobile app review feature extraction	*
Appare nelle tipologie:	01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
s10664-025-10660-y.pdf solo utenti autorizzati Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 2.05 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.05 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/570522

Citazioni

ND

7

3

social impact