Mobile app reviews are a large-scale data source for software-related knowledge generation activities, including software maintenance, evolution and feedback analysis. Effective extraction of features (i.e., functionalities or characteristics) from these reviews is key to support analysis on the acceptance of these features, identification of relevant new feature requests and prioritization of feature development, among others. Traditional methods focus on syntactic pattern-based approaches, typically context-agnostic, evaluated on a closed set of apps, difficult to replicate and limited to a reduced set and domain of apps. Mean-while, the pervasiveness of Large Language Models (LLMs) based on the Transformer architecture in software engineering tasks lays the groundwork for empirical evaluation of the performance of these models to support feature extraction. In this study, we present T-FREX, a Transformer-based, fully automatic approach for mobile app review feature extraction. First, we collect a set of ground truth features from users in a real crowdsourced software recommendation platform and transfer them automatically into a dataset of app reviews. Then, we use this newly created dataset to fine-tune multiple LLMs on a named entity recognition task under different data configurations. We assess the performance of T-FREX with respect to this ground truth, and we complement our analysis by comparing T-FREX with a baseline method from the field. Finally, we assess the quality of new features predicted by T-FREX through an external human evaluation. Results show that T-FREX outperforms on average the traditional syntactic-based method, especially when discovering new features from a domain for which the model has been fine-tuned.

T-FREX: A Transformer-based Feature Extraction Method from Mobile App Reviews

Miaschi A.;Dell'Orletta F.;
2024

Abstract

Mobile app reviews are a large-scale data source for software-related knowledge generation activities, including software maintenance, evolution and feedback analysis. Effective extraction of features (i.e., functionalities or characteristics) from these reviews is key to support analysis on the acceptance of these features, identification of relevant new feature requests and prioritization of feature development, among others. Traditional methods focus on syntactic pattern-based approaches, typically context-agnostic, evaluated on a closed set of apps, difficult to replicate and limited to a reduced set and domain of apps. Mean-while, the pervasiveness of Large Language Models (LLMs) based on the Transformer architecture in software engineering tasks lays the groundwork for empirical evaluation of the performance of these models to support feature extraction. In this study, we present T-FREX, a Transformer-based, fully automatic approach for mobile app review feature extraction. First, we collect a set of ground truth features from users in a real crowdsourced software recommendation platform and transfer them automatically into a dataset of app reviews. Then, we use this newly created dataset to fine-tune multiple LLMs on a named entity recognition task under different data configurations. We assess the performance of T-FREX with respect to this ground truth, and we complement our analysis by comparing T-FREX with a baseline method from the field. Finally, we assess the quality of new features predicted by T-FREX through an external human evaluation. Results show that T-FREX outperforms on average the traditional syntactic-based method, especially when discovering new features from a domain for which the model has been fine-tuned.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Motger Q. en
dc.authority.people Miaschi A. en
dc.authority.people Dell'Orletta F. en
dc.authority.people Franch X. en
dc.authority.people Marco J. en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/12/20 12:29:39 -
dc.date.available 2024/12/20 12:29:39 -
dc.date.firstsubmission 2024/12/18 16:59:24 *
dc.date.issued 2024 -
dc.date.submission 2024/12/18 16:59:24 *
dc.description.abstracteng Mobile app reviews are a large-scale data source for software-related knowledge generation activities, including software maintenance, evolution and feedback analysis. Effective extraction of features (i.e., functionalities or characteristics) from these reviews is key to support analysis on the acceptance of these features, identification of relevant new feature requests and prioritization of feature development, among others. Traditional methods focus on syntactic pattern-based approaches, typically context-agnostic, evaluated on a closed set of apps, difficult to replicate and limited to a reduced set and domain of apps. Mean-while, the pervasiveness of Large Language Models (LLMs) based on the Transformer architecture in software engineering tasks lays the groundwork for empirical evaluation of the performance of these models to support feature extraction. In this study, we present T-FREX, a Transformer-based, fully automatic approach for mobile app review feature extraction. First, we collect a set of ground truth features from users in a real crowdsourced software recommendation platform and transfer them automatically into a dataset of app reviews. Then, we use this newly created dataset to fine-tune multiple LLMs on a named entity recognition task under different data configurations. We assess the performance of T-FREX with respect to this ground truth, and we complement our analysis by comparing T-FREX with a baseline method from the field. Finally, we assess the quality of new features predicted by T-FREX through an external human evaluation. Results show that T-FREX outperforms on average the traditional syntactic-based method, especially when discovering new features from a domain for which the model has been fine-tuned. -
dc.description.allpeople Motger, Q.; Miaschi, A.; Dell'Orletta, F.; Franch, X.; Marco, J. -
dc.description.allpeopleoriginal Motger Q.; Miaschi A.; Dell'Orletta F.; Franch X.; Marco J. en
dc.description.fulltext restricted en
dc.description.numberofauthors 5 -
dc.identifier.doi 10.1109/SANER60148.2024.00030 en
dc.identifier.scopus 2-s2.0-85193059310 en
dc.identifier.source scopus *
dc.identifier.uri https://hdl.handle.net/20.500.14243/519997 -
dc.language.iso eng en
dc.publisher.name Institute of Electrical and Electronics Engineers Inc. en
dc.relation.conferencedate 2024 en
dc.relation.conferencename 31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024 en
dc.relation.firstpage 227 en
dc.relation.ispartofbook Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024 en
dc.relation.lastpage 238 en
dc.relation.numberofpages 12 en
dc.subject.keywords feature extraction -
dc.subject.keywords large language models -
dc.subject.keywords mobile apps -
dc.subject.keywords named entity recognition -
dc.subject.keywords reviews -
dc.subject.keywords token classification -
dc.subject.singlekeyword feature extraction *
dc.subject.singlekeyword large language models *
dc.subject.singlekeyword mobile apps *
dc.subject.singlekeyword named entity recognition *
dc.subject.singlekeyword reviews *
dc.subject.singlekeyword token classification *
dc.title T-FREX: A Transformer-based Feature Extraction Method from Mobile App Reviews en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
iris.mediafilter.data 2025/04/15 04:26:15 *
iris.orcid.lastModifiedDate 2024/12/20 12:29:39 *
iris.orcid.lastModifiedMillisecond 1734694179220 *
iris.scopus.extIssued 2024 -
iris.scopus.extTitle T-FREX: A Transformer-based Feature Extraction Method from Mobile App Reviews -
iris.sitodocente.maxattempts 1 -
iris.unpaywall.bestoahost repository *
iris.unpaywall.doi 10.1109/saner60148.2024.00030 *
iris.unpaywall.hosttype repository *
iris.unpaywall.isoa true *
iris.unpaywall.landingpage http://hdl.handle.net/2117/421040 *
iris.unpaywall.license other-oa *
iris.unpaywall.metadataCallLastModified 14/06/2025 06:33:14 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1749875594042 -
iris.unpaywall.oastatus green *
scopus.category 1712 *
scopus.category 2213 *
scopus.category 1708 *
scopus.contributor.affiliation Universitat Politècnica de Catalunya -
scopus.contributor.affiliation ItaliaNLP Lab -
scopus.contributor.affiliation ItaliaNLP Lab -
scopus.contributor.affiliation Universitat Politècnica de Catalunya -
scopus.contributor.affiliation Universitat Politècnica de Catalunya -
scopus.contributor.afid 60007592 -
scopus.contributor.afid 60021199 -
scopus.contributor.afid 60021199 -
scopus.contributor.afid 60007592 -
scopus.contributor.afid 60007592 -
scopus.contributor.auid 57209540522 -
scopus.contributor.auid 57211678681 -
scopus.contributor.auid 57540567000 -
scopus.contributor.auid 6603081752 -
scopus.contributor.auid 8332219900 -
scopus.contributor.country Spain -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Spain -
scopus.contributor.country Spain -
scopus.contributor.dptid -
scopus.contributor.dptid 121833164 -
scopus.contributor.dptid 121833164 -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.name Quim -
scopus.contributor.name Alessio -
scopus.contributor.name Felice -
scopus.contributor.name Xavier -
scopus.contributor.name Jordi -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Institute for Computational Linguistics A. Zampolli (CNR-ILC); -
scopus.contributor.subaffiliation Institute for Computational Linguistics A. Zampolli (CNR-ILC); -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.surname Motger -
scopus.contributor.surname Miaschi -
scopus.contributor.surname Dell'Orletta -
scopus.contributor.surname Franch -
scopus.contributor.surname Marco -
scopus.date.issued 2024 *
scopus.description.abstracteng Mobile app reviews are a large-scale data source for software-related knowledge generation activities, including software maintenance, evolution and feedback analysis. Effective extraction of features (i.e., functionalities or characteristics) from these reviews is key to support analysis on the acceptance of these features, identification of relevant new feature requests and prioritization of feature development, among others. Traditional methods focus on syntactic pattern-based approaches, typically context-agnostic, evaluated on a closed set of apps, difficult to replicate and limited to a reduced set and domain of apps. Mean-while, the pervasiveness of Large Language Models (LLMs) based on the Transformer architecture in software engineering tasks lays the groundwork for empirical evaluation of the performance of these models to support feature extraction. In this study, we present T-FREX, a Transformer-based, fully automatic approach for mobile app review feature extraction. First, we collect a set of ground truth features from users in a real crowdsourced software recommendation platform and transfer them automatically into a dataset of app reviews. Then, we use this newly created dataset to fine-tune multiple LLMs on a named entity recognition task under different data configurations. We assess the performance of T-FREX with respect to this ground truth, and we complement our analysis by comparing T-FREX with a baseline method from the field. Finally, we assess the quality of new features predicted by T-FREX through an external human evaluation. Results show that T-FREX outperforms on average the traditional syntactic-based method, especially when discovering new features from a domain for which the model has been fine-tuned. *
scopus.description.allpeopleoriginal Motger Q.; Miaschi A.; Dell'Orletta F.; Franch X.; Marco J. *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.identifier.isbn *
scopus.differences scopus.relation.conferenceplace *
scopus.document.type cp *
scopus.document.types cp *
scopus.funding.funders 501100004895 - European Social Fund Plus; 501100004837 - Ministerio de Ciencia e Innovación; 501100004837 - Ministerio de Ciencia e Innovación; *
scopus.funding.ids PID2020-117191RB-I00 / AEI/10.13039/501100011033; *
scopus.identifier.doi 10.1109/SANER60148.2024.00030 *
scopus.identifier.isbn 9798350330663 *
scopus.identifier.pui 644840272 *
scopus.identifier.scopus 2-s2.0-85193059310 *
scopus.journal.sourceid 21101239187 *
scopus.language.iso eng *
scopus.publisher.name Institute of Electrical and Electronics Engineers Inc. *
scopus.relation.conferencedate 2024 *
scopus.relation.conferencename 31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024 *
scopus.relation.conferenceplace fin *
scopus.relation.firstpage 227 *
scopus.relation.lastpage 238 *
scopus.subject.keywords feature extraction; large language models; mobile apps; named entity recognition; reviews; token classification; *
scopus.title T-FREX: A Transformer-based Feature Extraction Method from Mobile App Reviews *
scopus.titleeng T-FREX: A Transformer-based Feature Extraction Method from Mobile App Reviews *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
T-FREX_A_Transformer-based_Feature_Extraction_Method_from_Mobile_App_Reviews.pdf

solo utenti autorizzati

Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 775.61 kB
Formato Adobe PDF
775.61 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/519997
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? ND
social impact