Empirical comparison of text-based mobile apps similarity measurement techniques (ICSE 2020 - Journal First)

Write a Blog >>

Wed 24 June - Thu 16 July 2020

Who

Afnan Al-Subaihin, Federica Sarro, Sue Black, Licia Capra

Track

ICSE 2020 Journal First

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 7 Jul 2020 08:43 - 08:51 at Baekje - I4-Clones and Changes Chair(s): Chanchal K. Roy

Abstract

Context Code-free software similarity detection techniques have been used to support different software engineering tasks, including clustering mobile applications (apps). The way of measuring similarity may affect both the efficiency and quality of clustering solutions. However, there has been no previous comparative study of feature extraction methods used to guide mobile app clustering.

Objective In this paper, we investigate different techniques to compute the similarity of apps based on their textual descriptions and evaluate their effectiveness using hierarchical agglomerative clustering.

Method To this end we carry out an empirical study comparing five different techniques, based on topic modelling and keyword feature extraction, to cluster 12,664 apps randomly sampled from the Google Play App Store. The comparison is based on three main criteria: silhouette width measure, human judgement and execution time.

Results The results of our study show that using topic modelling, in addition to collocation-based and dependency-based feature extractors perform similarly in detecting app-feature similarity. However, dependency-based feature extraction performs better than any other in finding application domain similarity (ρ = 0.7, p − value < 0.01).

Conclusions Current categorisation in the app store studied does not exhibit a good classification quality in terms of the claimed feature space. However, a better quality can be achieved using a good feature extraction technique and a traditional clustering method.

Link to Publication

https://link.springer.com/article/10.1007/s10664-019-09726-5

DOI

https://doi.org/10.1007/s10664-019-09726-5

File attachments

Full paper (Al-Subaihin2019_Article_EmpiricalComparisonOfText-base.pdf)	910KiB

Afnan Al-Subaihin

King Saud University

Saudi Arabia

Federica Sarro

University College London, UK

United Kingdom

Sue Black

Durham University

Licia Capra

University College London