Towards Better Technical Debt Detection with NLP and Machine Learning Methods
Technical debt (TD) is an economical term used to depict non-optimal choices made in the software development process. Technical debt occurs usually when developers take shortcuts instead of following agreed upon development practices. Unchecked growth of technical debt can start to incur negative effects for software development processes such as extra costs, weakened working morale and motivation, making it difficult to implement new features to the existing programs, and substantial financial losses.
Technical debt management is mainly done manually, and this is both slow and costly way of detecting technical debt. Automatic detection would solve this issue, but even state-of-the-art tools of today do not accurately detect the appearance of technical debt. Therefore, increasing the accuracy of automatic classification is of high importance, so that we could eliminate significant portion from the costs relating to technical debt detection.
This research aims to solve the problem in detection accuracy by bringing in together static code analysis and natural language processing. This combination of techniques will allow more accurate detection of technical debt, when compared to them being used separately from each other. Research also aims to discover themes and topics from written developer messages that can be linked to technical debt. These can help us to understand technical debt from developers’ viewpoint. Finally, we will build an open-source tool/plugin that can be used to accurately detect technical debt using both static analysis and natural language processing methods.