Identification of non-functional requirements in textual specifications: A semi-supervised learning approach

Abstract

Context - Early detection of non-functional requirements (NFRs) is crucial in the evaluation of architectural alternatives starting from initial design decisions. The application of supervised text categorization strategies for requirements expressed in natural language has been proposed in several works as a method to help analysts in the detection and classification of NFRs concerning different aspects of software. However, a significant number of pre-categorized requirements are needed to train supervised text classifiers, which implies that analysts have to manually assign categories to numerous requirements before being able of accurately classifying the remaining ones. Objective - We propose a semi-supervised text categorization approach for the automatic identification and classification of non-functional requirements. Therefore, a small number of requirements, possibly identified by the requirement team during the elicitation process, enable learning an initial classifier for NFRs, which could successively identify the type of further requirements in an iterative process. The goal of the approach is the integration into a recommender system to assist requirement analysts and software designers in the architectural design process. Method - Detection and classification of NFRs is performed using semi-supervised learning techniques. Classification is based on a reduced number of categorized requirements by taking advantage of the knowledge provided by uncategorized ones, as well as certain properties of text. The learning method also exploits feedback from users to enhance classification performance. Results - The semi-supervised approach resulted in accuracy rates above 70%, considerably higher than the results obtained with supervised methods using standard collections of documents. Conclusion Empirical evidence showed that semi-supervision requires less human effort in labeling requirements than fully supervised methods, and can be further improved based on feedback provided by analysts. Our approach outperforms previous supervised classification proposals and can be further enhanced by exploiting feedback provided by analysts.

Publication
Information and Software Technology