Recently we have worked with a dozen industrial collaborators to pinpoint and quantify architecture debts, from multi-national corporations to startup companies. Our technology leverages a wide range of project data, from source file dependency to issue records, and we interacted with projects of various sizes and characteristics.

Crossing the border between research and practice, we have observed significant gaps in terms of data availability and quality among projects of different kinds. Compared with successful open source projects, data from proprietary projects are rarely complete or well-organized. Consequently, not all projects can benefit from all the features we provide, which, in turn, made them aware of the need to improve their development processes.

In this paper, we categorize the commonly observed differences between open source and proprietary project data, analyze the reasons for such differences, and propose suggestions to minimize the gap, to facilitate advances to both software research and practice.

