Analyzed the metrics of open source projects hosted on SourceForge.net
Highlights
Deliverables:
Final Presentation
Final Report
Instructor: Lada Adamic
Course: SI544 – Introduction to Statistics and Data Analysis (Winter’2009)
Teammates: Nathan Oostendorp, Tom Hayden, & Zongyun Lai
Challenge
One measure of a open source project’s popularity is the number of times the project files have been downloaded. What are the factors that increase the number of downloads of a project? Our team analyzed the metrics of projects hosted on SourceForge.net. Our primary objective was to find the strongest predictor of an open source project’s download numbers. We also looked at the distribution characteristics of forum activity, total downloads, and other metrics.
Process
Using regression models, we compared number of downloads for a project with many variables: tools usage (forums, bug tracker, and file releases), project members, user participation, and license. We found strong correlations between usage of tools and the number of file downloads. Time series analysis and a chi test confirmed that the month of December has fewer downloads than the rest of the year. We presented our findings as a presentation and a written report.