MSR4P&S Workshop Presentation

MSR4P&S Workshop Presentation – Alexis Butler

August 15, 2025

The 3^rd International workshop on Mining Software Repositories for Privacy and Security (MSR4P&S) this year, was co-located with SANER25 in Montreal, Canada. I was fortunate to be able to attend and present my recent work, Links Between Package Popularity, Criticality, and Security in Software Ecosystems. MSR4P&S focusses on the application of software mining techniques to the security domain, an intersection particularly relevant to my interests in software supply chain risks. This year, the work presented, ranged from the study of LLM PII disclosure risks, to Software Bill of Materials (SBOM) standards conformity. The other published works can be found on the workshop’s webpage.

In the work I presented, we investigated the relationships between package popularity, criticality, and security within software ecosystems, specifically Python and JavaScript/TypeScript. Given the increasing maintenance workloads and stressors faced by open-source software (OSS) maintainers, our research aimed to determine if the security of packages at the core of these ecosystems was being prioritized over those on the periphery. While there are many ways to determine core packages, we made use of two – popularity and criticality, as measured by GitHub Forks and a novel Directed graph centrality measure. Our findings revealed a statistically significant moderate positive correlation between security and popularity in both ecosystems, suggesting that more popular packages tend to have stronger security postures. However, the correlation between security and criticality yielded mixed results, indicating that further investigation is needed to understand the nuances of criticality in different ecosystems.

While I refer to measuring relationships to security, it is more accurate to say the relationship to security posture was measured. This is due to my choice of OSSF Scorecard as the metric for security, a framework that takes a broad definition of security in terms of adherence to Open-Source best practices for secure development. Interestingly, my use of OSSF Scorecard aligned with discussions around another one of the accepted works at MSR4P&S.

Following the presentation, there was an opportunity for questions and discussion, this proved valuable, as multiple other attendees had experience working both with graph centrality measures and dependency graphs. Discussion focused on how the observed structural properties of the dependency graphs could be leveraged to extend this work to enable the prediction of security scores given partial information regarding a package. I feel that both my presentation and the subsequent discussion went well, however, as always, I came away with several ways to improve my approach to presentations.

On completion of the workshop, I attended the Software Quality Assurance for Artificial Intelligence workshop (SQA4AI). Two papers at this workshop drew my attention, one on the effect on LLM prompt patterns on resultant code complexity, and another on Deep Learning specific technical debt. Both works combined interests from my pre-PhD work, while also being relevant to the potential future directions of my PhD.