Tuesday 23 October 2018

Inspecting Most Frequently Changed Files in Azure Git Repos

When analyzing an application with its telemetry data we would be looking at usage of the application, and would be able to identify which parts of the system is often used. Wouldn’t it be nice to compare that with the most frequently updated areas of the system? Let’s look at how we can get this information via Azure DevOps REST API.


Whether you use one Azure Git repo or multiple Azure Git repos within a single team project in Azure DevOps, it would be useful to extract frequently developed or modified areas of the system. This can be easily achieved using the script  I made available in GitHub. Currently the script only support Azure DevOps Git repos. (Support for including tfvc (team foundation version control) will be added soon).

The script can be executed as shown below.

.\GetMostFrequentlyModifiedFiles.ps1 -token 'yourPAT' -fromDate '5/02/2018' -collectionUri 'https://dev.azure.com/yourAccount' -teamProjectName 'yourteamProject' -repoName @('repo1*', '*corereop*') -branchNameFilter @('master*','develop*')

Parameters for the script

  • token: Personal Access Token of Azure DevOps User.
  • fromDate: The from date to consider modification to the system. Should be supplied in US short date format. This gives the opportunity to obtain data from a given date to current day.
  • collectionUri: Your Azure DevOps account URL.
  • teamProjectName: Name of the Azure DevOps team project.
  • repoName: You can provide * as an array item to consider all repos. Or you can provide multiple filters in an array to filter repo names. Patterns of reponamepart*, *reponamepart or *reponamepart* supported.
  • branchNameFilter: You can provide * as an array item to consider all branches. Or you can provide multiple filters in an array to filter branch names. Patterns of branchnamepart*, *branchnamepart or *branchnamepart* supported.

Script execution process in brief

  • Retrieve all repos and filter for repo name pattern filter and process only relevant repos.
  • Retrieve all branches in a given repo and apply filter and identify branches to include in the report.
  • Get all commits from the given date and process is commit.
  • In each commit find the files changed and make counts of each file change, and get it added up to file change count memory table.
  • Sort the data after processing all repos and branches and export data as a csv file.

Once script executed a csv with the most frequently used files information for selected repos and branches will be exported to csv format. Data will be sorted from most frequently modified to least modified files. The file paths would help you to identify the areas getting most changes in the system. You can use the data and create graphical representations using Excel by summarizing the data by using file path patterns. (This is possible with writing formulas in excel to strip the file part into dropping file name or path portions).image

No comments:

Popular Posts