Visualising StackOverflow Tag Relationships with Silverlight

February 20th, 2012 by Colin Eberhardt

UPDATE: I have posted the sourcecode for this control on codeproject.

Recently I have been wondering about the wealth of information that can be gleaned from the 2.5 million programming question on Stack Overflow. A few weeks back I found a tag trending tool, which can be used to measure the rise and fall in popularity of tags over time. Whilst this is a great little tool, I am sure there is much more that can be done with the freely available Stack Overflow data, for example, exploring the relationships between the many technologies people ask questions about.

On a recent trip to Copenhagen I decided to put my hours of travelling time to good use and create a Silverlight application that plots the relationships between the various tags. I created an application that downloaded the 1,000 most recent questions via the Stack Overflow API and plotted the relationships between the 20 most popular tags, as seen above.

The graph is constructed as follows:

  • The size of each segment is proportional to the number of questions relating to the tag, i.e. android and java are the most popular tags.
  • Connections between tags indicate questions that have been tagged with both technologies. The thickness of the connection indicates how many questions share these two tags, i.e. jQuery and JavaScript tags appear together quite often.
  • Each segment is coloured based on the number of connections it has, red for many connections, blue for few.

The ordering of segments can be changed using the drop-down control. Probably one of the most interesting views is the one where related tags are clustered. This is done by assigning a ‘weight’ to the current configuration of the graph by summing the length of all connections, with connections that cross the centre of the circle adding most weight. An iterative process is used to minimise the overall graph weight by moving each segment a few steps left and right, until the least ‘weighty’ configuration is found. This is the one where each tag is most closely related to its neighbours.

When clustering is applied we can see small ‘pockets’ of related technologies, with the following patterns emerging

  • The two most popular tags, Java and Android, are very closely related to each other, but have very few other relationships.
  • iOS, Objective-C and iPhone form a close-knit group. However, Objective-C questions are sometimes also tagged with C#, C and C++.
  • C#, .NET and ASP.NET are clustered, however C# has links with many other tags
  • The strongest relationship is between jQuery and JavaScript, probably due to jQuery having become the de-facto framework for JavaScript development, being used on 53% of websites.
  • There is a large cluster of connected web technologies, CSS, HTML, JavaScript, jQuery, reflecting the mix of technologies involved in creating web sites and web applications.
  • Python, whilst being a popular tag, has very few relationships, only being weakly linked to PHP.

I am planning on tidying up the code for this visualisation, making it more generic, allowing it to be used to graph other datasets. Let me know if you are interested in this!

Here is the same graph, but showing the top 30 tags, again, more interesting relationships start to emerge:

Finally, thanks to Chris P., Adrian C. and Graham O. for their ideas and input!

Regards, Colin E.

 

Tags: , ,

27 Responses to “Visualising StackOverflow Tag Relationships with Silverlight”

  1. Murat Eraydin says:

    Great post! High performance diagramming is something most component vendors lacking… Visiblox should add these kind of visualizations to their toolbox as well ;)

  2. Ujjwal Singh says:

    Cool!!
    a suggestion: you could label the small block like: ‘Eclipse’ along the radius. also camel Case would make it even better. e.g. ‘JavaScript’.

  3. [...] Data visualization of StackOverflow (link) [...]

  4. Myrddin Emrys says:

    On the ‘top 30′ charge, choose clustering, then look at the relative positions of CSS and SPRING (lower right). Their location seems reversed; do you know why the clustering algorithm made a mistake there?

    • Myrddin Emrys says:

      Another example of swapped ordering is ASP.NET-MVC-3 and LINUX (lower left), also which appear that they would be better clustered if their positions was reversed.

  5. MaggieL says:

    I note that “Silverlight” doesn’t appear as a tag…

  6. Omnifarious says:

    I have no Microsoft platform to view this on. :-/

  7. sacha says:

    Where can I download the code for this one, looks interesting

  8. [...] Visualising StackOverflow Tag Relationships with Silverlight [...]

  9. Jarrod Dixon says:

    Awesome – I really like the hovers that highlight a tag’s relationships.

    You should check out the d3 project for more visualization examples:

    http://mbostock.github.com/d3/ex/

    I could see a circle packing graph for related tags, too.

  10. Tim says:

    Very cool. Was this done using Visiblox?

  11. [...] Visualising StackOverflow Tag Relationships with Silverlight (Colin Eberhardt) [...]

Leave a Reply