Delivering Big Data with Open Source. By Olivia Hughes

Big data & open source


Big data and supporting analytics has moved to center stage in improving value for money from data sources, and open source software has been an important contributor to releasing this value. Providing a foundation to exploit large data streams and data sets, open source solutions enable the accelerated development and deployment of big data applications for transformative impact. These solutions range from the use of Unix/Linux infrastructure, distributed storage and processing, and analytical tools based on R.


From optimized business analysis to consumer behaviour trends and big pharma research though to capturing personal data to build life stories, the wide reaching implementation of open source solutions was an overarching concept at CW Big Data at Mathys & Squire LLP, The Shard. Keynote speakers Mary-Ann Claridge of Mandrel Systems, Dan Taylor of JDSU UK, Miika Ahdesmaki of AstraZeneca and Dana Pavel of TecVis analysed ‘Delivering Big Data: Practical solutions with an emphasis on open source’.

Through capturing, organising, validating and analysing data for insight and intelligence, open source software is generating new opportunities across all industries. Key points discussed at the event were:


Frequent and fast releases made possible through open source environments result in closer alignment with developers emerging needs, allowing developers to extend functionality by pushing new features into it.

Open source also allows users to control the support arrangements that are put in place and, if necessary, move the code to a new supplier. Having access to an open source community provides access to suppliers of support services.

Data security is a critical issue underpinning the big data ecosystem.  For example, one open source use case presented by Miika Ahdesmaki, AstraZeneca is Next Generation Sequencing, using Linux in AWS to analyse genomes and detect mutations. While life-changing, these cutting-edge health science technologies pull data from DNA, raising concern about the protection of such distinctly unique personal information.

Mary-Anne Clarage, Mandel Systems discussed how to select the right tools to effectively leverage datasets, with a focus on the investigation and production stages. Key questions to consider are: What is the natural form of the data provided? How quickly will data be received, and how much will arrive? How soon will processed results be needed? and who is end user?

Although there are currently numerous solutions on the market, big data technologies are evolving very quickly and new platforms continuously being released. To explore these new solutions, an approach to deal with this is to initially make a small contained product to test different technologies to determine whether important to invest. Using open source tools significantly reduces the investment to do this.


With many organisations still coming to grips with full potential of big data, open source software offers unmistakable benefits, including reducing the barriers to entry. However, to maintain this lead requires open source communities to address any deficiencies in reliability, auditing, security and consistency for mission critical systems. Some open source forums like Linus have shown they are able to achieve this.


About The Author

Liv Hughes
Follow on twitter: @liv_venture
Connect on LinkedIn:


Keep the conversation going on Twitter: @CambWireless

Want to find out more about Cambridge Wireless? Visit