In 1989 a husband and wife team set up a small data science business in the kitchen of their home. Five years later they were presenting to the board of Tesco. What they said that day enabled the supermarket chain to double its market share within a year, transformed the way most of us shop, and created a new market into which billions of pounds have been invested over the last decade.
Dunnhumby was the company behind Tesco’s Clubcard scheme - a loyalty programme that tracks customer purchases as a way to offer rewards for frequent shopping and more targeted promotions. It was also a pioneer of harnessing big data for tangible organisational benefit.
The big data revolution emerged from a combination of academic and industry research, and advances in computing power at lower cost. This has stimulated the rapid development of the underpinning technologies that make it possible to organise vast quantities of data and convert it into insights and money. The emergence of software such Hadoop, which enables the use of a network of many computers to solve problems involving massive amount of data computation, suddenly meant that working with large amounts of data in an efficient and meaningful way was possible. But because this has all taken place over just the last 10 years, everyone engaged with taking big data mainstream has experienced a rapid learning curve and an educational journey.
The majority of big data strategies have emerged in the commercial sector from companies perceiving data analytics as a route to competitive advantage through better decision-making, more accurate predictions, and finding efficiency gains. A good example is the logistics industry, where the colossal amounts of data generated by tracking items around the world are now being crunched to find ways of saving time and money. UPS, a large global delivery company, analysed the shortest routes from pickup to destination across its global fleet. What it discovered was that the shortest distance from A to B is often not the quickest when you factor in traffic infrastructure on the route. Specifically, it found that avoiding junctions requiring drivers to wait before turning across the traffic (a right-hand turn in the UK, a left hand turn in most other countries) meant that driving a longer route with a lower frequency of cross-traffic turns often got drivers to their destinations much quicker. In the US it now issues drivers with prescribed routes for each delivery that limit the number of left-hand turns they need to make.
“Defence can learn from experiences of the commercial sector and avoid some obvious pitfalls”
In defence the use of commercial-scale data analytics has not yet taken hold. There are many instances where large volumes of data are being used to achieve operational advantage but due to the enormous variety of the data available, the added complexity of data sensitivity, and the sheer scale of the environment to cover, defence remains a fast follower rather than an early adopter in this field. That could well turn out to be an important advantage. Companies have made great strides harnessing the value of their data, but along the way they have made plenty of mistakes too. Defence can learn from their experiences and avoid some obvious pitfalls. In this article we outline five key lessons defence should learn from commercial’s big data journey so far:
1. Understand the limits of what you have
Organisations want to introduce data science and analytics into their operations because they believe that within the data they have amassed over time, there are indications about how to become more effective and efficient. Their starting point is often an assumption that they have all the data they need but just need a way to analyse it. The reality is often very different. The data has rarely been captured in a uniform, organised way so cleaning and sorting it is a gruelling process. It can vary dramatically in location, quality, and accessibility. Sometimes different data to that which is currently held is needed to undertake the required analysis and there can be significant gaps. More data has to be gathered or purchased to fill those gaps. For many organisations this mismatch of expectation and reality is a nasty surprise. Defence organisations must look at the experiences of the commercial world and go into the big data opportunity with their eyes open. The issues most companies experience are likely to be exacerbated by a defence environment that is more complex, and which deals in more sensitive and diverse data.
“Organisations often assume they have all the data they need, but just need a way to analyse it. The reality is often very different"
The defence sector has the potential to glean significant value from the data it can generate, and the historic information it already has. It is the spread, volume and complexity of the data that may make this a difficult process. If defence organisations want to explore the big data opportunity they need to do so with a clear understanding that they may be at the start of the journey not close to the end. Reviewing the data they can access; investing in getting existing that data cleaned and prepared; ensuring existing processes for capturing data are aligned to the insights required; and putting suitable monitoring and validation in place to assure the outputs from any analysis, will all help accelerate the timeline for effective results.
2. Don’t underestimate the value of open data
While many organisations want to explore their proprietary data to generate new insights, there is a growing collection of open data which is available for anyone to use. A lot of this comes from public sector organisations so it includes information about public travel patterns, weather data, and academic research findings, for example. Although there massive variety of online data available, including quite a bit of fake information, open data sets can be very powerful. They are constantly updated so their accuracy is usually high and the insights gleaned from their use in analysis are therefore significant. This is encouraging people and organisations to use open data more frequently to develop new ways of solving everyday problems at no cost to anyone. Kaggle, a company that hosts data analytics competitions set by organisations with data-driven problems they cannot solve, was established on the back of this uptick in publicly available data and the growth in people using it. Kaggle now boasts a community of more than 500,000 participants and has run more than 200 competitions. These have included important pieces of work for the public good, including understanding of the HIV virus, and the search for the Higgs Boson at CERN. Kaggle was acquired by Google in 2017.
“People and organisations are being encouraged to use open data more frequently to develop new ways of solving everyday problems, at no cost to anyone"
There are two ways defence can learn from the growth of open sources of data;
First, it must recognise that it can use this data as easily as anyone else. Weather patterns can help predict flight conditions and enable safe scheduling of test or training programmes. Analysis of travel data can enable fast and accurate decision-making during urban terror incidents. Defence organisations should embrace the use of open data and see it as a vast, growing resource for augmenting the sensitive information they already hold. To do so they will need to resolve the issue of assuring the quality of the base data and therefore the analytical outputs. This will require a layer of data verification, and output validation based on existing domain knowledge to ensure sufficient rigour for defence use. Working within the defence and security ecosystem, with organisations that already understand this environment, the technology, and the complex capability requirements, is a logical next step.
Secondly, it needs to note how engaged the global data science community has become as the amount and variety of open data has grown. There is a broad and talented cadre of data scientists whose experience and knowledge could be as valuable as the data itself. The default position for the defence community is not to share its data for reasons of national security. But that is too narrow a view. Not all of the data generated in the defence domain is sensitive. Using non sensitive data to crowdsource solutions will offer new methodologies and approaches which can then be applied to more tightly controlled. The opportunity for defence in big data lies as much in attracting the attention of the data science community, as it does to applying analytics to its current data sets. Once again, validating outputs and processes will be a key part of making this a viable part of defence’s ongoing big data strategy.
3. Be happy to work in the cloud
An awful lot of big data capability is now moving to the cloud. It has rapidly become the preferred offer from providers to the commercial environment where it is readily accepted. Commercial organisations have already experienced the scalability and security cloud based services can offer. But defence is not a rapid cloud adopter and moving away from suppliers’ default option can be costly. There needs to be some clear analysis of whether cloud can or cannot be a feasible route for defence’s exploration of big data. This should not be too difficult – more than four years ago Amazon Web Services built a private cloud for the US intelligence services, a move evangelised by the CIA’s Chief Information Officer as ‘the most innovative thing we have ever done’. Extending this out to the broader global defence community should not be an insurmountable challenge.
4. Embrace automation
As the amount of available data rises, so do both the range of sources generating that data, and the need to use real-time analysis to cope. People cannot work fast enough. An automated approach to some elements of data science is becoming more prevalent as a result. The financial sector has been the prime early adopter for automated big data analytics. Insurance is a particular case in point and fraud detection is the primary application. Fraud is a massive drain on insurance company revenues. In 2016 the Association of British Insurers issues a report that found the cost of fraud to UK insurers to be in excess of £1.3bn per annum. In the US, the Coalition Against Insurance Fraud estimates that more than $80bn in fraudulent claims are made in North America each year. One of the problems is that the volume of claims data is so huge that it has become impossible to identify the fraud using human intervention. Some of the bigger insurers have started trying to reduce their exposure by pushing the boundaries of automation, machine learning and big data as an integrated approach. By taking a big data architecture and connecting it with machine learning predictive analytical techniques they are automating the process of spotting which claims might not be legitimate by uncovering hidden correlations and patterns previously undetectable to the human eye.
“Automation may be the only way defence organisations can spot the trends and patterns that help them make critical decisions"
This is an area where defence can learn a lot. It too has to cope with large volumes of data, multiple sources, and a considerable variety of data types, all generated in a constantly changing environment. A static model cannot work in this scenario – automation may be the only way defence organisations can spot the trends and patterns that help them make critical decisions. They need to take a lead from the finance community, work with organisations in this sector to better understand the technology solutions they have adopted, and start building automation strategies into their overall approach to big data.
5. Accept failure as part of the process
In 2008 Google claimed it could use its search data to predict flu epidemics around the world based on people’s searches for flu-related information. It quickly became the poster child for the application of big data and analytics. It failed spectacularly, missing the peak of the 2013 flu epidemic and was quietly euthanised by Google as a result. It illustrates two important lessons defence will have to consider as it moves further into a big data word. The first is that using the data correctly is as important as having the data in the first place. The value of the data held by entities like Google is almost limitless, if applied in the right way. Not all organisations do this and that is why their efforts are unsuccessful.
“Using data correctly is as important as having data in the first place"
The second lesson is that this is not a zero sum game. The process of finding useful insight in the swathes of data available to most organisations is a long term commitment that involves succeeding through failure. No matter how much preparation of the data, algorithms, architecture, and systems you undertake, there is no way to predict with any reasonable certainty what all that effort will deliver. Google’s experience is synonymous with many other companies. Few, if any, have embarked on a big data journey and found gold at the first turn. Algorithms are often wrong; too many assumptions are usually made; rarely is the data as relevant or as clean as was first thought; and how well different data sets will click together is usually an unknown. Everyone thinks they understand the data but really no one does – certainly not to the levels expected. To accommodate this, many big data programmes are delivered using agile approaches that employ considerable monitoring and testing to quickly respond to analytical findings, and hone both the approach and results at every juncture. In commercial environments this has become the norm. Companies have accepted that they will seldom find linear paths from data to answer and to decision. It is a more agile approach but it is also time consuming and expensive.
Defence organisations now have a choice to make. Either they wait a little longer to join the big data revolution until some of the uncertainty has been mitigated, but by which time they may be further behind than they already are. Or they adjust their approach, accepting that not everything can be planned at the outset and delivered to spec where data is concerned. This is not a comfortable option. But making that adjustment is not as perilous as it sounds anymore. Developments in test, evaluation, simulation, and monitoring for data-driven programmes mean that being more agile doesn’t have to mean being more risky. Monitoring the successes and failures of endeavours in data analytics using the same principles and techniques used to monitor the performance of other modern defence capabilities can enable a more agile approach without exposure. If a new air defence capability can be monitored, tested and adapted swiftly to reflect the changing nature of a critical situation, why can data analytics not be evaluated and attuned in the same way?
Looking across the areas outlined above there are some important common themes. First there is the mismatch between the experience commercial organisations have had with big data over the last decade and how they have dealt with those to reach a stronger position, and the approach defence can take. In the commercial world companies have accommodated cultural and behavioural shifts in the search for ways to turn data assets into insight and money. In some cases this has meant changing structure and roles. The emergence of the Chief Data Office and the role’s rise to board-level is evidence of this. Defence organisations are not sufficiently convinced of the benefits on offer to commit to this level of change, nor are they agile enough in their approach to deliver it if they were. The commercial experience has shown us that those unwilling to accept this level of change may hamper their own ability to benefit. The defence community needs to work together to develop a risk reduction strategy that allows for a modicum of change but with reduced exposure.
Second, is that the world of big data thrives on open engagement. Limiting yourself to the few people in an organisation who understanding predictive modelling and analytics wastes the opportunity the data presents. Defence organisations are right to be careful with the way they engage an unregulated community like this, but they also need to recognise the value it brings. They need to study the examples where sensitive data has delivered important progress in mission critical environments such as healthcare and adopt similar approaches to find a happy medium between risk and reward.
Underpinning all of this is the importance of monitoring and verifying outputs. Not only can this substantiate the results from open engagement, but it also enables the learnings of each project to be fed back into the system, further optimising the approach and therefore future results. Harnessing the value of big data is an iterative route to success. Each project requires a layer of evaluation to ensure the next stage on the journey can be better than the last. Defence organisations will no doubt welcome this extra layer of assurance and need to incorporate it into their plans for embracing the big data opportunity.
“The defence community needs to work together to develop a risk reduction strategy"
Defence really has no option but to invest in big data analytics across its activities; it has to do so in order to keep pace with adversaries, who are increasingly technologically advanced, and to get performance improvements out of its platforms, capabilities and workforce. For defence organisations then, the greatest takeaway of all is that the commercial world has done a lot of the learning already. Mistakes have been made, successes have been identified, and best practice is starting to emerge. The requirement now is for the defence sector to apply this to its unique operating environment and start to make the most of the data assets it has today and the ones it can generate in the future.