Big Data Choices

ScaleConf 2012 was a two-day conference held in Kirstenbosch Gardens earlier this year. It was a huge success: a beautiful venue; capacity crowds and a carefully selected list of speakers. Even the Cape Town weather played its part with two days — one after the other — of warm sun and cloudless skies.

ScaleConf also offered a hot topic: local and international experts sharing their ideas on how to grow businesses and services on the internet as fast as possible. For the young and overwhelmingly male audience, Web Scale was the new testosterone-charged measure of awesomeness.

At gatherings like ScaleConf, there are a few fundamental rules. Open Source software is always an unquestioned choice. Open Source good; Microsoft bad.

Another ritual is celebrating the death of the SQL database. As with any decent wake, speeches spread the nostalgia and detail the foibles of the departed. Later, glasses are raised and war stories shared.

There is always a conspicuous absentee from these funeral rites. There is no coffin propped up for viewing. The dearly departed has, once again, stubbornly refused to show up. Even with all the determined mourning, the relational database refuses to kick the bucket, shuffle off its mortal coil or join the choir invisible.

This time, it’s the loose grouping of database systems under the NoSQL banner that threaten the two decade long reign of traditional relational databases like Oracle, SQL Server and MySQL.

When internet companies like Google, Amazon and Facebook had to contend with the gigantic volumes of data generated by their applications, and the global appetite to devour it as fast as possible, they developed their own, innovative data storage systems.

These vast new databases had certain features in common. They could all spread their data across many thousands of servers. They also tackled another performance limitation of conventional databases head-on. They abandoned the rows and columns of the relational model, and instead came up with new data representations narrowly optimized for their business needs.

With no relational data to retrieve, there was no need for the SQL query language. The NoSQL movement was born.

NoSQL is a puzzling name: the query language seemingly punished for the more egregious sins of the underlying relational database. It’s a little like blaming the remote control for the abysmal choice of programs on the television.

At another software conference late last year, in the gloom of early winter in Sweden, I listened to a different set of speakers offering more nuanced views. Here, Microsoft was no longer a pariah, but an active participant in the Open Source community, having just made an important piece of web software freely available.

And instead of SQL vs NoSQL battle lines, a more pragmatic compromise was being discussed: Polyglot Persistence. Why not use multiple data stores in the enterprise, best matched to the differing types of data and the applications that consume them?

There were case studies, too, from fast-growing web outfits like Stack Overflow and Instagram. They relied on the consistency and maturity of relational databases to manage user accounts and payments, but also used the speed and simplicity of NoSQL systems for storing huge numbers of images and caching thousands of web requests per second.

The NoSQL movement, an energizing blast of fresh air in the database world, has come at just the right time for KRS. As a software development house starting in 1987, we have grown in parallel with the SQL database, and relied on it to keep our clients’ data safe. We have taken pride in our data modelling expertise and SQL language skills, and have taught successive generations of developers to make the best use of these tools.

Over the last couple of years, KRS has been investing in far-reaching changes in the way we run our projects. Our commitment to Agile processes and principles has been visible to clients and staff alike; from daily stand-ups to end-of-sprint retrospectives.

Less obvious, but equally profound, have been the improvements in the way we design systems. Team by team, KRS has been changing over to Domain-Driven Design (DDD), with its emphasis on building software that closely matches our client’s business, rather than the dictates of a software framework or database.

There’s been a subtle shift in how we think about data, from the rigid and abstract relations of the past, to flexible aggregates that use simple names and rules easily understood by our clients. In this form, data can be stored equally well in SQL or NoSQL databases.

The challenge for KRS lies in moving data from one store to another, just as reliably as in the past. But this time round, making it all a little faster — and more than a little simpler.

About the Author
Steve Mabbutt is Chief Information Officer at Khanyisa Real Systems, and leads our strategic technology vision. KRS applies the best XP and Agile principles to all our projects: it brings out the best in our teams, and improves value for our clients.-9