Software

Databases

TileDB secures $34M to reimagine databases, not just collect GitHub stars

System aims to clean mess of high-performance analytics cluttering the modern data stack


Flush from securing a $34 million VC investment for his fledgling database company, TileDB CEO, Stavros Papadopoulos, is not planning on returning to the well any time soon.

A former colleague of database pioneer Michael Stonebraker at MIT, Papadopoulos is optimistic that the revenue for database systems designed around multi-dimensional arrays will outpace costs sufficiently to avoid taking cap in hand to VCs again.

"I'm a very conservative CEO," he told The Register. "The previous [$15 million] round lasted for three years, although it was supposed to last 18 months. The economic environment right now is horrible and investors are more conservative than they were.

"The funding may last indefinitely because we have revenue: we're not raising money on GitHub stars, we're raising money on actual numbers. We have a lot of revenue coming in based on our projections. If we were cautious, we can become profitable very, very quickly. I will first get to profitability, and then make this decision whether we want to deploy more aggressively or we want to organically grow."

The last couple of decades have seen a number of concerted efforts to reinvent the database and move on from omnipresent relational systems. Object-oriented, wide-column, document, graph, and value-key systems have all vied to find markets where the RDBMS doesn't play. Papadopoulos's notion of a system with a multi-dimensional array as its first-class data structure is aimed squarely at analytical problems.

The advantage of the array approach is that it represents a general system from which relational or vectors systems, for example, become special cases, he said. TileDB hopes to provide a mathematical proof showing that the array model is a generalization of the relational model; in effect that the array model subsumes the relational model.

For example, document databases, such as systems from MongoDB and Couchbase, have become popular with developers owing to their schema-less or schema-lite approach, making it easier to get systems up and running. But there is a cost when it comes to analytics, Papadopoulos argues.

"You may be able to store an image in a document database like MongoDB but you store it as a blob; you're not going to store each pixel separately," he said. "So that image is not analysis-ready. In an object store, you can't slice it. You can't create these multi-resolution images, to be able to zoom in, zoom out, and do that interactively with the cloud.

"The images that we're handling are in the terabyte scale. In a document database, you would have to download the whole file locally, but you may not have enough memory and enough storage to do this. TileDB stores it in a structured way, which is tiled and indexed, so you can slice any portion and you can do analytics in a distributed way – you don't need tons of memory to do this."

TileDB was born out of Papadopoulos's time as a research scientist at MIT's Intel Labs, working on supporting scientific research. The main focus remains life sciences, where the multitude of X-rays, CAT scans, genomic data, and transcripts play to TileDB's strengths, but there are also opportunities in engineering diagnostics and financial services, he said.

"The way people are solving these problems today is that they're either putting together 10 different tools that are completely different to each other: a relational database, a key value database, bespoke files and formats.

"And then they're hiring big teams of data engineers, and they're building catalogs on top and access control layers and logging layers. Effectively, they're reinventing the database, but to manage other databases, and that's what they call the modern data stack. There are different flavors of the same thing, but they conceal a problem: instead of going back to the roots, and fixing this problem at its core, they're hacking it."

TileDB comes in an open source and a commercial offering. Unlike so-called cloud-native data warehouse systems that mushroomed in popularity over the last decade – including Snowflake and AWS Redshift – TileDB charges a flat license fee based on seats and data volume.

Papadopoulos argued that the pay-as-you-go consumption model for data analytics could create a conflict between sales teams who want to see consumption go up, and the engineering team trying to make the system become more efficient, and as a result, potentially reduce consumption.

Andy Pavlo, associate professor of databaseology at Carnegie Mellon University, said the conceptual foundation of TileDB has some merit. "Multi-dimensional arrays are the only data model that you do not want to store in a native relational DBMS. A row-store scans data 'horizontally,' a column-store scans data 'vertically.'

"But some array query access patterns do arbitrary traversals across different dimensions. Therefore, you want a specialized engine – like TileDB – to handle them. But no major cloud provider offers a hosted array DBMS service, meaning they do not see a sizable market."

Pavlo pointed out that SQL:2023 – the ninth edition of the ubiquitous ISO query language – added support for multi-dimensional arrays (SQL/MDA). TileDB supports SQL.

However, array databases were not necessary for vector analytics – something that has become en vogue due to escalating interest in large language models in machine learning.

"Vectors are just single-dimension arrays. There is nothing special about them; relational DBMSes have supported them for decades. The vector DBs have added indexes to do fast (approximate) nearest neighbor search," said Pavlo, who is also CEO of database performance management company OtterTune. ®

Send us news
3 Comments

Microsoft extends life support for aging Apache Cassandra 3.11 database

But only if you're ready to cozy up in Azure's abode

Buyer's remorse haunts 3 in 5 business software purchases

They never do tell you about the unexpected costs and overly complex implementations

Forcing Apple to allow third-party app stores isn't enough

You're excited about Meta offering iOS apps via Facebook ads? Really?

SAP barely moving needle to migrate users off ECC before support ends

Gartner finds only a third are somewhat prepared for S/4HANA transition

MariaDB ditches products and staff in restructure, bags $26.5M loan to cushion fall

Strategic DBaaS and distributed back end jettisoned after years of promotion

Microsoft starts offering advice in how to code for Arm

In 2027 a quarter of PCs won’t use x86, and Redmond wants its ecosystem ready

Atlassian users complain of cloud migration dead ends, especially in UK

Lack of local clouds and inflexible offers see users depart. Maybe the new ‘Compass’ developer experience tool will be more to their liking

US venture capitalist spending continues to slide, hits six year low in Q3

Looking for a bright spot? Well, it's easier to beg the Feds for cash now, say researchers

In rare bout of generosity, Oracle extends free support for Database 19c

Big Red says it wants to give customers time to upgrade to 23c, which only exists in the cloud for now

Thousands of Teslas recalled over brake fluid bug

OTA software update to deal with misbehaving sensor

Qualtrics culls 780 jobs amid 'complex' growth spurt

Staff get Zoom meeting to hear how they might be hit after phase of rapid hiring ends

Microsoft says VBScript will be ripped from Windows in future release

It's PowerShell or something similar in the not too distant future