August 7, 2024
Product News

What's New in July '24

It’s been a while since we provided intel on our latest product features and improvements. We have been busy attending events and working with new clients, but here are the latest updates to our platform.
July Product Roundup 2024

Background

The world of data engineering (and pretty much all industries) has been focusing on the use of AI to automate processes, generate code/images/content and simplify complex tasks. If you check socials or attend talks and conferences you'll encounter AI at every turn, its been a whirlwind 18 months.  

At The Data Refinery, we've sat on the quietly sidelines (somewhat) and observed how AI has been used, the trade offs it enforces and the general difficulties encountered by people adopting the technology. We wanted a solid use case for AI in our platform..

As a SaaS data platform we have many users and brands using our platform everyday, we empower those users that don't have data or analytical skills to answer key questions of their data in a simple straight forward way. All such actions can be completed using the various tools our platform provides, or by hooking up third party tooling (e.g. power BI).

Aligned to our approach to guided analytics, we also wanted to take the pain out of adopting AI, helping our customers avoid the pitfalls and learning curve required to get value of AI implementations and make our platform even more accessible at the same time. 

Our goal was to give our users the ability to ask a plain text question relating to any and all of their data, and for our platform to both answer the question and also wire in that response to our existing dashboard and segmentation tools. This is what it looks like:

A screenshot of a graphDescription automatically generated

Figure 1: Top selling products

As you can see, we now allow our clients to hold a conversation with their data, which further reduces typical barriers when accessing data.

As detailed in our second AI blog post, there are a number of technique's that can be used to ensure an AI implementation behaves as you expect and achieve the results above, but there is limited value in building an AI service if your data foundation cannot support it.

This blog posts details the pain points we help our customers avoid and provides answers to those typical problems based on our work in the data and AI space to date.

AI Problems

Anyone that attempts to use AI in the data space, specifically in the Data Warehouse space, will face a number of common issues when working with an AI model:

Data Access: How to provide access to just the information it needs to be effective.

Data Context: How to provide enough context to the model so it can reason and choose the correct path, without needed excessive prompt engineering.

Data Consistency: What happens when data is inconsistent across databases and clients, does this mean we need an AI per system?

Data Structure: How best to structure data that is intended to be consumed by an AI service

All of these issues can be seen as typical outputs of /requires of an organisations data strategy, where investment has been made to ensure data is managed and maintained consistently across an organisation, thus can be subject to companywide reporting and analysis.

Another lens to put over AI adoption challenges is that most modern SaaS platforms have an AI offering, which one do you use? Shopify, Google, Facebook, Klaviyo all have something to say on the subject but as with the reporting elements of these platforms, the AI works well in isolation with each platform.

AI Answers

Data Access

Security and privacy are major concerns when working with AI models. The prospect of giving an AI service access to what is typically the crown jewels of any data driven organisation (the data tier) is a scary one indeed. So what are the options here:

The Data Refinery has a semantic layer that sits atop of our data tier (Cube). The layer allows us to abstract a raft of database meta data and configuration, without making any changes to our physical data warehouse. The layer allows us and our customers to configure models, views and derived columns using simple configuration.

The semantic layer is also ideal metadata to provide to an AI allowing it to understand both the schema of a data source and also the context of the metadata, which in this case can be much richer than traditional database schema metadata.

This means that for us, our AI models only have access to metadata and any data we choose to provide it for context, meaning the generation of data warehouse queries happens without the AI connecting to any form of database. In our world, we have both an AI service and a Query Service, by orchestrating across these services we can generate a database query using AI, validate the generates query and then execute that query using the standard security model used by the rest of our platform. Win.

A diagram of service orchestrationDescription automatically generated

Figure 2: Marshalling an AI Model

Data Context

Whilst building our AI implementation, we spent a decent chunk of time tuning our semantic layer to best suit the needs of our AI model. Our initial attempts to give the model access to a large semantic map worked, but we found that the model would often make leaps as to how it would perform an analytical query, mainly because there was almost too much data for it to choose from.

We also found that we needed a lengthy prompt to give the model guidance as to how to navigate the vast amount of metadata on offer. As detailed in our 2nd blog post, the answer was to break down our AI implementation into may AI agents that each specialised in doing part of the role of an Analyst. Those roles included:

To enable the AI agents to fulfil each role, our semantic layer needed to be tuned to be more instructional than our previous implementation that was more factual. Some examples would be:

Original metadata

Table Name: CustomerProfile

Table Description: "This table holds data about customers"

Revised metadata

Table Name: CustomerProfile

Table Description: "Use this table to answer questions about customers. Combine with the enrichment table for questions relating to affluence, locale and demographics"

A close-up of a documentDescription automatically generated

Figure 3: Improved Metadata

Whilst only a subtle change, this language change up is more meaningful to our AI models and can be re-used as instructions to our users when reviewing our data model.

Drilling down another level we have:

A screenshot of a graphDescription automatically generated

Figure 4: Improved Column Metadata

This level of data context ensures our AI is aimed with enough context about our data models to correctly convert a users question into a valid database query.

Data Consistency

As with any data warehouse and analytical solution, having consistent data is essential, especially when operating with AI across data that is sourced from multiple systems, each of which carries nuances that could trip up an AI implementation.  

A column in SystemA might be named the same as a column from SystemB but it's purpose and data content could be very different.  

To address this issue and allow our AI model to confidently operate across multiple systems, our common data model ensures that not only is data mapped and catalogued into a single schema, any data fields that can be standardised (enumerations or identifiers) are made consistent.

In the below example, we can see two operations that are very much the same, but each system records and stores that event differently

Mailchimp Event

A close-up of a boxDescription automatically generated

Figure 5: Mailchimp Events

Klaviyo Event

A close-up of a boxDescription automatically generated

Figure 6: Klaviyo Events

  In our data model, these records become 2 event records that look like this:

A close-up of a dataDescription automatically generated

Figure 7: Harmonised Events

These records allow an AI model to reason with and establish that the intent of these two events are indeed the same without the need to use multiple prompts or AI agents to handle mismatches or ambiguity between system records or enumerations.

Ensuring that the data schema is consistent across all entities from both a layout and content perspective means your AI model will need less custom instructions and should be more successful in performing the operation asked of it.

Data Structure

Aligned to consistency, ensuring your data has a well thought out structure and a logical path from one entity to the next will ensure any AI that is asked to understand the schema will have an easier time. Often a challenge faced by organisation is the amount of business or system context that can be entrenched in a data model. This can make it almost impossible for an AI to make a reliable choice when being asked a question of data, even with questions being curated using aligned business context.

Key consideration here are:

TableName: Ensure it has meaning and not littered with abbreviations, also ensure related tables are sensibly named

Columns: Ensure the purpose of a column is clear, is the column a number, aggregation, date, text field?  

For reference we follow a standard pattern across all of our entities which is:

The above is where using a semantic layer can help remove the pain of altering an existing data schema. Being able to put a façade between your database and an AI model, allows you to re-define how a table or column is labelled and documented.

AI Outcomes

Our implementation now allows our user to do all the things previously possible in our platform, by simply asking a question of our AI model.

Under the hood, our model generates not only an answer to a users question, but creates the underlying data query that we use across the platform to create charts, stats, segments and dashboards.

This means you can go from asking a question, to having a visualisation in 1 click, for example:

A screenshot of a chatDescription automatically generated

Figure 8: AI integration

This also means you can drop an AI generated chart or query into our existing tools to understand how the answer was generated, this allows a user to then further refine the output and get a glimpse into how they can build such queries themselves.

As mentioned previously, we want to empower our user to get the answers they need without needing to be data experts, we also want to help users understand their data and start to go on a journey to being experts in their own data, our new AI implementation enables this meaning we now have 3 levels of data access:

As our platform continues to evolve, we expect these lines to be blurred further, with AI being the primary access point for our users, where switching to a query builder or SQL query is required less and less.

Summary

In the data space, being AI ready isn't all about having the skills or compute power to leverage the latest AI toolsets, it's about having your data ready to be used by AI. There is little point in adopting AI without a consistent, reliable and meta-rich data landscape.  

We are proud of the effort that we have put into our platform over the last 2 years to make our data warehouse as consistent and uniform as possible, meaning we really do have AI capabilities that work for all of our clients. That work tee's us up nicely for the really interesting next steps in AI which will help our customers benefit further from AI driven marketing, forecasting and the next innovations in AI models.

Maximise your data value

See how The Data Refinery can unlock the value in your data.
Director of Product at The Data Refinery

Check out our latest posts

Stop settling for less.
Get the analytics platform your team needs

With The Data Refinery, you’ll have the tools to unify your data, track KPIs, analyse trends and optimise for growth, all tailored to your ecommerce setup.