Retrieval for Structured Data

Motivation

Language models (LLMs) are trained on vast but fixed datasets, which limits their ability to access up-to-date or domain-specific information.

To enhance their performance on specific tasks, we can augment their knowledge using retrieval systems.

Retrieval systems fetch relevant information from external sources, which can then be included in the prompt given to the model.

Key benefits of using retrieval systems include:

A large fraction of the world's operational data is structured, often organized into database tables with a specific schema.

Various DSLs (Domain Specific Languages) have been developed to interact with these systems including SQL, Cypher, and PQL.

A popular approach to interacting with structured data is to use an LLM to convert natural language queries into a DSL for the relevant database.

In particular, text-to-SQL and text-to-Cypher are useful ways to interact with structured and graph databases respectively.

Name	When to Use	Description
Text to SQL	If users are asking questions that require information housed in a relational database, accessible via SQL.	This uses an LLM to transform user input into a SQL query.
Text-to-Cypher	If users are asking questions that require information housed in a graph database, accessible via Cypher.	This uses an LLM to transform user input into a Cypher query.

See our tutorials on text-to-SQL and text-to-Cypher for more details.

tip

See our blog post overview and RAG from Scratch video on query construction.