Introduction to NLP to SQL
With the availability of powerful large language ******, we now can convert natural language into accurate SQL (NLP to SQL) with a single callout, enabling users to express their information needs naturally and efficiently.
In this blog, we will explore the creation of a Data Questioning and Answering web application.
Technology Stack
- Python
- Azure OpenAI
- Sqlite3
Flow
ER Diagram
Prompt
While making a call to the Azure OpenAI model, we need to carefully craft our prompt. Precision is crucial, as an ambiguous prompt may result in unintended outcomes.
We have set up the following prompt, where we replace the user’s NLP query.
prompt = """ Given the following SQL tables, your job is to write queries given a user’s request. CREATE TABLE Branches ( BranchID INTEGER PRIMARY KEY, City TEXT, Country TEXT, EmployeeNumber INTEGER, GM TEXT ) CREATE TABLE IF NOT EXISTS Departments ( DepartmentID INTEGER PRIMARY KEY, Name TEXT ) CREATE TABLE IF NOT EXISTS Employees ( EmployeeID INTEGER PRIMARY KEY, Name TEXT, Position TEXT, Salary REAL, DepartmentID INTEGER, BranchID INTEGER, FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID), FOREIGN KEY (BranchID) REFERENCES Branches(BranchID) ) ''') Desired Format: Return valid sql query strictly for above tables otherwise return undefined Translate the following natural language query into SQL: {} SQL Query: """.format(user_nlp_query)
Azure OpenAI
NLP to SQL is one of the advantages of the Azure OpenAI model we can leverage for our Data Questioning and Answering application. In the code below, we have used the text-davinci-003 model, which converts human language to SQL
# Using Azure OpenAI API from openai import AzureOpenAI api_key = config["AZURE_OPENAI_KEY"] client = AzureOpenAI( azure_endpoint="https://vc-test-openai.openai.azure.com/", api_key = api_key, api_version="2023-09-15-preview", ) response = client.completions.create( model="vc-text-davinci-003", prompt=prompt, max_tokens=1024 )
HTML
NLP to SQL Examples
Below are some human language queries I tried to input through an HTML form. I validated their corresponding SQL queries and their actual responses against the database. There are situations where we receive an SQL query that results in an error when executed on the actual database. We handle this scenario by asking the user a different or more appropriate question
/* 1. List all employees with their corresponding department names. */ T Employees.Name, Departments.Name FROM Employees LEFT JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID; /* 2. Find the total number of employees in each department. */ SELECT DepartmentID, COUNT(*) AS Total_Employees FROM Employees GROUP BY DepartmentID; /* 3. Identify employees who are managers along with their department names. */ SELECT e.Name, e.Position, d.Name FROM Employees e JOIN Departments d ON e.DepartmentID = d.DepartmentID WHERE e.Position = 'Manager' /* 4. List all employees in the IT department with their salaries. */ SELECT EmployeeID, Name, Position, Salary FROM Employees WHERE DepartmentID = (SELECT DepartmentID FROM Departments WHERE Name="IT") /* 5. Find the average salary for employees in each department. */ SELECT DepartmentID, AVG(Salary) FROM Employees GROUP BY DepartmentID; /* 6. Identify the department with the highest total salary expense. */ SELECT Departments.Name, SUM(Employees.Salary) AS TotalExpense FROM Employees INNER JOIN Departments On Employees.DepartmentID = Departments.DepartmentID GROUP BY Departments.Name ORDER BY TotalExpense DESC LIMIT 1; /* 7. List employees who do not belong to any department. */ SELECT * FROM Employees WHERE DepartmentID IS NULL; /* 8. Find the department with the least number of employees. */ SELECT Departments.Name FROM Employees LEFT JOIN Departments ON Departments.DepartmentID = Employees.DepartmentID GROUP BY Departments.Name ORDER BY COUNT(Employees.EmployeeID) ASC LIMIT 1; /* 9. List employees who belong to the Finance department and have a salary above 1000 */ SELECT * FROM Employees WHERE DepartmentID = (SELECT DepartmentID FROM Departments WHERE Name="Finance") AND Salary > 1000; /* 10. What is the salary of employee ID 101?*/ "SELECT Salary FROM Employees WHERE EmployeeID = 101;
Hallucination
Hallucination refer to instances where the model generates or predicts SQL queries that are not accurate or relevant to the user’s intent. This might result in incorrect or unintended database operations. I have tried all the human language queries above, but it didn’t generate invalid SQL
Challenges
Users can manipulate data either through a user form or via a prompt. We should handle all scenarios where the end user can only retrieve data from the application. In the above prompt section, I have included the desired format to constrain the model to generate a valid query; otherwise, it returns ‘undefined’ to handle invalid user queries. Users can ask anything which is not relevant to a particular database.
@app.route("/qa", methods=["POST"]) def prompt_to_qa(): nlp_query = request.form.get("query") sql_query = get_sql_query(nlp_query) print(sql_query) if(sql_query.strip().lower() == "undefined"): output = "Not a valid question" else: result = str(execute_sql_query(sql_query)) check_json = is_json(result) if(check_json): output = json.loads(result) else: output = result return { "nlp_query": nlp_query, "sql_query": sql_query, "result": output}
Conclusion
You can modify the technology stack mentioned above to streamline the development of applications, regardless of UI, LLM, or database. Additionally, you can leverage libraries such as pandas. By experimenting with different combinations of the technology stack, we can enhance the capabilities of the application.