NLP to SQL: Build Data Questioning and Answering Application


Introduction to NLP to SQL

With the availability of powerful large language ******, we now can convert natural language into accurate SQL (NLP to SQL) with a single callout, enabling users to express their information needs naturally and efficiently.

In this blog, we will explore the creation of a Data Questioning and Answering web application.

Technology Stack

  • Python
  • Azure OpenAI
  • Sqlite3

Flow

Nlp To Sql

ER Diagram

NLP to SQL - DB

Prompt

While making a call to the Azure OpenAI model, we need to carefully craft our prompt. Precision is crucial, as an ambiguous prompt may result in unintended outcomes.

We have set up the following prompt, where we replace the user’s NLP query.

prompt = """
    Given the following SQL tables, your job is to write queries given a user’s request.
    CREATE TABLE Branches (
        BranchID INTEGER PRIMARY KEY,
        City TEXT,
        Country TEXT,
        EmployeeNumber INTEGER,
        GM TEXT
    )
    CREATE TABLE IF NOT EXISTS Departments (
        DepartmentID INTEGER PRIMARY KEY,
        Name TEXT
    )
    CREATE TABLE IF NOT EXISTS Employees (
        EmployeeID INTEGER PRIMARY KEY,
        Name TEXT,
        Position TEXT,
        Salary REAL,
        DepartmentID INTEGER,
        BranchID INTEGER,
        FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID),
        FOREIGN KEY (BranchID) REFERENCES Branches(BranchID)
    )
    ''')
    Desired Format: Return valid sql query strictly for above tables otherwise return undefined
    Translate the following natural language query into SQL: {}
    SQL Query:
""".format(user_nlp_query)

Azure OpenAI

NLP to SQL is one of the advantages of the Azure OpenAI model we can leverage for our Data Questioning and Answering application. In the code below, we have used the text-davinci-003 model, which converts human language to SQL

# Using Azure OpenAI API
from openai import AzureOpenAI
api_key = config["AZURE_OPENAI_KEY"]
client = AzureOpenAI(
    azure_endpoint="https://vc-test-openai.openai.azure.com/",
    api_key = api_key,
    api_version="2023-09-15-preview",
)
response = client.completions.create(
    model="vc-text-davinci-003",
    prompt=prompt,
    max_tokens=1024
)

HTML

HTML for Nlp to Sql

NLP to SQL Examples

Below are some human language queries I tried to input through an HTML form. I validated their corresponding SQL queries and their actual responses against the database. There are situations where we receive an SQL query that results in an error when executed on the actual database. We handle this scenario by asking the user a different or more appropriate question

/* 1. List all employees with their corresponding department names. */
        T Employees.Name, Departments.Name 
        FROM Employees 
        LEFT JOIN Departments 
        ON Employees.DepartmentID = Departments.DepartmentID;

/* 2. Find the total number of employees in each department. */
        SELECT DepartmentID, COUNT(*) AS Total_Employees
        FROM Employees
        GROUP BY DepartmentID;
        
        
/* 3. Identify employees who are managers along with their department names. */
        SELECT e.Name, e.Position, d.Name
        FROM Employees e
        JOIN Departments d ON e.DepartmentID = d.DepartmentID
        WHERE e.Position = 'Manager'


/* 4. List all employees in the IT department with their salaries. */
        SELECT EmployeeID, Name, Position, Salary FROM Employees 
        WHERE DepartmentID = (SELECT DepartmentID FROM Departments WHERE Name="IT")


/* 5. Find the average salary for employees in each department. */
        SELECT DepartmentID, AVG(Salary) 
        FROM Employees 
        GROUP BY DepartmentID;


/* 6. Identify the department with the highest total salary expense. */
        SELECT Departments.Name, SUM(Employees.Salary) AS TotalExpense
        FROM Employees
        INNER JOIN Departments
        On Employees.DepartmentID = Departments.DepartmentID
        GROUP BY Departments.Name
        ORDER BY TotalExpense DESC
        LIMIT 1;


/* 7. List employees who do not belong to any department. */
        SELECT * FROM Employees WHERE DepartmentID IS NULL;


/* 8. Find the department with the least number of employees. */
        SELECT Departments.Name
        FROM Employees
        LEFT JOIN Departments
        ON Departments.DepartmentID = Employees.DepartmentID
        GROUP BY Departments.Name
        ORDER BY COUNT(Employees.EmployeeID) ASC
        LIMIT 1;


/* 9. List employees who belong to the Finance department and have a salary above 1000 */
        SELECT *
        FROM Employees 
        WHERE DepartmentID = (SELECT DepartmentID FROM Departments WHERE Name="Finance")
        AND Salary > 1000;

/* 10. What is the salary of employee ID 101?*/
        "SELECT Salary FROM Employees WHERE EmployeeID = 101;

Hallucination

Hallucination refer to instances where the model generates or predicts SQL queries that are not accurate or relevant to the user’s intent. This might result in incorrect or unintended database operations. I have tried all the human language queries above, but it didn’t generate invalid SQL

Challenges

Users can manipulate data either through a user form or via a prompt. We should handle all scenarios where the end user can only retrieve data from the application. In the above prompt section, I have included the desired format to constrain the model to generate a valid query; otherwise, it returns ‘undefined’ to handle invalid user queries. Users can ask anything which is not relevant to a particular database.

@app.route("/qa", methods=["POST"])
def prompt_to_qa():
    nlp_query = request.form.get("query")
    sql_query = get_sql_query(nlp_query)
    print(sql_query)
    if(sql_query.strip().lower() == "undefined"):
        output = "Not a valid question"
    else:
        result = str(execute_sql_query(sql_query))
        check_json = is_json(result)
        if(check_json):
            output = json.loads(result)
        else:
            output = result
    return { "nlp_query": nlp_query, "sql_query": sql_query, "result": output}

Conclusion

You can modify the technology stack mentioned above to streamline the development of applications, regardless of UI, LLM, or database. Additionally, you can leverage libraries such as pandas. By experimenting with different combinations of the technology stack, we can enhance the capabilities of the application.

Important Links





Source link

Social media & sharing icons powered by UltimatelySocial
error

Enjoy Our Website? Please share :) Thank you!