LLM Prompt Engineering Techniques for Knowledge Graph Integration
Updated: Jan 10, 2024
Ever since the hype of ChatGPT, a substantial amount of Large Language Model (LLM) variations have been widely implemented in various ways and revolutionizing how we interact with AI. With simple prompt engineering techniques, users can adjust the model output to their preference, no matter their level of expertise in programming languages. Essentially, LLMs appear to have the potential to turn natural language to a new programming language. To have a better understanding the capabilities of prompt engineering, we will walk through a use case where we turn LLM model responses of user questions into knowledge graph, for example:
User Question:
"How to self study Generative Artificial Intelligence in 2024"
Output:
Hope this introductory guide can become a starting point for some of you to experiment more elegant and comprehensive solutions.
What is Prompt Engineering?
Large Language Models (LLM) are typically trained on trillions of words through self-supervised deep learning techniques. To train a LLM from scratch takes a significant amount of time and resources. Fortunately, there are several techniques to tune the model for customized requirements. Here are three common techniques, listed from the most to the least computationally expensive:
Pre-Training: this refers to the early stage of Language Model (LLM) development, which is typically the most time-consuming process. During this stage, the model learns from a large corpus through self-supervised learning.
Fine-Tuning: it describes the process of adding niche and specialized datasets to modify the LLM so that it is more aligned with a certain objective. It differs from prompt engineering because fine-tuning enables updates to the LLM weights and parameters.
Prompt Engineering: the most cost effective way to change the model output. It ranges from simple instruction tuning mechanism (which is mainly discussed in this article) to more advanced mechanisms such as RAG (Retrieval Augmented Generation), CoT (Chain of Thought) or ART (Automatic Reasoning and Tool Use) etc.
Different models have very distinct response to certain prompt instructions. Following prompt techniques work well with OpenAI model may not apply to other models.
Prompt Engineering Techniques
We will examine how minor changes in the way we guide the model can significantly alter the outputs. Although prompts may be presented in vastly different format, they generally follow the same structure of 4 components: instruction, context, input data, output indicator.
We can consolidate all components into a single instruction message for LLM to interpret. Alternatively, LangChain, an LLM integration framework, offers tools to separate each component into its individual unit, providing more clarity for the model. We will use LangChain to construct LLM chains that employ an OpenAI model and then experiment on various instruction cues to alter model response.
1. Baseline Prompt Template
LangChain provides PromptTemplate to create a simple prompt instruction which takes two essential parameters: template and input_variables. It works similar to string formatting in python where the variables are enclosed by { }. For example below, we define the template to provide a consistent instruction to feed into the model. Since the template is passed as an entire string, when creating the template using PromptTemplate, we explicit ask the model to treat {question} as an input variable from users.
template = """Question: {question}"""
prompt = PromptTemplate(
template=template,
input_variables=["question"])
To apply the prompt template, we feed it into a LLM chain as defined below.
import os
from langchain.chains import LLMChain
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
# save the OpenAI token as environment variable
os.environ["OPENAI_API_KEY"] = "<your_openai_token>"
# create a model that allows output of maximum 1000 tokens
openai_model = OpenAI(max_tokens=1000)
chain = LLMChain(
prompt=prompt,
llm=openai_model)
Let the LLM chain runs over the user question. Of course, you can also explore other questions of your interest.
question = "How to self study Generative Artificial Intelligence in 2024"
chain_response = chain.run(question)
Model Response
1. Understand the basics: Before diving into any complex topic, it is important to have a strong foundation. Begin your study by understanding the basic concepts of artificial intelligence, such as machine learning, deep learning, and neural networks. You can find various online resources, such as YouTube tutorials, online courses, and blogs, to help you with this.
2. Learn programming languages: A good understanding of programming languages is essential for working with generative artificial intelligence. Some popular languages used in AI are Python, Java, and C++. Choose one or two languages to focus on and practice writing code to build your skills.
3. Study mathematical concepts: Generative AI involves a lot of mathematical concepts, such as linear algebra, calculus, and probability. These concepts are used to design and train AI models. It is important to have a basic understanding of these concepts, so consider taking a course or using online resources to learn them.
...
2. Think Step by Step
Adding a simple cue “think step by step” in the prompt can instruct the model for more sequential and logical response, which is more suitable for generating knowledge graph.
template2 = """
Question: {question}
Let's think step by step.
""
We then use the same chain defined above to encapsulate template2 and the model will generate the following response.
prompt2 = PromptTemplate(template=template2, input_variables=["question"])
chain2 = LLMChain(prompt=prompt2, llm=openai_model)
chain2_response = chain2.run(question)
Model Response
Step 1: Set your goals
Start by setting clear and achievable goals for yourself. This will give you a sense of direction and motivation throughout your self-study journey. For example, your goal could be to understand the fundamentals of Generative Artificial Intelligence and be able to apply it in real-world scenarios by the end of 2024.
Step 2: Identify your learning resources
There are plenty of resources available for self-studying Generative Artificial Intelligence. Some of the best resources include online courses, books, research papers, and tutorials. Take some time to research and identify the resources that best suit your learning style and goals.
Step 3: Create a study schedule
Once you have your learning resources, create a study schedule that works for you. It is important to be consistent and dedicated in your studies to effectively learn and retain the material. Set aside a specific time each day or week to focus on studying Generative Artificial Intelligence.
...
3. Give Model a Persona
Specifying a persona or role is another useful way to provide context and perspective that the model should generate the answer based on.
template3 = """
Question: {question}
You are a Course Instructor who designs a progressive syllabus based on the Question. \\
Break it down into modules which contains a list the topics, concepts, or themes.
"""
Model Response
Module 1: Introduction to Generative Artificial Intelligence
- History and evolution of Generative Artificial Intelligence
- Key concepts and principles
- Applications and use cases
- Ethical considerations
Module 2: Machine Learning Fundamentals
- Basics of machine learning
- Types of machine learning algorithms
- Supervised, unsupervised, and reinforcement learning
- Deep learning and neural networks
Module 3: Generative Adversarial Networks (GANs)
- Understanding GANs and their components
- Training and optimization techniques
- Applications in image and text generation
- Challenges and limitations
...
You may want to modify the role or persona depending on the type of answer users are looking for.
4. Generate Formatted Output
Additionally, we can direct the model to generate output with certain formats, e.g. JSON, XML, HTML. The model is smart enough to reorganize its response in JSON by prompting “Provide answer in JSON format”. This technique is particularly useful for programmatically parsing outputs with reliability and consistency. These outputs can then be fed into subsequent steps in a data analytics pipeline.
template4 = """Question: {question}
Answer: Let's think step by step.
Provide Answer in JSON format.
"""
Module Response
{
"Module 1": "Introduction to Generative Artificial Intelligence",
"Topics": [
"Definition and Basic Concepts of AI",
"Overview of Generative AI",
"History and Evolution of Generative AI",
"Applications of Generative AI"
],
"Module 2": "Machine Learning Fundamentals",
"Topics": [
"Supervised, Unsupervised, and Reinforcement Learning",
"Neural Networks and Deep Learning",
"Data Preprocessing and Feature Engineering",
"Model Evaluation and Validation"
],
"Module 3": "Generative Models",
"Topics": [
"Probabilistic Models",
"Generative Adversarial Networks (GANs)",
"Variational Autoencoders (VAEs)",
"Auto-Regressive Models",
"Generative Flow Models"
],
...
5. Provide Specific Instructions
Let's explore how we can add more details to the prompt make the output better align with the expected format for the knowledge graph as below.
{
"input": ... ,
"output": ... ,
"module": ...
}
To achieve the desired format, we'll create an additional prompt specifically for formatting. The model response from section 3 will be fed into this prompt through input_variables "text", then generate response with these specific requirements.
a list of JSON objects
each object has three keys: “input”, “output” and “module”
keep the wording consistent and concise
“module” is numeric
template5 = """
Text: {text}
Response:
Format Text above as a list of JSON objects with keys "input", "output" and "module".
"input" and "output" represent one and only one key concept respectively and "output" has dependency on "input".
Keep consistent and concise wordings (two to three phrases) for "input" and "output".
Do not include Module in the "input" or "output".
"module" is a numeric value indicates the Module in the syllabus.
"""
We modify the LLM chain to run over the model_response and interpret the {text} element as the input variable.
format_prompt = PromptTemplate(template=template5, input_variables=["text"])
format_chain = LLMChain(prompt=format_prompt, llm=openai_model)
formatted_response = format_chain.run(model_response)
print(formatted_response)
Model Response
...
{
"input": "Machine Learning",
"output": "Basic concepts and principles of machine learning",
"module": 2
},
{
"input": "Machine Learning",
"output": "Types of machine learning (supervised, unsupervised, reinforcement)",
"module": 2
},
{
"input": "Machine Learning",
"output": "Algorithms commonly used in Generative AI (e.g. neural networks, deep learning)",
"module": 2
}
...
6. One-Shot/Few-Shot Prompting
You might observe that the model hasn't accurately interpreted the requirements, for instance, even being explicitly told to “Extract the concepts within the bracket and after ```e.g``` as separate objects.", it still gives us "output": "Algorithms commonly used in Generative AI (e.g. neural networks, deep learning)".
When facing abstract instructions like this, even human has trouble understanding the exact meaning. We learn best through examples, similarly, providing example output (i.e. one-shot / few-shot example) is a powerful technique to let the model adapt rapidly and improves the response towards the direction of desired format instantaneously.
template6 = """
Text: {text}
Response:
Format Text above as a list of JSON objects with keys "input", "output" and "module".
"input" and "output" represent one and only one key concept respectively and "output" has dependency on "input".
Keep consistent and concise wordings (two to three phrases) for "input" and "output".
Do not include Module in the "input" or "output".
"module" is a numeric value indicates the Module in the syllabus.
Extract all concepts within the bracket and after ```e.g.``` as seperate "output" objects.
An example:
```
Text: "Module 6: \
- A (e.g. B, C)"
Response:
{{
"input": "A",
"output": "B",
"module": 6
}},
{{
"input": "A",
"output": "C",
"module": 6
}}
```
"""
Model Response
...
{
"input": "Basic concepts and principles of machine learning",
"output": "Machine Learning",
"module": 2
},
{
"input": "Types of machine learning",
"output": "Supervised Learning",
"module": 2
},
{
"input": "Types of machine learning",
"output": "Unsupervised Learning",
"module": 2
},
{
"input": "Types of machine learning",
"output": "Reinforcement Learning",
"module": 2
},
{
"input": "Algorithms commonly used in Generative AI",
"output": "Neural Networks",
"module": 2
}
...
Generate Knowledge Graph
We will use Network Graph to generate knowledge graph from the model output. Python library pyvis has made it easy for us to create interactive nodes and edges. This is achieved by using add_edge function and feeding ”input” and “output” nodes to create a directed edge between them. The node size is further customized by the value of “module”.
Occasionally, the response may be incomplete due to the limitation of token, therefore, we truncate the response to the last complete object. Then we load JSON string as dictionary format using json.loads for further processing.
import json
json_response = formatted_response[:formatted_response.rfind("}")+1] + "]"
json_list = json.loads(json_response)
Following code snippet handles the remaining steps to create the knowledge graph.
from pyvis.network import Network
from IPython.core.display import HTML
net = Network(height="750px", width="100%", bgcolor="#222222",
font_color="white", notebook=True,
cdn_resources='in_line', directed=True)
for i in json_list:
input = i['input']
output = i['output']
effort = i['effort']
# add nodes
net.add_node(input, title=input)
net.add_node(output, title=output, value=effort)
# add edges
net.add_edge(input, output)
# save and open the graph
net.save_graph("pyvis.html")
HTML(filename="pyvis.html")
As the result, we will see a knowledge graph like this.
Hope you found this article helpful. If you’d like to support my work and see more articles like this, treat me a coffee ☕️ by signing up Premium Membership with $10 one-off purchase.
Take-Home Message
This article explains various techniques to guide a large language model (LLM) to generate desired outputs, by briefing explain what is prompt engineering and explore instruction-based techniques:
Baseline Prompt Template
Think Step by Step
Give Model a Persona
Generate Formatted Output
Provide Specific Instructions
One-Shot/Few-Shot Prompting
We then generate a knowledge graph from the instructed LLM output.