The Importance of Ontology to an IR System

2026.03 by Alfred Lu

This notebook is dual-licensed:

You are free to share and adapt this material for any purpose, even commercially, provided you give appropriate credit to me.

In [1]:
%load_ext autoreload
%autoreload 2
In [2]:
import logging
import warnings
from transformers import logging as transformers_logging

logging.getLogger("transformers").setLevel(logging.ERROR)
transformers_logging.set_verbosity_error()
warnings.filterwarnings('ignore')
logging.getLogger("BAAI").setLevel(logging.ERROR)
In [3]:
import KGRAG
import tqdm as notebook_tqdm
from sentence_transformers import SentenceTransformer
from rdflib import Graph, Namespace, RDF, RDFS, OWL, BNode, URIRef, Literal
import os

from pizzar_utils import *
from collections import defaultdict
In [4]:
g = Graph()
g.parse("pizza.rdf", format="xml")

PIZZA = Namespace("http://www.co-ode.org/ontologies/pizza/pizza.owl#")

concrete_triples = get_all_concrete_triples(g)

adj = defaultdict(list)
all_predicates = set()
all_node = set()
p_needs = ['hasSpiciness', 'hasTopping', 'belongs to', 'hasBase']
all_named_pizza = set()

# filter out BNode
final_triples = []
for s, p, o in concrete_triples:
    # s & o must be URIRef or transfered from Literal,not BNode
    if isinstance(s, BNode) or isinstance(o, BNode):
        continue
    s = get_uri_name(s)
    p = get_uri_name(p)
    o = get_uri_name(o)
    if (o == 'NamedPizza') and (p == 'belongs to'):
        all_named_pizza.add(s)
    if p in p_needs:
        adj[s].append((o, p))

        all_node.add(s)
        all_node.add(o)
        all_predicates.add(p)
        
        final_triples.append((s, p, o))

print(f"Has total {len(final_triples)} triples(no Blank Node)")
print(f"Has total {len(all_node)} nodes, {len(all_predicates)} rels")
all_rels = str(", ".join(list(all_predicates)))
print(f"All rels: {all_rels}")
print(f"\nAll triples: ")
for i, (s, p, o) in enumerate(final_triples):
    print(f"{i+1}. ({s}, {p}, {o})")

# distribution of rel
from collections import Counter
pred_counts = Counter([get_uri_name(p) for _, p, _ in final_triples])
print(f"\nTop 3 rels: ")
for pred, count in pred_counts.most_common(3):
    print(f"  {pred}: {count}")
Has total 249 triples(no Blank Node)
Has total 99 nodes, 4 rels
All rels: hasSpiciness, belongs to, hasBase, hasTopping

All triples: 
1. (American, belongs to, NamedPizza)
2. (American, hasTopping, MozzarellaTopping)
3. (American, hasTopping, PeperoniSausageTopping)
4. (American, hasTopping, TomatoTopping)
5. (AmericanHot, belongs to, NamedPizza)
6. (AmericanHot, hasTopping, HotGreenPepperTopping)
7. (AmericanHot, hasTopping, JalapenoPepperTopping)
8. (AmericanHot, hasTopping, MozzarellaTopping)
9. (AmericanHot, hasTopping, PeperoniSausageTopping)
10. (AmericanHot, hasTopping, TomatoTopping)
11. (AnchoviesTopping, belongs to, FishTopping)
12. (ArtichokeTopping, belongs to, VegetableTopping)
13. (ArtichokeTopping, hasSpiciness, Mild)
14. (AsparagusTopping, belongs to, VegetableTopping)
15. (AsparagusTopping, hasSpiciness, Mild)
16. (Cajun, belongs to, NamedPizza)
17. (Cajun, hasTopping, MozzarellaTopping)
18. (Cajun, hasTopping, OnionTopping)
19. (Cajun, hasTopping, PeperonataTopping)
20. (Cajun, hasTopping, PrawnsTopping)
21. (Cajun, hasTopping, TobascoPepperSauce)
22. (Cajun, hasTopping, TomatoTopping)
23. (CajunSpiceTopping, belongs to, HerbSpiceTopping)
24. (CajunSpiceTopping, hasSpiciness, Hot)
25. (CaperTopping, belongs to, VegetableTopping)
26. (CaperTopping, hasSpiciness, Mild)
27. (Capricciosa, belongs to, NamedPizza)
28. (Capricciosa, hasTopping, AnchoviesTopping)
29. (Capricciosa, hasTopping, CaperTopping)
30. (Capricciosa, hasTopping, HamTopping)
31. (Capricciosa, hasTopping, MozzarellaTopping)
32. (Capricciosa, hasTopping, OliveTopping)
33. (Capricciosa, hasTopping, PeperonataTopping)
34. (Capricciosa, hasTopping, TomatoTopping)
35. (Caprina, belongs to, NamedPizza)
36. (Caprina, hasTopping, GoatsCheeseTopping)
37. (Caprina, hasTopping, MozzarellaTopping)
38. (Caprina, hasTopping, SundriedTomatoTopping)
39. (Caprina, hasTopping, TomatoTopping)
40. (CheeseTopping, belongs to, PizzaTopping)
41. (CheeseyPizza, belongs to, Pizza)
42. (CheeseyPizza, hasTopping, CheeseTopping)
43. (CheeseyVegetableTopping, belongs to, CheeseTopping)
44. (CheeseyVegetableTopping, belongs to, VegetableTopping)
45. (ChickenTopping, belongs to, MeatTopping)
46. (ChickenTopping, hasSpiciness, Mild)
47. (Country, belongs to, DomainConcept)
48. (DeepPanBase, belongs to, PizzaBase)
49. (Fiorentina, belongs to, NamedPizza)
50. (Fiorentina, hasTopping, GarlicTopping)
51. (Fiorentina, hasTopping, MozzarellaTopping)
52. (Fiorentina, hasTopping, OliveTopping)
53. (Fiorentina, hasTopping, ParmesanTopping)
54. (Fiorentina, hasTopping, SpinachTopping)
55. (Fiorentina, hasTopping, TomatoTopping)
56. (FishTopping, belongs to, PizzaTopping)
57. (FishTopping, hasSpiciness, Mild)
58. (Food, belongs to, DomainConcept)
59. (FourCheesesTopping, belongs to, CheeseTopping)
60. (FourCheesesTopping, hasSpiciness, Mild)
61. (FourSeasons, belongs to, NamedPizza)
62. (FourSeasons, hasTopping, AnchoviesTopping)
63. (FourSeasons, hasTopping, CaperTopping)
64. (FourSeasons, hasTopping, MozzarellaTopping)
65. (FourSeasons, hasTopping, MushroomTopping)
66. (FourSeasons, hasTopping, OliveTopping)
67. (FourSeasons, hasTopping, PeperoniSausageTopping)
68. (FourSeasons, hasTopping, TomatoTopping)
69. (FruitTopping, belongs to, PizzaTopping)
70. (FruttiDiMare, belongs to, NamedPizza)
71. (FruttiDiMare, hasTopping, GarlicTopping)
72. (FruttiDiMare, hasTopping, MixedSeafoodTopping)
73. (FruttiDiMare, hasTopping, TomatoTopping)
74. (GarlicTopping, belongs to, VegetableTopping)
75. (GarlicTopping, hasSpiciness, Medium)
76. (Giardiniera, belongs to, NamedPizza)
77. (Giardiniera, hasTopping, LeekTopping)
78. (Giardiniera, hasTopping, MozzarellaTopping)
79. (Giardiniera, hasTopping, MushroomTopping)
80. (Giardiniera, hasTopping, OliveTopping)
81. (Giardiniera, hasTopping, PeperonataTopping)
82. (Giardiniera, hasTopping, PetitPoisTopping)
83. (Giardiniera, hasTopping, SlicedTomatoTopping)
84. (Giardiniera, hasTopping, TomatoTopping)
85. (GoatsCheeseTopping, belongs to, CheeseTopping)
86. (GoatsCheeseTopping, hasSpiciness, Mild)
87. (GorgonzolaTopping, belongs to, CheeseTopping)
88. (GorgonzolaTopping, hasSpiciness, Mild)
89. (GreenPepperTopping, belongs to, PepperTopping)
90. (HamTopping, belongs to, MeatTopping)
91. (HerbSpiceTopping, belongs to, PizzaTopping)
92. (Hot, belongs to, Spiciness)
93. (HotGreenPepperTopping, belongs to, GreenPepperTopping)
94. (HotGreenPepperTopping, hasSpiciness, Hot)
95. (HotSpicedBeefTopping, belongs to, MeatTopping)
96. (HotSpicedBeefTopping, hasSpiciness, Hot)
97. (IceCream, belongs to, Food)
98. (IceCream, hasTopping, FruitTopping)
99. (InterestingPizza, belongs to, Pizza)
100. (JalapenoPepperTopping, belongs to, PepperTopping)
101. (JalapenoPepperTopping, hasSpiciness, Hot)
102. (LaReine, belongs to, NamedPizza)
103. (LaReine, hasTopping, HamTopping)
104. (LaReine, hasTopping, MozzarellaTopping)
105. (LaReine, hasTopping, MushroomTopping)
106. (LaReine, hasTopping, OliveTopping)
107. (LaReine, hasTopping, TomatoTopping)
108. (LeekTopping, belongs to, VegetableTopping)
109. (LeekTopping, hasSpiciness, Mild)
110. (Margherita, belongs to, NamedPizza)
111. (Margherita, hasTopping, MozzarellaTopping)
112. (Margherita, hasTopping, TomatoTopping)
113. (MeatTopping, belongs to, PizzaTopping)
114. (MeatyPizza, belongs to, Pizza)
115. (MeatyPizza, hasTopping, MeatTopping)
116. (Medium, belongs to, Spiciness)
117. (Mild, belongs to, Spiciness)
118. (MixedSeafoodTopping, belongs to, FishTopping)
119. (MozzarellaTopping, belongs to, CheeseTopping)
120. (MozzarellaTopping, hasSpiciness, Mild)
121. (Mushroom, belongs to, NamedPizza)
122. (Mushroom, hasTopping, MozzarellaTopping)
123. (Mushroom, hasTopping, MushroomTopping)
124. (Mushroom, hasTopping, TomatoTopping)
125. (MushroomTopping, belongs to, VegetableTopping)
126. (MushroomTopping, hasSpiciness, Mild)
127. (NamedPizza, belongs to, Pizza)
128. (Napoletana, belongs to, NamedPizza)
129. (Napoletana, hasTopping, AnchoviesTopping)
130. (Napoletana, hasTopping, CaperTopping)
131. (Napoletana, hasTopping, MozzarellaTopping)
132. (Napoletana, hasTopping, OliveTopping)
133. (Napoletana, hasTopping, TomatoTopping)
134. (NonVegetarianPizza, belongs to, Pizza)
135. (NutTopping, belongs to, PizzaTopping)
136. (NutTopping, hasSpiciness, Mild)
137. (OliveTopping, belongs to, VegetableTopping)
138. (OliveTopping, hasSpiciness, Mild)
139. (OnionTopping, belongs to, VegetableTopping)
140. (OnionTopping, hasSpiciness, Medium)
141. (ParmaHamTopping, belongs to, HamTopping)
142. (ParmaHamTopping, hasSpiciness, Mild)
143. (Parmense, belongs to, NamedPizza)
144. (Parmense, hasTopping, AsparagusTopping)
145. (Parmense, hasTopping, HamTopping)
146. (Parmense, hasTopping, MozzarellaTopping)
147. (Parmense, hasTopping, ParmesanTopping)
148. (Parmense, hasTopping, TomatoTopping)
149. (ParmesanTopping, belongs to, CheeseTopping)
150. (ParmesanTopping, hasSpiciness, Mild)
151. (PeperonataTopping, belongs to, PepperTopping)
152. (PeperonataTopping, hasSpiciness, Medium)
153. (PeperoniSausageTopping, belongs to, MeatTopping)
154. (PeperoniSausageTopping, hasSpiciness, Medium)
155. (PepperTopping, belongs to, VegetableTopping)
156. (PetitPoisTopping, belongs to, VegetableTopping)
157. (PetitPoisTopping, hasSpiciness, Mild)
158. (PineKernels, belongs to, NutTopping)
159. (Pizza, belongs to, Food)
160. (Pizza, hasBase, PizzaBase)
161. (PizzaBase, belongs to, Food)
162. (PizzaTopping, belongs to, Food)
163. (PolloAdAstra, belongs to, NamedPizza)
164. (PolloAdAstra, hasTopping, CajunSpiceTopping)
165. (PolloAdAstra, hasTopping, ChickenTopping)
166. (PolloAdAstra, hasTopping, GarlicTopping)
167. (PolloAdAstra, hasTopping, MozzarellaTopping)
168. (PolloAdAstra, hasTopping, RedOnionTopping)
169. (PolloAdAstra, hasTopping, SweetPepperTopping)
170. (PolloAdAstra, hasTopping, TomatoTopping)
171. (PrawnsTopping, belongs to, FishTopping)
172. (PrinceCarlo, belongs to, NamedPizza)
173. (PrinceCarlo, hasTopping, LeekTopping)
174. (PrinceCarlo, hasTopping, MozzarellaTopping)
175. (PrinceCarlo, hasTopping, ParmesanTopping)
176. (PrinceCarlo, hasTopping, RosemaryTopping)
177. (PrinceCarlo, hasTopping, TomatoTopping)
178. (QuattroFormaggi, belongs to, NamedPizza)
179. (QuattroFormaggi, hasTopping, FourCheesesTopping)
180. (QuattroFormaggi, hasTopping, TomatoTopping)
181. (RealItalianPizza, belongs to, Pizza)
182. (RedOnionTopping, belongs to, OnionTopping)
183. (RocketTopping, belongs to, VegetableTopping)
184. (RocketTopping, hasSpiciness, Medium)
185. (Rosa, belongs to, NamedPizza)
186. (Rosa, hasTopping, GorgonzolaTopping)
187. (Rosa, hasTopping, MozzarellaTopping)
188. (Rosa, hasTopping, TomatoTopping)
189. (RosemaryTopping, belongs to, HerbSpiceTopping)
190. (RosemaryTopping, hasSpiciness, Mild)
191. (SauceTopping, belongs to, PizzaTopping)
192. (Siciliana, belongs to, NamedPizza)
193. (Siciliana, hasTopping, AnchoviesTopping)
194. (Siciliana, hasTopping, ArtichokeTopping)
195. (Siciliana, hasTopping, GarlicTopping)
196. (Siciliana, hasTopping, HamTopping)
197. (Siciliana, hasTopping, MozzarellaTopping)
198. (Siciliana, hasTopping, OliveTopping)
199. (Siciliana, hasTopping, TomatoTopping)
200. (SlicedTomatoTopping, belongs to, TomatoTopping)
201. (SlicedTomatoTopping, hasSpiciness, Mild)
202. (SloppyGiuseppe, belongs to, NamedPizza)
203. (SloppyGiuseppe, hasTopping, GreenPepperTopping)
204. (SloppyGiuseppe, hasTopping, HotSpicedBeefTopping)
205. (SloppyGiuseppe, hasTopping, MozzarellaTopping)
206. (SloppyGiuseppe, hasTopping, OnionTopping)
207. (SloppyGiuseppe, hasTopping, TomatoTopping)
208. (Soho, belongs to, NamedPizza)
209. (Soho, hasTopping, GarlicTopping)
210. (Soho, hasTopping, MozzarellaTopping)
211. (Soho, hasTopping, OliveTopping)
212. (Soho, hasTopping, ParmesanTopping)
213. (Soho, hasTopping, RocketTopping)
214. (Soho, hasTopping, TomatoTopping)
215. (Spiciness, belongs to, ValuePartition)
216. (SpicyPizza, belongs to, Pizza)
217. (SpicyPizza, hasTopping, SpicyTopping)
218. (SpicyPizzaEquivalent, belongs to, Pizza)
219. (SpicyTopping, belongs to, PizzaTopping)
220. (SpicyTopping, hasSpiciness, Hot)
221. (SpinachTopping, belongs to, VegetableTopping)
222. (SpinachTopping, hasSpiciness, Mild)
223. (SultanaTopping, belongs to, FruitTopping)
224. (SultanaTopping, hasSpiciness, Medium)
225. (SundriedTomatoTopping, belongs to, TomatoTopping)
226. (SundriedTomatoTopping, hasSpiciness, Mild)
227. (SweetPepperTopping, belongs to, PepperTopping)
228. (SweetPepperTopping, hasSpiciness, Mild)
229. (ThinAndCrispyBase, belongs to, PizzaBase)
230. (ThinAndCrispyPizza, belongs to, Pizza)
231. (TobascoPepperSauce, belongs to, SauceTopping)
232. (TobascoPepperSauce, hasSpiciness, Hot)
233. (TomatoTopping, belongs to, VegetableTopping)
234. (TomatoTopping, hasSpiciness, Mild)
235. (UnclosedPizza, belongs to, Pizza)
236. (UnclosedPizza, hasTopping, MozzarellaTopping)
237. (VegetableTopping, belongs to, PizzaTopping)
238. (VegetarianPizza, belongs to, Pizza)
239. (VegetarianPizzaEquivalent1, belongs to, Pizza)
240. (VegetarianPizzaEquivalent2, belongs to, Pizza)
241. (VegetarianTopping, belongs to, PizzaTopping)
242. (Veneziana, belongs to, NamedPizza)
243. (Veneziana, hasTopping, CaperTopping)
244. (Veneziana, hasTopping, MozzarellaTopping)
245. (Veneziana, hasTopping, OliveTopping)
246. (Veneziana, hasTopping, OnionTopping)
247. (Veneziana, hasTopping, PineKernels)
248. (Veneziana, hasTopping, SultanaTopping)
249. (Veneziana, hasTopping, TomatoTopping)

Top 3 rels: 
  hasTopping: 116
  belongs to: 98
  hasSpiciness: 34
In [5]:
from openai import OpenAI
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())

def call_llm(prompt: str) -> str:
    """调用LLM生成问题,使用稍高的temperature增加多样性"""
    client = OpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
    )

    completion = client.chat.completions.create(
        messages=[
            {"role": "system", "content":"You are a pizza expert."},
            {'role': 'user','content': prompt},
        ],
        model="qwen-max"
    )
    response = completion.choices[0].message.content
    return response

Experiment 1. Use LLM to answer directly

In [6]:
q_text = f"""
Base on your knowledge, recommend ALL kind of pizza from following pizza list, 
which is medium spiciness and has vegetable on it.
Only output the name of pizza, and a short description of this pizza, followed by a 
brief reason you select it.
"""
prompt = f"""
{q_text}
         
## pizza list
{', '.join(list(all_named_pizza))}
"""

print(prompt)
print('-'*10)
print(call_llm(prompt))

Base on your knowledge, recommend ALL kind of pizza from following pizza list, 
which is medium spiciness and has vegetable on it.
Only output the name of pizza, and a short description of this pizza, followed by a 
brief reason you select it.


## pizza list
FourSeasons, Mushroom, LaReine, Cajun, PrinceCarlo, Caprina, QuattroFormaggi, Napoletana, AmericanHot, Margherita, Rosa, PolloAdAstra, FruttiDiMare, Veneziana, Parmense, Siciliana, SloppyGiuseppe, Giardiniera, Soho, American, Capricciosa, Fiorentina

----------
**Giardiniera**
- **Description:** A vegetable lover's dream, this pizza is topped with a variety of fresh vegetables such as bell peppers, zucchini, eggplant, and onions, often with a touch of garlic and herbs.
- **Reason for Selection:** The Giardiniera is a perfect choice for medium spiciness and a vegetable-rich topping. It offers a balanced, flavorful experience with a mix of garden-fresh veggies that can be slightly spiced to your liking.

Observation from experiment 1

❌ Looks like FruttiDiMare is missing, let's see why?

In [7]:
missing_pizza = 'FruttiDiMare'
prompt = f"""
I am looing for a pizza which is medium spiciness and has vegetable on it.
Base on your knowledge, explain why you not recommend {missing_pizza}.
Only output a short description of {missing_pizza}, followed by a 
brief reason you not recommend this one.
"""

print(call_llm(prompt))
FruttiDiMare is a pizza typically topped with a variety of seafood such as shrimp, mussels, and calamari, often with a tomato-based sauce and sometimes a sprinkle of herbs. I do not recommend this one because it does not include vegetables and is not designed to be spicy, which does not meet your preferences for a medium spiciness and vegetable toppings.

Takeaway from experiment 1

💡Seems LLM don't know FruttiDiMare can have GarlicTopping and TomatoTopping. Probably we need introduce RAG.

One approach to building a RAG system is to express all triples in subject-predicate-object sentence structure, treating each sentence as a chunk for RAG construction. However, this method fails to capture the direct relationships between triples.

An alternative intuitive approach is to represent sentences describing the same entity using a set of subject-predicate-object patterns, combining them into a single chunk for RAG construction. This method essentially samples a subgraph centered around a specific entity, uses the obtained subgraph as the value, and employs the embedding vector of the natural text generated from this subgraph as the key, to build an information retrieval system.

We use the 2nd method in following demonstration.

Experiment 2. Introduce RAG

In [8]:
bge_model = '/Users/weijialu/Documents/BAAI_bge-large-zh-v1.5/'
In [9]:
doc_dict={}
for a_t in final_triples:
    if (a_t[0] not in doc_dict) and ('Topping' not in a_t[0]):
        doc_dict[a_t[0]] = a_t[0] + ' ' + a_t[1] + ' ' + a_t[2]
        if (a_t[1] == 'hasTopping') and ('Topping' in a_t[2]):
            for b_t in final_triples:
                if (b_t[0] == a_t[2]):
                    doc_dict[a_t[0]] += ', ' + b_t[0] + ' ' + b_t[1] + ' ' + b_t[2]
    else:
        if ('Topping' not in a_t[0]):
            doc_dict[a_t[0]] += ', ' + a_t[0] + ' ' + a_t[1] + ' ' + a_t[2]
            if (a_t[1] == 'hasTopping') and ('Topping' in a_t[2]):
                for b_t in final_triples:
                    if (b_t[0] == a_t[2]):
                        doc_dict[a_t[0]] += ', ' + b_t[0] + ' ' + b_t[1] + ' ' + b_t[2]
docs = []
for k in doc_dict.keys():
    docs.extend([(k + x).strip(', ') for x in doc_dict[k].split(k) if len(x.strip()) > 0])                    

print(f'We have {len(docs)} documents...')
print('\n\n'.join(docs[:20]))
We have 170 documents...
American belongs to NamedPizza

American hasTopping MozzarellaTopping, MozzarellaTopping belongs to CheeseTopping, MozzarellaTopping hasSpiciness Mild

American hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium

American hasTopping TomatoTopping, TomatoTopping belongs to VegetableTopping, TomatoTopping hasSpiciness Mild

AmericanHot belongs to NamedPizza

AmericanHot hasTopping HotGreenPepperTopping, HotGreenPepperTopping belongs to GreenPepperTopping, HotGreenPepperTopping hasSpiciness Hot

AmericanHot hasTopping JalapenoPepperTopping, JalapenoPepperTopping belongs to PepperTopping, JalapenoPepperTopping hasSpiciness Hot

AmericanHot hasTopping MozzarellaTopping, MozzarellaTopping belongs to CheeseTopping, MozzarellaTopping hasSpiciness Mild

AmericanHot hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium

AmericanHot hasTopping TomatoTopping, TomatoTopping belongs to VegetableTopping, TomatoTopping hasSpiciness Mild

Cajun belongs to NamedPizza

Cajun hasTopping MozzarellaTopping, MozzarellaTopping belongs to CheeseTopping, MozzarellaTopping hasSpiciness Mild

Cajun hasTopping OnionTopping, OnionTopping belongs to VegetableTopping, OnionTopping hasSpiciness Medium

Cajun hasTopping PeperonataTopping, PeperonataTopping belongs to PepperTopping, PeperonataTopping hasSpiciness Medium

Cajun hasTopping PrawnsTopping, PrawnsTopping belongs to FishTopping

Cajun hasTopping TobascoPepperSauce

Cajun hasTopping TomatoTopping, TomatoTopping belongs to VegetableTopping, TomatoTopping hasSpiciness Mild

Capricciosa belongs to NamedPizza

Capricciosa hasTopping AnchoviesTopping, AnchoviesTopping belongs to FishTopping

Capricciosa hasTopping CaperTopping, CaperTopping belongs to VegetableTopping, CaperTopping hasSpiciness Mild
In [10]:
q_text = f"""
Base on your knowledge, recommend ALL kind of pizza from following pizza list, 
which is medium spiciness and has vegetable on it.
Only output the name of pizza, and a short description of this pizza, followed by a 
brief reason you select it.
"""

naiveRAG = KGRAG.SimpleRAGSystem(\
    embedding_model=bge_model,
    llm_model='qwen-max',
    use_cloud_llm=True,    
    db_name = 'pizza_naive_rag'
)

naiveRAG.build_vector_store(docs)
naiveRAG.get_context(q_text, top_n=10);
Using device: mps
Loading weights:   0%|          | 0/391 [00:00<?, ?it/s]
LLM loaded: qwen-max
Loading weights:   0%|          | 0/391 [00:00<?, ?it/s]
💡Done retrieve related chunks...
Pizza belongs to Food
PizzaBase belongs to Food
SpicyPizza hasTopping SpicyTopping, SpicyTopping belongs to PizzaTopping, SpicyTopping hasSpiciness Hot
NamedPizza belongs to Pizza
CheeseyPizza hasTopping CheeseTopping, CheeseTopping belongs to PizzaTopping
MeatyPizza hasTopping MeatTopping, MeatTopping belongs to PizzaTopping
MeatyPizza belongs to Pizza
VegetarianPizza belongs to Pizza
CheeseyPizza belongs to Pizza
AmericanHot hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
Distance:
0.664, 0.757, 0.776, 0.791, 0.799, 0.801, 0.821, 0.829, 0.835, 0.861

Takeaway

❌ Seems like bge model fails in catching clean search conditions.
💡 Let's try revise question based on ontology

Experiment 3. Improve RAG by revise question based on ontology

In [11]:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

onto_struct = {}
onto_struct["entities"] = [\
    'NamedPizza - a pizza with specific name',
    'PizzaTopping - the topping of pizza',
    'Topping - the topping of food',
    'hotness - the hotness of topping, could be mild, medium, hot',
    'Food - the food',
    'PizzaBase - the base of pizza']
onto_struct["object_property"] = ['is a kind of','hasTopping','hasSpiciness','hasBase']
onto_struct["graph"] = [\
    ('NamedPizza','hasTopping','PizzaTopping'),
    ('PizzaTopping', 'is a kind of', 'Topping'),
    ('Topping','hasSpiciness','hotness'),
    ('NamedPizza','hasBase','PizzaBase'),
    ('Mild','is a kind of','hotness'),
    ('Medium','is a kind of','hotness'),
    ('Hot','is a kind of','hotness'),
    ('NamedPizza','is a kind of', 'Food')
]

Firstly, I need show you the way to generate query graph based on this onto_struct. demo on same q_text

In [12]:
q_text = f"""
Base on your knowledge, recommend ALL kind of pizza from following pizza list, 
which is medium spiciness and has vegetable on it.
Only output the name of pizza, and a short description of this pizza, followed by a 
brief reason you select it.
"""

print(f'q_text:\n{q_text}')

kgBuilder = KGRAG.KnowledgeGraphBuilder(\
    embedding_model=bge_model,
    llm_model='qwen-max',
    use_cloud_llm=True
)

target, kg = kgBuilder.extract_entities_relations(q_text, onto_struct)
print('search target:'+target)
print(f'search condition:\n{kg}')
q_text:

Base on your knowledge, recommend ALL kind of pizza from following pizza list, 
which is medium spiciness and has vegetable on it.
Only output the name of pizza, and a short description of this pizza, followed by a 
brief reason you select it.

Loading weights:   0%|          | 0/391 [00:00<?, ?it/s]
LLM loaded: qwen-max
search target:NamedPizza
search condition:
[('NamedPizza', 'hasSpiciness', 'Medium'), ('NamedPizza', 'hasTopping', 'vegetable')]

Let's try again use same bge model to investigate a revised new question.

In [13]:
naiveRAGRlt2 = naiveRAG.get_context(q_text, onto_struct, top_n=10);
New Question: Question is about NamedPizza, and NamedPizza MUST hasSpiciness Medium, and NamedPizza MUST hasTopping Vegetable.
💡Done retrieve related chunks...
FruttiDiMare hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
FourSeasons hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
PolloAdAstra hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
Siciliana hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
Veneziana hasTopping OnionTopping, OnionTopping belongs to VegetableTopping, OnionTopping hasSpiciness Medium
Fiorentina hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
Parmense hasTopping AsparagusTopping, AsparagusTopping belongs to VegetableTopping, AsparagusTopping hasSpiciness Mild
AmericanHot hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
American hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
SloppyGiuseppe hasTopping OnionTopping, OnionTopping belongs to VegetableTopping, OnionTopping hasSpiciness Medium
Distance:
0.616, 0.626, 0.631, 0.632, 0.633, 0.643, 0.668, 0.668, 0.680, 0.683

Takeaway

❤️ Better than before
❌ but looks like bge still can not simultaneously conside all search conditions.
💡 A Potential solution is GRetriever... But tototototot heavy
💡 Let's try a simple method in this case, to double check based on ontology.

Experiment 4. Further improvement based on reranking

Firstly we need to translate each sample paragrapgh into graph . as following example:

In [14]:
q_text = f"""
FruttiDiMare is a pizza typically topped with a variety of seafood such as 
shrimp, mussels, and calamari, often in a tomato or white wine sauce. 
"""

print(f'q_text:\n{q_text}')

kgBuilder = KGRAG.KnowledgeGraphBuilder(\
    embedding_model=bge_model,
    llm_model='qwen-max',
    use_cloud_llm=True
)

target, kg = kgBuilder.extract_entities_relations(q_text, onto_struct)
print('topic:'+target)
print(f'condition:\n{kg}')
q_text:

FruttiDiMare is a pizza typically topped with a variety of seafood such as 
shrimp, mussels, and calamari, often in a tomato or white wine sauce. 

Loading weights:   0%|          | 0/391 [00:00<?, ?it/s]
LLM loaded: qwen-max
topic:NamedPizza
condition:
[('FruttiDiMare', 'is a kind of', 'NamedPizza'), ('FruttiDiMare', 'hasTopping', 'shrimp'), ('FruttiDiMare', 'hasTopping', 'mussels'), ('FruttiDiMare', 'hasTopping', 'calamari')]
In [15]:
tree_data = KGRAG.analyze_graph(kg)
print(tree_data['root'])
print(tree_data['leaf_candidates'])
['FruttiDiMare']
['NamedPizza', 'shrimp', 'mussels', 'calamari']
In [16]:
q_text = f"""
Base on your knowledge, recommend ALL kind of pizza from following pizza list, 
which is medium spiciness and has vegetable on it.
Only output the name of pizza, and a short description of this pizza, followed by a 
brief reason you select it.
"""
target, kg = kgBuilder.extract_entities_relations(q_text, onto_struct)
tree_data_q = KGRAG.analyze_graph(kg)
In [17]:
print(tree_data_q['leaf_candidates'])
['Medium', 'vegetable']

Then let's go reranking over all candidates from naive RAG based on leaf

In [18]:
all_retrieved_doc_txt = [adoc['text'] for adoc in naiveRAGRlt2['metadatas'][0]]
tree_data_ks = []
leaf_all_d = []
for adoc in all_retrieved_doc_txt:
    target, kg = kgBuilder.extract_entities_relations(adoc, onto_struct)
    tree_data_k = KGRAG.analyze_graph(kg)
    tree_data_ks.append(tree_data_k)
    print(kg)
    print('Leaf : {:}'.format(', '.join(sorted(tree_data_k['leaf_candidates']))))
    leaf_all_d.append(tree_data_k['leaf_candidates'])
[('FruttiDiMare', 'hasTopping', 'GarlicTopping'), ('GarlicTopping', 'is a kind of', 'VegetableTopping'), ('GarlicTopping', 'hasSpiciness', 'Medium')]
Leaf : Medium, VegetableTopping
[('FourSeasons', 'hasTopping', 'PeperoniSausageTopping'), ('PeperoniSausageTopping', 'is a kind of', 'MeatTopping'), ('PeperoniSausageTopping', 'hasSpiciness', 'Medium')]
Leaf : MeatTopping, Medium
[('PolloAdAstra', 'hasTopping', 'GarlicTopping'), ('GarlicTopping', 'is a kind of', 'VegetableTopping'), ('GarlicTopping', 'hasSpiciness', 'Medium')]
Leaf : Medium, VegetableTopping
[('Siciliana', 'hasTopping', 'GarlicTopping'), ('GarlicTopping', 'is a kind of', 'VegetableTopping'), ('GarlicTopping', 'hasSpiciness', 'Medium')]
Leaf : Medium, VegetableTopping
[('Veneziana', 'hasTopping', 'OnionTopping'), ('OnionTopping', 'is a kind of', 'VegetableTopping'), ('OnionTopping', 'hasSpiciness', 'Medium')]
Leaf : Medium, VegetableTopping
[('Fiorentina', 'hasTopping', 'GarlicTopping'), ('GarlicTopping', 'is a kind of', 'VegetableTopping'), ('GarlicTopping', 'hasSpiciness', 'Medium')]
Leaf : Medium, VegetableTopping
[('Parmense', 'is a kind of', 'NamedPizza'), ('Parmense', 'hasTopping', 'AsparagusTopping'), ('AsparagusTopping', 'is a kind of', 'PizzaTopping'), ('VegetableTopping', 'is a kind of', 'Topping'), ('AsparagusTopping', 'hasSpiciness', 'Mild'), ('Mild', 'is a kind of', 'hotness')]
Leaf : NamedPizza, PizzaTopping, Topping, hotness
[('AmericanHot', 'hasTopping', 'PeperoniSausageTopping'), ('PeperoniSausageTopping', 'is a kind of', 'MeatTopping'), ('PeperoniSausageTopping', 'hasSpiciness', 'Medium')]
Leaf : MeatTopping, Medium
[('American', 'hasTopping', 'PeperoniSausageTopping'), ('PeperoniSausageTopping', 'is a kind of', 'MeatTopping'), ('PeperoniSausageTopping', 'hasSpiciness', 'Medium')]
Leaf : MeatTopping, Medium
[('SloppyGiuseppe', 'hasTopping', 'OnionTopping'), ('OnionTopping', 'is a kind of', 'VegetableTopping'), ('OnionTopping', 'hasSpiciness', 'Medium')]
Leaf : Medium, VegetableTopping
In [19]:
scores = [kgBuilder.cal_rel_score(tree_data_q['leaf_candidates'], x) for x in leaf_all_d]
sorted_text = [t for t, _ in sorted(zip(all_retrieved_doc_txt, scores), 
                                   key=lambda pair: pair[1], 
                                   reverse=True)]
print([f'{x:.2f}' for x in scores])
print('--before--\n\n'+'\n'.join(all_retrieved_doc_txt))
print('\n\n')
print('--after--\n\n'+'\n'.join(sorted_text))
['0.91', '0.74', '0.91', '0.91', '0.91', '0.91', '0.43', '0.74', '0.74', '0.91']
--before--

FruttiDiMare hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
FourSeasons hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
PolloAdAstra hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
Siciliana hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
Veneziana hasTopping OnionTopping, OnionTopping belongs to VegetableTopping, OnionTopping hasSpiciness Medium
Fiorentina hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
Parmense hasTopping AsparagusTopping, AsparagusTopping belongs to VegetableTopping, AsparagusTopping hasSpiciness Mild
AmericanHot hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
American hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
SloppyGiuseppe hasTopping OnionTopping, OnionTopping belongs to VegetableTopping, OnionTopping hasSpiciness Medium



--after--

FruttiDiMare hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
PolloAdAstra hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
Siciliana hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
Veneziana hasTopping OnionTopping, OnionTopping belongs to VegetableTopping, OnionTopping hasSpiciness Medium
Fiorentina hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
SloppyGiuseppe hasTopping OnionTopping, OnionTopping belongs to VegetableTopping, OnionTopping hasSpiciness Medium
FourSeasons hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
AmericanHot hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
American hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
Parmense hasTopping AsparagusTopping, AsparagusTopping belongs to VegetableTopping, AsparagusTopping hasSpiciness Mild

Takeaway

🎉 Now the result is correct.
💡 The simple way introduced only looks into the leafs. In this case, leafs represent the attribute of pizza we need.

Sometime, we need to look into the whole path from root to each leaf. So here's our ultimate rerank solution..
Feel free to reach out if you'd like to discuss the details further.

In [20]:
scores = naiveRAG.rerank(tree_data_q, tree_data_ks);
sorted_text = [t for t, _ in sorted(zip(all_retrieved_doc_txt, scores), 
                                   key=lambda pair: pair[1], 
                                   reverse=True)]

print('--before--\n\n'+'\n'.join(all_retrieved_doc_txt))
print('\n\n')
print('--after--\n\n'+'\n'.join(sorted_text))
All reranking score:
['0.93', '0.80', '0.93', '0.93', '0.93', '0.93', '0.56', '0.80', '0.80', '0.93']
--before--

FruttiDiMare hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
FourSeasons hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
PolloAdAstra hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
Siciliana hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
Veneziana hasTopping OnionTopping, OnionTopping belongs to VegetableTopping, OnionTopping hasSpiciness Medium
Fiorentina hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
Parmense hasTopping AsparagusTopping, AsparagusTopping belongs to VegetableTopping, AsparagusTopping hasSpiciness Mild
AmericanHot hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
American hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
SloppyGiuseppe hasTopping OnionTopping, OnionTopping belongs to VegetableTopping, OnionTopping hasSpiciness Medium



--after--

Veneziana hasTopping OnionTopping, OnionTopping belongs to VegetableTopping, OnionTopping hasSpiciness Medium
SloppyGiuseppe hasTopping OnionTopping, OnionTopping belongs to VegetableTopping, OnionTopping hasSpiciness Medium
Siciliana hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
Fiorentina hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
FruttiDiMare hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
PolloAdAstra hasTopping GarlicTopping, GarlicTopping belongs to VegetableTopping, GarlicTopping hasSpiciness Medium
American hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
FourSeasons hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
AmericanHot hasTopping PeperoniSausageTopping, PeperoniSausageTopping belongs to MeatTopping, PeperoniSausageTopping hasSpiciness Medium
Parmense hasTopping AsparagusTopping, AsparagusTopping belongs to VegetableTopping, AsparagusTopping hasSpiciness Mild

To Wrap up

This example shows the power of ontology.

For demonstration purposes, I picked a really simple example (famous Pizza dataset) where each document can be represented as a tree—with the root being a type of Pizza and the leaf nodes being its key attributes. So once documents are successfully converted into triples and adjacency matrices using the ontology, we can directly rerank them by computing similarity matrices between the leaf nodes of documents and those of the query.
Please noted, I use QWen-Max, the Flag-ship LLM of Ali, and World-class Tier-1 LLM.

Beyond the approach shown here: text-similarity-based RAG + ontology-based question revise & reranking there are several other ways to tackle this problem:

  1. Structure the data as tables, using Pizza attributes as column headers, then use ReAct + code generation + Python interpreter to extract all results (see this link). The upside is that the number of recalled items isn't limited by the TopK parameter. The downside is that ReAct can burn through quite a few tokens.

  2. Design a rerank/retrival model using multi-layer GAT networks to replace the simple method described in experiment 4. Namely, to use GAT in context engineering. The advantage here is that GAT can learn to pay attention to semantic structures in both the query and recalled documents—not just leaf nodes—through carefully designed contrastive learning, giving it some generalization across different tasks. The major drawback is that the network needs to be trained.

  3. Combine multi-layer GAT networks with foundation model SFT for an end-to-end solution. Compared to option 2, the drawback is that it requires significantly more compute.

We already have an implementation for option 2, and PyG officially supports option 3. Feel free to reach out if you'd like to discuss the details further.