Generating Summaries For Better LLM Performance

Generating summaries to improve our LLMs understanding of our knowledge base.

Last time we tested our LLM on the AAVE documentation and governance proposals and set ourselves a goal to raise the evaluation score from 0.81 to above 0.9. We will try to achieve that by generating summaries of each document. Additionally, we will add another evaluation step: We will evaluate the document retriever for accuracy, essentially testing an in-between step of our LLM chain. This will tell us how well our retrieval function performs and whether we need to improve it. Without this information, we could prompt engineer all we want and still not get better results in case the retrieval itself isn’t working as well as it should.

Why LLM-Generated Summaries Help Improve Performance

There are several reasons why having an LLM generated summary of each document helps our LLM perform better. First, we have to understand how our LLM chain answers questions and know the limitations: Whenever the user sends a query, we first find documents (or chunks of documents) within our knowledge base that match that query. We then pass these chunks along as “context” to the LLM. Imagine the prompt looking like this: “The user has asked the following question: {query}. Answer the question using the following context: {context}”. The problem however is that the “context” we can pass along to the LLM is limited to a specific amount of tokens - meaning that we can’t just pass along everything we have in our knowledge base. 

Here, summaries come into the picture: A summary filters out less relevant information, meaning that this extra information doesn’t get passed on when answering user queries. This has several advantages: Tangential information won’t influence the answer of the LLM and the shorter summary reduces the context size required compared to the full document - this in turn means that the LLM can consider multiple different perspectives or documents (in other words: we could pass along three summaries instead of one full document text). Additionally, making the LLM think first about the summary gives it a better understanding of our knowledge base. It sounds counterintuitive, but you can easily test this yourself with ChatGPT: First, ask ChatGPT to to answer a question you have. Then, ask it to answer a question you have, but walk you through the logic step-by-step. In most cases, the second answer will be of higher quality.

Creating Summaries and Re-Evaluating our LLM

We need to adapt our existing governance proposal loader code a little bit in order to make summary generation work: We also need to store the source in the generated document - this is simply the snapshot URL.

Now we’re already ready to generate the summaries:

  1. Get all unique sources in our knowledge database
  2. Filter out any sources that already have a summary generated
  3. For each source, get all documents with the corresponding source tag
  4. Use these documents as input for the summary generation
  5. Store the summary document in our vectorstorage database

For our AAVE documentation base, we are summarising over 400 documents - so we are adding a simple progress bar to the loop with tqdm.

Code to summarise each document

Here’s an example summary that gets generated for a governance proposal:

The proposal suggests reducing the Liquidation Threshold and Loan-To-Value while increasing Liquidation Bonus and Reserve Factors for 11 frozen collateral assets on Aave V2 Ethereum. The purpose is to reduce potential risks and improve capital efficiency. The proposal offers an aggressive and a moderate approach for reducing the Liquidation Thresholds. The tables provided show the effect of the reductions on protocol users. Additionally, the proposal recommends reducing the Loan-to-Value ratio to zero, increasing the Reserve Factors, and increasing the Liquidation Bonus for the assets. The specification table provides the current and recommended values for various parameters. The next steps include submitting the proposal for a snapshot vote and implementing the updates through an Aave Improvement Proposal.

We also want to add functionality to load the vectorstorage database from the disk in order to not have to generate the summaries each time we run the app: We modify our previous function to check if a persist directory exists and load from that if so. We will have to revisit this once we dynamically load documents (eg new governance proposals), but for now this is good enough.

Updated function to load or create our vector database

Finally, let’s run our evaluation again to see how much adding the summaries improved our LLMs performance: We got it up to 0.95 (from 0.81). That’s quite the improvement and meets the goal we set last time.

Testing the Document Retrieval Process

Since we already generated the summaries, we can make use of them to add an additional evaluation step: Testing the retriever function, which grabs the relevant documents from the database that we pass along as the context to the LLM. We let the LLM generate a question for each summary. We expect our retriever function to retrieve the the summary (or one of the source documents of the summary) from which the question was generated. If it does so, the retriever passes with a score of 1, otherwise it fails with a score of 0.

Coming up with a question for each summary and creating the evaluator
New evaluator to test the retriever function

For our set of ~400 generated questions, the retriever has a hit rate of 0.93. There’s certainly room for improvements, but for this stage of this prototype development it is good enough.

What's Next

We certainly can do a lot more to improve our LLM and evaluation system: We could dream up questions from our knowledge base and automatically add them to our testset or use these questions in conjunction with gptcache to match questions from the user.

For now though we’re happy with the performance and will move on to developing functionality to have the LLM decide what to vote for for each governance proposal.

Subscribe to Endeavours Way

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe