Building with ChatGPT - Part 5 - Launching a simple AI App

Summarising text with gpt-3.5-turbo, dealing with session persistence and analytics.

Building with ChatGPT - Part 5 - Launching a simple AI App
Photo by Markus Spiske / Unsplash

In this blog post, we'll modify the app we've built in the last weeks and launch it publically #buildinpublic:

  1. Instead of scraping an entire Youtube channel to feed the AI, we'll just feed one video that the user requests to the AI.
  2. The app will then provide a summary of the video. Additionally, the user will be able to chat with the content of the video.
  3. The complete app will be hosted on Streamlit community cloud - with no seperate backend or data storage. We'll deal with session persistence and analytics amongst other things.

You can access the released app here:

RecapIt πŸ‘¨β€πŸ’»πŸ“Ίβ©πŸ€–

Are you tired of wasting time watching long YouTube videos just to get a few pieces of information?

RecapIt is here to help. Our AI-powered app summarizes YouTube videos in under 30 seconds, saving you valuable time.

Plus, you can chat with the content and get more details on any topic that interests you. RecapIt is the perfect tool for busy people who want to stay informed without sacrificing their time.

Try It For Free

Architecture & Flow Overview

In last weeks article, we switched from an embedded Chroma database to the Pinecone cloud storage. We did this because it was unreasonable to scrape entire Youtube channels on-demand for each user. Since we're changing the app to only provide information about a singular video, we can just use an embedded Chroma database again and save ourselves the trouble of having to deal with a seperate data storage provider.
Otherwise we are still using the same tools as before: Python, LangchainAI and Streamlit.

Here's how the app looks when you open it:

... and this is what happens once the user puts in a Youtube video:

  1. Load the transcript from Youtube
  2. Split the transcript into chunks
  3. Summarize the video using Langchain AIs load_summarize_chain
  4. Provide a chatbot interface

For user queries in the chatbot interface, the flow is as follows:

  1. Extend the query to ensure that the reply is short enough.
  2. For the first query, put the previously chunked up texts into an embedded Chroma database.
  3. Use RetrievalQA to run the query.

Below we'll dive into some of the code details.

Streamlit Forms & Session Persistence

Since there's two "seperate" flows in the app (summarisation & chatting), we need to ensure that we are storing everything between Streamlit re-renders. This is done by utilising the "session state". First, we need to initialise all session state variables once:

def init_session_state():
    if "video_input_saved" not in st.session_state:
        st.session_state.video_input_saved = None

    if "docs" not in st.session_state:
        st.session_state["docs"] = None

    if "db" not in st.session_state:
        st.session_state["db"] = None

    if "history" not in st.session_state:
        st.session_state["history"] = []

    if "generated" not in st.session_state:
        st.session_state["generated"] = []

    if "past" not in st.session_state:
        st.session_state["past"] = []

Our UI elements will then only use these session states to access data that needs to be shown to the user in order for the page to stay consistent between re-renders.

Data that the user enters into the forms gets put into the session state automatically if we give the element a "key" upon initilisation:

with st.form(key='chat', clear_on_submit=True):
    st.text_input("chat with the video content:", label_visibility="visible", placeholder="ask me anything", key='chat_input')
    st.form_submit_button(label='Send', on_click=handle_chat_user_input, kwargs=dict(container=container))

with st.form(key='youtube_url', clear_on_submit=True):
    st.text_input("enter the YouTube URL to get a summary & chatbot", label_visibility="visible", placeholder="paste youtube address", key='video_input')
    st.form_submit_button(label='Watch the video for me', on_click=handle_video_url_input, kwargs=dict(container=container))

Finally, we need to clear the session state once the user inputs a new video - this will clear all the extracted data, displayed summary etc.:

def reset_session_state():
    st.session_state["db"]        = None
    st.session_state["docs"]      = None
    st.session_state["history"]   = []
    st.session_state["generated"] = []
    st.session_state["past"]      = []

Summarising & Speed Issues

To summarize the content of the video we are using the load_summarize_chain from Langchain and pass the splitted up transcript into it.

We are using a slightly modified "summarisation" prompt to tell the LLM to keep the response below a certain number of tokens. This is to ensure that we get the full summary within the response token limit.

def summarize_video(docs):
    result          = None
    prompt_template = """Write a concise summary of the following text and keep the response below 15000 characters:"
    {text}"""
    prompt = PromptTemplate(template=prompt_template, input_variables=["text"])
    try:
        chain  = load_summarize_chain(llm=OpenAI(model_name="gpt-3.5-turbo", openai_api_key=globals.OPENAI_API_KEY_QUERY), chain_type="stuff", prompt=prompt)
        result = chain.run(docs)
    except:
        chain  = load_summarize_chain(llm=OpenAI(openai_api_key=globals.OPENAI_API_KEY_QUERY), chain_type="map_reduce")
        result = chain.run(docs)
    
    return result

Our first summarisation attempt is just using the default load_summarize_chain with the gpt-3.5-turbo model. Passing the specific model name will allow the summarization to happen a lot quicker: gpt-3.5-turbo was roughly 50% faster than the default model (but still takes around 10-20 seconds depending on the length of the video).

Note that this way of summarization does not work for long transcripts, since the maximum context length is exceeded. In case it fails, we fallback on a map_reduce summarization technique, in which (roughly speaking) each chunk gets summarized seperately and then the summaries of each chunk get summarized again. This summarization chain type does however not support gpt-3.5-turbo so it is a bit slower.

Queries

Handling user queries is mostly unchanged from what we did in the previous 2 parts of this blog series:

def handle_chat_user_input(container):
    output = None
    with container:
        with st.spinner("AI is working on answering your question..."):
            output = query_index(st.session_state.chat_input)['result']

        st.session_state['history'].append(output)
        st.session_state['past'].append(st.session_state.chat_input)
        st.session_state['generated'].append(output)
def query_index(query):
    if st.session_state.db is None:
        embeddings              = OpenAIEmbeddings(openai_api_key=globals.OPENAI_API_KEY_QUERY)
        st.session_state["db"]  = Chroma.from_documents(st.session_state.docs, embeddings)

    retriever = st.session_state["db"].as_retriever()
    qa = RetrievalQA.from_chain_type(llm=OpenAI(model_name="gpt-3.5-turbo", openai_api_key=globals.OPENAI_API_KEY_QUERY), chain_type="stuff", retriever=retriever)
    query_extended = query + ". Keep the response below 15000 characters."
    result = qa({'query': query_extended})
    return result

In case this is the first user query, we create the embeddings and put them into a Chroma vectorstorage. We then create the RetrievalQA chain (again, using the gpt-3.5-turbo model for speed reasons) and modify the prompt slightly to ensure that the response is within the response token limit. The query plus the response then get stored in the Streamlit session state.

Rendering the chat input and response is done by accessing this session state:

def render_chat(container):
    if len(st.session_state['generated']) == 0:
        return

    with container:
        if st.session_state['generated']:
            for i in range(len(st.session_state['generated'])):
                message(st.session_state["past"][i], is_user=True, key=str(i) + '_user', avatar_style="icons")
                message(st.session_state["generated"][i], key=str(i), avatar_style="identicon")

We are adding a small sidebar to give users a short intro about what the app does (since it's relatively straight-forward, there isn't really a need for a full-blown landing page at this stage):

def render_sidbar():
    with st.sidebar:
        st.header("RecapIt πŸ‘¨β€πŸ’»πŸ“Ίβ©πŸ€–")
        st.markdown("""
        <More detailed description goes here.>
        """)

Analytics

To get some basic idea about the number of users, we are adding some analytics to the page. For this, we are using the streamlit_analytics package. While it doesn't provide a lot of data, we are getting the most useful things out of it:

  1. How many pageviews
  2. How many summaries generated
  3. How many chatbot interactions

Usage is quite simply, simply wrap the entire app with:

with streamlit_analytics.track(unsafe_password="*******"):

And then attach "?analytics=on" to the URL of the deployed app to view the generated report.

Launch

RecapIt is launching today! Feel free to try it out below - whether you like it or have issues with it, I'm happy to receive feedback (feel free to message me on Twitter).

RecapIt πŸ‘¨β€πŸ’»πŸ“Ίβ©πŸ€–

Are you tired of wasting time watching long YouTube videos just to get a few pieces of information?

RecapIt is here to help. Our AI-powered app summarizes YouTube videos in under 30 seconds, saving you valuable time.

Plus, you can chat with the content and get more details on any topic that interests you. RecapIt is the perfect tool for busy people who want to stay informed without sacrificing their time.

Try It For Free

Subscribe to Endeavours Way

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe