11 min read

Build an AI Chat Web App With 100% Python - Open-Source Model

Launch gpt‑oss‑20b on Runpod with Eric Transformer and deploy a web app using Anvil - only basic Python is required.
Build an AI Chat Web App With 100% Python - Open-Source Model
AI web app. 100% Python

This tutorial explains how to create a web app that deploys an open‑source AI model using only Python, even if you have no web‑development experience. I created a full course on this topic in 2021, and this updated guide covers how to use the latest tools to build an AI web app with just Python.

Building a chatbot web‑app using only Python is possible with the following technologies:

  1. Eric Transformer – a new Python package I maintain that abstracts pre‑training, fine‑tuning, RAG, and inference with transformer models.
  2. Runpod – an easy‑to‑use, cost‑effective GPU service.
  3. Anvil – provides a drag‑and‑drop interface for creating a UI and lets you write both front‑end and back‑end logic with pure Python directly in their web interface.
  4. gpt-oss-2ob - OpenAI released an 20-billion-parameter open-source chat model that's available on Hugging Face.

But why Python? Keeping your entire tech stack in Python is very convenient, especially for prototyping. Python is the primary language for data scientists, so maintaining a Python‑only stack lets them work in the language they are most comfortable with. Anvil offers many features beyond being Python‑based, making it a great choice for web developers – even if they already know JavaScript.

Create a Runpod Account

I've been using and recommending Runpod for many years. I'm happy to share the following affiliate link which provides you with a one-time credit between $5 to $500 when you sign up and add $10 to your account.

https://runpod.io?ref=rh7fgnfm

Launch a GPU

Head over to the Pods section and rent a RTX Pro 6000 or better. An H200 SMX would also work great.

Click "Change Template"

Select "Runpod Pytorch 2.8.0".

Increase the amount of storage and then click "Deploy On-Demand".

Now click on the deployed pod and select "Jupyter Lab".

Deploy an LLM

Install Dependencies

Eric Transformer allows you to run open-source models such as openai/gpt-oss-20b with just a few lines of code. We need to pip install it along with a few other dependencies to run gpt-oss-20b on Runpod. So open a terminal and run the following code.

pip install erictransformer accelerate hf_transfer

Run the Chatbot

Simply crate a script called main.py within your environment and copy the following code.

from erictransformer import EricChat

eric_chat = EricChat("openai/gpt-oss-20b")

result = eric_chat("Tell me about NLP transformer models?")

print(result.text) # the output is a dataclass with a class variable called text

Now run the script with the following command in a terminal:

python3 main.py

Result:

Adjust Text Generation Parameters

You can adjust the text generation parameters using the Eric Chat's CHATCallArgs(). Below are potential settings you can consider adjusting:

  1. min_len (default 1): minimum number of generated tokens (including thinking tokens)
  2. max_len (default 4096): maximum number of generated tokens (also includes thinking tokens)
  3. temp (default 0.8): temperature. Increasing it causing text to become more 'creative' as it becomes more likely to sample tokens with lower probabilities.
  4. top_k (default 32): number of tokens that are considered for sampling. Increasing it makes the text more creative.
  5. top_p (default 0.6): The threshold for the sum of probabilities for tokens that are considered. Increasing it also increases creativity.
from erictransformer import EricChat, CHATCallArgs

eric_chat = EricChat("openai/gpt-oss-20b")

args = CHATCallArgs(temp=0.8, top_k=32, top_p=0.6)

result = eric_chat("Tell me about NLP transformer models?", args=args)

print(result.text)

That's all for the AI text‑generation aspect for now. Soon we'll use Anvil's Uplink functionality to connect our model running on Runpod to the web app.

Create a Web App With Anvil

Head over to anvil.works and click "Start building."

Create an account.

Click "Create a new app."

Click "Material Design."

You will be presented with an interface that looks like this. Don't worry, it's easy.

Drag a Text Area component into the UI. Then drag a Button right below the text area. Finally, drag a Rich Text component beneath the button.

Now click on the button and adjust its "text" property to "SUBMIT."

Then scroll down and select "on-click."

This will bring you to the front‑end code tab and automatically generate code for when the button is clicked.

from ._anvil_designer import Form1Template
from anvil import *
import anvil.server

class Form1(Form1Template):
  def __init__(self, **properties):
    # Set Form properties and Data Bindings.
    self.init_components(**properties)

    # Any code you write here will run before the form opens.

  @handle("button_1", "click")
  def button_1_click(self, **event_args):
    """This method is called when the button is clicked"""
    user_text = self.text_area_1.text
    self.text_area_1.text = ""
    result = anvil.server.call("generate", user_text)
    self.rich_text_1.content = result

Let's explain the code we added line-by-line.

Import's anvil's server code:

import anvil.server

Grab the text from the text area box, which is defined with the class variable self.text_area_1 and store it in a variable called user_text:

user_text = self.text_area_1.text

Set the text area box to a blank string. This way when users press the button the text area is cleared and is ready for their next prompt:

self.text_area_1.text = ""

Send the text to our server function called generate which we'll create later on in the tutorial:

result = anvil.server.call("generate", user_text)

Set the content of the rich‑text box (represented by self.rich_text_1) to the output of the server call.

self.rich_text_1.content = result

Anvil makes it easy to connect the front‑end to your Runpod instance. We simply need to enable the “Uplink” functionality and convert the Python script into a callable server function.

Save this key for later, we'll need it for the next step.

Create an Anvil Callable Function

Let's head back to Runpod and convert our script into an Anvil callable server function. This way, our front‑end code can communicate with the model running on our Runpod instance.

First we pip install anvil-uplink.

pip install anvil-uplink

Now, let’s add our server key as an environment variable.

export KEY="server_KEY"

Here's the final main.py script.

import os 
import anvil.server
from erictransformer import EricChat, CHATCallArgs


eric_chat = EricChat("openai/gpt-oss-20b")

args = CHATCallArgs(temp=0.8, top_k=32, top_p=0.6)

test_generation = eric_chat("Say hello and nothing more", args=args)

print("TEST:", test_generation.text)

@anvil.server.callable
def generate(user_text: str):
    print("INCOMING REQUEST: ", user_text)
    result = eric_chat(user_text, args=args)
    print("RESULT: ", result.text)

    return result.text

anvil.server.connect(os.getenv("KEY"))

anvil.server.wait_forever()

The code below imports the os module so we can retrieve the environment variable containing the key, and it also imports Anvil’s server module.

import os
import anvil.server

When an EricChat() object is created, the model isn’t moved to the GPU automatically until training or inference occurs. So we performed a quick generation to move the model to the GPU.

test_generation = eric_chat("Say hello and nothing more", args=args)
print("TEST:", test_generation.text)

We wrapped the inference code in a function with the decorator @anvil.server.callable. Notice how the function returns a string; this matches the expected return type from our front‑end logic code earlier.

@anvil.server.callable
def generate(user_text: str):
    result = eric_chat(user_text, args=args)
    return result.text

This code connects our script to Anvil using the key.

anvil.server.connect(os.getenv("KEY"))

Finally, the line below causes the script to enter a forever loop, where it monitors incoming requests and forwards them to our generate function.

anvil.server.wait_forever()

Now we can run main.py like before.

python3 main.py

Test the App

Head over to Anvil and click Run! We prefer clicking the side button and then selecting “Run in new tab.”

Here’s the web app! Type something into the text box and press the button.

We now see a loading indicator as the result is being generated.

The Runpod terminal will display the incoming request and the result.

Anvil's Server Module

Right now we’re calling the AI model hosted on Runpod from the client side. We recommend performing this operation from Anvil’s server so that the hosted model isn’t called directly from the client. This becomes important if your app grows beyond a simple prototype, as you’ll want to add authentication, rate‑limiting, and other features. For example, before calling the AI model you may want users to sign in and limit the number of calls they can make to the model.

Upgrade Your Anvil Account

You need a paid tier to have background tasks run for more than 30 seconds, which is required by our current configuration. If you’re hesitant to upgrade, you can use a different AI model, limit the number of generated tokens, and stay on the free tier. I’ll provide an alternative script for the Runpod instance later for those who want to continue using Anvil’s free plan, but note that the quality of the completions will be much lower.

Server Module

Go to the Anvil app and click + Add Server Module.

Replace the code in ServerModule1 with the following code.

import anvil.server

@anvil.server.background_task
def call_generate_background(user_text):
  result = anvil.server.call("generate", user_text)
  return result 

@anvil.server.callable
def server_generate(user_text):
  result = anvil.server.launch_background_task('call_generate_background', user_text)
  return result 

Async Client Code

Go back to the UI builder ( by clicking Form1 and then design) and then drag the timer component into your content panel.

Click on the timer and scroll down until it says "tick" and click the blue arrow.

This will create a new method that runs every 0.5 seconds. You can adjust its frequency by changing the "interval" property, but 0.5  seconds works well for our purposes.

Replace the front-end logic code with the code below.

from ._anvil_designer import Form1Template
from anvil import *
import anvil.server

class Form1(Form1Template):
  def __init__(self, **properties):
    self.init_components(**properties)
    self.async_gen = None

  @handle("button_1", "click")
  def button_1_click(self, **event_args):
    user_text = self.text_area_1.text
    self.text_area_1.text = ""
    self.async_gen = anvil.server.call("server_generate", user_text)

  @handle("timer_1", "tick")
  def timer_1_tick(self, **event_args):
    if self.async_gen is not None and self.async_gen.is_completed():
      self.rich_text_1.content = self.async_gen.get_return_value()
      self.async_gen = None 

self.async_gen is an async object that we check every 0.5 seconds to see if text has been generated.

self.async_gen = None

# ... 

self.async_gen = anvil.server.call("server_generate", user_text)

Each time the timer_1_tick runs (every 0.5 seconds) we check if self.async_gen is not None and is completed. We set the rich box contents to the return value. Finally, we reset the state by setting self.async_gen back to None

  @handle("timer_1", "tick")
  def timer_1_tick(self, **event_args):
    if self.async_gen is not None and self.async_gen.is_completed():
      self.rich_text_1.content = self.async_gen.get_return_value()
      self.async_gen = None 

Let's restart the app again to test the new code! We asked it "Explain retrieval augmented generation. "

Free Tier Anvil Alternative

As mentioned, there's a 30 second time limit for background tasks on Anvil's free tier. This isn't enough for running a 20-billion parameter models with our current configuration. But, if you would like to continue to use Anvil's free tier, you can consider using EricGeneration instead of EricChat instead for the Runpod script and use a smaller model, like EleutherAI/gpt-j-6b.

import os 
import anvil.server
from erictransformer import EricGeneration, GENCallArgs

eric_gen = EricGeneration("EleutherAI/gpt-j-6b")

args = GENCallArgs(temp=0.8, top_k=32, top_p=0.6, max_len=512)

test_generation = eric_gen("Say hello and nothing more", args=args)

print("TEST:", test_generation.text)

@anvil.server.callable
def generate(user_text: str):
    print("INCOMING REQUEST: ", user_text)
    result = eric_gen(user_text, args=args)
    print("RESULT: ", result.text)

    return result.text

anvil.server.connect(os.getenv("KEY"))

anvil.server.wait_forever()

Customize Your AI Web App

Now that the core functionality of the app is working, we suggest you get creative and think of a unique way to customize it! For inspiration, here are two articles I wrote on performing RAG and fine‑tuning gpt‑oss‑2ob.

Open-Source Vector Search and RAG Made Easy With Python

gpt-oss-20b Made Easy. Full-Parameter Fine-Tuning on a Single GPU

There are also lots of other improvements you can make that focus on enhancing what we have now. For example, you can display the user’s chat history and support multi‑turn conversational chats. Anvil lets you build fully‑featured web apps, so you can also explore its authentication functionality, Stripe integration, or its database for more inspiration.

Publish Your Anvil App

The last step is to publish your web app! To do so, click Publish and select one of the options.

Give Eric Transformer a ⭐ to show your support.

GitHub - EricFillion/erictransformer: Local fine-tuning, pre-training and inference for LLMs
Local fine-tuning, pre-training and inference for LLMs - EricFillion/erictransformer

Vennify's YouTube channel.

Eric's YouTube channel

Eric's Instagram

Runpod Affiliate Link: Get between a one-time $5 to $500 credit after you add $10 for the first time.

https://runpod.io?ref=rh7fgnfm