LLMs

Predictive Models through an API

Review

Examples so far: GET

Examples so far: POST

How are APIs different?

Modularizes part of computational workflow to run on dedicated (powerful) server.
Functionality called by client through a simple interface: HTTP request.
- Anyone with internet can use it.
- Any software with internet can use it.
- No need to install complex software locally.

Example: BART Arrivals

Server: BART API server.

Client: Me, on my phone.

Example: BART Arrivals

Server: BART API server.

Client: Me, on the Transit App.

Example: BART Arrivals

Server: BART API server.

Client: Me, on the Transit App.

Example: CT Predict

Server: My CT-Predict API Server.

Client 1: My CT-Predict App.

. . .

Client 2: Kaiser’s EHR system.

How an LLM works

Informal Next Word Prediction

I will show you three sentences, one at a time, each one with a missing word.
You will have 3 seconds to predict the next word that best completes the sentence.
Here, “best” means the word that is most likely to appear next, given the words that came before it, when look at all text on the internet.

“The capital of France is _____”

“He added milk to his _____”

“She walked into the room and everyone started to ______”

Formal Next Word Prediction

Given a sequence of words (or tokens) (w_1, w_2, , w_{n-1}), predict the next word (w_n) by estimating the probability distribution (P(w_n | w_1, w_2, , w_{n-1})).
Large Language Models (LLMs) are trained on vast amounts of text data to learn these probabilities.

LLMs through an API

Example: ChatGPT

Server: OpenAI API server.

Client: Me, on the ChatGPT App.

Example: Claude

Server: Anthropic API server.

Client: Me, on the Claude.ai App.

Example: Claude

Server: Anthropic API server.

Client: Me, at the terminal.

Me, at the terminal

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "claude-haiku-4-5-20251001",
        "max_tokens": 100,
        "messages": [
          {
            "role": "user",
            "content": "The capital of France is"
          }
        ]
      }'

Me, at the terminal

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "claude-haiku-4-5-20251001",
        "max_tokens": 100,
        "messages": [
          {
            "role": "user",
            "content": "The capital of France is"
          }
        ]
      }'

{
  "model":"claude-haiku-4-5-20251001",
  "id":"msg_012mMQuMNGPFXNZqookPvuu9",
  "type":"message",
  "role":"assistant",
  "content":[{
              "type":"text",
              "text":"The capital of France is **Paris**."}],"stop_reason":"end_turn",
              "stop_sequence":null,
              "usage":{
                "input_tokens":12,
                "cache_creation_input_tokens":0,"cache_read_input_tokens":0,
                "cache_creation":{
                  "ephemeral_5m_input_tokens":0,"ephemeral_1h_input_tokens":0},
                "output_tokens":11,
                "service_tier":"standard"
                }
}

The Ellmer R package

What it does:

Simplifies making requests to the many LLM APIs.
Handles authentication and response parsing.
Offers functions for common tasks.

Me, in R

I’ll demonstrate with Claude: “claude-haiku-4-5-20251001”

library(ellmer)

claude <- chat_claude()
claude$chat(
  "The capital of France is"
)