# Transformer Foundations

## About

Before fine-tuning or deploying a language model, it helps to understand what's actually happening inside. This tutorial runs three interactive labs, tokenization, attention visualization, and embedding analysis, and generates BertViz HTML visualizations you can open locally after the job completes.

Running this on Ocean Network means you get consistent, reproducible results on clean hardware without managing Python environments or dealing with local GPU memory limits.

***

### What the Tutorial Covers

**Attention & Self-Attention**

The QKV formulation — queries, keys, and values — and how scaled dot-product attention works mathematically. Why the `√dₖ` scaling matters. How multi-head attention lets different heads specialize in syntax, semantics, and co-reference simultaneously.

**Architectures**

Side-by-side comparison of encoder-only, decoder-only, and encoder-decoder transformer families, with representative models for each. What causal masking does and why decoder-only models need it. How cross-attention works in sequence-to-sequence models.

**Tokenization**

Byte-Pair Encoding (BPE), WordPiece, and SentencePiece — how each algorithm works, which models use which, and what the trade-offs are. Special tokens and why they exist.

**Pretraining Objectives**

MLM (BERT), causal LM (GPT family), span corruption (T5), and ELECTRA's replaced token detection. Why ELECTRA is 4× more compute-efficient than BERT despite similar architecture.

**Model Landscape**

Timeline from the original Transformer (2017) through BERT, GPT-3, LLaMA 3, and Mixtral, with architectural details for each major family.

***

### Labs

| Lab                 | What it produces                                                |
| ------------------- | --------------------------------------------------------------- |
| Lab 1: Tokenization | Tokenization breakdown across BPE, WordPiece, and SentencePiece |
| Lab 2: Attention    | BertViz HTML files — per-head attention and all-layers overview |
| Lab 3: Embeddings   | Embedding cluster visualization (PCA-reduced, saved as PNG)     |

***

### Hardware Requirements

| Resource | Requirement                                     |
| -------- | ----------------------------------------------- |
| GPU      | Optional (CPU is sufficient for all three labs) |
| Runtime  | 10–30 minutes                                   |

This is a good starting point if you want to test your Ocean Orchestrator setup on a free compute job before moving to GPU workloads.

***

### Run It on Ocean Network

1. **Clone the repo**

   ```bash
   git clone https://github.com/oceanprotocol/oncompute-tutorials
   ```
2. **Open the `Machine Learning Foundations and Introduction to LLMs/Transformer foundations/` folder** in Ocean Orchestrator.
3. **Select a node** at [dashboard.oncompute.ai](https://dashboard.oncompute.ai/). A CPU node works for all three labs — use **Start Compute Job** to run the job.
4. **Run individual labs** by passing flags, or run all three at once:

   ```bash
   python transformer_foundations.py         # all labs
   python transformer_foundations.py --lab 1 # tokenization only
   python transformer_foundations.py --lab 2 # attention only
   python transformer_foundations.py --lab 3 # embeddings only
   ```
5. **Download results** — HTML attention visualizations and PNG embedding clusters download to your `results/` folder when the job completes.

**Tutorial source:** [github.com/oceanprotocol/oncompute-tutorials/tree/main/Machine%20Learning%20Foundations%20and%20Introduction%20to%20LLMs/Transformer%20foundations](https://github.com/oceanprotocol/oncompute-tutorials/tree/main/Machine%20Learning%20Foundations%20and%20Introduction%20to%20LLMs/Transformer%20foundations)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.oncompute.ai/use-cases/transformer-foundations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.