# LLM Fine-Tuning

## About

Fine-tuning large language models requires GPU memory that most developers don't have on their local machines. These tutorials let you run the full fine-tuning pipeline on Ocean Network's GPU nodes pay for the compute you use, get the results back, keep nothing running when you're done.

Two tutorials are available: one for **encoder-only models** (BERT family — classification, NER, embeddings) and one for **decoder-only models** (LLMs — instruction following, text generation). Both use Parameter-Efficient Fine-Tuning (PEFT) with LoRA, which reduces GPU memory requirements dramatically and makes billion-parameter models trainable on a single GPU.

***

### Encoder Fine-Tuning

Encoder-only models (BERT, RoBERTa, DeBERTa) produce rich bidirectional representations. Fine-tuning them is the standard approach for classification, named entity recognition, semantic similarity, and token-level tasks.

**What the tutorial covers:**

* Bidirectional self-attention and how it differs from decoder architectures
* Attaching sequence-level and token-level classification heads to a shared encoder backbone
* Multi-task learning: one backbone, multiple task heads, combined loss
* Masked answer prediction — framing QA as a reconstruction task using pretrained knowledge
* Full training loop with PyTorch Lightning, evaluation with Hugging Face `evaluate`

**Hardware:** A single mid-range GPU (A100 or equivalent) is sufficient for most encoder fine-tuning jobs.

**Tutorial source:** [github.com/oceanprotocol/oncompute-tutorials/tree/main/Deep%20Learning%20and%20Large%20Language%20Models%20%E2%80%94%20Advanced%20Topics/Encoder%20Fine-Tuning](https://github.com/oceanprotocol/oncompute-tutorials/tree/main/Deep%20Learning%20and%20Large%20Language%20Models%20%E2%80%94%20Advanced%20Topics/Encoder%20Fine-Tuning)

***

### Decoder Fine-Tuning

Decoder-only models (LLaMA, Mistral, Qwen, Phi) are the architecture behind modern chat and instruction-following systems. Full fine-tuning of these models is typically out of reach without expensive multi-GPU setups. LoRA changes that.

**What the tutorial covers:**

* The autoregressive decoder architecture and causal masking
* Cross-entropy loss over completion tokens only (prompt masking with -100 labels)
* Why PEFT exists: the jump from 110M-parameter BERT to 7B+ parameter LLMs
* Low-Rank Adaptation (LoRA): injecting trainable low-rank matrices into frozen transformer layers so the base model never changes
* Training a LoRA adapter that can be stored, shared, and swapped independently of the base model

**Hardware:** A single H100 or H200 GPU. LoRA's memory efficiency means a 7B model fine-tunes in roughly the same VRAM footprint as full fine-tuning a 1B model.

**Tutorial source:** [github.com/oceanprotocol/oncompute-tutorials/tree/main/Deep%20Learning%20and%20Large%20Language%20Models%20%E2%80%94%20Advanced%20Topics/Decoder%20Fine-Tuning](https://github.com/oceanprotocol/oncompute-tutorials/tree/main/Deep%20Learning%20and%20Large%20Language%20Models%20%E2%80%94%20Advanced%20Topics/Decoder%20Fine-Tuning)

***

### Hardware Requirements

| Resource | Encoder                              | Decoder                                 |
| -------- | ------------------------------------ | --------------------------------------- |
| GPU      | A100 40GB or equivalent              | H100 / H200 80GB                        |
| Runtime  | 30 min – 2 hours (dataset dependent) | 1–6 hours (model and dataset dependent) |

***

### Run It on Ocean Network

1. **Clone the repo**

   ```bash
   git clone https://github.com/oceanprotocol/oncompute-tutorials
   ```
2. **Open the tutorial folder** (`Encoder Fine-Tuning/` or `Decoder Fine-Tuning/`) in Ocean Orchestrator.
3. **Browse nodes** at [dashboard.oncompute.ai](https://dashboard.oncompute.ai/) and select a GPU node that meets the hardware requirements above.
4. Use **Start Compute Job** to run the job, the container installs PyTorch, Transformers, PEFT, and your dependencies, then runs the training script.
5. **Download results** — model checkpoints, evaluation metrics, and logs download to your `results/` folder when the job completes.

**General fine-tuning reference:** [github.com/oceanprotocol/oncompute-tutorials/blob/main/Deep%20Learning%20and%20Large%20Language%20Models%20%E2%80%94%20Advanced%20Topics/General%20Fine-Tuning%20and%20Model%20Usage%20Info.md](https://github.com/oceanprotocol/oncompute-tutorials/blob/main/Deep%20Learning%20and%20Large%20Language%20Models%20%E2%80%94%20Advanced%20Topics/General%20Fine-Tuning%20and%20Model%20Usage%20Info.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.oncompute.ai/use-cases/llm-fine-tuning.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
