Run GPT-OSS Locally: ABS Zaurion Blackwell Workstations for Open-Source AI

Open-source AI is no longer confined to the cloud. With the release of powerful models like GPT-OSS-20B and GPT-OSS-120B, the ability to run advanced AI tools locally on your machine is now a reality. For researchers, developers, and professionals alike, this marks a major shift in how we deploy and experiment with AI.

Running a large language model offline means complete control—no API costs, no throttling, and no privacy concerns. But it also demands serious hardware.

About OpenAI’s GPT‑OSS‑20B and GPT‑OSS‑120B

In August 2025, OpenAI introduced two open-weight language models: GPT‑OSS‑20B and GPT‑OSS‑120B. This marks the company’s first release of publicly accessible model weights since GPT‑2 and represents a renewed commitment to transparency and open development in AI.

Both models were designed with accessibility and real-world usage in mind. They offer strong reasoning capabilities, efficient performance, and built-in safety measures—all while being open and free to use commercially under the Apache 2.0 license.

GPT‑OSS‑20B is a 21-billion parameter model that uses a Mixture-of-Experts (MoE) architecture. Despite its size, only 3.6 billion parameters are active at any given time, significantly reducing its hardware demands. It can run on modern GPUs with as little as 16 GB of VRAM, making it practical for local development on desktops, laptops, or edge devices.

GPT‑OSS‑120B is a larger, more powerful model with 117 billion parameters. It uses a similar efficient inference setup and is designed to run on systems with a single 80 GB GPU. In terms of performance, it’s comparable to OpenAI’s internal “o4-mini” models and is well-suited for high-performance enterprise or research environments.

Both models support up to 128,000 tokens of context, use Rotary Positional Embeddings (RoPE), and feature grouped multi-query attention for faster, more scalable inference. They are pre-trained and aligned using a mix of supervised fine-tuning and reinforcement learning, offering capabilities like chain-of-thought reasoning, structured output, and tool integration.

Safety has also been a major focus. OpenAI subjected both models to adversarial testing and red-teaming exercises, ensuring they meet internal safety standards and are less likely to produce harmful outputs—even when deliberately fine-tuned toward risky behaviors.

For developers and organizations, this opens the door to running advanced language models locally, without relying on cloud services or incurring usage fees. You get full control over your data, faster iteration cycles, and the ability to customize the models for your own tools, agents, or products.

When paired with a high-performance local workstation—such as Newegg’s ABS Zaurion series, equipped with RTX PRO 6000 Blackwell GPUs—these models become not just possible, but practical. Whether you’re fine-tuning custom assistants, deploying internal AI copilots, or simply experimenting with open models in a secure offline environment, GPT‑OSS and Zaurion together represent a powerful new foundation for local AI development.

What It Takes to Run GPT‑OSS Locally

GPT-OSS-20B typically requires 16 GB or more of GPU VRAM and at least 64 GB of system RAM. GPT-OSS-120B is significantly more demanding, calling for around 80 to 96 GB of GPU memory and 128 GB or more of system memory. In short, most consumer desktops aren’t equipped for this kind of workload.

Why ABS Zaurion Workstations Are the Ideal Match

The ABS Zaurion workstations are built with professional creators and developers in mind. They pair top-tier components with Newegg’s expert system integration to deliver systems ready for offline inference and creative workloads.

Key specs of RTX PRO 6000 Blackwell include:

96 GB GDDR7 ECC VRAM – Easily exceeds the memory demands of gpt-oss-20b and supports multi-GPU scaling for gpt-oss-120b.
Blackwell Tensor Cores – Accelerate inference for transformer-based models with improved throughput and energy efficiency.
Configurable platforms – Choose from AMD Threadripper PRO or Intel Xeon CPUs, with memory up to 128 GB or more and storage arrays optimized for read/write-heavy AI tasks.

Whether you’re deploying LLMs for coding assistants, research, or custom chatbot projects, the Zaurion series offers future-proof hardware tuned for your needs.

Suggested Configurations

Use Case	Recommended Build
Lightweight Local AI (20B model)	1x RTX PRO 6000 Blackwell, 128 GB RAM, Ryzen Threadripper or Xeon W5
Power AI Dev (120B model)	Dual RTX PRO 6000 GPUs, 128 GB RAM, AMD WRX90 or Xeon W7 CPU
AI + Creative Workflows	1–2 Blackwell GPUs, 128 GB RAM, 4 TB+ SSD + RAID, GPU-accelerated render apps

Zaurion Ruby is ideal for users who want to run GPT-OSS-20B or work on smaller fine-tuned models. It features a single RTX PRO 6000 Blackwell GPU, 128 GB of ECC RAM, and either a Threadripper PRO or Xeon W7 processor.

Zaurion Aqua is built for the heavyweights—those who want to tackle GPT-OSS-120B, run multiple models in parallel, or combine AI with rendering. Dual RTX PRO 6000 cards, 256 GB RAM, and enterprise-level CPU options deliver uncompromised performance.

Use Cases

These workstations aren’t just about running benchmarks—they enable real work with open models like GPT‑OSS‑20B and GPT‑OSS‑120B. Here are some real-world scenarios where these models—and the workstations that run them—are already being used or tested:

AI developers building local copilots or chatbots
Developers are running GPT‑OSS‑20B on workstation GPUs to power personal coding assistants and offline chatbots, fully customized for their tools and environments.
Security-sensitive organizations keeping data in-house
Healthcare, finance, and legal teams are deploying GPT‑OSS models on-premises to process sensitive documents and client communications—without sending data to the cloud.
Research labs needing open model access and reproducibility
Academic researchers are fine-tuning GPT‑OSS‑20B on their own compute clusters for multilingual NLP studies and algorithmic bias testing, with full transparency of model behavior.
Studios mixing real-time AI with 3D or VFX workloads
Film and game development studios are experimenting with GPT‑OSS models to automate narrative scripting, generate NPC dialog trees, or assist with texture naming and documentation—all done locally.
LLM-powered devtools and embedded agents
Some open-source IDE plugins now integrate GPT‑OSS‑20B to provide autocomplete and bug explanation offline. These models can also power on-device assistant agents for customer support or training use.
Internal knowledge assistants for enterprises
Companies are pairing GPT‑OSS‑20B with retrieval-augmented generation (RAG) pipelines using tools like LangChain and Haystack, creating in-house Q&A systems that reference private documents.
Edge AI deployments for offline environments
Military, industrial, and field research teams are testing GPT‑OSS‑20B on compact servers and rugged workstations to enable AI reasoning without internet access.
AI + Simulation workflows
In high-end engineering and simulation labs, GPT‑OSS is used to interpret simulation output or automate documentation generation in aerospace, automotive, and biotech fields.

Newegg’s Role in the AI Hardware Revolution

As AI goes open and local, Newegg is here to support developers, researchers, and creators with workstations that match the moment. Our ABS workstation line merges professional-grade system integration with the most powerful GPUs available today.

Start building your offline AI lab today. Shop ABS Zaurion Blackwell Workstations

Tags:

Is Your PC Ready to Run Open-Source AI Models Like GPT‑OSS?

About OpenAI’s GPT‑OSS‑20B and GPT‑OSS‑120B

What It Takes to Run GPT‑OSS Locally

Why ABS Zaurion Workstations Are the Ideal Match

Use Cases

Newegg’s Role in the AI Hardware Revolution

Tags:

Previous PostBeelink's GTR9 Pro Mini-PC Unleashed: Power Packed in a Pint-Sized Package

Next PostSales Tax Holiday Guide -- 2025

Author Jamie Cooper