All writing

OpenClaude + Nemotron: A Free, Terminal-Based AI Coding Agent on macOS

How I set up OpenClaude with NVIDIA Nemotron 120B as the backend. A free, local-CLI alternative to Claude Code that runs without a GPU and without a subscription.

Niccolò MascaroApril 23, 20268 min read

OpenClaude CLI banner running with NVIDIA Nemotron 120B

I wanted something that looked and felt like Claude Code, but without the subscription and without downloading a 40GB model to a laptop that is already fighting for disk space. After a bit of trial and error I landed on a combination that actually works: OpenClaude as the terminal agent, NVIDIA Nemotron 3 Super 120B as the language model, served for free through the NVIDIA Build API.

OpenClaude is essentially a thin fork of Claude Code (same -p non-interactive mode, same tool system, same working-directory sandboxing), with the OpenAI-compatible provider path turned on. So when you run it, you get the Claude Code UX, and the brain behind it happens to be a 120B NVIDIA model instead of Anthropic's.

This post is the exact recipe I used, the dead ends I hit, and a small NestJS microservice I built with it as a smoke test.

Tested on macOS (Apple Silicon) with OpenClaude v0.1.8 and Nemotron 3 Super 120B. April 2026.

What this setup gives you

  • A terminal-based AI coding agent, similar in spirit to Claude Code.
  • Powered by NVIDIA's Nemotron 120B model via their free inference API.
  • File reading, code generation, shell commands, all driven from your terminal.
  • Zero cost for dev/testing usage. No local GPU. ~100MB of disk.

All the heavy inference happens on NVIDIA's servers. Your laptop just runs a thin CLI.

Step 1: Install the prerequisites

You need Node.js, ripgrep, and git:

brew install node ripgrep git

A gotcha that cost me half an hour: OpenClaude silently crashes without ripgrep. If you installed Node through nvm, you still have to install rg separately through Homebrew. Also: if you have Claude Code installed, it ships a shell alias named rg that can shadow the real binary, so make sure the package is actually on your PATH.

Verify:

node --version   # v18+
rg --version
git --version

Step 2: Install OpenClaude

npm install -g @gitlawb/openclaude
openclaude --version

Step 3: Get an NVIDIA API key

Go straight to the NVIDIA API keys page, no hunting required:

  1. Open build.nvidia.com/settings/api-keys.
  2. Log in or create a free account.
  3. Click Generate API Key. The key will start with nvapi-. Copy it somewhere safe.

One important distinction: do not go to the NemoClaw / managed VM product. That is a paid deployment offering. You want the API Catalog / Build section, which hands out API keys for free inference against the Nemotron family.

Creating an API key on build.nvidia.com/settings/api-keys

Step 4: Configure the environment

OpenClaude speaks the OpenAI-compatible protocol. The variable names say "OPENAI" but the values will point at NVIDIA. That's fine, because NVIDIA's endpoint speaks the same dialect.

Add to your shell config (replace nvapi-xxxx with your real key):

echo 'export CLAUDE_CODE_USE_OPENAI=1' >> ~/.zshrc
echo 'export OPENAI_API_KEY="nvapi-xxxx"' >> ~/.zshrc
echo 'export OPENAI_BASE_URL="https://integrate.api.nvidia.com/v1"' >> ~/.zshrc
echo 'export OPENAI_MODEL="nvidia/nemotron-3-super-120b-a12b"' >> ~/.zshrc
source ~/.zshrc

If you'd rather not edit your shell config, launch openclaude and type /provider:

FieldValue
ProviderOpenAI-compatible
Base URLhttps://integrate.api.nvidia.com/v1
API keynvapi-xxxx
Modelnvidia/nemotron-3-super-120b-a12b

Step 5: Launch

openclaude

You should see the startup banner:

╔════════════════════════════════════════════════════════════╗
 Provider  OpenAI                                           
 Model     nvidia/nemotron-3-super-120b-a12b                
 Endpoint  https://integrate.api.nvidia.com/v1              │
╠════════════════════════════════════════════════════════════╣
 * cloud    Ready, type /help to begin                      
╚════════════════════════════════════════════════════════════╝

Running openclaude, the banner rendering, and typing the first prompt

A first hello! should come back with a reasonable reply. Then I tried a real task to see if the wiring actually held up under load:

> Write a Node.js NestJS microservice with a /hello endpoint

Nemotron generated a clean scaffold: hello.module.ts, hello.controller.ts, a Jest unit spec, an e2e spec with Supertest, and a working main.ts. That became the hello-microservice I kept in the repo as a tiny integration smoke test for the whole setup.

How fast is it, honestly?

Before anyone gets excited: this is not as fast as Claude Code running on Claude itself. It's not close. I ran three prompts through openclaude -p in non-interactive mode to get real numbers, same machine, same network:

PromptTime
Say only the word 'ok' (trivial, no tools)~7 seconds
One-line NestJS controller method, no tools~2m 30s
Tool-using prompt (listing files, hitting the workdir sandbox)~1m 15s

For context, the same Say only the word 'ok' prompt through Claude Code against Anthropic's API typically returns in under a second. So you are paying a real latency tax for the free tier, and it shows up most on code-generation prompts where the model thinks for a while before emitting.

That said, the output is good. The NestJS scaffold had no hallucinated imports, the controller/spec/e2e split was reasonable, and the model understood "keep it minimal" on the one-line prompt.

The one-line take

Free-tier slowness, paid-tier quality. OpenClaude + Nemotron is a patient developer's Claude Code.

Who this is actually for

Be honest with yourself about what you're optimizing for. This setup is a great fit if you are:

  • A junior developer learning the tooling. Getting your hands on an agentic coding CLI without needing a credit card or a paid plan is genuinely valuable. The UX is 95% the same as Claude Code, so the muscle memory transfers the day you do upgrade.
  • A money-saver or hobbyist. If you're writing side projects in the evenings and a Claude subscription doesn't fit the budget, running Nemotron through NVIDIA's free tier lets you keep the agentic workflow for $0.
  • A tester or tire-kicker. Trying agentic coding before you commit to a subscription, evaluating how a different base model behaves, or benchmarking open-ish models against Claude. This is a very cheap way to do it.

Where it is not the right tool: production work under deadline, anything where a 2-minute latency per turn is a dealbreaker, or heavy iterative loops like "refactor this 400-line file" (you'll want the fast model for that).

Use it as a great starting point. Upgrade when the latency starts costing you more than the subscription would.

Available Nemotron models

Not every model name works. These are the ones I confirmed on the free tier:

Model IDSizeNotes
nvidia/nemotron-3-super-120b-a12b120B (12B active)Recommended, best reasoning
nvidia/llama-3.3-nemotron-super-49b-v149BFaster, good alternative
nvidia/llama-3.1-nemotron-70b-instruct70BStable, well-tested
nvidia/llama-3.1-nemotron-nano-8b-v18BFastest, less capable

A common mistake: nvidia/nemotron-3-super does not exist. You must use the full model ID including the size suffix. Switch inside OpenClaude with /model, or export a new OPENAI_MODEL and relaunch.

Troubleshooting

"There's an issue with the selected model": wrong model name. Double-check against the table above; the size suffix is not optional. Here is exactly what that failure looks like, plus the sed one-liner I used to rewrite the env var and get unstuck:

OpenClaude reporting 'issue with the selected model' and a one-line sed fix to replace the model ID in ~/.zshrc

"API Error: Please wait a moment and try again": expected on the first request. The model has a cold start of 10 to 30 seconds; subsequent requests are much faster.

rg not found / silent crash: brew install ripgrep. If Claude Code is also installed, verify that which rg returns the Homebrew binary and not the shell alias.

Slow responses: this is the big one. You're running a 120B parameter model for free through a shared queue. Based on my measurements (see How fast is it, honestly? above):

  • Trivial prompts ("say ok"): ~7s
  • Short code generation: 30s to 3 minutes
  • Tool-using or multi-step tasks: 1 to 3 minutes per turn

If openclaude seems to be hanging, it probably isn't. It's just thinking on a shared GPU. Give it up to 3 minutes before you assume something's wrong.

Rate limits: the free tier has usage caps. Fine for development, tinkering and the occasional microservice. For heavy usage, you can add credits at build.nvidia.com.

Architecture, in one picture

Terminal
  |
  v
OpenClaude (local CLI agent)
  |
  v  HTTPS (OpenAI-compatible protocol)
  |
NVIDIA API (integrate.api.nvidia.com)
  |
  v
Nemotron 120B (runs on NVIDIA GPUs)
  |
  v
Response streamed back to your terminal
  • OpenClaude handles the agent loop: reading files, editing them, running shell commands, keeping conversation state.
  • Nemotron handles the reasoning: understanding prompts, generating code.
  • Your machine just runs the lightweight CLI. No GPU, no local weights, nothing to keep updated.

Where to go from here

  • Swap models per task. Use the nano-8B for one-off shell questions, the 120B when you actually need reasoning.
  • Dockerize the CLI and run it on a VPS for remote access.
  • Add an Ollama fallback for offline work (at the cost of 4GB+ of disk).
  • Build custom agents on top of it for deploy, test or review pipelines.

Credits

If you end up using this, I'd love to hear what workflows you build on top of it, especially anyone combining Nemotron with a local Ollama model for offline fallback.

Thanks for reading.