Today I tried to do something which apparently is silly: Use Ollama on my Windows machine to run gpt-oss:20b, use it in VSCode’s Copilot-chat, and also use it on WSL for my Linux (web) projects. This is, apparently, difficult. To document my findings instead of a “nvm fixed it”, here’s the stuff I did:
- Tell Ollama (running on Windows) to broadcast/be available on the network by assigning it to 0.0.0.0:11434 instead of localhost.
- Tell the remote settings for WSL to look for Ollama on the Windows IP. The preference is
"github.copilot.chat.byok.ollamaEndpoint": "http://172.x.x.x:11434". - Tell the Windows firewall to let traffic through. This can be done in “Windows defender firewall with advanced security”. You need to open port 11434 to both incoming and outgoing connections.
- Click the model thing in VSCode’s copilot chat -> Manage models -> Ollama and tick all the models you currently have downloaded.
- In the Ollama app on Windows, tweak the context size to match your GPU. A 3090 can, apparently, *just* barely contain 32k of context for gpt-oss:20b. If you use too much VRAM, Ollama will frequently crash out without response, or give up trying to fit onto the GPU and attempt the CPU instead.
- Ollama must be running before you attempt to locate it in Copilot chat.
- On WSL, Ollama can be summoned via Ollama.exe. You’ll need to alias
ollama=ollama.exeif you want to control it from within WSL. - Different models show up in different Copilot chat modes. So even if you’ve correctly installed/selected some models, they can fall out of the list of models you’re allowed to use (and simply not show up) when you switch mode.
That’s all I know. It works on my system now, please ask someone else if you run into trouble with this.
