About

Inferencer allows you to download, run and deeply control the latest SOTA AI models (GPT-OSS, DeepSeek, Qwen and more) on your own computer.

No data is sent to the cloud for processing - maintaining your complete privacy.
Advanced inferencing controls give you complete control on their accuracy and outputs.

Understand what the AI is thinking



Inferencer respects your privacy.
All AI processing happens on your device.
No telemetry, no background "update" checks.

Models

Start in the models section where you can select the location of existing models or download new ones directly from Hugging Face.

Chats

Select the model to interact with on the top menu bar and write a prompt to begin. At any point you can switch between models and continue the chat to see what else they can uncover. You can also selectively delete past messages to keep the model focused and less scatterbrain.

Chat Controls

Control the inferencing parameters including intensity of processing, and model streaming - which allows you run large models on low memory devices and multi-task with other applications in harmony.

Token Inspection

Select the inspector to peek into the inner-workings of each word outputted and see the model's confidence levels and alternative choices.

Prompt Framing

Expand the prompt section to utilise the framing feature which allows you to control the output the model generates.

Server

If enabled, the server feature allows you to serve and connect to your own or trusted devices. No data is sent elsewhere. Also includes compatible APIs for application development.

Distributed Inference

With distributed compute you can now link together two Macs, sharing the memory to inference larger models. To use make sure it's enabled in both the app and server settings. Once a connection to your server is made, if both the computers have the same model, a distributed compute icon will appear next to the models dropdown list. Simply tap on it to load the model for distributed compute.

Xcode Intelligence

Use the server feature with OpenAI Compatibility API enabled and SSL disabled to allow Xcode to use your Inferencer as a service provider either hosted locally or over the internet.

Shortcuts

Use the Shortcuts app to automate inferencing workflows (e.g., copy text from clipboard > inference > speak result).

Settings

Including parental controls, setting up an automatic deletion policy and configuring font sizes.

Privacy

By default, All AI processing happens offline and on your device. No data is sent to the cloud for maximum privacy.

Subscribe for updates

With more features coming soon, you can be the first to know.