About
Inferencer allows you to download, run and deeply control the latest SOTA AI models (GPT-OSS, DeepSeek, Qwen and more) on your own computer.
No data is sent to the cloud for processing - maintaining your complete privacy.
Advanced inferencing controls give you complete control on their accuracy and outputs.
All AI processing happens on your device.
No telemetry, no background "update" checks.
Models
Start in the models section where you can select the location of existing models or download new ones directly from Hugging Face.Chats
Select the model to interact with on the top menu bar and write a prompt to begin. At any point you can switch between models and continue the chat to see what else they can uncover. You can also selectively delete past messages to keep the model focused and less scatterbrain.Chat Controls
Control the inferencing parameters including intensity of processing, and model streaming - which allows you run large models on low memory devices and multi-task with other applications in harmony.Token Inspection
Select the inspector to peek into the inner-workings of each word outputted and see the model's confidence levels and alternative choices.Prompt Framing
Expand the prompt section to utilise the framing feature which allows you to control the output the model generates.Server
If enabled, the server feature allows you to serve and connect to your own or trusted devices. No data is sent elsewhere. Also includes compatible APIs for application development.Distributed Inference
With distributed compute you can now link together two Macs, sharing the memory to inference larger models. To use make sure it's enabled in both the app and server settings. Once a connection to your server is made, if both the computers have the same model, a distributed compute icon will appear next to the models dropdown list. Simply tap on it to load the model for distributed compute.Xcode Intelligence
Use the server feature with OpenAI Compatibility API enabled and SSL disabled to allow Xcode to use your Inferencer as a service provider either hosted locally or over the internet.Shortcuts
Use the Shortcuts app to automate inferencing workflows (e.g., copy text from clipboard > inference > speak result).Settings
Including parental controls, setting up an automatic deletion policy and configuring font sizes.Privacy
By default, All AI processing happens offline and on your device. No data is sent to the cloud for maximum privacy.Subscribe for updates
With more features coming soon, you can be the first to know.