nferencer

Deeply control Artificial Intelligence models


See the inner-workings
Distributed inference
Markdown rendering
Serve models privately
Inference SOTA models
Inference SOTA models

Artificial intelligence
should not be a black box.

Private

By default, all AI processing happens offline and on your device. No data is sent to the cloud for processing.If enabled, the server feature allows you to serve and connect to your own or trusted devices. No data is sent elsewhere.

State-of-the-Art

Support for SOTA models. Fastest inferencing performance. Patent Pending deep learning inferencing to fully control your models.

Realtime token probabilities

Token Inspection

Tap on the token inspector icon to reveal the probabilities for each token, showing you exactly what the model was thinking at each step.

Token Entropy: Instantly see contentious tokens, allowing you to better gaguge the confidence of the generation.

Token Selection: Specifically select tokens to manually explore alternative branches.

Token Exclusion: Select tokens to exclude from the generation, such as foreign characters.

Private hosted server

Private Server

Serve and inference models over the local network or internet with SSL encryption. Keeping the privacy of your inference in your premises.

Compatible APIs: Also includes Ollama and OpenAI compatible APIs for application development.

Distributed Inference: Link two computers together for shared memory inference of even larger models.

Prompt prefilling

Prompt Prefilling

Expand the prompt field to seed a model's response by prefilling the Assistant message.

This technique allows you to direct its actions, skip preambles, enforce specific formats like JSON or XML, and even unlock gated responses.

Markdown rendering with LaTex example

More Features

Rendering: Support for markdown with advanced LaTeX rendering. Code previews also coming soon.

Model Streaming: Stream large models directly from storage, using custom read-only implementation for low-resource devices.

Understand what the AI is thinking

Understand
why.

Pricing

 
 

Base

Free
Private on-device AI
Unlimited processing
Markdown rendering
Model control settings
Download models
Limited inference server
Limited model streaming
Limited distributed compute
Auto-load/unload settings
Retention control settings
Parental controls
Limited prompt prefilling
Limited token entropy
Limited token exclusions
Limited token probabilities
Limited Shortcuts integration
Xcode Intelligence integration
 

Professional

$9.99 USD1 per month
Private on-device AI
Unlimited processing
Markdown rendering
Model control settings
Download models
Encrypted inference server
Model streamingAllows streaming large models partially from storage for low-memory devices
Distributed compute
Auto-load/unload settings
Retention control settings
Parental controls
Unlimited prompt prefilling
Unlimited token entropy
Unlimited token exclusions
Unlimited token probabilities
Shortcuts integration
Xcode Intelligence integration
Support the development
1. The prices shown here are in USD. Apple and Google will convert the amount to your local currency at checkout, based on your region.

Subscribe for updates

With more features coming soon, you can be the first to know.