nferencer

Run and deeply control Artificial Intelligence models


See the inner-workings
Markdown rendering
Download and run SOTA AI models
Advanced AI controls

Artificial intelligence
should not be a black box.

Private

All AI processing happens offline, in your premises, on your device. No data is sent to the cloud for processing.

State-of-the-Art

Support for SOTA models. Fastest inferencing performance. Patent Pending deep learning inferencing to fully control your models.

Realtime token probabilities

Token Probabilities

Tap on the token inspector icon to reveal the probabilities for each token, showing you exactly what the model was thinking at each step.

Here you can also select alternative tokens to generate an alternative response.

Token entropy

Token Entropy

Instantly see contentious tokens, allowing you to better gaguge the confidence of the generation.

Token Selection: Specifically select tokens to manually explore alternative branches.

Token Exclusion: Select tokens to exclude from the generation, such as foreign characters.

Prompt prefilling

Prompt Prefilling

Expand the prompt field to seed a model's response by prefilling the Assistant message.

This technique allows you to direct its actions, skip preambles, enforce specific formats like JSON or XML, and even unlock gated responses.

Clinical note generation example

More Features

Rendering: Support for markdown with advanced LaTeX rendering. Code previews also coming soon.

Model Streaming: Stream large models directly from storage, using custom read-only implementation for low-resource devices.

Understand what the AI is thinking

Understand
why.

Pricing

 
 

Base

Free
Private on-device AI
Unlimited processing
Markdown rendering
Download models
Model control settings
Limited Memory offloading
Auto-load/unload settings
Retention control settings
Parental controls
Limited prompt prefilling
Limited token entropy
Limited token exclusions
Limited token probabilities
Limited Shortcuts integration
 

Professional

$9.99 USD1 per month
Private on-device AI
Unlimited processing
Markdown rendering
Download models
Model control settings
Memory offloadingAllows streaming large models partially from storage for low-memory devices
Auto-load/unload settings
Retention control settings
Parental controls
Unlimited prompt prefilling
Unlimited token entropy
Unlimited token exclusions
Unlimited token probabilities
Shortcuts integration
Support the development
1. The prices shown here are in USD. Apple and Google will convert the amount to your local currency at checkout, based on your region.

Subscribe for updates

With more features coming soon, you can be the first to know.