Fastinference is a machine learning model optimizer and model compiler that generates the optimal implementation for your model and hardware architecture:
In Fastinference the user comes first. We believe that the user know best what implementation and what type of optimizations should be performed. Hence, we generate readable code so that the user can adapt and change the implementation if necessary.
In Fastinference optimizations and implementations can be freely combined. Fastinference distinguishes between optimizations for specific models which are independent from the implementation and specific types of implementations. Consider for example a simple decision tree, then the pruning of the model does not affect its implementation and vice-versa.
Fastinference can be easily extended. You can easily add your own implementation while benefiting from all optimizations performed on the model and vice-versa.