Regular vs. Accelerated inference

Depending on the plan that you're subscribed to on Tinq.ai, we use different machines to process requests for rewriting, summarizing and classifying.
We use two class of machines: CPUs and GPUs. These two have different computing capabilities.

The primary distinction between CPU and GPU architecture is that a CPU is designed to execute a wide range of operations fast (as indicated by CPU clock speed), but is restricted in the number of concurrent processes that may be run. A GPU is intended to render high-resolution graphics and video in real time.

GPUs are often employed for non-graphical applications such as machine learning and scientific computation since they can execute parallel operations on several sets of data. GPUs, which are designed with thousands of processing cores working concurrently, provide tremendous parallelism, with each core focused on executing efficient computations.

Example:

Let's say that we have this paragraph:

Playing the piano will definitely help you gain confidence as it requires you to make decisions on your own. It is always extremely rewarding to hear what you have created yourself. (In the early days of learning and playing, your teacher will definitely help you with decision-making, but the process becomes more autonomous as time goes by!). The process of learning a new piece of music is fantastic. You start from nothing, practice, improve, and finally get the fruits of your hard work, as farmers do during the harvest time. The progression is fascinating and getting an assured reward after putting the effort in certainly boosts your self-confidence and sense of achievement.

It has 681 characters and 112 Words.

Here is how each plan will perform in order to rewrite it:

  • Free & Starter: CPU - 13.67 seconds. Note that using the starter plan, the limit is 500 characters per request, we shrunk the request to fit it in the plan's limitation.
  • Pro & Ultra: Pool GPU - 2.3 seconds.
  • Scale: GPU Dedicated Nvidia Tesla A100 - 0.6 seconds.

On a side note, when using the scale plan, with 800 words, the computing time was around 1.2 seconds on average.

πŸ‘

These tests were performed in english, with no additional parameter. When using other supported languages, it takes an average of + 1 second.