A single comprehensive software engine for enabling Speech, Speaker, Face, Object, Emotion Recognition, Translation, Access Controls, and much more, using a unified set of APIs designed for Integrators and Software Developers -- works standalone (Android and Linux) and in client/server mode
RecoMadeEasy Embedded AudioVisual Recognition Engine by Recognition Technologies, Inc.
  • AudioVisual Recognition (Embedded) (Server Based)
    (Combination of Speaker, Speech, Face Recognition, and Object Detection and Recognition with a single interface)
    RecoMadeEasy® (Reco Made Easy)
    Server-Based AudioVisual Recognition


    RecoMadeEasy® (Reco Made Easy) Embedded AudioVisual Recognition is an embedded natural language voice and video recognition engine that offers comprehensive conversational voice interaction, voice biometrics and facial recognition. The engine has a small memory footprint and is designed to run natively on devices that seek unconstrained natural language interfaces with high recognition accuracy in the presence of service interruption or when full, uninterrupted and secure access to a cloud server is not guaranteed.

    The RecoMadeEasy® AudioVisual Recognition engine is comprised of three distinct technologies: Speaker, Speech, and Facial Recognition, which have been developed in our research labs in New York. When presented with an audio, video, or audio-video stream, the engine via the API returns the following in either XML or JSON:

    1. Speaker Segmentation of Incoming Audio, Video, or Both (including timestamps of the location where the speakers change and tagging of each audio, video, or combined segment with the ID of the person speaking in that segment)
    2. Standalone engine which may be used through a very simple C++ SDK and API. This would be most useful for integrating the engine into current products and IVR systems.
    3. Audio and/or Visual Identification of speaker(s)
    4. Audio and/or Visual Verification of speaker(s)
    5. Full Transcription of the audio stream

    The engine is built to allow users to speak naturally and be understood – even in a far-field, noisy environment. RecoMadeEasy® (Reco Made Easy) is available as an SDK with an included API that contains all necessary components for full integration and enables engineers to get started easily and without any work or costs for development.

    The RecoMadeEasy® AudioVisual Reocgnition engine is also available as a server-side and a standalone product.

    Speaker Recognition

      Language- and Text-Independence: The speaker recognition system is completely text- and language-independent. This means that a user may enroll her/his voice into the system in one language and be identified or verified in a completely different language. This allows the engine to be able to handle authentication and identification processes across any number of languages.

    Large-Vocabulary Speech Recognition

      The speech recognition side of the engine provides one of the most accurate transcriptions for English, handling many different dialects and accents in a single large-vocabulary transcription engine, It is also capable of providing real-time processing in a small memory footprint.

      The speech recognition uses a streaming interface where the recognizer, in the form of listeners and the client, both run on the embedded device. Any light generic client capable of using a websocket interface may stream audio/video to a listener and get back real-time results of the transcript with optional alternative results, including likelihood scores in any codec that is supported by GStreamer-1.0, including MP3, Ogg Vorbis, Free Lossless Audio Codec (FLAC), MP4, Pulse Code Modulation (PCM), or other codecs such as those supported by a standard Waveform Audio File Format (WAVE).

    Face Recognition

      The facial recognition side of the engine provides face detection, face identification (open-set and closed-set), and facial verification from still images and video streams. It supports all standard image and video formats such as png, jpeg, gif, mp2, mp4, .mov, etc.

    Supported Operating Systems

      The RecoMadeEasy® Embedded AudioVisual Recognigtion engine is available for the following operating systems. The C++ SDK, command-line interface, and web services may be used in any of the following systems:

    Server and Desktop Operating Systems (64-bit and 32-bit):

    • CentOS 8 and 7.9 Linux (Latest)
    • Previous CentOS Linux versions: 7.3, 7.2, 7.1, 7.0, 6.6, 6.4, 6.3 6.2, 5.7, 5.6, 5.4

    • Fedora 40 Linux (Latest)
    • Previous Fedora Linux versions: 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, Core 5, Core 4, Core 3, Core 2, Core

    • Ubuntu 24.04 Linux (Latest)

    • Previous Ubuntu Linux versions: 22.04, 20.04, 18.04, 16.04

    • N.B.: May be made available for other Unix-Like systems upon request

  • Large-Vocabulary Speech Recognition (Embedded) (Server Based)
    Initially available for English, Spanish, Mandarin, Arabic, and German, is now available for 100+ languages
    Also includes multilinguagl support and code-switching
    (Customizable domain full transcription ~ 300,000+ word vocabulary)

  • Speaker Recognition (Embedded) (Server Based)
    (Language- and Text-Independent, aka: Speaker Biometrics, Voice Biometrics, or SIV)
    Recipient: Frost & Sullivan Award 2011

  • Face Recognition (Embedded) (Server Based)
    (Face detection and recognition)

  • Object Recognition (Embedded) (Server Based)
    (Object detection and recognition)

For further information please contact us at 1-800-215-0841 inside the U.S. or +1-914-997-5676 from any other country. Alternatively, you may send an Email to Recognition Technologies, Inc.