HEPIUS Software - Showcase

Reference Projects

The majority of our projects were implemented under non-disclosure agreements and are therefore not legit for public reference.
The listed sections describe topics of past project at a very high level. If you like to know more about them, do not hesitate to drop us a message.

PDF Compression

The modification and preview of large PDF files is still a big challenge nowadays. We investigated on possible ways to compress PDF files in a performant way for a customer project.

Challenges

PDF Files are often exchanged between co-workers and loaded onto numerous devices.
Printing and sending them can be a challenge, especially once the file size exceeds 20 MB.
Once documents are highly confidential, the use of public tools for compression is no option.

Solution

A dedicated Amazon Web Service Account was set up
AWS API Gateway was configured to only accept authenticated users from withing the company (OAuth2)
Ghostscript was used for compressed and called from within the NodeJS Lambda function.
All processed assets were stored in the protected or private S3 environment, managed by AWS Amplify
Android and Web support was needed. Hence, the frontend was implemented in Flutter.
The website was hosted on-premise, the app is installed directly on the corporate devices (signed apps, manually installed)

Voice Commands in AR

Augmented Reality (AR) as well as Extended Reality (XR) are on their way to the consumer market. Apple's recent announcement of the Apple Vision Pro (link) highlights the potential of spatial computing and new ways to consume content and interact with the environment.
Before the Apple device, we investigated the hand free use of an EPSON AR Headset and combined its ability to present content in a "heads up display" manner with the input of voice commands. That way, one could fill out forms by either talking or using the companion app.

Challenges

AR Eyewear is pricey, and its input devices are not suitable for gloves, or scenarios where the hands are covered by objects the worker is operating on
During maintenance, one may need to document the progress and problems without washing hands and writing things
Trying to remember everything for future documentation can lead to mistakes and false documentation

Solution

AWS API Gateway was configured to only accept authenticated users from withing the company (OAuth2)

A serverless application was implemented to manage the form filling process and the loading of content
AWS Transcribe was used to convert spoken language into text. The analyzed text was then used to fill out the form.
"Action Words" were defined to ease the "next" and "clear" operations.
Only Android support was needed and the form itself consist of 12 steps only. Hence, the frontend was implemented in Android with an embedded web view, which was pointing to an Angular form.
The interaction between the server and AWS Transcribe was written as a Service in Android, consumed by the Android Activity that holds the WebView.
The communication between the form and the Voice Transcription was done by implementing a JavaScriptInterface, so that JS calls like "TRANSCRIBE.start()" started the recording and analysis of the audio signal in the Android native context.
The website was stored in the application assets. The app was installed directly on the EPSON Device.

Patient File AI

No matter if it's about Natural language processing in text or spoken word, AWS provides the right tools and clearly distinguishes between "normal environments" and the medical context. The same applies for Machine Learning.
Check out "AWS Comprehend Medical", "AWS Transcribe Medical", and "Amazon SageMaker for Healthcare and Life Sciences" for details. It is important to never use general models for medical use cases.
To analyze Patient files, we checked out AWS Comprehend Medical to convert unstructured medical documents into medical ontologies.

Challenges

Clinical trial reports, patient health records, and doctors' notes are likely unstructured and written in different way
The vast amount of data that is stored over the course of a patient journey can overwhelm practitioners in their day-to-day life.
The preparation time for a practitioner is limited. Hence, chances are high that the patient is asked to frame the situation and has to explain key facts in their words.
The anamnesis may miss important facts and delay the diagnosis or treatment

Solution

AWS Comprehend Medical understands the global language for clinical terms (SNOMED-CT), International Classification of Diseases (ICD-10), and the normalized names for clinical drugs (RxNorm) and can rank their relevance based on the gathered information.
In this Proof of Concept, we sued unstructured text of anonymized patient file records and displayed the output on a tablet device
Although Comprehend Medical does not replace a medical doctor's decision and understanding, it was obvious to the participants that the conversation was much more efficient than the traditional gathering of information via the clinical management system.
Note: All assets were processed in a lab environment, no sensitive data was ever processed in a real clinical scenario.

3D Chatbot Avatar

Usually, all of these Chat-GPT bots feel quite old-fashioned and remind you of the days when you were forced to chat with a bot in a support page channel like WhatsApp Business. Just an input field and generated output.

But for the exquisite and special occasions, the visual appearance has a high impact on the retention rate. Let's pretend potential customers and decision makers should interact with a bot and ask it about the latest high-end hardware or system module. An input field would look quite unimpressive.

Therefore, we took a look at 3D avatars to make chatbots more appealing, simply more like a sales person.

Challenges

Due to the sensitive content, we should not use off-the-shelf Chat-GPT but ensure that all the content (or custom knowledge base) stays in the intranet, offline.
The avatar should speak like English like a native human being, with breathing pauses.
The rendering of the avatar should look like a "single shot movie", so that answers and pauses between questions look as if a real person were standing and talking. Visual “cuts” should be avoided.
The avatar should be configurable in gender, voice, and language.

Solution

Investigation on 3D chatbots and decided to give d-id a chance, knowing that it is based on plain ChatGPT, we were interested in its chat-rendering (try out) but also impressed by its solid developer API.
The implementation of a video-chat like conversation stream based on the WebRTC based API of d-id allowed us to implement a Proof of Concept within a week.
We were able to display the chatbot on a custom website and interact with the avatar of choice via text or voice commands.
We added "custom knowledge" of a premium product by using the prompt "I will ask you questions based on the following content: [custom content of up to 4k tokens]" at the very beginning of the d-id conversation. That way, we were able to demo and run usability tests based on a fully working product w/o all the technical heavy lifting.
To demo the interaction with hundreds of PDF files of custom knowledge base, we used chatbase, which comes with an easy-to-use setup process and API documentation. We could successfully demo the processing of unstructured data.
For the initial version, we decided to go with an on-premise installation, used gpt-turbo and decided to opt out of the human-like 3D rendering for now.

Image Analysis

The understanding of papers, pictures, written text, sketches and notes, is a science of its own. While human beings are perfect at understanding the context, for example, they ignore address lines, dates, and footer but directly jump to the essential lines of a letter. Machines are incredibly efficient in processing and structuring gigantic sets of data.

Challenges

In the daily operations, important documents were scanned and stored as images (JPG) in the intranet.
The written text was not considered by the system's tagging process at all, instead the files had to be opened and interpreted by human beings.
“Information Overload” – Documents were just spot checked or opened when the customer referred to it during the conversation.
The chances of not considering essential information from a scanned document were high.

Solution

Depending on the context, we used either AWS Comprehend or AWS Rekognition to analyze the document.
This allowed us to make sense of unstructured data in case we were processing textural information but also enabled the enhanced tagging of image date
Additionally, we were able to demonstrate the future potential in clinical environment when combining AWS Rekognition with AWS Comprehend Medical in a lab environment (more).