Forum Topics AIM AIM Infocomm 2025

Pinned straw:

Added 5 months ago

Shout out to @Shapeshifter who put me on to AIM's involvement in Infocomm 2025.

Some scrolling put a couple of videos in front of me.

I have been keen to see AIM's services in operation and get an understanding of the real world uses.

This interview was billed as " the first-ever trilingual live broadcast in English, Spanish, and German" a live translation from English to German and from English to Spanish. The presentation came across a little clunky to me. Maybe as a result of the TV production teams work?Interesting non the less.

https://www.avixa.org/avixa-tv/videos/How-AI-Media-is-providing-live-translations-on-AVIXA-TV/xJG0aXAC

The following is part one of a three part interview with Pete "Tea Man" Connan and Tony Abrahams in New York recently:

https://www.linkedin.com/posts/thetecheffect_lexivoice-aitranslation-infocomm2025-activity-7335665938584788992-kAy6/?utm_source=share&utm_medium=member_desktop&rcm=ACoAADmFthUBoF5CIWqAKjGQdOhTAvbZnt5N7z8

Next an interview with AIM's Russ Newton. In the bottom left of the screen LEXI is transcribing and translating the interview live, with a lag of approximately six seconds:

https://www.linkedin.com/feed/update/urn:li:activity:7342959259493593089/

DrPete
Added 5 months ago

Thanks @Arizona and @Shapeshifter for the AI-Media links. I found them useful for fleshing out my understanding of their products, and their marketing activities.

I don't speak German or Spanish, so can anyone comment on the quality of the voice translation in the first video? Also, the voice translation was pre-recorded, so not a thorough test of a live broadcast.

Also, have to admit I was disappointed with the quality of the live captioning in the third video with Mark Skehan. At times it was accurate, but at other times it was completely wrong, and at times it just seemed to make up new unrelated content (AI hallucination?). I didn't run the numbers, but didn't seem much better than around 50% accurate. Much worse than existing readily available translation tools. Again, not knowing Spanish, I can't judge the live text translation, but given it involves another layer of AI to translate, it wouldn't surprise me if it was near useless.

Maybe the transcription demo wasn't a best case scenario. Perhaps a business trade stall isn't ideal for voice quality. Still, the voice quality when watching the video was ok, so why couldn't Lexi do better? And Tony and Russ are spruiking a wide range of use cases, many which won't have noise-free studio-quality audio recording.

Makes me question where they are getting their accuracy numbers. It definitely didn't outperform what a human could do. In their H1FY25 results presentation they said Lexi was achieving 98.8% accuracy - no way did they achieve anything like that in the demo video. Where are they getting those accuracy numbers? Or why was the demo so much below what Lexi can achieve on average?

The demo felt more like a proof of concept rather than a finished product. Of course, AI is improving rapidly, so maybe Lexi improves rapidly. Or maybe it doesn't - past transcription and translation tools quickly reached 90% usefulness, but couldn't then improve further.

Any thoughts from others on this?

29

Arizona
Added 5 months ago

@DrPete You make some great points.

Had not even considered the accuracy of the translations. I simply took it for granted that the accuracy must be decent.

I was not blown away with what I saw in these videos and wonder: am I missing something?

I need to take a closer look at this and get a better handle on AIM's abilities.

Is there an example of Ai Media's products in use, that Straw people can point to, to help a pleb like me understand the company's offerings?

21

DrPete
Added 5 months ago

One for the stat nerds . . . Just came across this blog by AI-Media that explains how they measure accuracy:

https://www.ai-media.tv/knowledge-hub/insights/accuracy-measurement-captions-ner/

Here's a counter-argument for NER from another captioning provider:

https://www.3playmedia.com/blog/measuring-captioning-accuracy-why-wer-and-ner-analyses-differ/

TLDR: AI-Media report NER accuracy. It is a commonly used measure for live captioning (but much less so for recorded captioning). NER is not based on word matching, but rather based on matching the meaning. The scoring is subjective and labour intensive (I read elsewhere on a Uni of Melb page that it takes 10-15 x the length of audio to score the audio using NER principles). Also, NER statistics will inherently be a lot higher than word accuracy statistics because 1) errors are discounted if they impact meaning less, and 2) errors are scored on concepts/phrases rather than words even though the denominator is still based on total number of words (eg if you omit an entire 10-word sentence/phrase it might only count as 1 error, giving an accuracy of 90%). 98% NER is typically seen as baseline accuracy required for live captioning.

I realise this is a digression from focusing on the business case for AI-Media. They aren't developing the AI models. In Tony's words, they are "orchestrating" existing AI models and technology infrastructure. However, given the poor performance of the one and only live demo I've seen (other demos I've seen included human oversight/input), I'm not yet confident that the AI models are strong.

I'm keen to be convinced otherwise if anyone can provide evidence of good quality AI-only (ie no human involved) transcription provided by AI-Media.

28

Arizona
Added 5 months ago

@DrPete I'm rewatching the Mark Skehan interview.

There really are a lot of issues with the transcribing in this example. Maybe Lexi struggled with Marks Australian accent.

The image below is a screen shot of the live transcription that takes centre stage in the interview. I have double-checked, there was no mention of a Python in the interview and the transcribed sentence is nonsensical. This is just one example. There are others throughout the interview

66ee5080a933265e79f94ba21a22aa9dce03bb.png


19

SudMav
Added 5 months ago

I would suggest that it’s part language but also part of the fact that he’s talking way too far away from the microphone.

I agree that it’s not as accurate as I thought it would be, and from this vid just feels a lot like the live captioning in a ms teams meeting.

maybe they get them to use microphones like in the cricket cricket where they make sure it’s an exact distance from the mouth to ensure optimal sound quality

17

Arizona
Added 5 months ago

@SudMav That's a fair point re the microphone.

I guess it's possible that the audio we are getting is not the audio being received by Lexi. The is also a lot of background noise. There are times in the video where I wondered if it was transcribing another conversation all together.

Maybe there are volume or gain levels that need adjusting on the audio being input into Lexi.

15

RogueTrader
Added 5 months ago

Yes I struggle to see how they could win five awards at the show if the transcriptions are sh*thouse :)

19

Arizona
Added 5 months ago

@RogueTrader Maybe every exhibitor gets an award or five?

16

DrPete
Added 5 months ago

Lol, yeah, keeps the parents happy when all the kiddies are getting awards. More likely to pay for a stall next year.

I've reached out to Tony via LinkedIn to ask about the quality issue in the demo. Don't know if he monitors or replies to LinkedIn messages. I'll report back if I hear anything.

It surprises me they'd set up the demo at a trade stall if it was always this bad. I'm hoping it was some combination of mike and background noise issue. You're right @Arizona that at times it just seemed so wrong that I wondered whether it was picking up on another conversation. I'm hoping this poor performance was an exception rather than typical, but my confidence has been dented.

19

Shapeshifter
Added 5 months ago

It would be interesting to know what the NER score was on that interview @DrPete ! Not their usual 98%+ that AIM quote I'm sure.

For me what is disappointing about the relatively poor performance of LEXI in that situation is not the performance itself (there could be several reasons for that like poeple have said) but the fact that they posted the interview as a showcase to display LEXI without quality control recognising it was substandard. This is perplexing.

As for NER it seems reasonable for live captioning but there is a subjective element to the scoring. AIM look like they have a fairly tightly defined set of rules on their web page but it would depend on how you implement those as well. Perhaps the best way to use NER is for AIM to measure changes in its own performance. If they are using the same set of rules to get their NER score then any changes should show relative improvments (or not) as the product matures.

Thanks for the thought provoking posts.

18

mushroompanda
Added 5 months ago

NER can be vastly improved by two things that I know of

  1. Topic models.
  2. The clarity of the audio source.


Topic models are specific vocabulary that are relevant to the event taking place. E.g. The names of Australian cities and towns in a weather report, the names of participants in a meeting, "EBITDA", "CAC", etc in a company conference call. I don't exactly recall, but it could be at least a 1% uplift in NER when done correctly - I remember reading something like this from the AWS guys using AIM's products.

Audio source is of course very important. In a studio environment, it's very clean. At a busy conference hall, there's probably a lot of background noise that it's picking up that could be muddling things up.

21

Arizona
Added 5 months ago

@Shapeshifter Perplexing! exactly right.

We have cut them some slack here for imperfect audio, mic placement, environmental noise levels etc, but to promote this as showcase of what AI-Media can do seems strange.

Or looking at it another way: Is this what Lexi does? Is this a reasonably accurate example of Lexi in a real world scenario?

That is what I am asking myself. I was on the verge of opening a position in AIM this week and these videos came through and now I am scrambling to find real world usage examples of AIM's products, in order to maintain my interest in the company.

Valuable chat, Thanks all

15

Arizona
Added 5 months ago

@DrPete I imagine that there is some sort of "mixer" and or interface into the computer and that human error can play a part in getting the mic levels right for the situation. I am guessing that the audio we are getting when watching the video belongs to the camera of Mark the interviewer and may not be the same audio that Lexi is getting.

I'm sure Lexi can only work properly within certain parameters etc and fair enough too. However as @Shapeshifter says, they went on to promote this a a showcase of what Lexi can do.

13

Arizona
Added 5 months ago

3e2b3177872d0b1e5d34de392732a10006658a.jpeg

This is a comment below Tony Abrahams reposting of Mark "The Bearded Tech" Skehan's post on Linkedin.

12

DrPete
Added 5 months ago

Yeah @Arizona I almost posted a similar message in response to Tony's comments. But decided to go the more diplomatic and less public route of trying to contact Tony directly. Let's see if Tony replies to either Justin's question or mine.

18

Arizona
Added 5 months ago

@DrPete I think the less public route is the classy way to go. for sure.

Good to see the issue being picked up by others out there in the world.

14

Arizona
Added 5 months ago

In the interests of balanced reporting:

I came across this earlier interview, between Mark "the Bearded tech" Skehan and Russ Newton @ Integrated systems Europe a few months ago.

Lexi appears to be far more accurate in this video.

14
jcmleng
Added 5 months ago

Thanks @Shapeshifter and @Arizona for the flag on AIM's Infocomm 2025 involvement and the links to the video’s. Still trying to find a reliable way to get notified of these events and links directly!

I viewed the 3 interviews and actually took away quite a few new insights. 

In summary,

  • These new capabilities make good technical/capability/customer sense against the AIM 9 squares strategy.
  • This is giving me the sense that AIM is actively building capability to enable future opportunities outside of Broadcast.
  • It also demonstrates AIM’s relentlessness in trying to deliver Lexi capabilities into evey possible customer workflow. 


Discl: Held IRL and in SM and topped up again at 48.5c

My take away points:

TONY INTERVIEW WITH PETE CONNAN

Tony summed up AIM's capability nicely with this:

“We are delivering “AI” into every possible workflow for our customers, live and recorded, using the best of public information, private information, that is scaled within that organisation to be coherent and clear, and then live information on top of that. Think about it as the best of ChatGPT, but with your private information and updated live with the thousands of Lexi feeds that we have going on around the world. A Bloomberg service, tailored to your organisation”.

The other insight in that interview was Tony mentioned that both his 2 co-founders were deaf. The problem they were trying to solve was originally not about broadcast, but how do deaf people get an education. In 2003, 50% of deaf kids did not complete high school. This drove the finding of a way to deliver live captioning, at scale, in classrooms by 2007., At peak, AIM has 800 re-speakers, using/checking AI to deliver live captioning

I did not understand this before, but I can now make better sense of Tony’s comments in earlier interviews in describing the transition from human to AI when I started deep diving AIM.

DANTE INTEGRATION

Technically, AIM is about orchestrating live video and voice feeds through the AIM iCap ecosystem/Lexi Cloud, which then opens up all the Lexi services to those feeds and all the good revenue that comes with it. 

My immediate reaction to the Dante integration was thus “wow, that makes good sense”.

Enabling the hookup from AIM to Dante would be a “one-time” integration - do it once platform-to-platform, and theoretically, all customers on Dante would thereafter have a standard technical vehicle to connect a Dante-enabled AV setup to the Lexi Cloud. This is very much aligned to AIM’s vision of growth via platform-to-platform integration. 

Once there is a live voice and/or audio feed from a Dante site to Lexi Cloud, then that Dante site should be able to access all Lexi Services - Lexi Voice, captioning etc, probably not much differently from the major broadcasters

AD8/Dante is primarily focussed on corporate/office/education institutions/government AV installations - a Dante pipe would then allow AIM penetration into the non-broadcast world, which makes up 6 of AIM’s 9 squares 

As I also like and hold AD8, this should be a good thing for AD8 as it should add another out-of-the-box capability to the Dante platform

6586c216ca4f7c84b0517df3a15eac46a2401c.png

So, pursuing a Dante integration makes great sense, given how it would open up non-broadcast via an immediate Dante-enabled ecosystem. This does not feel like a random opportunity. Something to absolutely cheer on, in my view!

BACKBONE NETWORK

This was interesting to get low latency audio translation from Lexi Voice to mobile devices in a live event setting for huge audiences where the delivery of the real time voice translation is controlled by AIM and is not dependent on the infrastructure of the source environment

  • Ultra low distribution capability to mobile to reduce the current 6-8 sec latency of the translation, providing a means to further compress and reduce the latency to mobile
  • Can scale to high capacity eg. 100,000 viewers or more 
  • Can strip out the audio of the speaker, not disrupt all the background contextual audio 


This should theoretically improve the delivery of broadcast content to live events as well as/and/or mobiles. Very nice!

MARK SKEHAN INTERVIEW

  • Lexi Voice latency is now 6-8 seconds - that is really impressive
  • $30/hour cost to enable Live Voice translation over 100 languages - that feels cheap as chips
  • Running a lot of proof of concepts to prove accuracy
  • Via Infocomm, exploring potential with all kinds of other integrations/strategic partnerships that AIM can build to allow AIM to push and pull the audio experience into all kinds of different environments and designs - bring language inclusiveness everywhere - airports, mobiles and to do this within secure infrastructure 

32

Shapeshifter
Added 5 months ago

Flesh on the bones thank you @jcmleng !

The best spot to get AIM updates I find is on their linkedin page.

21

Arizona
Added 5 months ago

@jcmleng You have pulled the key details together, very nicely. Thanks for sharing.

I am on the verge of becoming an AIM shareholder. The discussions here on SM have been very helpful.

As @Shapeshifter says, Linkedin is a good place to start for company info.


24

JohnnyM
Added 5 months ago

Great discussion @Arizona I watched the videos you shared and was impressed by AIM’s history and heartwarming start as a transcription service for deaf people in education settings. Amazing!

But I just don’t see how it survives from here.. ask ChatGPT how many open source code transcription services there are.. I.e Free for anyone to replicate. Including as I previously posted, transcription is free for everything played through VLC Media player.

None of them are perfect quality but Free is a hard price to beat. I use the Quartr App to listen to earnings calls and search through the transcripts, for free, it’s amazing. You can watch the transcript get written in real time and search it instantly.

AIM have a viable business while dinosaur broadcasters get up to speed.

I could be wrong, wouldn’t be the first time.

14

SudMav
Added 5 months ago

Thanks for your views @JohnnyM . I do agree to some extent that there will be a lot of competition out there with LLMs, and there will be some free opportunities out there that will cannabalise some of their future opportunities. The trusted brand and reliability of AIM will likely go a long way to keep going while the older broadcasters get up to speed, and they would need to get their pre-recorded capabilities up to speed very fast to stay competitive in this area.

I think the one thing that stuck with me though from one of the presentations Tony gave, is that their product has backup capability, can work behind in a secure network and can function without any internet connection. While we live in a world where internet connections are easily accessible, I found that this was a great point of difference that would provide their customers with the certainty that captioning wont drop out in live broadcasting.

The potential partnership/integration with Dante would be an interesting and promising collaboration if they can get it to work right, but not sure how to put any value to it now.

I still need to do more work on this one, but this is my 2c.

Held IRL and SM.

17

Arizona
Added 5 months ago

@JohnnyM Its great to able to kick these ideas around with the intelligent and thoughtful folks here on SM.

There are going to be people here that will explain this better than I, but, in a very general sense, I'll give it a go....

It is my understanding that Ai-Media's strength is its ability to integrate Ai into customer work flows, via AIM's encoders and software.

Apparently, AIM's moat is not easily replicated and AIM is not in competition with Ai language models and the like, rather it enables Ai platforms/tools to integrate into the systems of Ai-media's customers.

This is discussed in the latest SM meeting with Tony Abrahams 1st April 2025.

I am still trying to get my head around this and understand how this works in the real world.

Tony comes across as articulate, intelligent and passionate about what he does. The idea that AI-Media is not so much a provider of Ai and rather a conduit for it, might be easier to get my head around if I could see it in action. However, The videos that we have been discussing have left me underwhelmed and I wonder if I am missing something.

15

Arizona
Added 5 months ago

@SudMav The backup capability and the ability to operate off line would seem to be great features. Something I had missed.

Thanks for your input.

12

jcmleng
Added 5 months ago

@Arizona , The main slide which talks about the moat is the one below. Have posted detailed notes about the AIM moat, previously, around this slide, as have others. Think the posts are replies to, and hence embedded, within the SM meeting posts as well as the AIM results announcements which may make it trickier to search. Think I started my own deep dive from the 2nd last SM meeting with Tony a year or so ago.

I found it helpful thinking about AIM with the slide and this highly summarised mental model below as the start point, as this gets to the guts of the AIM offering, what it actually does, and hence the moat:

  • AIM does not compete with all other AI translation tools/LLM's - it is in fact, a USER/Consumer of those LLM's
  • The moat is created from (1) the broadcasters MAIN BROADCASTING FEED (if this feed is broken, the TV screens go blank) being directly connected to/flows through the AIM Encoders, and the iCAP Cloud, where the LEXI-driven enhancements are applied, before emerging as the Enhanced Live Broadcast feed which the viewers see - the Broadcaster and AIM's networks are thus 100% intertwined for every second that a broadcast is live
  • Within the iCap Cloud, AIM uses a combo of the LEXI products + AI LLM's + technical smarts + all other available contextual information around the broadcast content, to do the captioning and translating, live - hence the use of the word "orchestration"
  • Instead of using humans to do the captioning, AIM has replaced the humans with AI to do the captioning faster, cheaper and more reliably, which is where the "AI" dimension comes from
  • From captioning, AIM is now aplying the above onto LIVE Voice services via LEXI Voice ...


Hope this helps ...

266728f4edeeea8345883e6f3af0446a128112.png

22

Arizona
Added 5 months ago

@jcmleng This is great. It does help. Thanks for taking the time. You explain it so much better than I.

I watched the meeting from April 1st, with Tony. The slide you included is one Tony leans on heavily to describe what Ai-Media does and essentially how it does it. I think I am generally across this idea.

There is a fair amount of info on AIM's website, devoted to explaining what Ai-Media does. I am surprised that there aren't many video examples. I would love to see and experience AIM's products in action.

Examples like this leave me wondering where the tech is at.

Does anyone have a good example of Ai-Media's work, that can be seen in action?

I am grateful for the effort Straw people have put into this discussion, its great - Thank you all.


15
Shapeshifter
Added 5 months ago

I thought a couple of interesting things came out of the AVIXA interview.

AIM are holding conversations with Audinate to look at intergrating with Dante.

They have partnered with Backbone Networks to develop ultra low latency translation for mobiles.

Basically they are throwing heaps of mud at the walls and hoping some sticks!

22