bloggeek
Which WebRTC JS library should I use?
I don’t really know, but there’s a lot in this innocent “WebRTC JS library” question that isn’t clear without digging a lot further.
Every now and again (= a week or two) I get a question asking me to help with the selection of this or that open source component, pick a CPaaS vendor for a project, find someone to outsource WebRTC work to or hire a stellar WebRTC developer.
Many of these emails are about shortcuts. Give us that silver bullet. Shortcuts seldomly work with WebRTC.
Last week, I had a question come in. A startup is looking for a “WebRTC JS library” to use. Something that does 1:1 voice chat rooms, stores user profiles, etc. It also needed to be inexpensive – Twilio is too expensive for them. And a free alternative was their main preference.
The problem I had with it, is that this simple question of which WebRTC JS library should I use didn’t align that well with the set of questions asked.
This article is about what components are needed for WebRTC deployments. If you’re looking to dig deeper into the media paths in WebRTC, then join my free webinar: Mesh, MCU or SFU
Let’s break down WebRTC to its main components as seen from a network architecture perspective:
- Signaling
- NAT traversal
- Media
- Other
Here’s a slide I’ve been using to explain where a device gets connected to in a typical WebRTC session –
SignalingSignaling is how the devices reach out to one another. They can’t do it directly, since they don’t have each other’s IP address, and even if they could, we need some kind of a “protocol” for them to do that.
Signaling in WebRTC is… non-existent. You need to bring your own signaling. This approach confuses some developers, and probably causes this lack of a good solution that fits no-one and everyone at the same time.
Today, you can use SIP, XMPP, MQTT or just proprietary protocols as your signaling for WebRTC traffic. Each such protocol will have its own set of frameworks, services and SDKs that you can use. Some will be free (open source) while others will be licensable software or SaaS based.
NAT traversalNAT traversal is about being able to actually get media flowing.
WebRTC is P2P (peer to peer), meaning you can, in some cases, send media directly across devices. This is something that is impossible otherwise with web browsers. WebRTC also have a preference on using UDP, since it offers better real time low latency characteristics. It is also the only web browser traffic that makes use of UDP, which means it is sometimes blocked as well.
NAT traversal is how WebRTC get past these pesky issues, and it requires additional servers to help it out to do so. Some of these servers (TURN) may end up relaying all traffic through it…
At the end of the day, you will need to deploy these servers or pay for someone to do it for you (no free meals here).
MediaRecording. Group calling. The need to control media paths. Broadcasting. All these end up requiring media servers in the backend. Ones that can process media in one way or another.
The most common approaches today is to use SFUs and solve most of the world/media problems with them. These also offer some signaling protocol of their own – my preference is usually to short circuit these and redirect all this traffic through a different signaling/messaging path – especially for the more complex applications.
Again, they come in different shapes, sizes and types – open source ones and commercial ones. You usually won’t be able to pay for them separately as a hosted service and will need to go to a CPaaS vendor to get the whole set of solutions – if you’re looking for the hosted/managed path.
OtherPayments, user authentication and identity, the website itself and a large number of other things you might be needing.
These are really out of scope of WebRTC, but sometimes are provided by the various vendors and frameworks out there.
Back to that questionWhat were we dealing with to begin with here?
looking for a “WebRTC JS library” to use. Something that does 1:1 voice chat rooms, stores user profiles, etc. It also needed to be inexpensive – Twilio is too expensive for them. And a free alternative was their main preference.
Here’s how I’d break this one down to try and understand what was asked:
- That “WebRTC JS library” gives a hint of someone searching for a signaling framework. Which is great
- 1:1 voice chats strengthens that feeling we’re dealing with signaling only
- The word rooms… that feels more like an SFU media server. In this case, I’ll assume there’s no need for a media server though – due to the price points asked (free), the fact that there’s no ask on recording and that this is a 1:1 scenario
- Stores user profiles. Hmm. this usually has nothing to do with WebRTC. So much so that most CPaaS vendors don’t offer such a capability either
- Twilio is about the full shebang – getting a hosted, SaaS, CPaaS, managed (pick the term you like best) solution that gives you signaling, NAT traversal, media and some other knick knacks. Doesn’t quite fit in with the rest of the ask here
When I get such jumbled questions, it feels like there’s a bit of a misunderstanding of what WebRTC is and about how the ecosystem of vendors and services has evolved around it.
Want to learn more about WebRTC?There are several things to do at this point if you need to grok WebRTC:
- Read this article on learning WebRTC for more suggestions
- Read my WebRTC for Business People report (it is free)
- Learn how I think about WebRTC requirements
- Take the first module of my WebRTC training (it’s free)
- Join me for the webinar tomorrow – I’ll talk about Mesh, MCU and SFU media architectures
The post Which WebRTC JS library should I use? appeared first on BlogGeek.me.
WebRTC for Business People: 2019 Edition
Fresh from the oven – an update to my first ever report – WebRTC for Business People. Download it for free.
It was time. Two years have passed since my last update to this report. In WebRTC-land, things deteriorate and become unusable quite fast. We now have WebRTC in all modern browsers (at least theoretically and to some scenarios) and Microsoft decided to place Edge on top of Chromium. On the vendor stories things have changed and shifted as well.
This, and the need to do something to start off 2019, I decided to write an update to the report. This time, with the assistance of Frozen Mountain who sponsored this update.
Besides the usual updates of reading the report and making sure it is as close to where we are with WebRTC today as possible (and adding more references and links while at it), I’ve also updated the use cases section. I consider this part the most important one in the report.
I removed a few of the stories and added others, ending up with a total of 28 vendor stories. While the groups of these vendor stories haven’t changed, the direction I’ve taken in some of them did.
Here’s what you’ll find in there:
ToolingThe tooling section is usually the hardest one. With over 100 vendors in this space, I wanted to make a few distinct picks, each from a different angle of tooling. I decided this time around to also feature testRTC, a company where I am a co-founder (I am biased on this one, so sorry).
Customer Services and SupportIn the customer services space I wanted to make a change to reflect the growing adoption of “see what I see” type of contact center services, also known as “remote assistance” or similar names. To that end, I’ve featured Indeca4D who are making use of mixed reality in their solution.
Enterprise CommunicationsIn the enterprise communications space, it was time to put a UCaaS vendor – something overdue from the last round I guess. I picked Vonage for this one. They are unique also because they offer CPaaS (=Tooling) and contact center services.
WebinarsFor the webinars section, I decided to add AnyMeeting. I’ve used other platforms in the past, and after getting to know their platform somewhat more, I decided to start using it for my webinars in 2019. The first webinar will take place next week (feel free to register here).
HealthcareIn Healthcare I’ve replaced one of the stories there for the story of GuruMD. One of the trends in this space is the creation of marketplaces and tools that independent doctors and clinics can start using with their patients or for attracting new clients.
EducationFor Education, I’ve added Soliya. I wanted to somehow emphasize that education is probably one of the most varied domains where you see WebRTC. Almost every vendor there is looking at education from a different angle, leading to different requirements and final product offerings.
SocialSocial… remained the same. The stories got a bit of a refresh where needed, but stayed mostly the same. I felt that Facebook, Houseparty, Snap and YouNow are relevant today as they were two years ago.
Streaming and Content DeliveryIn streaming and content delivery, I’ve replaced two vendors, deciding to showcase Google Project Stream and Limelight. Both bringing some strong validation to where WebRTC is headed and how it fits into these non-video calling domains.
Download the reportIf WebRTC interests you, then you should definitely read this report –
Tell me what you think about it.
The post WebRTC for Business People: 2019 Edition appeared first on BlogGeek.me.
Asking Google: WebRTC is …
This is going to be awkward. For me? WebRTC is an open source media engine with a publicly known JavaScript API that got implemented in browsers.
I’ve written a “what is WebRTC” article more than once. The most notable ones?
- What is WebRTC? – an article from 2017
- WebRTC FAQ: The 2018 Version
- WebRTC for Business People – a report that got updated in 2017, with a new 2019 edition coming real soon
- Advanced WebRTC Architecture Course – a full length paid for course that teaches WebRTC
This time, I wanted to check what Google thinks of WebRTC, so I started asking it:
Before we continue down this rabbit hole, make sure to register and join me in two weeks for a webinar covering Mesh, MCU and SFU topologies and what each one is good for in your WebRTC application.
Lets go one by one over these alternatives, trying to understand what are people looking for in their WebRTC.
WebRTC is disabledSomehow, this got the highest ranking. VPN vendors doing their best with FUD and SEO here, in trying to get people to disable WebRTC in browsers.
Reminds me of the good old days when people disabled JavaScript in their browsers.
WebRTC does give access to the camera, microphone, screen and local IP address of a user. Most of it under the user’s own volition. You can use browser extensions to support local IP address “leaks”, while in Safari exposing local IP addresses requires user authorization of some sort as well.
Not sure how this got first place in “WebRTC is”.
WebRTC is freeYes it is. Mostly. Somewhat. If you understand what “free” is.
You can go to webrtc.org and download it for free. You can even use it and modify it.
But then again, hosting a service isn’t free. Someone needs to pay for the network and electricity. Someone needs to do the coding.
Things brings a rather interesting mindset that I see in entrepreneurs and developers – they feel like using a third party framework or even a managed service should be free – or a lot cheaper than it is. So they go about developing it on their own, spending time and money on development (and a lot of times a lot more than it would have been just picking up a managed service instead).
That concept of free in WebRTC? It is mostly about removing barriers of entry for vendors. It isn’t about free video calling.
WebRTC is_component_buildBeats me how this got so high as a suggestion by google.
The build system in WebRTC is often challenging. That’s because Google maintains the main WebRTC open source project with the main purpose of being embedded in Chrome. Due to this, it is just part of the Chrome build process and scripts, and not a standalone product or library.
This part is probably the most painful in WebRTC for developers who need to modify or adapt it for native applications.
Still not sure why it ranks so high.
WebRTC is deadIt isn’t. Can’t even call it a grownup or a teanager.
Moving on.
WebRTC is readyYap. it is.
WebRTC is ready. Developers will still bitch and whine that it isn’t complete and changes all the time breaking things up, but at the end of the day – if you’re doing something with communications these days, WebRTC should be the first thing to look at before searching elsewhere.
WebRTC is udpIt is also TCP. With a dash of SCTP. With talks about making it QUIC. Go figure.
UDP is what WebRTC uses to send its media. It works well because TCP has this nasty habit of retransmitting things to make sure they get received. This retransmission thing doesn’t work well where what you’re sending is time sensitive (like media of an interactive conversation).
Not sure why this one is in the top 10 either.
WebRTC is_clangLike is_component_build, is_clang is also a build/compiler related setting. In this case, deciding which C/C++ compiler to use with WebRTC.
And again, I am clueless as to how and why this is such a popular Google search for WebRTC is.
WebRTC is not definedThis is golden.
The search itself is most probably related to compilation and runtime errors of developers with WebRTC, where they post the error messages around the web in stack overflow, discuss-webrtc and other online forums – asking for help from fellow developers.
Yet…
WebRTC isn’t defined. Yet.
People primsed me WebRTC 1.0 since 2015. Maybe a year or two earlier. We are now in 2019, talking about things like WebAssembly in WebRTC. But we still don’t have WebRTC 1.0. We’re getting there, but it is still a draft. Will WebRTC 1.0 standardization complete in 2019? Maybe. But WebRTC is not defined. But it is ready. Go figure.
WebRTC is p2pWebRTC is peer to peer.
You can send media directly from one browser to another (if network conditions allow). But you need to handle signaling in front of web servers, which is kinda centralized. And sometimes, sending media peer to peer won’t work media and has to be routed. And other times, you’ll want to send media towards a media server.
You can read more about it here – Get Over it: WebRTC isn’t Peer-to-Peer
WebRTC is supportedSomething that is going to change meaning in 2019.
People used to ask “which browsers support WebRTC?” or “is WebRTC supported on X” where X is Internet Explorer, Edge or Safari.
Nowadays, we’re over that bit of a challenge, with the last gaps closing as well.
The shift of this one is going to be towards traditional voice and video services that are adding WebRTC support for guest access or for those who don’t want to install any apps.
In the last year or so, I’ve had to install a lot less applications for meetings I have with companies. It isn’t because we all use Google Meet – it is because almost all of the services (Zoom is the exception here) give WebRTC guest access. WebEx, GoToMeeting, Amazon Chime – all offer WebRTC support. So I can easily handle these calls without installing anything. And yes – WebRTC is supported.
What’s your WebRTC is search term?I found this list of google search suggestions for WebRTC is quite interesting. Not exactly what I expected starting out.
For me, WebRTC is progress. It is the next step we’re taking in figuring out communications, and in that, it fills the role of one of the most basic building blocks we now have and use.
What about you? WebRTC is …
Looking to learn more about what WebRTC is? How about understanding about mesh, mixing and routing architecture? You should join me for this free webinar:
Register to Mesh, MCU or SFU webinar
The post Asking Google: WebRTC is … appeared first on BlogGeek.me.
What is a WebRTC Signaling Server and Why You Should NOT Use AppRTC?
AppRTC isn’t your friend when it comes to developing a commercial WebRTC application.
I already wrote about the fact that there’s no free TURN server from Google. It seems that I failed to mention the fact that you shouldn’t use Google’s “free” STUN server in production either. Which leads us to this great question on github on AppRTC:
apprtc websocket server down?
The interesting part about this one is that no one from Google commented on it at any point in time.
You see, AppRTC wasn’t meant as a full fledged application, and to some extent, not even as a reference application for other developers. It is mostly meant to be a hello world type of an example.
With a glaring lack of good, simple, popular open source signaling frameworks for WebRTC,
developers sometimes use AppRTC for that purpose.
Signaling is important, and so is media. If you want to learn more about mesh, mixing and routing architecture, you should join me for this free webinar:
Register to Mesh, MCU or SFU webinar
While I use AppRTC for baselining, I don’t think it is a good starting place for actual development of a real service.
Here are 4 reasons why:
#1 – AppRTC doesn’t get much love and attentionLook at github insights for AppRTC:
See the number of additions and deletions taking place in 2018?
Latest commit? March 2018.
One could argue that this is because the “Hello World” example for WebRTC is already quite polished and working well, so there’s no need to change anything. Or that WebRTC is now stable enough.
#2 – This is just a “Hello World”Here’s an example of a Hello World js function:
function hello(name){ console.log("Hello " + name); } hello('node.js');This isn’t a starting point I’d use for writing an application.
The AppRTC application is admittedly larger. Here’s the lines of code count for its github project at the time of writing (not that I’d expect much change to it in 2019):
The problem is in what AppRTC doesn’t include, which many developers want/try to add:
- Android and/or iOS AppRTC apps – these aren’t available from Google. There are 3rd party projects for it you can find on github, but they are even less maintained than the Google AppRTC one
- Screen sharing – it isn’t there. Need it? Add it on your own
- Multiparty – not there either. And if you’d try using AppRTC for it, my guess is you’d end up with a mesh architecture (which for 99.9% of the use cases and most definitely for your use case – is destructive)
AppRTC uses a python based signaling server, which is great. The actual signaling protocol selected and used isn’t really documented anywhere, so you’ll need to dive into the code to figure it out if you’ll want to add or modify anything. And you will, simply because a lot of functionality you might want is missing.
The thing is, if you plan on scaling up your service to large number of users, you’ll need this to work across machines – and that’s not easy – or at least not trivial.
At Kranky Geek 2016, Google explained what they did to scale and improve signaling for their own production services. Check out what that means:
Not everyone needs to do things at scale, but many do. Starting for AppRTC places you at the wrong place for growth.
And when it comes to edge cases, it doesn’t cover them all – if ICE negotiation fails, you won’t know about it on the UI, just have it as an ICE failure message in the console log. That’s the example I’ve bumped into when using testRTC with it and closing all ports but 443.
#4 – Don’t iframe or URL to itRunning a service and just need basic meeting capabilities?
Don’t place AppRTC in an iframe of your app or have a URL to it open in another window.
You don’t get an SLA from Google when using AppRTC, and they won’t treat it like a critical service when it fails to run. Throughout the years there have been times when AppRTC was down for one reason or another.
Upwork, for example, used to use a third party free/sample/demo service similar to AppRTC or Jitsi Meet. You had to schedule a meeting with people you work with on Upwork? Click a button, it created a kind of an ad-hoc, random URL for that meeting and opened it on a new browser tab. They were smart enough to replace it with their own branded meetings feature later down the road.
That service that Upwork used? No longer exists. Want to get a signed guarantee from Google that AppRTC will stay up and running and work the same way it does today 2 years from now?
If you plan on running a serious business, host your own communications infrastructure or pay for it.
Do you have any other alternative?Not really. Not an immediate one at least.
People are still falling to the trap of using peerjs (see here why NOT to use peer.js).
We used to have EasyRTC and SimpleWebRTC in the past. EasyRTC still gets some love and attention, so you can try it out. SimpleWebRTC is now deprecated – &yet have decided to offer it “as a service” instead.
There are many other github projects offering webrtc signaling. Most of them seem to be projects people built for themselves but never really matured to a robust framework that others have adopted.
I started suggesting matrix, but many don’t really manage getting WebRTC to work well with out.
Then there’s the cloud based services – PubNub, Pusher, Scaledrone, Ably and even Google’s Firebase. These give you robust transport where you can pour your signaling protocol into.
Or a commercial software you can install anywhere such as Frozen Mountain’s WebSync.
In many cases, this will be an each to his own situation, where you’ll just need to develop it yourself or start somewhere and make it your own quite fast.
Signaling is important, and so is media. If you want to learn more about mesh, mixing and routing architecture, you should join me for this free webinar:
Register to Mesh, MCU or SFU webinar
The post What is a WebRTC Signaling Server and Why You Should NOT Use AppRTC? appeared first on BlogGeek.me.
What’s the Role of WebAssembly in WebRTC?
WebAssembly in WebRTC will enable vendors to create differentiation in their products, probably favoring the more established, larger players.
In Kranky Geek two months ago, Google gave a presentation covering the overhaul of audio in Chrome as well as there is WebRTC headed next. That what’s next part was presented by Justin Uberti, creator and lead engineer for Google Duo and WebRTC.
The main theme Uberti used was the role of WebAssembly, and how deeper customizations of WebRTC are currently being thought of/planned for the next version of WebRTC (also known as WebRTC NV).
Before we dive into this and where my own opinions lie, let’s take a look at what WebAssembly is and what makes it important.
Looking to learn more about WebRTC? Start from understanding the server side aspects of it using my free mini video course.
Enroll to the free course
What is WebAssembly?Here’s what webassembly.org has to say about WebAssembly:
WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.
To me, WebAssembly is a JVM for your browser. The same as Java is a language that gets compiled into a binary code that then gets interpreted and executed on a virtual machine, WebAssembly, or Wasm, allows developers to take the hard core languages (which means virtually any language), “compile” it to a binary representation that a Wasm virtual machine can execute efficiently. And this Wasm virtual machine just happen to be available on all web browsers.
WebAssembly allows vendors to do some really cool things – things that just weren’t possible to do with JavaScript. JavaScript is kinda slow compared to using C/C++ and a lot of hard core stuff that’s already written in C/C++ can now be ported/migrated/compiled using WebAssembly and used inside a browser.
Here are a few interesting examples:
- Construct 3 decided to use Opus in browsers. Even when it isn’t available – by implementing Opus with Wasm
- Zoom uses WebAssembly in order NOT to use WebRTC (probably because it doesn’t want to transcode at the edge of its network)
- Unity, the popular gaming engine has adopted Wasm for its Unity WebGL target
While the ink hasn’t dried yet on WebRTC 1.0 (I haven’t seen a press release announcing its final publication), discussions are taking place around what comes next. This is being captured in a W3C document called WebRTC Next Version Use Cases – WebRTC NV in short.
The current list of use cases includes:
- Multiparty voice and video communications for online gaming – mainly more control on how streams are created, consumed and controlled
- Improved support in mobile networks – the ability to manage and switch across network connections
- Better support for media servers
- New file sharing capabilities
- Internet of Things – giving some love, care and attention to the data channel
- Funny hats – enabling AI (computer vision) on video streams
- Machine learning – like funny hats, but a bit more generic in its nature and requirements
- Virtual reality – ability to synchronize audio/video with the data channel
While some of these requirements will end up being added as APIs and capabilities to WebRTC, a lot of them will end up enabling someone to control and interfere with how WebRTC works and behaves, which is where WebAssembly will find (and is already finding) a home in WebRTC.
Google’s example use case for WebAssembly in WebRTCAt the recent Kranky Geek event, Google shared with the audience their recent work in the audio pipeline for WebRTC in Chrome and the work ahead around WebRTC NV.
For Google, WebRTC NV means these areas:
The Low Level APIs is about places where WebAssembly can be used.
You should see the whole session, but here it is from where Justin Uberti starts talking about WebRTC NV – and mainly about WebAssembly in WebRTC:
WebAssembly is a really powerful tool. To give a taste of it with WebRTC, Justin Uberti resorted to the domain of noise separation – distinguishing between speech and noise. To do that, he put up an online demo that takes RNNoise, a noise suppression algorithm based on machine learning, ported it to WebAssembly, and built a small demo around it. The idea is that in a multiparty conference, the system won’t switch to a camera of a person unless he is really speaking – ignoring all other interfering noises (key strokes, falling pen, eating, moving furniture, etc).
Interestingly enough, the webpage hosting this demo is internal to Google and has a URL called hangouts_echo_detector/hackathon_2018/doritos – more on that later.
To explain the intent, Justin Uberti showed this slide:
As he said, the “stuff in green” (that’s Session Management, Media Processing, Codecs and Packetizer/FEC/RTX) can now be handled by the application instead of by WebRTC’s PeerConnection and enable higher differentiation and innovation.
I am not sure if this should make us happier or more worried.
In favor of differentiation and innovation through WebAssembly in WebRTCSavvy developers will LOVE WebAssembly in WebRTC. It allows them to:
- have way more control over the browser behavior with WebRTC
- add their own shtick
- do stuff they can’t do today – without waiting on Google and the other browser vendors
In 2018, I’ve seen a lot of companies using customized WebRTC implementations to solve problems that are very close to what WebRTC does, but with a difference. These mainly revolved around streaming and internet of things type of use cases, where people aren’t communicating with each other in the classic sense. If they’d have low level API access, they could use WebAssembly and run these same use cases in the browser instead of having to port, compile and run their own stand-alone applications.
This theoretically allows Zoom to use WebRTC and by using WebAssembly get it to play nice with its current Zoom infrastructure without the need to modify it. The result would give better user experience than the current Zoom implementation in the browser.
Enabling WebAssembly in WebRTC can increase the speed of innovation and spread it across a larger talent pool and vendors pool.
In favor of a level playing field for WebRTCThe best part about WebRTC? Practically any developer can get a sample application up and running in no time compared to the alternatives. It reduced the barrier of entry for companies who wanted to use real time communications, democratizing the technology and making it accessible to all.
Since I am on a roll here – WebRTC did one more thing. It leveled the playing field for the players in this space.
Enabling something like WebAssembly in WebRTC goes in the exact opposite direction. It favors the bigger players who can invest in media optimizations. It enables them to place patents on media processing and use it not only to differentiate but to create a legal mote around their applications and services.
The simplest example to this can be seen in how Google itself decided to share the concept by taking RNNoise and porting it to WebAssembly. The demo itself isn’t publicly available. It was shown at Kranky Geek, but that’s about it. Was it because it isn’t ready? Because Google prefers having such innovations to itself (which it is certainly allowed to do)? I don’t know.
There’s a dark side to enabling WebAssembly in WebRTC – and we will most definitely be seeing it soon enough.
Where do we go from here?WebRTC is maturing, and with it, the way vendors are trying to adopt it and use it.
Enabling WebAssembly in WebRTC is going to take it to the next level, allowing developers more control of media processing. This is going to be great for those looking to differentiate and innovate or those that want to take WebRTC towards new markets and new use cases, where the current implementation isn’t suitable.
It is also going to require developers to have better understanding of WebRTC if they want to unlock such capabilities.
Looking to learn more about WebRTC? Start from understanding the server side aspects of it using my free mini video course.
Enroll to the free course
The post What’s the Role of WebAssembly in WebRTC? appeared first on BlogGeek.me.
What’s the Best Size for a WebRTC SFU Media Server?
Small, Medium, Big or Extra Large? How do you like your WebRC SFU Media Server?
I just checked AWS. If I had to build the most bad-ass, biggest, meanest, scalest, siziest server for WebRTC. One that can handle gazillions of sessions, I’d go for this one:
A machine to drool over… Should buy such a toy to write my articles on.
Or should I go for the biggest machine out there?
I did a round-up of some of the people who develop these SFUs. And guess what? None of them is ordering the XL machine.
They go for a Medium or Medium Well. Or should I say Medium Large?
Media servers, Signaling, NAT traversal – do you know what it takes to install and manage your own WebRTC infrastructure? Check out this free video course on the untold story of the WebRTC servers backend.
Start your free course
Anyways – here are a few things to think about when picking a machine for your SFU:
Going BIG on your SFUAs big as they come that’s how big you wanna take them.
We called it scale up in the past. Taking the same monolith application and put it on a bigger machine to get more juice out of it.
It’s not all bad, and there are good reasons to go that route with a media server:
Managing less machinesIf one big machine does the work of 10 smaller machines, then all in all, you’ll need 1/10 the number of machines to handle the same workload.
In many ways, scaling is non-linear. To get to linear scaling, you’ll need to put a lot of effort. Different bits and pieces of your architecture will start breaking once you scale too much. In this sense, having less machines to manage means less scaling headaches as well.
Having bigger roomsGroup calling is what we’re after with media servers. Not always, but mostly.
Getting 4 people in a room is easy. 20? Harder. 500? Doable.
The bigger the rooms, the more you’ll need to start addressing it with your architecture and scale out strategies.
If you take smaller machines, say ones that can handle up to 100 concurrent users, then getting any group meeting to 100 participants or more is going to be quite a headache – especially if the alternative is just to use a bigger machine spec.
The bigger the rooms you want, the bigger the machines you’ll aim for (up to a point – if you want to cater for 100+ users in a room, I’d aim for other scaling metrics and factors than just enlarging the machines).
Less fragmentationSimilar to how you fit chunks of memory allocations into physical memory, fitting group sessions into media servers, and maybe even cascading them across machines will end up with fragmentation headaches for you.
Let’s say some of your meetings are really large and most are pretty smallish. But you don’t really know in advance which is which. What would be the best approach of starting to fit new rooms into existing media servers? This isn’t a simple question to answer, and it gets harder the smaller the machines are.
Simpler architecture (=no cascading)If you are setting up the media server for a specific need, say catering for the needs of a hospital, then the size is known in advance – there’s a given number of hospital beds and they aren’t going to expand exponentially over night. The size of the workforce (doctors and nurses) is also known. And these numbers aren’t too big. In such a case, aiming for a large machine, with an additional one acting as active/passive server for high availability will be rather easy.
Aiming for smaller machines might get you faster to the need to scale out in your architecture. And scaling out has its own headaches and management costs.
SimplerBigger machines are going to be simpler in many ways.
Going small on your SFUThis is something I haven’t thought about as an alternative – at least not until a few years ago when I was helping a client in picking a media server for his cloud based service. One of the parameters that interested him was how small was considered too small by each media server vendor – trying to understand the overhead of a single media server process/machine/application.
I asked, and got good answers. I since decided to always look at this angle as well with the projects I handle. Here’s where smaller is better for WebRTC media servers:
Easier to upgradeI dealt with upgrading WebRTC media servers in the past.
There are two things you need to remember and understand:
- WebRTC moves fast (and breaks things while doing so)
- You’ll need to update your backend rather frequently, including your media servers
The most common approach to upgrades these days is to drain media servers – when wanting to upgrade, block new sessions from going into some of the media servers, and once the sessions the are already handling are closed, kill and upgrade that media server. If it takes too long – just kill the sessions.
Smaller machines make it easier to drain them as they hold less sessions in them to begin with.
Having more machines also means you can mark more on them in parallel for draining without breaking the bank.
Blast radius of crashesThis is what started me on this article to begin with.
I took the time to watch Werner Vogels’s keynote from AWS re:Invent which took place November 2018. In it, he explains what got AWS on the route to build their own databases instead of using Oracle, and why cloud has different requirements and characteristics.
Here’s what Werner Vogels said:
With blast radius we mean that if a failure happens, and remember: everything fails all the time. Whether this is hardware or networking or transformers or your code. Things fail. And what you want to achieve is that you minimize the impact of such a failure on your customers.
Basically, if something fails, the minimum set of customers should be affected, if that’s the case.
Everything fails all the time.
And we do want to minimize who’s affected by such failures.
The more media servers we have (because they are smaller), the less customers will be affected if one of these servers fail. Why? Because our blast radius will be smaller.
CPU utilizationHere’s something about most modern media servers you might not have known – they don’t eat up CPU. Well… they do, but less than they used to a decade ago.
In the past, media servers were focused on mixing media – the industry was rallied around the MCU concept. This means that all video and audio content had to be decoded and re-encoded at least once. These days, it is a lot more common for vendors to be using a routing model for media – in the form of SFUs. With it, media gets routed around but never decoded or encoded.
Media servers, Signaling, NAT traversal – do you know what it takes to install and manage your own WebRTC infrastructure? Check out this free video course on the untold story of the WebRTC servers backend.
Start your free course
In an SFU, network I/O and even memory gets far more utilized than the CPU itself. When vendors go for bigger machines, they end up using less of the CPU of the machines, which translates into wasted resources (and you are paying for that waste).
At times, cloud vendors throttle network traffic, putting a limit at the number of packets you can send or receive from your cloud servers, which again ends up as putting a limit to how much you can push through your servers. Again, causing you to go for bigger machines but finding it hard to get them fully utilized.
Smaller machines translates into better CPU utilization for your SFU in most cases.
Number of Cores/CPUs and Your SFU’s ArchitectureBig or small, there’s another thing you’ll need to give your thought to – and that’s the architecture of the media server itself.
Media servers contain two main components (at least for an SFU):
- Control/signaling
- Media routing
Sometimes, they are coupled together, other times, they are split between threads or even processes.
In general, there are 3 types of architectures that SFUs take:
- Have a single process handle both control and media; doing it in a multithreaded mode
- Have separate processes that can scale out, running each on its own machine or thread
- Decoupling control and media and having both of them scale out independently of each other
Me? I like the third alternative for large scale deployments. Especially when each process there is also running a single thread (I don’t really like multithreaded architectures and prefer shying away from them if possible).
That said, that third option isn’t always the solution I suggest to clients. It all depends on the use case and requirements.
In any case, you do need to give some thought to this as well when you pick a machine size – in almost all cases, you’ll be used a multi-core multi-threaded machine anyway, so better make the most of it.
How Do You Like Your SFU?Back to you.
Media servers, Signaling, NAT traversal – do you know what it takes to install and manage your own WebRTC infrastructure? Check out this free video course on the untold story of the WebRTC servers backend.
Start your free course
The post What’s the Best Size for a WebRTC SFU Media Server? appeared first on BlogGeek.me.
A new design and what to expect in 2019 from BlogGeek.me?
The new look is here – and it is less… green.
I’m splitting this one into two main parts – the redesign and what’s going to happen in 2019.
BlogGeek.me – RedesignedWhen I started this blog, what I didn’t want is yet another blue website. Somehow, it didn’t seem right to me. I ended up with a green one. So much so, that it stuck to almost everything else that I did online. As a kid, I really liked light blue – I don’t think green was anywhere in my sights.
Earlier this year, I wanted to refresh the look and the “brand” that is BlogGeek.me a bit. Luckily, the original designer just moved back from being a designer in an IoT startup to being a freelancer again, so I asked her for a new look. Which she happily and lovingly provided.
A few months later, with a lot of deliberation, hard work and updating ALL posts and pages (I had a lot of crap lying around due to custom shortcodes and plugins that accumulated in 6 years), I decided to take the plunge and update the main site with the new design.
What are the main differences?There’s a lot… but here’s what you should know:
- I’ve removed the number and frequency of nagging popups. From now on, the only thing that will jump at you might be what is called an exit intent – it will show relevant content you may want to review further, and only once you’re ready to leave the page (no more searching for the x in the middle of reading an article)
- What is it that I do for a living? My site was designed and built as a blog. That last redesign I did was nice, but still left people wondering how I can actually help them. I tried fixing that with a new homepage and a simplified menu bar and footer area
- No course. I haven’t closed my WebRTC training – I just moved it to a website of its own: WebRTCcourse.com. This allows me to focus on the course and improve it in ways I just couldn’t do when it was part of BlogGeek.me
- Better reading experience. For now, I decided that article pages won’t have a sidebar, so you’ll get a distraction-free reading experience. The fonts are also bigger now (I am getting older, and with it my preference of font size seem to be changing)
Oh – and the pictures of me featuring on the website? They’re also new. Took them earlier in 2018.
Things are still brokenNot everything is working flawlessly. And there’s a reason for that. I knew that if I want just ship the thing, it will never come to be. So I decided to just release it “as is” at this point. I wanted to have a fresh start in 2019 with my website.
Here are somethings I know are broken:
- Mobile. Bad job there. This is known and will be taken care of through January
- Digital payments. The online store that I have/had was split into 2 – the one on BlogGeek.me which serves the reports and a separate one on WebRTCcourse.com which… needs to be fixed
Other than that, some pages are still ugly, and in other cases, there might be some dead or broken links.
If you find anything – just email me about it – I must have missed some of the ailments throughout this transition so I really appreciate your help here.
What to expect from BlogGeek.me in 2019?Honestly, I don’t really know. At least not exactly.
Each year I start off with a plan, in which certain initiatives take place throughout the year. Some of them come to fruition while others – don’t.
Here’s what I decided for 2019:
WebinarsLast year was a rather slow year for webinars. Both on BlogGeek.me and on testRTC (where I am a co-founder and CEO).
This is going to change.
In 2019, I want, at least theoretically, to do a webinar a month for each. A line up of topics has been created and is maintained (I’ll need more topics, but I have a good starting point).
For BlogGeek.me, webinars would be around topics that make sense for me at a given month. First one will be around Mesh/MCU/SFU – one of those topics that I can endlessly babble about.
testRTC webinars are going to focus on things that you can do with testRTC. Instead of trying to aim for generic WebRTC industry/testing/marketing/promoting/whatever non-focus, we’re going to double down on best practices, hacks and interesting things we’re bumping into with our customers at testRTC.
testRTCSpeaking of testRTC – we’ve had a good year in 2018, growing our list of customers and getting into new areas. We’ve rewritten a big portion of our backend and we will continue with the rewrite in 2019 to close our technical debt.
Expect some new features and a new product or two from testRTC to be announced during 2019.
Articles on BlogGeek.meI am going to write this year on BlogGeek.me, as well as other places when time permits.
For now, I plan to stick with a weekly article per week, something that was hard to maintain this year and I assume will be harder in 2019.
WebRTC TrainingMy online WebRTC course got over 250 registered students. I want to scale it up even further.
This year, I’ll be giving the course additional focus, making sure it stays the best alternative out there for those who wish to learn WebRTC.
In February, there will be a few announcements about the course.
Reports updateThe reports will get some refresh in 2019.
The WebRTC for Business People is up for a 2019 edition (later this month). I’d like to thank Frozen Mountain for sponsoring this initiative and making this edition free for everyone.
I might do an update to Choosing a WebRTC API Platform report. There are enough changes in the industry taking place that merit such an update. If you are a CPaaS vendor, who is now offering WebRTC support of some kind and you’re not featured in this report already – contact me.
The recent AI in RTC report I’ve written with Chad Hart doesn’t need an update. Yet.
Kranky GeekUnlike previous years, Kranky Geek already has a date for 2019: November 15, San Francisco, Google office – same place as always.
If you’d like to talk about sponsorships, speaking opportunities and such – we’re happy to start this earlier than usual.
In any case, mark your calendar.
Other projects and initiativesAs in previous years, more projects will crop up during the year. There are a few I am contemplating already, but not sure yet if I’ll be doing them.
If there’s a project you’d like to do together – just tell me.
2019Have a great new year!
The post A new design and what to expect in 2019 from BlogGeek.me? appeared first on BlogGeek.me.
All the Truth About the Latest (non)Hype of Fuzzy Testing WebRTC Applications
There’s a lot of fuzzing around lately about WebRTC. Which is really about SRTP. Which is really important. But also really misplaced.
Before I BeginThis all started when Google Project Zero, a team tasked with actively searching for zero day bugs (nasty crashes and similar bugs that might be exploited by hackers) set their sights on video conferencing and WebRTC. The end result of it all is a github repository with tools to test RTP streams (and some filed bugs).
A few things to put the house in order:
- These bugs are important. Go fix them
- I am not a security expert, but I know my way with security and have a few scars to show for it
- This isn’t the end of the world. A few bugs were found. Many of them old. This happens every day. Some are nastier than others
- These won’t be the last bugs in WebRTC and they won’t be the most serious that get found either. Just ask NewVoiceMedia about their recent audio issues
- We will all forget about this come 2019 and proceed with our normal daily lives
Now that we’ve cleared the air – let’s check what’s all that fuzz. Shall we?
What Fuzzing meansWikipedia has his to say about Fuzzing:
Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks.
For me, fuzz testing is about the generation of malformed inputs in ways that the developers haven’t anticipated or tested for. This will result undefined behavior, which is largely a nicer word of saying a bug. In some cases, the bug will be an innocent one. In other cases, it can be nasty:
- It might cause the software to crash
- Go read or write where it shouldn’t (overflow)
- Deadlock the whole thing (=cause it to freeze)
- Cause a memory leak
The type of bugs that can be found is endless, which makes for really good FUD (fear, uncertainty, doubt) and lore.
A good malformed input can theoretically be used to grant you administrative access to a machine or to allow you to read memory where you shouldn’t have access to.
A simple explanation can be this: assume your software expects a user’s email to be 40 characters long. Lower than that is obviously fine, but what will happen if you use an email that is longer than 40 characters? Somewhere along the line, there will be a piece of code that should check the length and state that you’ve got it too long. And if there isn’t… well… we’ve reached the realm of undefined and potential security bugs.
The same can happen in network protocols,where whatever you send “on the wire” has a structure of sorts. The machines need structure to be able to parse the data and act upon it. So if you change the data so it is close to the expected structure, but off in just a bit – you might get to that realm of undefined as well.
Fuzzing is trying to get to that place – adding randomness in just the correct places to get to undefined software behavior.
Let me tell you a bedtime storyMY fuzzy life started in Finland, though I’ve never been there (yet).
At Oulu university, one day, a new something called “PROTOS Test Suite” was created. At the time, I was the project manager leading the development and maintenance of RADVISION’s H.323 protocol stack. We’ve licensed it to many vendors around the globe, all using our source code to build VoIP products.
The PROTOS Test-Suite was all about security testing. The intent behind it was to find bugs that cause crashes and other ailments to those using H.323. And they chose the best possible entry point. Here’s how they phrased it:
The purpose of this test-suite is to evaluate implementation level security and robustness of H.225.0 implementations. H.225.0 is a protocol responsible for signalling and setting up H.323 calls. […]
The scope of the test-suite was narrowed to H.225.0 version 4 Setup-PDU. Rationale behind this selection was:
- Setup is the first message sent to a target H.323 endpoint upon call signalling, it is easy to deliver test-cases and to restore the implementation back to its initial state by disconnecting.
- […]
I marked in bold the important parts. Specifically, the guys at Oulu decided to go after the “pick up line” of H.323 and try to come up with nasty Setup messages that will confuse H.323 devices.
And confuse they did. PROTOS has 4497 Setup messages. On my first run with it, probably 50% of them caused our beloved H.323 stack to crash. I spent a week building the software to automate using it and fixing all the nastiness out of it. I admired the work they did and the work they made me do.
PROTOS practically analyzed how the things go on the wire, and devised a set of messages that were bound to get picked by bad programming practices, which we all err on as humans. This isn’t exactly fuzzing in an automated fashion, but it is the “manual” equivalent of it.
This got its own CERT vulnerability note and we had a great time working with our customers on updating our stack and getting these security fixes to work.
I believe some of our customers actually upgraded and updated their systems due to this. I am sure many didn’t. I am also assuming many of our customers’ customers didn’t upgrade their own deployed equipment. And the world continued on. Happily enough.
All this took place in 2004. Before WebRTC. Before the cloud. Before mobile. With practically the same RTP/RTCP protocol and the same techniques and mechanisms in VoIP that we use today in WebRTC.
Why didn’t people look at RTP vulnerabilities at that time? We’ll get to that.
Google’s Project Zero and video conferencingThis year, Google Project Zero decided to look at video conferencing. The “way in” was through WebRTC. Natalie Silvanovich was tasked with this and she wrote a series of 5 posts about it. The first one was about her selection and adventures with WebRTC itself. In it, she writes:
I started by looking at WebRTC signalling, because it is an attack surface that does not require any user interaction. […] WebRTC uses SDP for signalling.
I reviewed the WebRTC SDP parser code, but did not find any bugs. I also compiled it so it would accept an SDP file on the commandline and fuzzed it, but I did not find any bugs through fuzzing either. […]
I then decided to look at how RTP is processed in WebRTC. While RTP is not an interaction-less attack surface because the user usually has to answer the call before RTP traffic is processed, picking up a call is a reasonable action to expect a user to take. […]
Setting up end-to-end fuzzing was fairly time intensive […]
A few things that come to mind here:
- The “signaling” layer in WebRTC (=the SDP parser) is rather robust against these types of attacks. Natalie couldn’t find anything there
- Signaling and SDP, is the equivalent of what the guys at Oulu did with their PROTOS test suite
- There is a notion here of “call answering”. This isn’t what WebRTC does. It connects sessions. Sometimes directly and sometimes indirectly. And in all cases, there are layers above RTP that the users (and attackers) will need to go through first
- Setting up such a test, doing end-to-end fuzzing in the RTP layer is time intensive
Time intensive is important, as this raises the bar to those wishing to exploit such a weakness.
The fact that RTP isn’t the first attack surface and isn’t the first layer of interaction makes it somewhat less obvious on how to exploit it (besides instigating DDoS attacks on devices and servers).
Coupling these two – the complexity and the non-obviousness of an exploit is what kept people from putting the effort into it up until today.
The Fuzzy feelings of our WebRTC industryBen Hawkes, Project Zero team lead tweets on it garnered 3 digit likes and retweets, tapering off in the last 2 posts (I attribute that to fatigue of the subject):
Project Zero blog: "Adventures in Video Conferencing Part 1: The Wild World of WebRTC" by @natashenka – https://t.co/pdtZLDDP9M
— Ben Hawkes (@benhawkes) December 4, 2018
That kind of sharing is an average day for most posts published by that team. A few immediately took the cue and started fuzzing on their own. A notable example is Philipp Hancke who aimed at the Janus media server and fuzzed REMB RTCP messages.
His attack was quite successful due to several reasons:
- He had he source code of Janus and was able to isolate the area he wanted to attack. This made the process easier than the work done by Project Zero
- He picked an obvious target that was bound to crash multiple times – a message buried deep inside the protocol that aimed at control logic that takes place a lot after the session gets connected
Probably not.
And let’s face it – in the list of tests that you want to do but don’t do today, fuzzing fits nicely near that end of the things you just never find the time and priority to handle.
The good thing? For most of us, fuzzing is something that “others” should be doing.
If you are using a CPaaS vendor, it is his task to protect his signaling and media servers against such attacks.
If you run on top of the browser… well… those who maintain the WebRTC code for the browser need to do it (and it is Google for the most part at the moment).
You should think about fuzzing in your own application logic and the things that are under your control, but the WebRTC pieces? Going down the rabbit hole of fuzzing RTP and RTCP packets? Not for you.
Your role here is to ask the vendors you work with if they have taken steps in the area of security testing and what exactly have they done there. Fuzzing needs to be one of them things.
Who should care about fuzzing?There’s a shortlist of people that needs to deal with fuzzing.
- If you develop and deploy your own media servers and client side frameworks – you should fuzz them away
- The example above that Philipp Hancke did with Janus? It should be done on more such message types and protocol layers and it should be done for the other media servers
- A WebRTC implementation in Python added some fuzzing related fixes in version 0.9.14: “Fix RTP and RTCP parsing errors detected by fuzzing”
- That said, do we want them to do that or implement unified plan? What has a higher priority? For most of the industry, it would be unified plan…
- If you are using third parties, you need to make sure you update them frequently
- Using a WebRTC stack from a year or two ago isn’t something you should be doing
- Using open source media servers without upgrading them from time to time (and actively looking for these security patches for them) is als not something you should be doing
- CPaaS vendors…
- These things is one of them things they live for
- They deal with this headache so you don’t have to
- If they don’t – you should take your business elsewhere. Just saying
- Browser vendors. Enough said
Fuzzing isn’t the first thing that comes to mind when you set off to build your business.
We are at a point where we are dealing and addressing fuzzing, and at the layers of RTP is what people seem to be doing (at least a bit). We’ve come a long way since we started with WebRTC and it is a good sign.
To Fuzz or not to Fuzz? Where should you spend your energies with WebRTC? If you need help with that, just contact me.
The post All the Truth About the Latest (non)Hype of Fuzzy Testing WebRTC Applications appeared first on BlogGeek.me.
Is Chrome on its Way to be ONLY Browser out there? (Microsoft throwing the towel on Edge)
Chrome=The web. Is that a good thing or a bad thing?
I’ve always said that Chrome is almost the only browser we need. Microsoft Edge was always an easy target to mock. And it now seems that Microsoft has thrown the towel on Edge and its technology stack as a differentiating factor and has decided to *gasp* use Chromium as the engine powering whatever comes next.
A long explanation from Microsoft on the move was published on github (more on GitHub later).
What are Browsers made of?I’ll start with a quick explanation of how I see a browser’s architecture. It is going to be rather simplistic and probably somewhat far from the truth, but it will be good enough for us for now.
A browser is built out of two main pieces: the renderer and the runtime engine.
The Renderer deals with displaying HTML pages with their CSS styling. Today, it probably also deals with CSS animation. It is what takes your webpage and renders it into something that can be displayed on the screen.
The Runtime Engine was all about executing JavaScript code inside the browser. It is what makes it interactive in modern browsers. It is usually called JavaScript Engine, but it is already running also WebAssembly, hence my preference in referring it as Runtime.
On top these two pieces sits the browser engine itself, which is later wrapped by the browser.
Who Uses What?That illustration of the browser makeup above? It shows in gray the components that Google uses in Chrome. Each browser vendor picks and chooses its own components.
In the past, we effectively had 3 browsers engines: “Firefox”, “Internet Explorer” and “WebKit”
WebKit was used by both Safari and Chrome. That until 2013 when Google decided to part ways and create Blink – it started by deleting everything it didn’t use out of WebKit and continue from there. In a way, it is a fork of WebKit, to the point that code integrated into WebKit oftentimes comes directly by porting it enmasse from Blink/Chromium (this is how WebRTC is implemented in Safari/WebKit today).
Up until a year ago, we had 4 roughly independent browser engines for the major 4 browsers:
- Chrome, using Chromium, Blink and V8
- Firefox, using its own tech stack; with Gecko as the rendered, being replaced by Servo
- Safari uses WebKit and Nitro
- Edge had its own stuff – EdgeHTML and Chakra; now migrating to Chromium tech (and maybe a rebranded name instead of Edge?)
Internet Explorer is all but dead.
Edge was never getting useful market share and now moving to embrace Chromium.
Apple’s Safari… I am not sure how much Apple cares about Safari, and besides, WebKit gets its fare share of code from Google’s Blink project. On top of it all, it runs only on Apple devices, limiting its popularity and use.
In a way, we’re down to two main browser stacks: Google’s and Mozilla’s
Mozilla wrote about the end of the line for EdgeHTML and they are spot on:
If one product like Chromium has enough market share, then it becomes easier for web developers and businesses to decide not to worry if their services and sites work with anything other than Chromium. That’s what happened when Microsoft had a monopoly on browsers in the early 2000s before Firefox was released. And it could happen again.
I’ve tried Firefox and Edge a year or two ago. They worked well enough. But somehow they weren’t Chrome (possibly because I am a heavy user of Google services), so it just made no sense to stick with any of them when Chrome feels too much like “home”.
Is the current state of affair lifts Chromium to the status of Linux? More on that a bit later down this article.
Chrome’s DominanceI’ve taken a snapshot of StatCounter’s desktop browsers market share:
If you are more interested in the numbers than that boring visual line, then here you go:
Chrome with over 72%; IE and Safari at 5%; Edge at 4%.
Firefox has a single digit 9%.
Funnily enough, all non-Chrome browsers are trending downwards. Even Safari which should enjoy growth due to an increase of Mac machines out there (for some unknown reason they are popular with developers these days – go figure).
Even if you ignore the desktop and check mobile only (see here), Chrome gets some 53% versus Safari’s 22%.
Investing in browser development isn’t a simple task. There are several vectors that need to be pursued at all times:
- Adherence to the HTML5 specification(s), adding new components to it along the way (PWA, WebGL, WebVR, WebAssembly, Web Workers to name a few)
- Deal with backward compatibility of the billions of web pages that are out there as much as possible
- Handle security aspects
- Deal with performance and bloat
- Support hardware acceleration for optimized performance where possible, a trend that is becoming common
It would be safe to say that Chrome enjoys 100’s of Google employees developing code that goes directly into their Chrome browser.
Where will Microsoft take Edge?Microsoft under the lead of CEO Satya Nadella has shifted towards the cloud and is doubling down on the enterprise. To a big extent, its XBox business is an anomaly in the Microsoft of 2018.
Where once Microsoft was all about Windows and the Office suite, it has shifted towards Office 365 (subscription versus licensing business model for Office) and its Azure cloud. Windows is still there, but its importance and market dominance are a far cry from where it was a decade ago. Microsoft knows that and is making the necessary changes – not to win back the operating system market, but rather to grow its businesses on other core competencies and assets.
Microsoft Edge was an attempt to shed Internet Explorer. Give its browser a complete rewrite and bring something users would enjoy using. That hasn’t turned well. After all the investment in Edge, it had a small market share to show for it, with many of the users switching to Windows 10 opting to switch to Chrome instead of Edge.
This user behavior is surprising to say the least. With a default browser that is good enough (Edge), why would they make the conscious decision of browsing to chrome.com to download and install a different browser that does what Edge does?
Microsoft tried and failed to change this user behavior, which led it to the conclusion that Edge, or at least the innards of Edge are a waste of resources.
Why is opting for Chromium as a browser engine makes sense for Microsoft?
As Microsoft is shifting to the cloud, and Edge focusing on web standards, the end result was that anything and everything that Microsoft invested in for its web based services (Office 365 for example) has to work first and foremost on Chrome – that’s where users are anyway.
Google is using Chrome to drive proprietary initiatives to optimize its services for users and push them as standards later (think SPDY turn HTTP/2, QUIC or its latest Project Stream). It can do it due to its market dominance in browsers and the huge amount of web assets they operate. Microsoft never had that with Edge, so any proprietary initiatives on Microsoft’s part in web technologies was bound to fail.
Microsoft derived no value out of maintaining its own browser technology stack, and investing 100’s of developers on it was an expensive and useless endeavor.
So it went with Chromium.
Chromium brings one more benefit – theoretically, Microsoft can now push its browser to non-Windows 10 devices. Mac and Linux included. And since Microsoft is interested more in Office and Azure than it is in Windows, having an optimized “window” towards Office and Azure in the form of a Chromium-based Microsoft browser that works everywhere made sense.
This also means where Microsoft does want to focus its efforts in the browser – the user interface and experience, as well as in delivering the Microsoft services to customers.
Microsoft cannot forgo having its own browser and just pre-installing Chrome or even Firefox on its Windows operating system. That would mean ceding to much control to others. It has to have its own browser.
Windows ChromiumizedRemember that browser architecture I shared in the beginning? It is changing in one critical way. Google decided to create an “operating system” and call it Chrome OS, which ends up being based to some extent on the browser itself:
We spend more time in front of web applications that reside in the browser (or in Electron apps) and less inside native apps. This means that in many ways, the browser is the operating system.
Google derives all of its value from the internet, with the browser being the window there.
Microsoft is heading in the same direction, and where it matters for it with its operating system, it finds itself now competing against Chrome OS and Chromebooks, making it a huge threat to Microsoft and Office.
And obviously, there’s a “lite” version of Windows in the works, at least by the reports on Petri. Is this related to Edge using Chromium in some way? Would Windows Lite be web focused in the same way that Chrome OS is?
Who Controls Chromium? And is it the new Linux?Back to Chromium, and the reasons that the Microsoft news is making ripples in the web around openness and positive fragmentation.
Browsers are becoming operating systems in many ways. Can we correlate between Linux and its ecosystem to Chromium and its growing ecosystem?
Linux and OwnershipI’d say that these are two distinctly different cases. If anything, Chromium’s status should worry many out there. It is less about monocultures, openness and high words and more about control and competitive advantage.
On opensource.com, Greg Kroah-Hartman Feed wrote two years ago a piece titled 9 lessons from 25 years of Linux kernel development. Here’s lesson 6:
6. Corporate participation in the process is crucial, but no single company dominates kernel development.
Some 5,062 individual developers representing nearly 500 corporations have contributed to the Linux kernel since the 3.18 release in December of 2014. The majority of developers are paid for their work—and the changes they make serve the companies they work for. But, although any company can improve the kernel for its specific needs, no company can drive development in directions that hurt the others or restrict what the kernel can do.
This is important.
Who really controls Linux? Who owns it? Who decides what comes next? The fact that there are no clear answers to these questions is what makes Linux so powerful and so useful to the industry as a whole.
Chromium and GoogleDoes the same apply to Chromium?
Chromium is a Google owned project. Hosted on a Google domain. Managed using Google tooling. Maintained by Google. This includes all the main browser pieces that are created, controlled and owned by Google to a large extent: the V8 JavaScript Engine, Blink web renderer and Chromium itself.
When someone wants to contribute into Chromium, they need to go through a rigorous process. One that takes place at Google’s leisure and based on their priorities. This is understandable. Chromium is what Chrome is made up of, and Chrome gets released to a billion users every 6-8 weeks. Breakage there ends with backlash. Security holes there means vulnerability at a large scale.
While these aspects of stability and security are there with Linux as well, when it comes to Chromium, Google is the one that is setting the priorities.
It doesn’t end with priorities. It goes to the types of web experiments and proprietary features that end up in Chrome. Since Google controls and owns the Chromium stack… it can do as it pleases.
Will Google cede control of Chromium just because?
No.
It might benefit the open-whatever if it did, but it would also slow down innovation and won’t further Google’s own cause.
Microsoft and ChromiumMicrosoft is painting this in colors of open source and collaboration with the industry.
It isn’t.
This is about Microsoft going with Chromium because Edge took a few bad turns in its strategy from the get go:
- Limiting Edge to Windows 10 only
- Internet Explorer was always a Windows play, ignoring its stint on Mac
- Microsoft today is in a very different place – access to its services across all devices is what is driving it
- This requires its browser to run everywhere and not be limited to Windows 10
- Making Edge all about performance and security
- When Chrome was released, its leading pitch was exactly that. A secure browser with high performance
- As it grew in adoption, all browsers focused more resources towards that goal, and today, it is a moot point
- While Chrome is definitely a memory and resource hog, there’s no big backlash due to it
- Trying to take that same strategy as a differentiating point failed
- Not differentiating Edge through Microsoft’s assets
- There’s a challenge in this one. Take Office 365. If you make it run better on Edge and purposefully harming it on Chrome, you lose on (1) – you limit it on non-Windows devices
- Microsoft should have invested in a world where the user’s profile and preferences are stored in the cloud. Google and Apple devices “just work” when you plugin them in with your credentials. Microsoft doesn’t really
- Having a user’s profile in the cloud, easily accessible via Edge would strengthen the tie between people using Office and Azure to an Edge browser, keeping them away from Chrome
Going with Chromium means two things to Microsoft:
- Working on making Chromium (and by extension the new Edge) work perfectly on Windows devices (not only Windows 10, but also Windows 7, HoloLens and whatever comes next in the Internet of Things). This is an optimization effort, simply shifting it from what was Edge towards Chromium
- Doubling down on the differentiation of Edge based on a single browser engine, which is where it should have focused in the first place anyway
The only challenge here is that it comes to Chromium as just another vendor. Not a partner or an owner.
A Single WebRTC StackAt the recent Kranky Geek event, Microsoft discussed its WebRTC on UWP project. Part of it was about merging changes it made to the WebRTC code from webrtc.org (=the code that goes into Chrome). Here’s how James Cadd framed it in his session:
… after 4 years of maintaining a fork on github, we’ve been discussing with Google the possibility of submitting this back to the webrtc.org repo and we’re working on that now. The caveat is that there’s no guarantee that we’ll get 100% of the way there. We’re mostly using the public submission process, so we’re going through reviews just like everyone does, but that’s our goal.
The UWP specific changes are going to live in sdk-contrib-windows so we will have our own little area to contribute this back. Microsoft has comitter rights there, so we’ll be able to keep everything moving there. […]
So just wanted to say thank you to Google for that opportunity. We’re looking forward for the collaboration.
A master and a slave? A landlord and a tenant? A patron and a client? Two partners? I am not sure what the exact relation here is, but it should be similar to what Microsoft has probably struck with Google across the board for all Chromium related technologies that are dear to Microsoft in one way or another.
Is a single stack good or bad?
If we look at it from a browser level perspective, we aren’t in a different position in the technology diversity than 8 years ago:
And here’s where we are today:
The main difference is market share – Chrome is eating up the internet with Blink and Chromium. Factor in Node.js which uses V8 JavaScript engine and you get the same tech running servers as well.
WebRTC specific though? Now runs on webrtc.org code only. All browser vendors pick bits and pieces from it for their own implementations, and while they are differences between browsers they aren’t many.
As I said before in many of my articles here – most developers today can simply develop their code for Chrome and be done with it; adding support for more browsers only if they really really really need to.
Browsers are one piece of getting WebRTC to run. Check out what else you’ll need in this free video series unraveling the server side story of WebRTC:
Register to the video series
Could Microsoft Buy Their way into Browser Market Share?Not really. If they could have, they would done so instead of going Chromium.
Let’s start from why such a move would be appealing.
GitHubThe recent acquisition of GitHub by Microsoft can be taken as a case point. Especially considering at the varied reactions it brought across the board.
6 months after that announcement, the sky haven’t fallen. Open source hasn’t been threatened or gobbled up by Microsoft. And Microsoft is even using GitHub for its own projects, and to announce its own initiatives – Edge using Chromium for example.
Time will tell, but my gut tells me that Microsoft’s acquisition of GitHub is as meaningful as Facebook’s acquisition of Whatsapp and Instagram. These made little sense at the time from a valuation standpoint, but no one is doubting these acquisitions today.
With GitHub, Microsoft is buying its way into open source. Not only as lip service, but also in understanding how open source works. By owning a large portion of the open source interactions, and being able to analyze them closely, Microsoft can tell where developers are headed and what they are after. Microsoft was always successful due to the developers using their platform (top notch tools for developers – always). GitHub allows them to continue with that in an open source world.
Then why not the browser market?
There were two assets that could be acquired here – Mozilla and Electron.
ElectronElectron is already developed and maintained by GitHub directly. Microsoft owns it already.
What advantages does Microsoft derive from Electron? None, assuming you remember that Electron runs on top of Chromium.
From a strategic standpoint, there’s no value in Electron for Microsoft. At the end of the day, Electron is a window to Chromium and to web applications.
Microsoft is using it for its own cross platform applications – Skype on Linux has been known to use Electron for several years now.
Owning Electron through GitHub doesn’t help Microsoft in its browser market share.
MozillaMozilla would have been an interesting acquisition.
Similarly to GitHub, it would be acquiring the obvious open source vendor. The challenge here is twofold:
- Mozilla wouldn’t want to be acquired and would rather stay independent, as this is their stance and current market position. It may change, but resistance internally from Mozilla employees would rather be big
- Firefox market share is now a single digit and the trend isn’t a positive one
Furthermore, acquiring Firefox as a window to Microsoft’s services and assets in the cloud is exactly one of them things that Mozilla is fighting Google against. It would be counterproductive to go there.
—
Microsoft has no one to buy in order to improve its position and market share in browsers.
It could only continue to fight it out with Edge or partner. And it decided to partner with the goliath in the room (an elephant wouldn’t be visible enough).
Will Chrome Reign Supreme?Yes.
Anyone thinks otherwise?
The post Is Chrome on its Way to be ONLY Browser out there? (Microsoft throwing the towel on Edge) appeared first on BlogGeek.me.
What Does Machine Learning Have to do with MOS Scores?
What Does Machine Learning Have to do with MOS Scores?
Human subjectivity in MOS calculations doesn’t hold water when it comes to heterogeneous environments. That’s where machine learning comes to play.
MOS score. That Mean Opinion Score. You get a voice call. You want to know its quality. So you use MOS. It gives you a number between 1 to 5. 1 being bad. 5 being great. If you get 3 or above – be happy and move on they say. If you get 4.something – you’re a god. If you don’t agree with my classification of the numbers then read on – there’s probably a good reason why we don’t agree.
Anyways, if you go down the rabbit hole of how MOS gets calculated, you’ll find out that there isn’t a single way of doing that. You can go now and define your own MOS scoring algorithm if you want, based on tests you’ll conduct. From that same Wikipedia link about MOS:
“a MOS value should only be reported if the context in which the values have been collected in is known and reported as well”
Phrased differently – MOS is highly subjective and you can’t really use MOS scores produced in one device to MOS scores produced in another device.
This is why I really truly hate delving into these globally-accepted-but-somewhat-useless quality metrics (and why we ended up with a slightly different scoring system in testRTC for our monitoring and testing services).
What Goes into MOS Scoring Calculations?Easy. everything.
Or at least everything you have access to:
- RTCP sender and receiver reports
- Received RTP packets
- Knowing the voice codec used
- Actually decoding the audio stream and “listening” to it
- Understanding what the end user is really going to hear
Here are a few examples:
Physical desk phoneA physical IP phone has access to EVERYTHING. All the software and all the hardware.
It even knows how the headset works and what quality it offers.
Theoretically then, it can provide an accurate MOS that factors in everything there is.
Android native appAndroid apps have access to all the software. Almost. Mostly.
The low level device drivers are as known as the hardware that app is running on. The only problem is the number of potential devices. A few years back, these types of visualizations of the Android fragmentation were in fashion:
This one’s from OpenSignal. Different devices have different location for their mics and speakers. They use different device drivers. Have different “flavors” of the Android OS. They act differently and offer slightly different voice quality as well.
What does measuring what an objective person think about the quality of a played audio stream mean in such a case? Do we need to test this objectivity per device?
Media server who routes voice aroundThen we have the media server. It sends and receives voice. It might not even decode the audio (it could, and sometimes it does).
How does it measure MOS? What would it decide is good audio versus bad audio? It has access to all packets… so it can still be rather accurate. Maybe.
WebRTC inside a browserAnd we have WebRTC. Can’t write an article without mentioning WebRTC.
Here though, it is quite the challenge.
How would a browser measure MOS of its audio? It can probably do a good a job as an Android device. But for some reason, MOS scoring isn’t part of the WebRTC bundle. At least not today.
So how would a JavaScript web application calculate MOS of the incoming audio? By using getStats? That has access to an abstraction on top of the RTCP sender and receiver reports. It correlates to these to some extent. But that’s about as much as it has at its disposal for such calculations, which doesn’t amount for much.
Back to MOS calculationsBut what does MOS really calculate?
The quality of the voice I hear in a session?
Maybe the quality of voice the network is capable of supporting?
Or is it the quality of the software stack I use?
What about the issue with voice quality when the person I am speaking with is just standing in a crowded room? Would that affect MOS? Does the actual original content need to be factored into MOS scores to begin with?
I’ll leave these questions opened, but say that in my opinion, whatever quality measurement you look at, it should offer some information to the things that are in your power to change – at least as a developer or product owner. Otherwise, what can you do with that information?
What Affects Audio Quality in Communications?Everything.
- The quality of the microphone used to record the original audio (though this usually gets neglected in discussions around MOS)
- The location of the person speaking – a crowded room, airport, next to a working vacuum cleaner – or in a silent recording studio
- The voice codec used, its configuration and the level and aggressiveness of the compression it is using for this session
- The network conditions – in the last mile from both the sender and the receiver, of every hop along the way and the routers and servers it has to pass through
- The media servers – and every possible aspect about them
- The receiver’s software. Especially the jitter buffer and packet loss concealment algorithms
- The sender’s acoustic echo cancellation implementation quality
- The receiver’s voice decoder implementation
- The receiver’s speakers
I am sure I missed a bullet or two. Feel free to add them in the comments.
The thing is, there’s a lot of things that end up affecting audio quality when you make the decision of sending it through a network.
Is Machine Learning Killing MOS Scoring or Saving It?So what did we have so far?
A scoring system – MOS, which is subjective and inaccurate. It is also widely used and accepted as THE quality measure of voice calls. Most of the time, it looks at network traffic to decide on the quality level.
At Kranky Geek 2018, one of the interesting sessions for me was the one given by Curtis Peterson of RingCentral:
He discussed that problem of having different MOS scores for the SAME call in each device the call passes through in the network. The solution was to use machine learning to normalize MOS scoring across the network.
This got me thinking further.
Let’s say one of these devices provides machine learning based noise suppression. It is SO good, that it is even employed on the incoming stream, as opposed to placing it traditionally on the outgoing stream. This means that after passing through the network, and getting scored for MOS by some entity along the way, the device magically “improves” the audio simply by reducing the noise.
Does that help or hurt MOS scoring? Or at least the ability to provide something that can be easily normalized or referenced.
Machine Learning and Media OptimizationWe’ve had at Kranky Geek multiple vendors touching the domain of media optimizations. This year, their focus was mainly in video – both Agora.io and Houseparty gave eye opening presentations on using machine learning to improve the quality of a received video stream. Each taking a different approach to tackling the problem.
While researching for the AI in RTC report, we’ve seen other types of optimizations being employed. The idea is always to “silently” improve the quality of the call, offering a better experience to the users.
The next couple of years, we will see this area growing fast, with proprietary algorithms and techniques based on machine learning are added to the arms race of the various communication vendors.
Interested in more of these sessions around real time communications and how companies solve problems with it today?
Subscribe to our YouTube channel
The post What Does Machine Learning Have to do with MOS Scores? appeared first on BlogGeek.me.
HELLO 2. Is Hardware Gear Finally Taking WebRTC Seriously?
It is about time for video room systems to adopt WebRTC native approaches.
When I first started this blog, I had no clue where it was going to take me. I wanted it to be about developers. To be interesting. I also decided early on to write three posts about WebRTC:
- What is WebRTC
- How WebRTC is going to affect signaling
- What a room system needs to look like in a WebRTC world
Somehow, I ended up covering a lot more ground since then when it comes to WebRTC…
Signaling came a long way since then. Most of you might not even know what H.323 is. SIP is still important, but a lot less these days. Proprietary signaling mechanisms are thriving – and that’s a good thing.
The thing that never did come to play was WebRTC in video room systems. When you went to purchase a room system, you were tethered to the vendor providing you that system, along with the signaling standards it supported. It is still painfully hard to connect room systems of different vendors. And if you factor in the need to integrate it with other services the enterprise uses, it becomes even worse.
What’s a Video Room System Anyway?This is called a codec for some arcane reason.
A video room system is a device split into 4 parts in most cases:
- High end camera
- Speaker pod
- Remote control
- The brains (that’s the “codec”)
The TV display itself is almost never included in the package (unless you’re starting to look at the new touch boards).
Speaker pods are sometimes integrated into the camera itself. This is suitable for smaller meeting rooms, also known as huddle rooms.
Remote controls were always nasty. A meeting room will have at least 3 of those: one for the TV, one for the projector in the room and one for the video room system. The one for the video room system is somehow the most complex to use. The projector one is gone along with the projector, now that we all just use the TV(s) instead.
In many cases, an external touch panel will be used to control the gizmos in the room, including lighting and other moving parts. And today, in many cases, these room systems are capable of tethering themselves to apps on smartphones for the control, killing the need for the remot control altogether.
The brains? They are sometimes just wrapped into the same box as the camera, just to save on cabling and space.
It started off as an all customized solution. The hardware, the software – it was all proprietary and specific. DSPs made up the “brains”. High end cameras were purchased and branded from Sony. The software was written in embedded operating systems like VxWorks (anyone remembers that painful thing?)
We’ve standardized some of it as time went by. Cameras have become somewhat of a commodity, now that we’re all carrying powerful ones in our pockets. Operating systems for these devices have moved on to be Linux based. DSPs are less common now that we can just use SoC (system on chip, packing the host operating system and the DSPs nicely together) or just rely on Intel chips.
What never happened is the standardization and commoditization of the software in the brains – the actual video software running the room system.
Let’s Talk UCaaSThat may finally be changing. As we head to the cloud, UCaaS (unified communication as a service) vendors are beefing up their offerings. Adding contact centers, APIs, video support and other trinkets to their battle chest.
In the past few months, we’ve seen:
- Vonage acquiring TokBox, gaining ownership of a video technology
- 8×8 just acquired Jitsi from Atlassian
- RingCentral must be doing something along these lines as well – and if not, it is high time they start thinking about it. Using Zoom as a partner makes no sense anymore, especially considering Zoom’s entrance into the voice only space
Each of these vendors is using today a third party for its video calling services but can now potentially displace them with its own technology stack.
While that solves their video software issues, how are they going to handle video room systems?
Lets see what the other notable players have done in that domain:
- Microsoft, which has Teams and Skype, has been partnering with hardware vendors for years, getting these vendors to build their stack to the Microsoft spec in order to integrate with it and become official partners
- Cisco has its own hardware products, giving it the full spectrum of the solution
- Google has its Chromebox
Vonage, 8×8 and RingCentral aren’t hardware vendors. They aren’t going to start designing and manufacturing video room systems. When it comes to physical phones, they partner with multiple device manufacturers. This is hard work when it comes to integration and to adding more devices into the fold and trying to introduce new features. The video room systems types of devices are limited today. Polycom offer partner-friendly solutions. Logitech sells components/peripherals (mainly the cameras). Lifesize has its own cloud service. And again, integrating these video room systems with other features and capabilities is sometimes close to impossible.
On the other end of the spectrum, there’s the customer. Banking on one UCaaS supplier is fine, but if you invest in hardware devices, will they be usable when switching to another vendor? What if you want more than a single service to run on a room system? Let’s say you want to record and transcribe physical meetings taking place in a room – when not on a call. Is the UCaaS vendor or the video room system vendor need to add such a capability? Can you add it on your own by partnering with a totally different vendor while still using the same hardware?
Now, here’s the thing:
- TokBox uses proprietary signaling
- Jitsi uses proprietary signaling
- Microsoft’s own use of the SIP standard is notoriously non-standard to some extent
- Cisco puts its own “secret sauce” in all of its devices
- And Google uses Meet, which runs… proprietary signaling
How can you partner with video room system vendors (even if there are ones) in a way that is relatively easy?
You Redefine What a Room System isThe one thing that is now changing is the software that is built into a video room system.
That is done by first changing the operating system. Instead of Linux – Android.
And Android means we can start thinking of a video room system as a device that can run multiple different applications by different vendors for different tasks.
Need to run Zoom? Why not?
Wanna switch to GoToMeeting? Fine.
How about attending a WebEx call? Sure.
Just install any of these apps – or better yet – try joining them from an integrated Chrome browser if they happen to support WebRTC.
But what if you want to show internal news for your company on that display connected to the video meeting room? Or give the ability to record and transcribe local meetings? Or connect to other internal or external services with ease? Not a problem. Just install that app on Android and you’re ready to go.
The difference here is that there is no integration work required from the video room system vendor. This is something the UCaaS vendor can do – or god forbid – the actual enterprise who is using the video room system.
I’ve been waiting for this level of commoditization and flexibility to take place.
Enter HELLO 2One of the vendors in this space, is Solaborate. I’ve interviewed Labinot years ago on this blog. That was about his enterprise social network service. Since then, he’s added a hardware device called HELLO which successfully launched on Kickstarter; and he is now running a Kickstarter campaign for HELLO 2.
The HELLO 2 is an “all in one” video room system capable of what I was looking for to happen:
- The brains is built into the camera
- It is based on Qualcomm chipset, giving it most of what a high end phone can do (which is… a lot)
- It has a 4K camera with zoom capabilities
- Built-in mic array
- And … AI capabilities (why not?)
The best though? It runs on Android, so you can either use the HELLO 2 / Solaborate applications or any other application you fancy using (that said, the applications may not be as polished on the big screen as they are on a phone or a tablet and that requires a bit of reworking on their end).
This gives some real flexibility:
- UCaaS vendors can now offer a hardware video room system running their own software applications, not needing to rely on the vendor doing the work and the integration. This gives full brandability along with the ability to integrate intimately with all of UCaaS vendor’s services and capabilities
- End customers can install and add the other services and apps that they use within their enterprise, without needing to beg to the UCaaS vendor to support and integrate with them
One more thing – you can run Chrome directly on the HELLO 2, and it will successfully operate any WebRTC based web page with it.
The FutureThis is the model of the future when it comes to video room systems. Generic types of devices, packing all the needed hardware, letting other vendors and customers handle the software components.
And today, there’s no easier way to do that than using Android as the baseline operating system. Having a Chrome browser inside the device is just an added bonus to let you join with guest access to those pesky calls your suppliers and customers schedule on their own services.
The post HELLO 2. Is Hardware Gear Finally Taking WebRTC Seriously? appeared first on BlogGeek.me.
Kranky Geek 2018. A post event post
For me, Kranky Geek 2018 was a tremendously fun experience.
We had our fourth Kranky Geek event in San Francisco last week. As usual, it is a nerve wrecking experience up until the point it ends. And it doesn’t start on the day of the event itself – we’ve been busy with content curation, handling presentation drafts and doing dry runs for a few weeks.
The result is quite satisfying. We’ve decided this time to dig even deeper into the domain of artificial intelligence and machine learning and its role in real time communications. As I’ve been saying, WebRTC is ready – so what would be the point of doing an event about WebRTC? We have a lot of WebRTC topics already covered from our past events – and they are all available in the Kranky Geek YouTube channel.
The way we see it, there are 4 domains we had to cover: speech analytics, voicebots, computer vision and RTC optimization.
So we went hunting for the event. In the end, we were able to cover all four domains and squeeze a few WebRTC specific topics as well.
The SessionsThis year, we had the biggest number of sessions. The event has become a full day event from a shorter one over the years. The people I talked to noted that the day was long and tiring, but somehow, almost everyone stayed to the end. Here’s what we had this year:
Our own welcomeKranky Geek SF 2018: AI in RTC from Tsahi Levent-levi
One thing to note here – our AI in RTC report got a promotional discount of ~33%, which will be available until the end of the month. If this space interests you, then definitely check it out.
DiscordDiscord operates a large chat operation for gamers. Part of that service includes voice and video calling. At peak, they handle 2.8 million concurrent voice connections to their service.
What they shared, was the changes they have done to the vinyl WebRTC code base in order to fit their needs.
FacebookFacebook were kind enough to give a presentation around Facebook Portal – their new home device that is capable of handling video calls (using WebRTC of course). The device uses machine learning to track the people in the room during a call. They talked about the challenges that comes with automating the camera’s zoom and with connecting calls from Portal devices to mobile phones.
This was the first time they shared that information publicly at a conference.
IntelIntel announced open sourcing their media server – the Intel Collaboration Suite for WebRTC – under the name of Open Media Streamer. They also shared information of svt-hevc, their open source HEVC encoder.
VoicebaseVoicebase talked about Paralinguistics – the way we speak as opposed to the words we are saying. They shared the path they took charting that space, and understanding what makes more sense or less sense in terms of value.
VoiceraVoicera discussed virtual assistants and how they need to understand transcriptions.
IBMIBM explained the notion of voicebots and how it fits into contact centers. They explained the need to be able to handoff a voicebot to a human agent.
NexmoNexmo showed a demo using Dialog Flow, connected to a voice service for ordering a pizza. It stressed the need to be able to connect communication services to various machine learning ones.
DialpadDialpad explained how to take an open source speech to text engine and add some custom words into it in order to improve the accuracy of the transcription.
CallstatsCallstats clustered the sessions they are collecting, trying to figure out by that information the type of call and root cause of issues it may have.
RingCentralRingCentral normalized MOS scores of audio calls across its network and devices, to be able to give a clear indication of call quality – it appears that while there’s a standard specification for MOS, asking device manufacturers to follow it to the letter is rather challenging, so using machine learning they are “fixing” that issue.
GoogleGoogle talked about the current status and efforts in getting Chrome’s WebRTC implementation to 1.0 specification. It also shared the work being done to improve audio stability and performance in Chrome (lots of architecture changes in how devices get accessed in order to reduce the number of threads used and get a stable delay model for its acoustic echo canceller). There was also a look at what goes after 1.0 – WebRTC NV and what role may WebAssembly play there (I’ll write more about it in the future).
AgoraAgora showed how they use super resolution to improve video quality in calls, and what it means to run super resolution on a mobile device.
HousepartyHouseparty used machine learning to improve video quality as well, taking a different approach. They shared the work they are doing and the effort it takes to bring it to production.
MicrosoftMicrosoft shared the work done on WebRTC on UWP and explained how AR/VR fits into the story and the enterprise use cases they are seeing in the market.
Session RecordingsAs always, all the sessions were recorded and are available online.
Kranky Geek in 2019Every year we’ve done a Kranky Geek event, we came in with the notion that this is the last one. Not sure why, but that was always the case. Then about 9 months after the event, we started discussing with Google about the next event.
We’ve changed that this time. We are going to do an event in 2019, and we have a name for it:
Kranky Geek SF 2019
We have a tentative date for the event: November 15, 2019
Put it in your calendar.
We don’t yet know what the theme for next year will be, but I have a hunch that it will include WebRTC and machine learning
If you want to speak – contact me
If you want to sponsor – contact me
If you have feedback on what we should improve – you know – contact me
Oh – and if you are interested in AI in WebRTC, check out our report – there’s a discount available for it until the end of the month.
The post Kranky Geek 2018. A post event post appeared first on BlogGeek.me.
8×8 Acquires Jitsi From Atlassian. Winners and Losers
Jitsi was just acquired by 8×8, shifting hands from Atlassian. Here’s what to expect.
It seems that Jitsi has now switched hands, moving from Atlassian to 8×8.
Three months ago, Atlassian made a bold (desperate?) decision. It put up a white flag, decided to kill Stride, after investing in it huge amounts of money and resources, throw Hipchat along with it, and “sell” them to Slack, who “acquired” them.
The weird thing in this acquisition was that Jitsi was left behind.
Jitsi is an open source media framework. One of the most popular WebRTC frameworks out there. I wrote about that acquisition in 2015. The reason behind it was Atlassian’s need to own the video communications technically that powered Hipchat. And now that Hipchat is gone, what would Atlassian need Jitsi for?
The last 3 yearsThe last 3 years have been good for Jitsi in Atlassian.
The team of developers it had was big, considering its scope (and open-sourceness). Especially if you factor in the fact that everything that Hipchat (and Stride) needed from Jitsi was implemented directly inside Jitsi. Not on a private branch of the project available only to Atlassian.
Compare it to how Twilio treated Kurento after its acquisition… Atlassian did a great job at keeping Jitsi’s momentum and community. At the very least, it didn’t hurt the project, letting it grow and flourish, paying the salaries of its developers.
The interesting initiative that took place alongside the Jitsi open source project is Jitsi Meet – a free version of a group video calling service. One that wasn’t limited to a small number of participants or lower video resolutions.
Jitsi is in a better place than it were 3 years ago prior to its acquisition.
Leaving AtlassianLeaving Atlassian was a matter of time.
There was no room in today’s Atlassian for an open source project like Jitsi that brings no added value to its commercial products.
Jitsi didn’t go to Slack as part of the Hipchat/Stride deal. Slack were already using Janus, and moving on to their own homegrown media server – something they shared with us at Kranky Geek 2017 (hint: come and join us this year at Kranky Geek 2018). There was no reason for them to further invest in yet another migration – or they might have wanted to migrate to Jitsi and acquihire the team but it didn’t pan out.
That left Atlassian with one of 3 alternatives:
- Kill the project and be done with it. Send the developers home or integrate them into some other parts of Atlassian. It would work nicely, but if the asset can be sold, then why not recoup some money?
- Spin out the project. Let the team go, giving them back ownership of the code, and have them go scrape for a livelihood around Jitsi. Probably by offering a commercial license, support and customization services, etc. – this isn’t that far out as an idea – it is how Janus (another open source media framework) operates today and how Jitsi operated prior to its acquisition by Atlassian
- Sell it to someone who’s interested in it. This is what it ended up doing. Given the other alternatives in front of them, I tend to agree with Andy’s statement that this is a mercy sale
8×8 acquiring Jitsi is an interesting choice.
Here’s where things get interesting:
8×8 already has a WebRTC based web conferencing solution called “8×8 Virtual Office Meetings Online”. Somewhere in 2016, this service got rewritten. At some point between then and now, guest access on Chrome was introduced. From the looks of it, based on WebRTC.
Why would 8×8 need/want Jitsi when it had a solution already?
I can think of three possible reasons for it:
- Their WebRTC solution isn’t that good, too expensive, and they were looking for a better alternative. Jitsi was a catch in such a case
- 8×8 is looking to own its video technology and not use third party software, commercial or open source
- They were using Jitsi for their 8×8 meetings thingy and Atlassian selling that assent was an opportunity for them to control the tech stack without relying on a third party – probably on the cheap
What would 8×8 do with Jitsi?
The obvious thing is to integrate the tech into its meetings service. If it is already there, then use the Jitsi team of developers to tweak and finetune the thing for the 8×8 use case.
If it isn’t there yet, then integrate it and replace its current WebRTC tech in the meetings app. This is a more challenging undertaking, as Jitsi will need to meet the current feature list of what 8×8 already has in that domain, along with integrating to an existing codebase of a service and an application.
Jitsi probably has most of the needed features to make this happen. It wouldn’t have been acquired otherwise.
On a different area, 8×8 has no real open source activity at the moment. Its github account is mostly forked repos. Searching for “8×8 open source” is dominated by the Jitsi acquisition news:
(the rest are comparisons to other vendors, who are leaning more heavily on open source)
If 8×8 is interested in embracing open source, then it just got an interesting opportunity to do just that. While brings me to the last topic –
The future of JitsiWhat will be of Jitsi?
Here we need to look at Jitsi and Jisti Meet separately.
JitsiThe Jitsi Videobridge, along with its derivatives, add ons, plugins, extensions and client-side SDKs.
That’s the open source part of the project. At Atlassian, there was nothing kept for internal use of Hipchat/Stride. Everything found its way back to the open source project.
Will 8×8 continue in that path?
Their focus in the coming months is going to be the integration of Jitsi into their 8×8 meetings service. They are bound to use the resources of the Jitsi team to do that.
Managers may decide to implement some of the features in the 8×8 meetings service moving forward and not invest in adding it to the Jitsi open source project. Or they might decide to add everything via Jitsi.
8×8 might end up taking the extreme – ditching the Jitsi project as an open source one – embed it into their meetings app and from there on, invest in that privat branch only. I see that as a highly unlikely outcome in the next 2-3 years.
Time will tell which direction is taken.
Jitsi MeetJitsi Meet is a different story altogether.
It is a group video meeting service. One which doesn’t limit the users’ bitrate in sessions, doesn’t limit the number of users in a session, offers mobile apps, Slack and calendar integration and scales globally. All for free.
Would 8×8 see it as competition to their own 8×8 meetings app? If it grows in popularity and its maintenance costs increase, how happy would 8×8 be in paying the bills? Would it see Jitsi Meet as a sales tool for its other services? How would it measure the success of this service?
Whatsapp’s founders just left Facebook this year. It was over disputes about data, privacy and such. Most of all, it was probably a dispute around the future of Whatsapp and Facebook’s intent of monetizing the asset. The same (at a much smaller scale) can happen here at some point.
How would 8×8 monetize Jitsi Meet? Should it? If it doesn’t, should it kill it?
I don’t know the answers. I am sure 8×8 doesn’t either. It is just too early to tell.
Last WordsJitsi is an open source success story in WebRTC. There’s no doubt about it.
It is now entering a new chapter in its life, under 8×8.
I wish the team the best of luck and us as an industry to have the option to use Jitsi for our future projects.
Media Frameworks are part of the picture of the backend story of WebRTC. Care to learn the rest? Try out my free mini-video series on WebRTC backedn servers:
Register to the video series
The post 8×8 Acquires Jitsi From Atlassian. Winners and Losers appeared first on BlogGeek.me.
Meet me @ Kranky Geek San Francisco 2018
Kranky Geek is happening this year again, the date is Nov 16, and we’ve got the best lineup of speakers for you.
Kranky Geek started almost by mistake. Like most good things that happened to me. It wasn’t planned. The result though is becoming a tradition by now, where I get to work with Chris Koehncke and Chad Hart for a period of time that can be considered quite intense (we’re all too opinionated).
Google, along with our other sponsors make this event happen. We only curate the content to make sure the end result is great.
In last year’s event, we started looking at the domain of AI. You can find the recordings of that event on YouTube. The feedback we got was positive, so this year we’re taking a step further here. Many of the sessions will focus on machine learning and AI and its impact on real time communications.
What’s on the Agenda?AI in RTC.
As always, our intent here is to focus as much as possible on services and applications that are running in production already. It won’t be theories about what can be done but what are people doing. Today.
The updated agenda can be found online. It might change a bit in its ordering, but it is mostly ready.
This year, we have some brand new speakers for you:
- Discord will be giving a session about their service and what they had to do with WebRTC to make it work for their use case. My suggestion? Read their post to get ready for this session – it will be really interesting
- Houseparty are joining us for the first time as well. Tinkering with machine learning on device. One of the main challenges these days is deciding where to run inference with machine learning – on device or in the cloud. We will see both options throughout the day
- Agora will explain what they are doing to improve video quality in real time on mobile devices by using machine learning
- Voicera will be talking about the challenges in speech recognition when it comes to handling meetings
- Dialpad are there to talk custom vocabularies. Every company has that. How do you transcribe Kranky Geek? That’s a question I’ll ask in the Q&A of this session…
- Intel will discuss newly open sourced visual processing tools to help you build out your application
- RingCentral is joining us late in the game. We’re figuring out with them a stellar topic for the event
We also have some “repeat” speakers:
- Facebook this year will give us a sneak peek at the technology (and AI) behind their new Facebook Portal device. What I am really keen on hearing is what decisions they made to get their “follow you around” feature to work
- Voicebase will focus on paralinguistics this time. The nuances of speech that aren’t text – and how to capture their meaning
- Callstats will be discussing this time the use of looking at ongoing call data using… machine learning
- IBM will be all over voicebots and their uses in contact centers. We will get to look under the hood on how these get implemented
- Nexmo are going to show us the complexity of connecting real time voice streams to cloud based speech to text engines. (technically, there are a new speaker, but I figured that now that TokBox is part of Vonage which also owns Nexmo, they are repeat speakers)
- Google will give an update on Chrome’s implementation of WebRTC, with a focus on 1.0. They will also give a deep-dive into the upcoming architectural changes in Chrome’s audio processing engine
- Microsoft is going to give us a demo of WebRTC, Mixed/Augmented Reality and HoloLens. And we’re saving this for last so you’ll stick around
We are expanding our family of Kranky Geek speakers and Kranky Geek companies, which is a true joy. I can’t wait to hear your feedback once the day is over.
Our sponsors this yearAs always, the event is practically free to attend (there’s a $10 admission fee that gets donated to Girl Develop It).
The companies that made this event happen this year are Google, Intel, Agora.io and Nexmo who are our premium partners for the event; Callstats.io ,Voicebase and RingCentral who are our silver partners for the event.
No fire drillI am not sure if this is good or bad. We had a surprise fire drill last year. We knew about it about a week or two before the event. It cause so much headache for us. And a lot of worries.
It ended up pretty well, with our audience and speakers getting a one hour break outside on a beautiful sunny day. Almost all of them came back after the drill, which isn’t obvious or even expected.
Many were happy for the break – and the smalltalk that ensued during it.
Hopefully, there will only be pleasant surprises this year as well.
What are we looking for in Kranky Geek?We had to turn down a few vendors who wanted to speak. This is a process that takes place every year.
There’s no specific set of rules of what we approve or don’t as a session in Kranky Geek, but for me it boils down to this:
- Something new that wasn’t discussed at Kranky Geek before
- Preference to something running in production at scale
- An interesting topic that would appeal developers
- Related to real time communications
- A speaker that can “hold a room”
While the lineup of speakers for this year is full, if you want to speak in future Kranky Geek events – be sure to catch me during the event for a chat.
Should you travel just for this single day?I got this question a few times in the past few weeks.
My guess is that if this is the only thing you’re doing in San Francisco and coming for, then skip it. Especially if you are traveling from abroad.
That said, if you want to feel where WebRTC is headed, talk to many of the people who deal with it daily in the real world, then this is the place to be. So many discussions take place during the breaks that it might be worth coming only for the breaks… I know a person or two that are coming only for that.
We try to make Kranky Geek special and unique. We work hard to select the speakers and work with them on their presentations. All to make it worth your travel, wherever you come from.
Can non-developers attend?We received this question recently.
There is no easy answer to this one. On one hand, the event and its session are technical in nature as our focus is developers. On the other hand, the sessions are short (20 minutes all-in-all), so our speakers tend to focus on the essence and not dive too deep into the nitty gritty details. So a tough call.
My suggestion? Check out some of the session recordings on YouTube from past events and make your decision based on that.
Register nowYes. there’s this minor detail.
You need to register to attend. There’s limited room capacity, and at some point, we will need to close the registration.
We’re already half full in our registration list, so save your spot now and don’t wait.
Do you want to meet me prior to the event?
I’ll be in San Francisco Nov 12-17. Nov 15-16 are reserved for Kranky Geek. The rest for meetings with people – around WebRTC, CPaaS, testRTC, my WebRTC course, consulting and just catching up.
If you want to meet me during that week, leave me a note.
The post Meet me @ Kranky Geek San Francisco 2018 appeared first on BlogGeek.me.
Are Embeddable Video Experiences Necessary?
There’s no one size fits all in communications. In video, that means that embeddable video experiences are necessary and they are here to stay – they aren’t a passing trend.
Source: Vidyo
Years ago, before WebRTC came into our lives, I worked at a video conferencing company. My role there at the time was CTO of the business unit dealing with licensing VoIP technology to others. The leading product at the time, was a video conferencing client that can fit into device and able to interoperate in SIP and H.323. As a CTO, I was given the initiative of getting us into the cloud, which ended up involving something that was meant to become a CPaaS (just not using that term as it didn’t exist). It never came to fruition since I left the company a bit after WebRTC was announced and I knew where the future is headed.
Anyway, one day I was asked to take a business trip to the US, to meet with customers and potential customers. One of these customers was a vendor involved in the prison industry (not sure what’s the whitewashed term for that is, so just using prison industry).
Video Conferencing in PrisonsTo clarify: I am not taking a stand here around prisons, prisoners or video conferencing in prisons. Just sharing this as a requirement that I’ve seen in the past.
What they were doing was building “phone booths” for prisoners so they could call home and talk to friends and family. They were in the process of shifting towards video calling, and were using at the time one of the known brands – I don’t remember which. Think of Polycom or Cisco video conferencing systems for reference.
Source (somehow, the happy faces seem exaggerated for the use case)
The challenge was in the fact that these vendors and their solutions were geared towards video conferencing in the enterprise – what we now wrap under the term of unified communications. This meant that a lot of the features and requirements that a vendor developing a communications service for prisoners were hard or impossible to meet:
- Full moderation of the call by a third party at all times
- Ability to join the session as a silent or known participant (that’s the moderator)
- Ability to manage and control session length
- Knowing the identity of both people in the call, but having the system flexible enough to accomodate for new users and guests in the system
- Wrap the whole experience with other features (browsing) that prisoners might want to use
They ended up licensing our technology to build it all, at prices that today would seem ridiculously high, though made sense at these days, when real time communications technology wasn’t a commodity and wasn’t open sourced.
If we’re at the domain of anecdotes, funnily enough, we’ve been using GIPS for the audio codecs at that time on PCs. The same company that Google acquired and built WebRTC out of.
Back to Embeddable Video ExperiencesPrisons and prisoners aren’t the real story here.
Embeddable video is.
Communications between humans is something that can’t really be placed into a set of known rules.
Yes. We’ve had the telephone companies around for 120 years or so, explaining and educating us on how to communicate with each other remotely.
Unified communications has a gazillion of features dealing with telephony, trying to accommodate each and every eventuality that a customer may want and need. Which is nice, but from a certain point, it is really hard to scale across customers with different needs.
Video conferencing has been the hardest of all. Video is hard, so everything about it is hard as well.
This all meant that communications was always a service. Something you get “out of the box” as is. Or something you can customize if you are big enough, with enough money to pay.
WebRTC, cloud, virtualization, SaaS and a few other terms came into our lives. What they essentially did was reduce the barrier of entry for those who need video communications. This meant that scenarios that weren’t catered for with enterprise video conferencing were now possible to achieve at lower price points.
The end result?
We are now seeing video communications being embedded in places where it never really existed.
Are these new?
They are and they aren’t.
They aren’t because the need was always there.
They are because only now they can be satisfied commercially.
The only question that remains is where do you see embeddable video contributing to your business and how do you go about implementing it. In the last few months, I’ve been working with Vidyo on a research around this topic exactly.
Interested in the state of embedded video in 2018? Download the free report here.There’s also a joint webinar on the topic coming up – be sure to register to it:The post Are Embeddable Video Experiences Necessary? appeared first on BlogGeek.me.
WebRTC is Ready. Now What? (a look at the state of WebRTC in 2019)
There should be no doubt about WebRTC anymore. It is here and it is ready for everyone. The question is: “now what?” Where are we headed with WebRTC in 2019
Is WebRTC Ready Yet?
That was the name of a website that tracked how well is WebRTC adopted by the various browser vendors.
Apparently, it is also the most common question on Google about WebRTC:
It is time we say it outloud (I don’t believe anyone has done that up until now):
WebRTC is READY
I was asked to speak at Apidays Amsterdam last week, which was a true joy. The topic I was tasked was around WebRTC being a standard, and well… where are we headed next. So I decided to rephrase it a bit and ignore that tiny bit of a fact that WebRTC 1.0 still isn’t an official standard (nobody but those in standardization organizations and those opposing to adopting WebRTC seem to care either).
So I sat down to think what does it mean that WebRTC is ready. Which led to this question:
Why I think that WebRTC is ready?
The best way for me to answer that question was to give 3 recent examples on things happening with WebRTC (and I don’t mean Uber doing VoIP using WebRTC):
#1 – VP8 Supported by SafariI’ve been a critic about Apple’s non-support of WebRTC and then Apple’s non-support of VP8.
The fact that Apple decided at the time to support only the H.264, a royalty bearing video codec, and ignore VP8, the royalty free alternative, wasn’t a good sign.
In the past two weeks, tweets and webkit bug links have been flying around, indicating that if the mountain won’t come to Muhammad, then Muhammad must go to the mountain. Or more accurately, that Apple decided to do a Microsoft and support VP8.
Do a Microsoft because this is the same steps Microsoft took when going WebRTC. Starting with H.264 and only later adding VP8.
So Apple has started with H.264 and only now adding VP8.
When will this be available for all? Ask Apple.
What’s important is that ALL modern browsers now support both VP8 and H.264. More on that in a sec.
—
It doesn’t stop there either. Apple joined the Alliance of Open Media as a founding member. This alliance is behind the future video codec AV1, and now has 40 members in it.
#2 – H.264 Simulcast SupportThe second example is H.264. It is now becoming a first class citizen.
H.264 on Chrome didn’t have simulcast support. The “fix” for that was available for quite some time, but was never incorporated into Chrome. Simulcast increases the quality of group video calls, so not supporting it in H.264 made H.264 useless for group video calls.
There can be two reasons for this feet dragging by Google:
- Timing and priorities. Google didn’t really care enough to add that in and deal with the headaches of pushing code from a third party with the fix and validating it
- The push towards VP8. Increasing the quality of H.264 would get more developers to adopt it, especially when Apple supports only H.264 on Safari
Since VP8 is coming to Safari, the reason to give it an edge over H.264 isn’t there anymore. Especially considering the healthy growth of the Alliance of Open Media.
The end result?
- All modern browsers support VP8 (Safari support is imminent)
- All modern browsers support H.264; and simulcast will soon be possible for it
- VP9 is available only in Chrome and Firefox for WebRTC – but who cares? The future will be AV1. And ALL browser vendors are part of the Alliance of Open Media where AV1 is getting specified (YouTube is already testing AV1 decoding in Chrome and Firefox)
This media codecs disparity between browsers was the main challenge for the WebRTC community. It is now behind us.
#3 – Google Shifts FocusThat third reason why I believe WebRTC is ready?
Google is shifting focus. It is doing what is needed to support WebRTC and the migration to the 1.0 specification (unified plan for example), but its heart and mind is already elsewhere:
At the beginning of this month, Google announced Project Stream – a cloud based service that streams high end games from resource intensive cloud based machines to low end devices in real time.
There’s not a lot to go on about the technology, but it seems to be based on WebRTC.
Project Stream official gameplay capture: 1080p@60fpshttps://t.co/SjznbRCBAP
— Justin Uberti (@juberti) October 2, 2018
Why else would Justin Uberti from Google’s WebRTC team publish this? 1080p resolution at 60 frames per second with low latency for gaming. This type of a use case is different from real time communications. It requires a different focus and optimizations. And yet… the WebRTC team at Google have probably spent some cycles on supporting it.
Why is that a good thing?
Because for Google, WebRTC is ready when it comes to real time communications, and beyond optimizations and house keeping, it is time to move on and look at other use cases where WebRTC can be beneficial.
What’s Next?So. WebRTC is here:
- Apple supports it now; and there’s codec parity across browsers
- H.264 is a first class citizen in WebRTC
- And Google has moved on to other use cases for WebRTC
What’s next for WebRTC?
The answer I gave in that presentation at Apidays was Machine Learning.
I like that slide above. I like it because you can take RTC out of it, replace it with whatever word/term/industry you want and it will STILL be true.
In the rest of that presentation, I went over the research report that Chad Hart and I have written, sharing some of our findings.
I went into the 4 domains we’ve mapped in our research, in each giving an example of the impact and use cases that are now possible:
- Speech analytics, and how we’re shifting from offline processing to real time
- Voicebots, and how work in that area is accelerating
- Computer vision, where use cases are vastly different between consumer and enterprise settings
- Media optimization, and the shift from heuristics to machine learning
That slide deck from Amsterdam is now available online as well. You can view it here:
WebRTC is READY. What's Next? from Tsahi Levent-levi Machine Learning and Real Time Comms
If you are interested to learn more about machine learning, to be able to make smart decisions in your own company about the use and introduction of machine learning and artificial intelligence in a communications application, then definitely check out our report: AI in RTC
The post WebRTC is Ready. Now What? (a look at the state of WebRTC in 2019) appeared first on BlogGeek.me.
Can Google RCS Win the Messaging Game Through AI?
RCS is being brought from the dead by Google, and its next play will probably be with AI.
Carriers have a problemSMS won’t stay here forever. In fact, most of the messaging traffic is happening on social networks now.
Voice is shifting as well. Migrating to these same social networks. With the ability to upgrade these calls to video calls. With stickers. And silly hats, cat lenses and whatnots.
Want to learn more about the use if silly hats and other AI features in communications? Check out our AI in RTC report preview
Their circuit switched network technology is decaying, left in its 80’s or probably 50’s. Most of what goes on there is spam or OTP passwords anyways. Nobody cares.
So much so that Google is planning on diverting incoming calls to its assistant (but more about it later).
The solution, in the form of IMS and later RCS (or call it Joyn or whatever other branding it was given throughout the years) are some 20 years in the making. And they don’t seem to be coming any time soon. At least not if left to the arduous processes of carriers and their suppliers.
Google has a problem
A VERY different problem.
Google has no messaging clout.
For consumers?
Apple iMessage wins on iOS. It acts as a Chameleon, catching up your messages and deciding if they should be demoted to SMS or use modern messaging via iMessage instead.
Facebook with Messenger and Whatsapp is ruling supreme in Android, and in many cases on iPhones as well. Where they aren’t as strong, you’ve got a slew of other social players with 100+ million monthly active users. None of them looks like a carrier. And none of them is Google.
Google has Allo, Duo, Chat, Meet, Hangouts, Messages and probably a few more apps that I’ve forgotten to mention. All in different states and capabilities; but none which is dominant compared to its competitors. Actual monthly active users and amount of real messages going between users? Not shared. Probably not stellar.
And Google has RCS..
For businesses?
Apple, Facebook and others are adding APIs. Introducing bot platforms. Building marketplaces. And they are doing it slowly, fearful of becoming the spam cesspit that is the good ol’ carrier communications tech today.
Slack is killing it. And the rest of the cadre of UCaaS and enterprise communications players are trying to move into their space.
Google has Meet and Hangouts Chat. Part of G Suite. Meet gets used. Hangouts Chat I don’t really know. But it seems that most just skip it and move on to Slack or some other tool.
Google also has nothing similar to a business angle to its consumer facing communications applications yet, or at least nothing popular enough.
What’s new in RCS land?Nothing really.
I’ve written in April about RCS being still dead. For some reason, Google is still hammering away at it. Similar to Google+ if I need something to compare it to.
A press release last month by Samsung and Google brings Samsung to the RCS graveyard. New Samsung devices, and maybe layer older ones will come -gasp- with a Samsung Messages app that will work seamlessly with the Android Messages app using each other’s RCS technology!
This interoperability nightmare of the carriers will continue on, leaving RCS dead.
Adding new carriers or smartphones or chipset makes into the fold won’t help either.
And it isn’t as if Apple is making any noises of being interested in RCS, and why should they be?
That said, there are those who will be adopting RCS.
We are shifting towards an omnichannel world. No single protocol to rule them all. No single vendor to rule them all. You want to send your message as a business to a consumer?
You can use SMS. Or better do it over Messenger or Whatsapp or Apple Business Chat – there’s more context and richness in those, and consumers actually care about these channels. Which brings us to a place where businesses just need to support wherever their customers are with no decent common denominator.
And wouldn’t it be great if we could throw SMS and use RCS instead? At least where we can?
So CPaaS vendors are adding support for RCS and announcing it in their arms race to world domination by collecting as many social messaging icons as they can.
That’s great, but not enough to save RCS.
Can Google change RCS predicament?Not really.
There are just too many players and this is a domain where Google has been struggling to go it alone as it is.
Here’s what it takes to bring RCS properly to the masses:
Chipset vendorsChipset vendors are at the bottom of the food chain, but they need to offer their support to make RCS happen.
Unlike other messaging services, RCS is “bolted” on to the identity of the user and his device. The SIM card. The ability to connect the end user, through an application, to the SIM card, and from there to the carrier network is what presumably makes RCS different. But for that to happen, chipset vendors need to pave the way, even if just a little bit.
Handset manufacturersHandset manufacturers need to make sure that the RCS application is there implemented, supported and pre-installed in the device.
Without being pre-installed, users will need to pick and choose between an RCS app from a handset manufacturer or a carrier (the word bloatware comes to mind) OR pick Whatsapp instead. The choice is a simple one for most.
They need to make the application attractive and sleek. Things they can’t really do. Competing with current successful social messaging apps requires a lot of investment. Nailing the user experience is a lot harder than it looks.
CarriersCarriers need to actually support RCS. As a service. In their network. And have these things called mobile phones that support RCS. and enough people that have these devices so they can actually talk to each other.
Preferably, all carriers within a country should light on the switch on RCS simultaneously.
How likely is that to happen?
Single, very complex specificationAnd all of these players need to do so for a very complex IMS/RCS specification.
Testing the combinations of devices and networks is going to be hellish, especially for those who aren’t going to just select the default Google implementation of RCS client/server.
Which is exactly what Samsung decided to do. Have its own service and then interoperate it with Google’s. I can easily see other big players – chipset vendors, handset vendors and carriers who would be either scared shitless of ceding control to Google or not magnanimous enough in letting Google take control over that piece.
This headache also suggests something really important:
If RCS succeeds, it won’t move as fast as any of the other social networks in introducing new features, services and capabilities
There are too many moving parts, controlled by different players, some of which doing the same things.
Network effectsThen there’s the network effects.
When can I use RCS on my phone?
It needs to be installed there. Probably pre-installed.
The people I communicate should have it as well.
Our networks should support it.
Oh – and there’s this minor detail of me actually going into that app to send a message.
How many times this week have you clicked on this icon on your Android phone?
What about these icons?
Enter Artificial IntelligenceI’ve been thinking about it for quite some time.
How can Google become relevant in messaging?
It is unlikely to come from features and capabilities at the core of social messaging. None of its services stick:
- Google+ was “shutdown” publicly this month. Google found a great excuse – a potential security flaw
- Duo was supposed to compete head-on with Apple FaceTime, offering things like faster connections and knock knock feature. But what have we seen from Duo since its launch? And are you using it at all?
- Allo was interesting, but got no adoption. It got halted on April if you believe the news
- Hangouts is being replaced by Meet, at least for the enterprise. Will it be shut down for consumers? Time will tell
- Hangouts Chat is only starting its way, though I haven’t heard anything at all since its public launch
- Meet works just fine. For the enterprise. If you have a Google account
- The Google Messages app is purely for SMS. And it is crappy to say the least. It doesn’t respond as fast or as fluid as other social messaging apps, and frankly, I don’t really care about the technical reasons for it
The one thing Google has going for it is AI. in droves.
Which is probably why Google Duplex is reportedly rolling out next month, helping phone users book tables at restaurants – on their behalf.
It is also why Google is now adding to its Assistant the ability to screen spam calls:
These AI features have a potential to actually succeed. They don’t really relate to RCS or even messaging, but they are about telephony.
Allo was about messaging. As reported on The Verge in the April Allo pause:
As part of that effort, Google says it’s “pausing” work on its most recent entry into the messaging space, Allo. It’s the sort of “pause” that involves transferring almost the entire team off the project and putting all its resources into another app, Android Messages.
Google won’t build the iMessage clone that Android fans have clamored for, but it seems to have cajoled the carriers into doing it for them. In order to have some kind of victory in messaging, Google first had to admit defeat.
That’s the Google RCS effort right there.
If you take the AI related features in Allo, and think of them as getting Google Assistant into Messages, the Google RCS app, then it makes sense in a way. But not enough sense.
The Google Assistant doesn’t feel like a product by now. It is a large set of features and capabilities that can be used to add smarts into phones. It is a window to the phone’s (and Google’s) AI for the consumer.
Limiting it to run for RCS only doesn’t seem like the right thing to do. Would it be enough to save RCS? Would it be enough for Google to gain back users from other messaging apps?
It is too early to say, as none of it as come to fruition in an app customers can use.
Google could have tried to do with Allo the same things it is doing with its Contact Center AI:
Provide the whole AI for communication part as an API, a set of building blocks for others to use and embed. It worked so well for them that it got many in the industry lining up to partner with it in contact centers. Launch partners for the Contact Center AI include Mitel, Genesys, Vonage, Cisco, RingCentral, Five9 and Twilio to name a few.
Would such a thing work with social messaging apps?
Apple wouldn’t touch it with a long stick for its iMessage.
Facebook wouldn’t either. So no Messenger or Whatsapp.
Telegram? I don’t see that happening.
WeChat? Chinese.
Who would they be left with? The smaller players, who might grow, but none seem to be rising above white noise level.
Which gets us back to Google itself. With Messenger/RCS/Chat.
What Google needs to do is find the sticky features that will get users to use its app. Those that can get value out of it even when the other participant isn’t using the same app. Add smarts into SMS itself, while providing a rich experience to the user when interacting with others who have that app.
The real question is why limit this to RCS and carriers? why not just offer it as the out of the box Android experience to everyone? Have it there by default. Let people download and install it on older devices and on iPhones.
Probably because Google still believes it relies on carriers for its Android success. Which is what’s keeping it back in mobile social messaging since Android came to our lives.
Want to learn more about the use if silly hats and other AI features in communications? Check out our AI in RTC report preview
The post Can Google RCS Win the Messaging Game Through AI? appeared first on BlogGeek.me.
WebRTC vs Zoom. Who has Better Video Quality?
WebRTC vs Zoom? WebRTC is actually quite good. But you knew that already – didn’t you?
They say quality is in the eye of the beholder. So behold.
We’ve all been told once and again that this video conferencing vendor or that video conferencing vendor work great. They offer the best quality. The best experience. They work in conditions that others don’t.
I even had a call once with an entrepreneur that explained to me how he is going to offer a service that is better in its 1:1 video quality than Skype and Google Hangouts. And he is going to do it with WebRTC. I spent the better part of that call to get him off that idea (something about his logic was off there).
But I am digressing.
As many others, I’ve been told time and again how Zoom is great. How in spite of the fact that it doesn’t work in the browser and forces you to download its client (some even refer to it as a virus), it gets traction and adoption. It feels like it is the best game in town. And then they mention the reasons:
- It’s free (until it isn’t, which is a great business model if you can make it work, and Zoom is making it work)
- It has better video quality than the competition. Especially WebRTC
I am not the only one who needs to listen to it, and even believe it to some extent. The guys at Jitsi got curious – why not put it to the test?
So they took a Mac device, placed it on a WiFi network, added a network limiter so they can fiddle with the network configuration, and did a 1:1 call. Once with Zoom. And once with WebRTC.
Idea is this – start with as much bandwidth as the video call wants. Then limit it to 500kbps. Check how much time it takes to adapt. Remove the limit and change how much time it takes it to adapt back. More about it in Jitsi’s blog.
Essentially – testing for this network conditions:
The longer that marked areas, the worse the experience is going to be for the users.
And guess what? Zoom faired worse than WebRTC. Not a little, but a lot worse.
Full adaptation to limiting the bandwidth took WebRTC 20 seconds. It took Zoom 156 seconds (!).
Ramp up back to 2mbps took WebRTC 32 seconds. It took Zoom 62 seconds.
Now here’s my analysis of this.
WebRTC RocksYap. it really does.
The screen capture from that Zoom blog post that was pasted by Jitsi?
Stating that “web-RTC is a very limited solution that would not allow us to provide all the excellent features that our users have come to expect from us”?
That’s from 2015.
A lot have been improved in WebRTC since then, if that explanation was even correct in 2015 to begin with.
Without the need for most of us to do anything, we’re getting updates to a top notch media engine in the form of WebRTC inside the browsers we use. The code used in Chrome are open sourced, so they are accessible to all to embed it in their own applications as well.
Security fixes? New codecs? Improved media algorithms? They just “happen”. Out of thin air. For most of us.
Defending ZoomIf I look at it from Zoom’s point of view, besides the fact of being a dominant player in the market with or without WebRTC, here’s the challenges with such a test scenario:
- It was done once, or a few times. But it is still only one scenario
- It wasn’t a real life scenario. Just something concocted for this. Jitsi could have rigged it and tweaked it so that WebRTC would shine, but in real life, that doesn’t happen, and at Zoom we’re optimizing for real life scenarios
- (that isn’t really so. From my experience and knowledge of the Jitsi team, I’d estimate they tried to be VERY careful here to not fall into that trap)
- (and what’s real life scenarios anyway?)
- The network limiter used changes behavior in ways that aren’t close enough to reality
- (that I can understand and live with. We see faster uptake of the same type of scenarios for WebRTC at testRTC – more on that later)
- Zoom might be working through external remote servers for that same session while WebRTC is going peer to peer on the local network. Servers behave differently than clients, so the results seem somewhat “off”
- In other scenarios, Zoom might actually be better than WebRTC
Which leads us to the fact that more tests are needed to know which one is best and in which scenarios.
This starts to sound like the VP8 vs H.264 quality comparisons of the past (I never could tell the difference).
It’s the Infrastructure StupidWith WebRTC, it all boils down to the infrastructure. The one with the better deployment wins the quality game.
- Do you peer to peer for 1:1 sessions and seamlessly switch to SFU architecture when more participants join?
- Where are your media servers located?
- Do you cascade the session across media servers to improve quality?
- Do you provide feedback to the user about the network conditions?
- Do you switch video off when there’s not enough bandwidth?
- How are you managing things like FEC, simulcast, SVC, … ?
- What about mobile and native app support?
And the list goes on.
With vendors who use proprietary codecs and transport protocols, this is doubly so, as they need to cater for the browser once they reach WebRTC. So while their native apps might be optimized, it might all go down the drain once they transcode or just “translate” to reach the browser using WebRTC.
Need to understand WebRTC and how to design and architect real world solutions with it? A first step is to understand the servers used to connect WebRTC.
Join a free video course on WebRTC servers
Which brings us to why someone like Zoom should use WebRTC and thing about the quality issues once connecting to it:
You Need WebRTCZoom already supports WebRTC. I just found out when I searched for stuff to write this article: there’s a Zoom Web Client
It runs on Chrome and enables using audio in Chrome when joining meetings. No video, probably because transcoding the proprietary video codec Zoom uses to the ones in WebRTC is too complicated, but using G.711 or Opus in the browser and transcoding or using the same in Zoom is way simpler.
Zoom is going through the same phases that Amazon did with Chime:
- Amazon Chime started with a downloadable client
- They then added limited browser support that enabled users to view the screen shared in the browser and connect via the phone without the need to download the client
- Later on, audio support was added to the web client
- And recently, video got supported
- Screen sharing and remote desktop control still doesn’t work. I’d say it is a matter of time
This exact same path has been happening to other vendors in one way or another.
Why not Check Your Own Service?While writing this article, it dawned on me, that this is one of these scenarios that is ridiculously easy to simulate using testRTC, so I went ahead and created a script that does just that:
- Loads up Jitsi with 2 participants. That should cause them to work peer-to-peer
- Run the call for 1 minute unhindered
- Limit bitrate to 500kbps and run for 2 more minutes
- Remove bitrate limit and run for 2 more minutes
Here’s how the main part of the script looks like:
// Wait for 1 minute client .pause(60*sec) .rtcScreenshot('ALL GOOD'); if (probeType === 1) { client .rtcEvent('Start limit', 'global') .rtcSetNetworkProfile('custom', 'bandwidth', 500000, 'both', 'both') } // 2 minutes with bandwidth limits client .pause(60*sec) .rtcScreenshot('LIMITED') .pause(60*sec); if (probeType === 1) { client .rtcSetNetworkProfile('') // back to pristine network conditions .rtcEvent('Stop limit', 'global'); } client // 2 more minutes unlimited .pause(60*sec) .rtcScreenshot('BACK TO NORMAL') .pause(60*sec);
The .rtcEvent() calls are there to place a vertical lines on the graphs while the .rtcSetNetworkProfile() is there to fiddle around with the network conditions.
There were two probes here, each one a participant in the call. The first one is the one I limited while the second one was left “untouched”.
Here’s what the graphs look like on the second probe:
The above graph shows the outgoing birate. Within a span of 5 seconds, WebRTC finds out the new effective bitrate and adapts to it. Ramping back up takes some 20 seconds.
The above graph shows the incoming frame rate. You can see how frame rate reporting in WebRTC takes a bit of time to get back to its usual self – also some 20 seconds or so.
I wanted to check how the Jitsi SFU would behave, so I tweaked the test URL for that. The results? Still better than the Zoom one. 20 seconds to hit 30 frames per second and around 50 seconds to get back to full bitrate.
If you want to try it yourself, just import the JSON file in this Google Drive folder to your testRTC account and modify it to fit your needs.
Where to now?WebRTC is more than good enough.
Making it better is usually about thinking your way through the best possible architecture, along with media servers that take care of network conditions properly.
As for Zoom… please make sure your next call with me is on something that has WebRTC. The machine I regularly use for call is Linux. Zoom doesn’t work there… it doesn’t really support Chrome or Linux. Yet.
The post WebRTC vs Zoom. Who has Better Video Quality? appeared first on BlogGeek.me.
WebRTC FAQ: The 2018 Version
An updated WebRTC FAQ for those who wish to understand this tech somewhat better.
It is 2018, and it seems like there’s no good FAQ for WebRTC. Nowhere. They’re just not up to date. That, coupled with my own need to be the best source of information on the web about WebRTC (and the fact that my last few articles were more about CPaaS and messaging than WebRTC), got me to write this one.
What is WebRTC?WebRTC is both a standard specification and an open source project.
WebRTC allows sending and receiving of real time voice, video and arbitrary data across browsers and other devices. This means we now have an easy way as users to conduct voice and video conferences from a browser or from our mobile devices. WebRTC can do a lot more than that, but voice and video in real time is the basis of what you get out of it.
There’s a short video explaining What is WebRTC on my site.
Who is behind WebRTC?WebRTC originated from Google. It started by an acquisition of a few companies, whose technology was then repackaged and released as open source under the name of WebRTC.
Google is still the main vendor behind WebRTC. That’s because its own WebRTC engine is the main WebRTC open source project out there and it is also the one that gets integrated into the Chrome browser.
Mozilla, Microsoft and Apple all contribute to WebRTC and have their own implementations of WebRTC in their browsers (some of these implementations are derived from the Google code).
Other vendors and individuals contribute to the specification through the IETF and W3C, where the standardization process of WebRTC takes place.
My own contribution to WebRTC is this site, which publishes a lot of free information around WebRTC as well as the Kranky Geek event, WebRTC Index and WebRTC Glossary.
Is WebRTC ready for commercial use?Yes.
WebRTC is used today by commercial services (here are 10 such examples).
Some complain and gripe that WebRTC isn’t ready for commercial use. This stems due to the many changes that the codebase and specification is undergoing. It also means that if you plan on using WebRTC, either do that through a third party managed service (a CPaaS vendor – list here) or make sure to have a team of savvy developers that can keep up with the pace.
The changes introduced to the WebRTC codebase itself oftentimes breaks backward compatibility and features, probably by sticking to a “move fast and break things” motto to some extent.
Why should I use WebRTC?If you don’t need real time voice and video then you might not need to use WebRTC at all.
If you do, then it is a matter of capability, resources and time to market:
- If you want your service to work inside a web browser, then WebRTC is your only way of getting real time voice and video into a browser
- If you want it elsewhere, then in almost all cases, using WebRTC will cost you less and get you there faster than the alternatives
For voice, the mandatory codecs are G.711 and Opus. Out of these two, be sure to use Opus (G.711 is old and crappy).
For video, the mandatory codecs are VP8 and H.264. Apple’s Safari browser doesn’t support VP8. And on Android, Chrome won’t support H.264 on *some* devices (I’ll let you go figure out on which ones). More about that in this video mini-series.
VP9 is supported by Chrome and Firefox. AV1 seems to be the future.
What browsers support WebRTC?All of them. Almost. But not exactly. And there are differences.
- Chrome is where most developers focus. It isn’t 100% aligned with the specification yet (none of the browsers are)
- Firefox is the next that gets focus from developers. Close enough to Chrome in its implementation
- Edge doesn’t support data channels. And many skip it when it comes to testing due to is low market adoption
- Safari is what everyone wants (Apple you know), but it is still buggy and doesn’t have support for VP8. Most need Safari support for iOS but are fine with not supporting Safari on Mac. Read this webrtcHacks post for more
There’s a devices cheat sheet on my website.
And then there’s adapter.js which you should definitely use.
Can I use WebRTC on mobile devices?Yes.
On Android, on official Chrome and Firefox browsers, WebRTC is available.
On iOS, Safari offers something usable if you are willing to invest the energy to get it working well.
On both Android and iOS you can take the WebRTC source code and integrate it inside your native application. Google even releases prebuilt packages for both Android and iOS.
If you want to use a Webview inside your app, then this is easy with Android, restrictive with iOS for now (you won’t be able to access the camera or the microphone there).
Do I need special servers to run WebRTC?Yes.
You definitely need a signaling server. And STUN/TURN server. You might need a media server.
WebRTC is said to be peer-to-peer. It is when it comes to the media as much as possible. But developers can make use of it in server centric environments. And there are some scenarios where it makes no technical sense to use peer-to-peer (for example if you want to broadcast something to a million people or conduct a video conference with 20 participants).
There’s a free video mini series explaining WebRTC servers on this site.
Can WebRTC be used to create large conferences?Yap.
Think of WebRTC as a basic building block that gives you superpowers. With it you have the ability to send and receive voice and video in real time virtually on every device and browser.
Now what you do with this superpower, how you interact with it, architect your solution around it – that’s up to you.
There are vendors offering video conferencing that uses WebRTC and gets to 10’s of participants. Webinars with 100’s of live viewers in the audience.
You can read more about scale and size of WebRTC.
Is WebRTC posing a security threat for me?No.
And yes.
Depending who you are and what are your needs.
I wrote a lot about WebRTC security in the past. It gets tiring.
WebRTC comes with security in mind. It encrypts everything. Can’t remove that encryption. And browsers get security updates faster than any other software you have.
The one sticking issue is probably the fact that it exposes the local IP address of your machine when it is used. VPNs that are implemented properly solve that as well. More about that over at webrtcHacks and VPN leaks.
What does WebRTC 1.0 mean?WebRTC 1.0 is the first time that WebRTC will have an official specification.
Up until now, we had drafts and browser implementations that were an approximation of the drafts. Now we have an approximation of the WebRTC 1.0 specification and approximations of implementations to it in browsers.
Confused?
Don’t be. Assume WebRTC is good to go commercially (check that part of my FAQ) and just go read Jan-Ivar’s explanation @ Mozilla’s Advancing WebRTC blog.
Oh – and be sure to use adapter.js.
How much does WebRTC cost?It doesn’t. And it does.
WebRTC is freely available in browsers.
The source code is also freely available.
The servers you will need to use it – someone will need to pay for them. That payment can be to a managed service, or to a cloud vendors and developers who will develop, install and maintain them. Up to you to decide.
Oftentimes, developers assume everything should be free with WebRTC, whereas reality is different. And for some reason, most perceive development costs as free or sunk costs (they will call it investment) as opposed to paying a third party for doing the hard stuff for you.
A bit more on this here.
How can I learn more about WebRTC?If you are into free, then try reading the specs, playing with the official samples, reading this blog and webrtcHacks.
There are a few courses on coursera, pluralsight and elsewhere. Never tried them, but read their agendas. Take a look for yourself and decide what’s for you.
There are books, but none of them is up to date with the specification.
Best place? Hands down? My paid course. Advanced WebRTC Architecture Course
Can I help you?Maybe.
There’s my course. There’s testRTC where I am a co-founder (we do testing and monitoring of WebRTC apps).
I also consult. Around architecture, vendor selection, defining requirements, setting roadmaps, working on differentiation and doing pure marketing related work. What can I say?
I like the variety.
You can reach out to me here.
–
Got a question about WebRTC that needs to go into this FAQ? Add it below in the comments.
The post WebRTC FAQ: The 2018 Version appeared first on BlogGeek.me.
Social Messaging != Carrier Messaging (the stories of Whatsapp Business API & Apple Business Chat)
Social messaging is killing RCS in all the places that matter.
When looking at messaging in the context of communications and people, we can probably split the story into 3 distinct models:
- Consumer centric
- Business centric
- Businesses to consumers (and vice versa)
I’ll quickly sift through the first two and focus on the third.
Consumer CentricConsumer centric is easy. That’s where Apple iMessage, WhatsApp, Facebook Messenger, Telegram, WeChat and a bunch of others are competing. The approach there today is to deliver a rich messaging experience that includes text, images, video, voice and video calling, location, groups, … – the list goes on. And on. And on.
They have won the war against SMS. We still have SMS. Some mistakenly call it ubiquitous (on my phone it is used for spam and 2FA messages only). They won the war against RCS that never really started.
To give you a clue – Israel is a WhatsApp country. If you don’t have WhatsApp you don’t exist. It is true from the age of 8. I just purchased the first smartphone for my 8 year old boy. Not so he can play or call with the phone – just so he can send messages to his classmates and stay part of the social fabric of his class. It happened to my daughter when she reached that age. I am now a part of multiple WhatsApp groups: family, close friends, parents of my kids’ classes and after classes, work related, etc.
How easy would it be to move people in Israel from entrenched groups that hold history, images and videos? And to what end? How would RCS be any better in its experience?
Business CentricBusiness centric is Slack. It used to be all about calling and the PBX. Slack changed the game. Everyone is talking about “team messaging” today. I used the term enterprise messaging years ago.
What Slack did was find a good balance between functionality and user experience that no other player has been able to copy properly so far, but everyone is after.
- Cisco wrote Spark from scratch, then rebranded it as WebEx Teams
- Then there’s Microsoft Teams, which gobbled up Skype for Business
- And there’s Facebook for Workplace
- Google had Google Talk, and Voice, and then Hangouts, and then Meet, and now Google Hangouts Chat
- Did I miss anyone? Maybe: Twist, Flock, Discord, Mattermost, ChatWork, Ryver, Zoho Cliq, RingCentral Glip, Semaphor, Troop Messenger, Redbooth, Flowdock, and the list goes on – each with its own shtick
WhatsApp is unlikely to penetrate businesses in a meaningful way. Facebook built Workplace instead of trying to introduce Facebook or Messenger directly.
Where’s SMS in this orgy of messaging? Meaningful conversations happen in IP messaging services and not over SMS anymore. Some solutions, like VonageFlow offer a seamless experience that encompasses both messaging as we know it today and SMS, though I’d argue that capability is a business to consumer one.
For all intent and purpose, SMS is non-existent when it comes to business centric messaging.
Business to ConsumerBack to RCS. RCS was supposed to be the future of SMS when we all move to IP based packet networks. Guess what? We’re all on IP based packet networks, and RCS isn’t really here yet in any meaningful way.
In the past couple of years, RCS got a new tune by its proponents. The strategy changed from getting consumers back from social networks towards being the one ubiquitous network – the ring to rule them all. Here’s the idea: you get RCS on all smartphones worldwide. Now carriers have the ubiquity they had with SMS. And businesses would pay for such access to customer’s phones.
Not going to happen.
Why? Because Apple and Facebook have other plans for us.
Apple now has Apple Business Chat. It is built into the iPhone, making businesses discoverable and reachable over iMessage from the Safari browser, Spotlight search, Siri assistant and Apple Maps. I’ve written extensively about it when it was introduced on SearchUC: Apple Business Chat looks to polish customer messaging
WhatsApp came out with their own offering called WhatsApp Business API. Similarly to Apple Business Chat, it offers the ability for businesses to communicate with consumers. Apple does that by focusing on contact center vendors while Whatsapp partners with CPaaS vendors. The goal? Get higher exposure and not working directly with longtail developers in the initial release.
What drove me to even start writing this article? This title of a TechCrunch post: Wish, Netflix, Uber and ~100 others testing WhatsApp’s new Business API
Businesses aren’t waiting for RCS. They are trying to figure out how to communicate with their customers via WhatsApp.
They had Line, WeChat, Facebook Messenger. And they’re still aiming for WhatsApp – a messaging service that isn’t even a US-thing.
Which brings me to the main thing – business to consumer is now a social messaging realm. Carriers have lost that domain as well.
1 Billion Defines the MoatRemember ubiquity? Here’s what it takes to be interesting:
1 Billion Monthly Active Users
Who has that number today?
Facebook (WhatsApp + Messenger), Apple Business Chat and WeChat. WhatsApp being the biggest one are redefining this market. You hear a lot about how customers still phone businesses and chat isn’t catching up with contact centers. That might be true, but only partially.
Today’s chat solutions usually require being on the company’s website. SMS hasn’t proven itself in a large scale for anything other than notifications to customers on orders and transactions. Whatsapp can change that – and to that extent, any of the other 1B+ MAU social messaging apps.
RCS? With what billion users exactly?
With the large social networks, a 100 million monthly active users seem like a rounding error.
Focus is on Customer Care – Not MarketingAnother interesting aspect (and difference) is that social networks are keeping user identity and access close to their chest. While WhatsApp is using phone numbers for identity, piggybacking on carriers in a way, they are not allowing anyone access to a user without the user’s permission. This means:
- Businesses can’t “spam” users by sending them unsolicited messages just because they know their phone number or user name
- A user must first approach the business. Inbound use cases are the focus here, which lends itself nicely to support and purchasing activities
- Outbound marketing campaigns, ads, promotions – these aren’t something that are encouraged at the moment
What these networks are trying to do is to get businesses and consumers off their SMS communications and shift it to their network. To do so, they plan on offering a superior experience. They are doing that not only by adding richness over the limited 160 character experience of SMS, but they are also making sure this will be a useful service to their user base and won’t be considered spammy.
Will there be other avenues opened to businesses on social networks to interact with users through marketing campaigns and outbound messaging? Sure. But it isn’t the first priority. The market needs to be created first.
Where Can We Go Next?We are headed towards an omnichannel interaction model.
To me that means that a business will meet a customer wherever it is comfortable for the customer in the context of that specific interaction.
A customer may prefer a phone call at one interaction, but a chat over WhatsApp on another.
The challenge here is that different customers may prefer different social networks. Or aren’t even approachable on some of the social networks. This isn’t going to change any time soon either. The number of social networks is still growing, and while we have a few huge players, others are important to specific populations.
Businesses will need to rely on multiple such channels if they want to reach out to a larger target audience of potential customers.
Back to RCSIt is coming. In some carriers. On some devices. In some form.
Is it going to take back ownership of the interactions from social networks? No.
What it can be, is just another channel. Right next to the rest. It will only become important if it can make that 1 billion monthly active users mark.
Oh, and it will need to succumb to the rules of engagement laid out by social networks today, around business-to-user permissions.
The post Social Messaging != Carrier Messaging (the stories of Whatsapp Business API & Apple Business Chat) appeared first on BlogGeek.me.