hydrogen-web/src/matrix/calls/TODO.md

 - relevant MSCs next to spec:
  - https://github.com/matrix-org/matrix-doc/pull/2746 Improved Signalling for 1:1 VoIP
  - https://github.com/matrix-org/matrix-doc/pull/2747 Transferring VoIP Calls
  - https://github.com/matrix-org/matrix-doc/pull/3077 Support for multi-stream VoIP
  - https://github.com/matrix-org/matrix-doc/pull/3086 Asserted identity on VoIP calls
  - https://github.com/matrix-org/matrix-doc/pull/3291 Muting in VoIP calls
  - https://github.com/matrix-org/matrix-doc/pull/3401 Native Group VoIP Signalling


## TODO
 - DONE: implement receiving hangup
 - DONE: implement cloning the localMedia so it works in safari?
 - DONE: implement 3 retries per peer
 - DONE: implement muting tracks with m.call.sdp_stream_metadata_changed
 - DONE: implement renegotiation
 - DONE: finish session id support
    - call peers are essentially identified by (userid, deviceid, sessionid). If see a new session id, we first disconnect from the current member so we're ready to connect with a clean slate again (in a member event, also in to_device? no harm I suppose, given olm encryption ensures you can't spoof the deviceid).
 - DONE: making logging better
 - figure out why sometimes leave button does not work
 - get correct members and avatars in call
 - improve UI while in a call
    - allow toggling audio
    - support active speaker, sort speakers by last active
    - close muted media stream after a while
    - support highlight mode where we show active speaker and thumbnails for other participants
    - better grid mode:
        - we report the call view size to the view model with ResizeObserver, we calculate the A/R
        - we calculate the grid based on view A/R, taking into account minimal stream size
    - show name on stream view
 - when you start a call, or join one, first you go to a SelectCallMedia screen where you can pick whether you want to use camera, audio or both:
    - if you are joining a call, we'll default to the call intent
    - if you are creating a call, we'll default to video
    - when creating a call, adjust the navigation path to room/room_id/call
    - when selecting a call, adjust the navigation path to room/room_id/call/call_id
 - implement to_device messages arriving before m.call(.member) state event
    - DONE for m.call.member, not for m.call and not for to_device other than m.call.invite arriving before invite
 - reeable crypto & implement fetching olm keys before sending encrypted signalling message
 - local echo for join/leave buttons?
 - batch outgoing to_device messages in one request to homeserver for operations that will send out an event to all participants (e.g. mute)
 - implement call ringing and rejecting a ringing call
 - support screen sharing
    - add button to enable, disable
    - support showing stream view with large screen share video element and small camera video element (if present)
 - don't load all members when loading calls to know whether they are ringing and joined by ourself
    - only load our own member once, then have a way to load additional members on a call.
 - see if we remove partyId entirely, it is only used for detecting remote echo which is not an issue for group calls? see https://github.com/matrix-org/matrix-spec-proposals/blob/dbkr/msc2746/proposals/2746-reliable-voip.md#add-party_id-to-all-voip-events
 - remove PeerCall.waitForState ?
 - invite glare is completely untested, does it work?
 - how to remove call from m.call.member when just closing client?
    - when closing client and still in call, tell service worker to send event on our behalf?
        ```js
            // dispose when leaving call
            this.track(platform.registerExitHandler(unloadActions => {
                // batch requests will resolve immediately,
                // so we can reuse the same send code that does awaits without awaiting?
                const batch = new RequestBatch();
                const hsApi = this.hsApi.withBatch(batch);
                // _leaveCallMemberContent will need to become sync,
                // so we'll need to keep track of own member event rather than rely on storage
                hsApi.sendStateEvent("m.call.member", this._leaveCallMemberContent());
                // does this internally: serviceWorkerHandler.trySend("sendRequestBatch", batch.toJSON());
                unloadActions.sendRequestBatch(batch);
            }));
        ```
## TODO (old)
 - DONE: PeerCall
    - send invite
    - implement terminate
    - implement waitForState
    
        - find out if we need to do something different when renegotation is triggered (a subsequent onnegotiationneeded event) whether
          we sent the invite/offer or answer. e.g. do we always do createOffer/setLocalDescription and then send it over a matrix negotiation event? even if we before called createAnswer.
    - handle receiving offer and send anwser
    - handle sending ice candidates
        - handle ice candidates finished (iceGatheringState === 'complete')
    - handle receiving ice candidates
    - handle sending renegotiation
    - handle receiving renegotiation
    - reject call
    - hangup call
    - handle muting tracks
    - handle remote track being muted
    - handle adding/removing tracks to an ongoing call
    - handle sdp metadata
 - DONE: Participant
    - handle glare
    - encrypt to_device message with olm
    - batch outgoing to_device messages in one request to homeserver for operations that will send out an event to all participants (e.g. mute)
    - find out if we should start muted or not?

## Store ongoing calls

DONE: Add store with all ongoing calls so when we quit and start again, we don't have to go through all the past calls to know which ones might still be ongoing.


## Notes

we send m.call as state event in room

we add m.call.participant for our own device

we wait for other participants to add their user and device (in the sources)

for each (userid, deviceid)
    - if userId < ourUserId
        - get local media
        - we setup a peer connection
        - add local tracks
        - we wait for negotation event to get sdp
        - peerConn.createOffer
        - peerConn.setLocalDescription
        - we send an m.call.invite 
    - else
        - wait for invite from other side

on local ice candidate:
    - if we haven't ... sent invite yet? or received answer? buffer candidate
    - otherwise send candidate (without buffering?)

on incoming call:
    - ring, offer to answer

answering incoming call
    - get local media
    - peerConn.setRemoteDescription
    - add local tracks to peerConn
    - peerConn.createAnswer()
    - peerConn.setLocalDescription

in some cases, we will actually send the invite to all devices (e.g. SFU), so
we probably still need to handle multiple anwsers?

so we would send an invite to multiple devices and pick the one for which we
received the anwser first. between invite and anwser, we could already receive
ice candidates that we need to buffer.


updating the metadata:

if we're renegotiating: use m.call.negotatie
if just muting: use m.call.sdp_stream_metadata_changed


party identification
 - for 1:1 calls, we identify with a party_id
 - for group calls, we identify with a device_id


## TODO

Build basic version of PeerCall
    - add candidates code
DONE: Build basic version of GroupCall
    - DONE: add state, block invalid actions
DONE: Make it possible to olm encrypt the messages
Do work needed for state events
    - DONEish: receiving (almost done?)
    - DONEish: sending
logging
DONE: Expose call objects
    expose volume events from audiotrack to group call
DONE: Write view model
DONE: write view
 - handle glare edge-cases (not yet sent): https://spec.matrix.org/latest/client-server-api/#glare

## Calls questions
 - how do we handle glare between group calls (e.g. different state events with different call ids?)
 - Split up DOM part into platform code? What abstractions to choose?
   Does it make sense to come up with our own API very similar to DOM api?
 - what code do we copy over vs what do we implement ourselves?
    - MatrixCall: perhaps we can copy it over and modify it to our needs? Seems to have a lot of edge cases implemented.
        - what is partyId about?
    - CallFeed: I need better understand where it is used. It's basically a wrapper around a MediaStream with volume detection. Could it make sense to put this in platform for example?
 
 - which parts of MSC2746 are still relevant for group calls?
 - which parts of MSC2747 are still relevant for group calls? it seems mostly orthogonal?
 - SOLVED: how does switching channels work? This was only enabled by MSC 2746
    - you do getUserMedia()/getDisplayMedia() to get the stream(s)
    - you call removeTrack/addTrack on the peerConnection
    - you receive a negotiationneeded event
    - you call createOffer
    - you send m.call.negotiate
 - SOLVED: wrt to MSC2746, is the screen share track and the audio track (and video track) part of the same stream? or do screen share tracks need to go in a different stream? it sounds incompatible with the MSC2746 requirement.
 - SOLVED: how does muting work? MediaStreamTrack.enabled
 - SOLVED: so, what's the difference between the call_id and the conf_id in group call events?
    - call_id is the specific 1:1 call, conf_id is the thing in the m.call state event key
    - so a group call has a conf_id with MxN peer calls, each having their call_id.

I think we need to synchronize the negotiation needed because we don't use a CallState to guard it...

## Thursday 3-3 notes

we probably best keep the perfect negotiation flags, as they are needed for both starting the call AND renegotiation? if only for the former, it would make sense as it is a step in setting up the call, but if the call is ongoing, does it make sense to have a MakingOffer state? it actually looks like they are only needed for renegotiation! for call setup we compare the call_ids. What does that mean for these flags?


## Peer call state transitions

FROM CALLER                                         FROM CALLEE

Fledgling                                           Fledgling
 V `call()`                                          V `handleInvite()`: setRemoteDescription(event.offer), add buffered candidates
 V                                                  Ringing
 V                                                   V `answer()`
CreateOffer                                          V
 V add local tracks                                  V
 V wait for negotionneeded events                    V add local tracks
 V setLocalDescription()                            CreateAnswer
 V send invite event                                 V setLocalDescription(createAnswer())
InviteSent                                           |
 V receive anwser, setRemoteDescription()            |
 \___________________________________________________/
                             V
                            Connecting
                             V receive ice candidates and iceConnectionState becomes 'connected'
                            Connected
                             V `hangup()` or some terminate condition
                            Ended

so if we don't want to bother with having two call objects, we can make the existing call hangup his old call_id? That way we keep the old peerConnection.


when glare, won't we drop both calls? No: https://github.com/matrix-org/matrix-spec-proposals/pull/2746#discussion_r819388754
WIP6 2022-02-25 21:24:00 +05:30			`- relevant MSCs next to spec:`
			`- https://github.com/matrix-org/matrix-doc/pull/2746 Improved Signalling for 1:1 VoIP`
			`- https://github.com/matrix-org/matrix-doc/pull/2747 Transferring VoIP Calls`
			`- https://github.com/matrix-org/matrix-doc/pull/3077 Support for multi-stream VoIP`
			`- https://github.com/matrix-org/matrix-doc/pull/3086 Asserted identity on VoIP calls`
			`- https://github.com/matrix-org/matrix-doc/pull/3291 Muting in VoIP calls`
			`- https://github.com/matrix-org/matrix-doc/pull/3401 Native Group VoIP Signalling`

update TODO 2022-04-12 17:32:57 +05:30
WIP5 2022-02-18 21:08:10 +05:30			`## TODO`
update TODO 2022-04-12 17:32:57 +05:30			`- DONE: implement receiving hangup`
update TODO 2022-04-21 13:39:31 +05:30			`- DONE: implement cloning the localMedia so it works in safari?`
			`- DONE: implement 3 retries per peer`
update the TODO 2022-04-28 17:15:15 +05:30			`- DONE: implement muting tracks with m.call.sdp_stream_metadata_changed`
			`- DONE: implement renegotiation`
comments, todo housekeeping 2022-04-27 23:11:25 +05:30			`- DONE: finish session id support`
update TODO 2022-04-13 00:50:15 +05:30			`- call peers are essentially identified by (userid, deviceid, sessionid). If see a new session id, we first disconnect from the current member so we're ready to connect with a clean slate again (in a member event, also in to_device? no harm I suppose, given olm encryption ensures you can't spoof the deviceid).`
update TODO 2022-05-09 18:01:19 +05:30			`- DONE: making logging better`
update the TODO 2022-04-28 17:15:15 +05:30			`- figure out why sometimes leave button does not work`
			`- get correct members and avatars in call`
			`- improve UI while in a call`
			`- allow toggling audio`
			`- support active speaker, sort speakers by last active`
			`- close muted media stream after a while`
			`- support highlight mode where we show active speaker and thumbnails for other participants`
			`- better grid mode:`
			`- we report the call view size to the view model with ResizeObserver, we calculate the A/R`
			`- we calculate the grid based on view A/R, taking into account minimal stream size`
			`- show name on stream view`
			`- when you start a call, or join one, first you go to a SelectCallMedia screen where you can pick whether you want to use camera, audio or both:`
			`- if you are joining a call, we'll default to the call intent`
			`- if you are creating a call, we'll default to video`
			`- when creating a call, adjust the navigation path to room/room_id/call`
			`- when selecting a call, adjust the navigation path to room/room_id/call/call_id`
update TODO 2022-04-12 17:32:57 +05:30			`- implement to_device messages arriving before m.call(.member) state event`
update the TODO 2022-04-28 17:15:15 +05:30			`- DONE for m.call.member, not for m.call and not for to_device other than m.call.invite arriving before invite`
update TODO 2022-04-12 17:32:57 +05:30			`- reeable crypto & implement fetching olm keys before sending encrypted signalling message`
			`- local echo for join/leave buttons?`
			`- batch outgoing to_device messages in one request to homeserver for operations that will send out an event to all participants (e.g. mute)`
update the TODO 2022-04-28 17:15:15 +05:30			`- implement call ringing and rejecting a ringing call`
			`- support screen sharing`
			`- add button to enable, disable`
			`- support showing stream view with large screen share video element and small camera video element (if present)`
update TODO 2022-04-12 17:32:57 +05:30			`- don't load all members when loading calls to know whether they are ringing and joined by ourself`
			`- only load our own member once, then have a way to load additional members on a call.`
update TODO 2022-04-13 00:50:15 +05:30			`- see if we remove partyId entirely, it is only used for detecting remote echo which is not an issue for group calls? see https://github.com/matrix-org/matrix-spec-proposals/blob/dbkr/msc2746/proposals/2746-reliable-voip.md#add-party_id-to-all-voip-events`
update the TODO 2022-04-28 17:15:15 +05:30			`- remove PeerCall.waitForState ?`
			`- invite glare is completely untested, does it work?`
			`- how to remove call from m.call.member when just closing client?`
			`- when closing client and still in call, tell service worker to send event on our behalf?`
			```js
			`// dispose when leaving call`
			`this.track(platform.registerExitHandler(unloadActions => {`
			`// batch requests will resolve immediately,`
			`// so we can reuse the same send code that does awaits without awaiting?`
			`const batch = new RequestBatch();`
			`const hsApi = this.hsApi.withBatch(batch);`
			`// _leaveCallMemberContent will need to become sync,`
			`// so we'll need to keep track of own member event rather than rely on storage`
			`hsApi.sendStateEvent("m.call.member", this._leaveCallMemberContent());`
			`// does this internally: serviceWorkerHandler.trySend("sendRequestBatch", batch.toJSON());`
			`unloadActions.sendRequestBatch(batch);`
			`}));`
			```
update TODO 2022-04-12 17:32:57 +05:30			`## TODO (old)`
update the TODO 2022-04-28 17:15:15 +05:30			`- DONE: PeerCall`
WIP5 2022-02-18 21:08:10 +05:30			`- send invite`
WIP6 2022-02-25 21:24:00 +05:30			`- implement terminate`
			`- implement waitForState`

			`- find out if we need to do something different when renegotation is triggered (a subsequent onnegotiationneeded event) whether`
			`we sent the invite/offer or answer. e.g. do we always do createOffer/setLocalDescription and then send it over a matrix negotiation event? even if we before called createAnswer.`
WIP5 2022-02-18 21:08:10 +05:30			`- handle receiving offer and send anwser`
			`- handle sending ice candidates`
			`- handle ice candidates finished (iceGatheringState === 'complete')`
			`- handle receiving ice candidates`
			`- handle sending renegotiation`
			`- handle receiving renegotiation`
			`- reject call`
			`- hangup call`
			`- handle muting tracks`
			`- handle remote track being muted`
			`- handle adding/removing tracks to an ongoing call`
			`- handle sdp metadata`
update the TODO 2022-04-28 17:15:15 +05:30			`- DONE: Participant`
WIP5 2022-02-18 21:08:10 +05:30			`- handle glare`
			`- encrypt to_device message with olm`
			`- batch outgoing to_device messages in one request to homeserver for operations that will send out an event to all participants (e.g. mute)`
			`- find out if we should start muted or not?`

WIP4 2022-02-17 21:28:44 +05:30			`## Store ongoing calls`

update TODO 2022-04-12 17:32:57 +05:30			`DONE: Add store with all ongoing calls so when we quit and start again, we don't have to go through all the past calls to know which ones might still be ongoing.`
WIP4 2022-02-17 21:28:44 +05:30

			`## Notes`

			`we send m.call as state event in room`

			`we add m.call.participant for our own device`

			`we wait for other participants to add their user and device (in the sources)`

			`for each (userid, deviceid)`
			`- if userId < ourUserId`
WIP6 2022-02-25 21:24:00 +05:30			`- get local media`
			`- we setup a peer connection`
			`- add local tracks`
WIP4 2022-02-17 21:28:44 +05:30			`- we wait for negotation event to get sdp`
WIP6 2022-02-25 21:24:00 +05:30			`- peerConn.createOffer`
			`- peerConn.setLocalDescription`
WIP4 2022-02-17 21:28:44 +05:30			`- we send an m.call.invite`
			`- else`
			`- wait for invite from other side`

WIP6 2022-02-25 21:24:00 +05:30			`on local ice candidate:`
			`- if we haven't ... sent invite yet? or received answer? buffer candidate`
			`- otherwise send candidate (without buffering?)`

			`on incoming call:`
			`- ring, offer to answer`

			`answering incoming call`
			`- get local media`
			`- peerConn.setRemoteDescription`
			`- add local tracks to peerConn`
			`- peerConn.createAnswer()`
			`- peerConn.setLocalDescription`
WIP4 2022-02-17 21:28:44 +05:30
			`in some cases, we will actually send the invite to all devices (e.g. SFU), so`
			`we probably still need to handle multiple anwsers?`

			`so we would send an invite to multiple devices and pick the one for which we`
			`received the anwser first. between invite and anwser, we could already receive`
			`ice candidates that we need to buffer.`



			`updating the metadata:`

			`if we're renegotiating: use m.call.negotatie`
			`if just muting: use m.call.sdp_stream_metadata_changed`


			`party identification`
			`- for 1:1 calls, we identify with a party_id`
			`- for group calls, we identify with a device_id`
WIP6 2022-02-25 21:24:00 +05:30



			`## TODO`

			`Build basic version of PeerCall`
WIP13 2022-03-10 22:09:29 +05:30			`- add candidates code`
WIP: work on group call state transitions 2022-03-11 19:10:37 +05:30			`DONE: Build basic version of GroupCall`
			`- DONE: add state, block invalid actions`
WIP13 2022-03-10 22:09:29 +05:30			`DONE: Make it possible to olm encrypt the messages`
WIP6 2022-02-25 21:24:00 +05:30			`Do work needed for state events`
WIP: work on group call state transitions 2022-03-11 19:10:37 +05:30			`- DONEish: receiving (almost done?)`
			`- DONEish: sending`
handle remote ice candidates 2022-03-11 21:05:32 +05:30			`logging`
update TODO 2022-04-12 17:32:57 +05:30			`DONE: Expose call objects`
			`expose volume events from audiotrack to group call`
			`DONE: Write view model`
			`DONE: write view`
support multiple devices in call per user 2022-03-29 20:43:33 +05:30			`- handle glare edge-cases (not yet sent): https://spec.matrix.org/latest/client-server-api/#glare`
WIP6 2022-02-25 21:24:00 +05:30
WIP8 - implement PeerCall.handleAnswer and other things 2022-03-02 18:23:22 +05:30			`## Calls questions`
WIP6 2022-02-25 21:24:00 +05:30			`- how do we handle glare between group calls (e.g. different state events with different call ids?)`
			`- Split up DOM part into platform code? What abstractions to choose?`
			`Does it make sense to come up with our own API very similar to DOM api?`
			`- what code do we copy over vs what do we implement ourselves?`
			`- MatrixCall: perhaps we can copy it over and modify it to our needs? Seems to have a lot of edge cases implemented.`
			`- what is partyId about?`
			`- CallFeed: I need better understand where it is used. It's basically a wrapper around a MediaStream with volume detection. Could it make sense to put this in platform for example?`

			`- which parts of MSC2746 are still relevant for group calls?`
			`- which parts of MSC2747 are still relevant for group calls? it seems mostly orthogonal?`
			`- SOLVED: how does switching channels work? This was only enabled by MSC 2746`
			`- you do getUserMedia()/getDisplayMedia() to get the stream(s)`
			`- you call removeTrack/addTrack on the peerConnection`
			`- you receive a negotiationneeded event`
			`- you call createOffer`
			`- you send m.call.negotiate`
			`- SOLVED: wrt to MSC2746, is the screen share track and the audio track (and video track) part of the same stream? or do screen share tracks need to go in a different stream? it sounds incompatible with the MSC2746 requirement.`
			`- SOLVED: how does muting work? MediaStreamTrack.enabled`
WIP9 2022-03-07 14:45:54 +05:30			`- SOLVED: so, what's the difference between the call_id and the conf_id in group call events?`
			`- call_id is the specific 1:1 call, conf_id is the thing in the m.call state event key`
			`- so a group call has a conf_id with MxN peer calls, each having their call_id.`

			`I think we need to synchronize the negotiation needed because we don't use a CallState to guard it...`

			`## Thursday 3-3 notes`

			`we probably best keep the perfect negotiation flags, as they are needed for both starting the call AND renegotiation? if only for the former, it would make sense as it is a step in setting up the call, but if the call is ongoing, does it make sense to have a MakingOffer state? it actually looks like they are only needed for renegotiation! for call setup we compare the call_ids. What does that mean for these flags?`


WIP10 2022-03-09 15:59:39 +05:30			`## Peer call state transitions`
WIP9 2022-03-07 14:45:54 +05:30
			`FROM CALLER FROM CALLEE`

			`Fledgling Fledgling`
WIP10 2022-03-09 15:59:39 +05:30			V `call()` V `handleInvite()`: setRemoteDescription(event.offer), add buffered candidates
remove local media promises (handle them outside of call code) + glare 2022-03-11 21:26:21 +05:30			`V Ringing`
			V V `answer()`
			`CreateOffer V`
			`V add local tracks V`
WIP10 2022-03-09 15:59:39 +05:30			`V wait for negotionneeded events V add local tracks`
			`V setLocalDescription() CreateAnswer`
			`V send invite event V setLocalDescription(createAnswer())`
			`InviteSent \|`
			`V receive anwser, setRemoteDescription() \|`
			`\___________________________________________________/`
WIP9 2022-03-07 14:45:54 +05:30			`V`
			`Connecting`
WIP10 2022-03-09 15:59:39 +05:30			`V receive ice candidates and iceConnectionState becomes 'connected'`
WIP9 2022-03-07 14:45:54 +05:30			`Connected`
WIP10 2022-03-09 15:59:39 +05:30			V `hangup()` or some terminate condition
WIP9 2022-03-07 14:45:54 +05:30			`Ended`

			`so if we don't want to bother with having two call objects, we can make the existing call hangup his old call_id? That way we keep the old peerConnection.`


			`when glare, won't we drop both calls? No: https://github.com/matrix-org/matrix-spec-proposals/pull/2746#discussion_r819388754`