Discussion:
Client-side error recovery with FedOne
Torben Weis
2010-01-19 17:50:37 UTC
Permalink
Hi,

I have been pondering on how to support offline-working in QWaveClient.
It is getting tricky when the TCP connection breaks accidentially,

Let's imagine the client sent a delta to the server, the server processes
it,
but the client could not hear it because the connection is broken.

Eventually the client will reconnect. Now it does not have a chance of
seeing
whether its last delta has reached the server or not.
In general it seems next to impossible to tell whether any given server
delta corresponds
to a client delta, because deltas are transformed and do not carry IDs.

Currently I see no quick solution.
Two more complex solutions come to my mind:
a) Introduce IDs in deltas. Unlikely because that would require Google to
change
its code heavily? Nevertheless, it would be nice because currently
QWaveClient uses
a most aweful hack to determin whether its delta has been processed by
the server ...
b) When connection to FedOne, the server should ask for a client ID.
For each client ID & wavelet ID it keeps a persistent record about the
version of the last submitted delta.
Upon connect the client can query fedone for this version information.
This would mean no modifications to the federation protocol, but it would
require some fedone extensions.

Any other ideas, suggestions?

About the delta detection hack used in QWaveClient:
According to the wave specs a client must have only one outstanding delta.
Thus, it is important to find out whether a delta has been accepted or not
to
determine when the next delta can be sent.
To solve this, QWaveClient waits until it receives a delta from the server
which is authored by
its own user, assuming that this is the server response to the delta
QWaveClient has submitted itself.
This hack can break if the same user connects with two QWaveClient instances
and concurrently
types in both instances (ok, unlikely, but still ...).

Did I miss a trick of getting this right?

Cheers
Torben
James Purser
2010-01-19 20:37:33 UTC
Permalink
Post by Torben Weis
Hi,
I have been pondering on how to support offline-working in
QWaveClient.
It is getting tricky when the TCP connection breaks accidentially,
Let's imagine the client sent a delta to the server, the server
processes it,
but the client could not hear it because the connection is broken.
Eventually the client will reconnect. Now it does not have a chance of
seeing
whether its last delta has reached the server or not.
In general it seems next to impossible to tell whether any given
server delta corresponds
to a client delta, because deltas are transformed and do not carry IDs.
Currently I see no quick solution.
a) Introduce IDs in deltas. Unlikely because that would require Google
to change
its code heavily? Nevertheless, it would be nice because currently
QWaveClient uses
a most aweful hack to determin whether its delta has been processed
by the server ...
b) When connection to FedOne, the server should ask for a client ID.
For each client ID & wavelet ID it keeps a persistent record about
the version of the last submitted delta.
Upon connect the client can query fedone for this version
information.
This would mean no modifications to the federation protocol, but it
would require some fedone extensions.
Any other ideas, suggestions?
How about when the client connects to the server again it does a history
check against the known good deltas it has sent out. If the last delta
it sent out isn't in the history, then it hasn't been received.
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Wave Addresses:
jamesrpurser-***@public.gmane.org (wave.google.com)
purserj-xBucRuPHYkk9V/L/***@public.gmane.org (wavesandbox.com)
james-ZGY8ohtN/8plGI6Z+***@public.gmane.org (collaborynth.com.au FedOne Server)
Skype: purserj1977
GTalk: jamesrpurser-***@public.gmane.org
Torben Weis
2010-01-19 20:42:20 UTC
Permalink
Hi James,

How about when the client connects to the server again it does a history
Post by James Purser
check against the known good deltas it has sent out. If the last delta
it sent out isn't in the history, then it hasn't been received.
The problem is that this is impossible. How should QWaveClient recognize a
delta as
being its own? The server has perhaps transformed the delta, i.e. simple
delta comparison
is not possible and looking at version numbers does not help either.

Greetings
Torben
Post by James Purser
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
torben.weis-***@public.gmane.org
James Purser
2010-01-19 20:52:16 UTC
Permalink
Post by Torben Weis
Hi James,
How about when the client connects to the server again it does a history
check against the known good deltas it has sent out. If the last delta
it sent out isn't in the history, then it hasn't been
received.
The problem is that this is impossible. How should QWaveClient
recognize a delta as
being its own? The server has perhaps transformed the delta, i.e.
simple delta comparison
is not possible and looking at version numbers does not help either.
Hrmm.

Let me ponder.

Unique Ids based on wave ids could work. However who generates the id?
If you leave it up to the client there is the possibility of id
collision (unless you do say a hash of something like date/time joined
the wave plus salt).

It's doable. I'll think on it some more
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Wave Addresses:
jamesrpurser-***@public.gmane.org (wave.google.com)
purserj-xBucRuPHYkk9V/L/***@public.gmane.org (wavesandbox.com)
james-ZGY8ohtN/8plGI6Z+***@public.gmane.org (collaborynth.com.au FedOne Server)
Skype: purserj1977
GTalk: jamesrpurser-***@public.gmane.org
Torben Weis
2010-01-19 22:43:46 UTC
Permalink
Hi James,
Post by James Purser
Unique Ids based on wave ids could work. However who generates the id?
If you leave it up to the client there is the possibility of id
collision (unless you do say a hash of something like date/time joined
the wave plus salt).
the ID problem exists anyway in FedOne. When I create a new wave or blip
I must provide a random ID each time. There is always the risk of collisions
unless
we change the C/S protocol to provide IDs.
However, this would create some trouble while working offline because in
this case there
is no server available for creating unique IDs.

Greetings
Torben
Post by James Purser
It's doable. I'll think on it some more
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
torben.weis-***@public.gmane.org
Brett Morgan
2010-01-19 22:10:37 UTC
Permalink
If you are transforming your docops, you can compare the docops coming back
down for equality. You are doing client side transformations, right?

org.waveprotocol.wave.model.operation.OpComparators is the FedOne code for
comparing equality of ops. Which, after a whole bunch of edge case checking
turns into the following comparison:

DocOpUtil.toConciseString(a).equals(DocOpUtil.toConciseString(b))

In short, comparing docops for equality is easy, as long as you keep
transforming your docops...
Post by Torben Weis
Hi James,
How about when the client connects to the server again it does a history
Post by James Purser
check against the known good deltas it has sent out. If the last delta
it sent out isn't in the history, then it hasn't been received.
The problem is that this is impossible. How should QWaveClient recognize a
delta as
being its own? The server has perhaps transformed the delta, i.e. simple
delta comparison
is not possible and looking at version numbers does not help either.
Greetings
Torben
Post by James Purser
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
Torben Weis
2010-01-19 22:40:25 UTC
Permalink
Hi Brett,

thanks for the suggestion.
However, it seems to me that this approach is not completely correct.

Imagine two clients which are sending a delta against the same server
version.
The delta says to insert "Hello" at some position in a blip.
The correct outcome is "HelloHello" being inserted.
Now one client fails to submit its delta, the other one succeeds.
The client that failed cannot detect that it failed, because the other
delta looks exactly the same. Thus, in the end "Hello" will be inserted only
once.

I agree that this is an academic corner case, but I see no solution for this
when relying on delta comparisons.

Greetings
Torben
Post by Brett Morgan
If you are transforming your docops, you can compare the docops coming back
down for equality. You are doing client side transformations, right?
org.waveprotocol.wave.model.operation.OpComparators is the FedOne code for
comparing equality of ops. Which, after a whole bunch of edge case checking
DocOpUtil.toConciseString(a).equals(DocOpUtil.toConciseString(b))
In short, comparing docops for equality is easy, as long as you keep
transforming your docops...
Post by Torben Weis
Hi James,
How about when the client connects to the server again it does a history
Post by James Purser
check against the known good deltas it has sent out. If the last delta
it sent out isn't in the history, then it hasn't been received.
The problem is that this is impossible. How should QWaveClient recognize a
delta as
being its own? The server has perhaps transformed the delta, i.e. simple
delta comparison
is not possible and looking at version numbers does not help either.
Greetings
Torben
Post by James Purser
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
torben.weis-***@public.gmane.org
Brett Morgan
2010-01-20 01:54:25 UTC
Permalink
Actually, no, OT deals with this case. My almost working code that I
presented at LCA2010 deals with this edge case. Unfortunately it has bugs,
and dies in the arse randomly. Sigh.

Wave that I presented from:
https://wave.google.com/wave/#restored:wave:googlewave.com!w%252BTBvx4ehoA
The code:
http://code.google.com/p/wave-ot-editor/source/browse/#svn/wave-ot-editor

I can put together a JUnit test case showing that this case actually
stabilises using the Wave OT code, if that would help...
Post by Torben Weis
Hi Brett,
thanks for the suggestion.
However, it seems to me that this approach is not completely correct.
Imagine two clients which are sending a delta against the same server
version.
The delta says to insert "Hello" at some position in a blip.
The correct outcome is "HelloHello" being inserted.
Now one client fails to submit its delta, the other one succeeds.
The client that failed cannot detect that it failed, because the other
delta looks exactly the same. Thus, in the end "Hello" will be inserted
only once.
I agree that this is an academic corner case, but I see no solution for this
when relying on delta comparisons.
Greetings
Torben
Post by Brett Morgan
If you are transforming your docops, you can compare the docops coming
back down for equality. You are doing client side transformations, right?
org.waveprotocol.wave.model.operation.OpComparators is the FedOne code for
comparing equality of ops. Which, after a whole bunch of edge case checking
DocOpUtil.toConciseString(a).equals(DocOpUtil.toConciseString(b))
In short, comparing docops for equality is easy, as long as you keep
transforming your docops...
Post by Torben Weis
Hi James,
How about when the client connects to the server again it does a history
Post by James Purser
check against the known good deltas it has sent out. If the last delta
it sent out isn't in the history, then it hasn't been received.
The problem is that this is impossible. How should QWaveClient recognize
a delta as
being its own? The server has perhaps transformed the delta, i.e. simple
delta comparison
is not possible and looking at version numbers does not help either.
Greetings
Torben
Post by James Purser
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
Torben Weis
2010-01-20 18:15:38 UTC
Permalink
Hi Brett,

thanks for the hint to your project. I did not know it before.

However, I would like to see a proof (i.e. a short explanation is
sufficient) how you intend to solve the problem I have mentioned.
Running code is no proof :-)

For some reasons I strongly doubt that your code (or any possible code) can
handle this without changes to the C/S protocol.

Your application seems to be different anyway. If I am not mistaken (I just
read the wave you mentioned) you are running
a web client which connects to your web server which connects to FedOne.
Right?

The problem I mentioned is between your web server and FedOne. In my case it
is between QWaveClient and FedOne.
Your web app can of course recover as long as FedOne and your web server are
stable. But what happens if your
WebServer crashes in an unfortunate moment? Your code will suffer from the
very problem I described.

However, would like to be proven wrong here since this would give me a
solution to my initial problem :-)

Greetings
Torben
Post by Brett Morgan
Actually, no, OT deals with this case. My almost working code that I
presented at LCA2010 deals with this edge case. Unfortunately it has bugs,
and dies in the arse randomly. Sigh.
https://wave.google.com/wave/#restored:wave:googlewave.com!w%252BTBvx4ehoA
http://code.google.com/p/wave-ot-editor/source/browse/#svn/wave-ot-editor
I can put together a JUnit test case showing that this case actually
stabilises using the Wave OT code, if that would help...
Post by Torben Weis
Hi Brett,
thanks for the suggestion.
However, it seems to me that this approach is not completely correct.
Imagine two clients which are sending a delta against the same server
version.
The delta says to insert "Hello" at some position in a blip.
The correct outcome is "HelloHello" being inserted.
Now one client fails to submit its delta, the other one succeeds.
The client that failed cannot detect that it failed, because the other
delta looks exactly the same. Thus, in the end "Hello" will be inserted
only once.
I agree that this is an academic corner case, but I see no solution for this
when relying on delta comparisons.
Greetings
Torben
Post by Brett Morgan
If you are transforming your docops, you can compare the docops coming
back down for equality. You are doing client side transformations, right?
org.waveprotocol.wave.model.operation.OpComparators is the FedOne code
for comparing equality of ops. Which, after a whole bunch of edge case
DocOpUtil.toConciseString(a).equals(DocOpUtil.toConciseString(b))
In short, comparing docops for equality is easy, as long as you keep
transforming your docops...
Post by Torben Weis
Hi James,
How about when the client connects to the server again it does a history
Post by James Purser
check against the known good deltas it has sent out. If the last delta
it sent out isn't in the history, then it hasn't been received.
The problem is that this is impossible. How should QWaveClient recognize
a delta as
being its own? The server has perhaps transformed the delta, i.e. simple
delta comparison
is not possible and looking at version numbers does not help either.
Greetings
Torben
Post by James Purser
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
torben.weis-***@public.gmane.org
Brett Morgan
2010-01-20 19:56:58 UTC
Permalink
Heya Torben,

I have attached a java class that I believe implements Daniel's scenario.
First off, note that I'm not implementing the wave federation algorithm, as
federation isn't my goal. My goal is to build web apps that use wave's OT.
That said, here is the output of the aforementioned java class showing that
the server and the two clients converge.

State of system after step 1
server editHistory[0]:
Client 'Client1' at server revision 1 with document
Client 'Client2' at server revision 1 with document
State of system after step 2
server editHistory[0]:
Client 'Client1' at server revision 1 with document ++"Hello"; with edit in
flight ++"Hello";
Client 'Client2' at server revision 1 with document
State of system after step 3
server editHistory[0]:
Client 'Client1' at server revision 1 with document ++"Hello"; with edit in
flight ++"Hello";
Client 'Client2' at server revision 1 with document ++"Hello"; with edit in
flight ++"Hello";
State of system after step 4
server editHistory[0]:
Client 'Client1' at server revision 1 with document ++"Hello"; with edit in
flight ++"Hello"; with cached edit __5; ++"Going to hell";
Client 'Client2' at server revision 1 with document ++"Hello"; with edit in
flight ++"Hello";
State of system after step 5
server editHistory[0]:
server editHistory[1]: ++"Hello";
server editHistory[2]: ++"Hello"; __5;
Client 'Client1' at server revision 2 with document ++"Hello"; with cached
edit __5; ++"Going to hell";
Client 'Client2' at server revision 2 with document ++"Hello";
State of system after step 7
server editHistory[0]:
server editHistory[1]: ++"Hello";
server editHistory[2]: ++"Hello"; __5;
server editHistory[3]: __10; ++"Going to hell";
Client 'Client1' at server revision 4 with document ++"HelloHelloGoing to
hell";
Client 'Client2' at server revision 4 with document ++"HelloHelloGoing to
hell";
Post by Torben Weis
Hi Brett,
thanks for the hint to your project. I did not know it before.
It had it's coming out party at LCA. And I think I'm going to rip it down
and start again, this time using long poll based notification. I couldn't do
long polls while I was targeting AppEngine as my deployment environment.
Post by Torben Weis
However, I would like to see a proof (i.e. a short explanation is
sufficient) how you intend to solve the problem I have mentioned.
Running code is no proof :-)
If running code doesn't merit existence proof status, then i'm fucked. =)
Post by Torben Weis
For some reasons I strongly doubt that your code (or any possible code) can
handle this without changes to the C/S protocol.
The client/server protocol in the FedOne code base, unless i miss my guess,
isn't doing OT.
Post by Torben Weis
Your application seems to be different anyway. If I am not mistaken (I just
read the wave you mentioned) you are running
a web client which connects to your web server which connects to FedOne.
Right?
Heh, no. I'm not using FedOne, just the OT component of FedOne. I'm building
out the capacity to be able to have gwt web clients running OT sync with a
webserver. It works, but I lack the theoretical grounding to prove it.
Post by Torben Weis
The problem I mentioned is between your web server and FedOne. In my case
it is between QWaveClient and FedOne.
Your web app can of course recover as long as FedOne and your web server
are stable. But what happens if your
WebServer crashes in an unfortunate moment? Your code will suffer from the
very problem I described.
If the web server goes down with unsync'd state, everything goes shiny. At
this point I force the clients to drop all state and reload.
Post by Torben Weis
However, would like to be proven wrong here since this would give me a
solution to my initial problem :-)
Sorta, kinda, maybe.
Post by Torben Weis
Greetings
Torben
Post by Brett Morgan
Actually, no, OT deals with this case. My almost working code that I
presented at LCA2010 deals with this edge case. Unfortunately it has bugs,
and dies in the arse randomly. Sigh.
https://wave.google.com/wave/#restored:wave:googlewave.com!w%252BTBvx4ehoA<https://wave.google.com/wave/#restored:wave:googlewave.com%21w%252BTBvx4ehoA>
http://code.google.com/p/wave-ot-editor/source/browse/#svn/wave-ot-editor
I can put together a JUnit test case showing that this case actually
stabilises using the Wave OT code, if that would help...
Post by Torben Weis
Hi Brett,
thanks for the suggestion.
However, it seems to me that this approach is not completely correct.
Imagine two clients which are sending a delta against the same server
version.
The delta says to insert "Hello" at some position in a blip.
The correct outcome is "HelloHello" being inserted.
Now one client fails to submit its delta, the other one succeeds.
The client that failed cannot detect that it failed, because the other
delta looks exactly the same. Thus, in the end "Hello" will be inserted
only once.
I agree that this is an academic corner case, but I see no solution for this
when relying on delta comparisons.
Greetings
Torben
Post by Brett Morgan
If you are transforming your docops, you can compare the docops coming
back down for equality. You are doing client side transformations, right?
org.waveprotocol.wave.model.operation.OpComparators is the FedOne code
for comparing equality of ops. Which, after a whole bunch of edge case
DocOpUtil.toConciseString(a).equals(DocOpUtil.toConciseString(b))
In short, comparing docops for equality is easy, as long as you keep
transforming your docops...
Post by Torben Weis
Hi James,
How about when the client connects to the server again it does a
Post by James Purser
history
check against the known good deltas it has sent out. If the last delta
it sent out isn't in the history, then it hasn't been received.
The problem is that this is impossible. How should QWaveClient
recognize a delta as
being its own? The server has perhaps transformed the delta, i.e.
simple delta comparison
is not possible and looking at version numbers does not help either.
Greetings
Torben
Post by James Purser
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
Torben Weis
2010-01-20 20:41:01 UTC
Permalink
Hi Brett,

I think we had a little misunderstanding with regard to my initial problem.

In your example, what would happen if Client1 lost connection to the server
after step3?
Upon reconnect it does not know whether its in-flight delta has been
accepted or not
and it as IMHO no chance of finding out. So should it send the delta again
or not?

As long as there is no such crash (as in your example), it will of course
converge properly.

Greetings
Torben
Post by Brett Morgan
Heya Torben,
I have attached a java class that I believe implements Daniel's scenario.
First off, note that I'm not implementing the wave federation algorithm, as
federation isn't my goal. My goal is to build web apps that use wave's OT.
That said, here is the output of the aforementioned java class showing that
the server and the two clients converge.
State of system after step 1
Client 'Client1' at server revision 1 with document
Client 'Client2' at server revision 1 with document
State of system after step 2
Client 'Client1' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
Client 'Client2' at server revision 1 with document
State of system after step 3
Client 'Client1' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
Client 'Client2' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
State of system after step 4
Client 'Client1' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello"; with cached edit __5; ++"Going to hell";
Client 'Client2' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
State of system after step 5
server editHistory[1]: ++"Hello";
server editHistory[2]: ++"Hello"; __5;
Client 'Client1' at server revision 2 with document ++"Hello"; with cached
edit __5; ++"Going to hell";
Client 'Client2' at server revision 2 with document ++"Hello";
State of system after step 7
server editHistory[1]: ++"Hello";
server editHistory[2]: ++"Hello"; __5;
server editHistory[3]: __10; ++"Going to hell";
Client 'Client1' at server revision 4 with document ++"HelloHelloGoing to
hell";
Client 'Client2' at server revision 4 with document ++"HelloHelloGoing to
hell";
Post by Torben Weis
Hi Brett,
thanks for the hint to your project. I did not know it before.
It had it's coming out party at LCA. And I think I'm going to rip it down
and start again, this time using long poll based notification. I couldn't do
long polls while I was targeting AppEngine as my deployment environment.
Post by Torben Weis
However, I would like to see a proof (i.e. a short explanation is
sufficient) how you intend to solve the problem I have mentioned.
Running code is no proof :-)
If running code doesn't merit existence proof status, then i'm fucked. =)
Post by Torben Weis
For some reasons I strongly doubt that your code (or any possible code)
can handle this without changes to the C/S protocol.
The client/server protocol in the FedOne code base, unless i miss my guess,
isn't doing OT.
Post by Torben Weis
Your application seems to be different anyway. If I am not mistaken (I
just read the wave you mentioned) you are running
a web client which connects to your web server which connects to FedOne.
Right?
Heh, no. I'm not using FedOne, just the OT component of FedOne. I'm
building out the capacity to be able to have gwt web clients running OT sync
with a webserver. It works, but I lack the theoretical grounding to prove
it.
Post by Torben Weis
The problem I mentioned is between your web server and FedOne. In my case
it is between QWaveClient and FedOne.
Your web app can of course recover as long as FedOne and your web server
are stable. But what happens if your
WebServer crashes in an unfortunate moment? Your code will suffer from the
very problem I described.
If the web server goes down with unsync'd state, everything goes shiny. At
this point I force the clients to drop all state and reload.
Post by Torben Weis
However, would like to be proven wrong here since this would give me a
solution to my initial problem :-)
Sorta, kinda, maybe.
Post by Torben Weis
Greetings
Torben
Post by Brett Morgan
Actually, no, OT deals with this case. My almost working code that I
presented at LCA2010 deals with this edge case. Unfortunately it has bugs,
and dies in the arse randomly. Sigh.
https://wave.google.com/wave/#restored:wave:googlewave.com!w%252BTBvx4ehoA<https://wave.google.com/wave/#restored:wave:googlewave.com%21w%252BTBvx4ehoA>
http://code.google.com/p/wave-ot-editor/source/browse/#svn/wave-ot-editor
I can put together a JUnit test case showing that this case actually
stabilises using the Wave OT code, if that would help...
Post by Torben Weis
Hi Brett,
thanks for the suggestion.
However, it seems to me that this approach is not completely correct.
Imagine two clients which are sending a delta against the same server
version.
The delta says to insert "Hello" at some position in a blip.
The correct outcome is "HelloHello" being inserted.
Now one client fails to submit its delta, the other one succeeds.
The client that failed cannot detect that it failed, because the other
delta looks exactly the same. Thus, in the end "Hello" will be inserted
only once.
I agree that this is an academic corner case, but I see no solution for this
when relying on delta comparisons.
Greetings
Torben
Post by Brett Morgan
If you are transforming your docops, you can compare the docops coming
back down for equality. You are doing client side transformations, right?
org.waveprotocol.wave.model.operation.OpComparators is the FedOne code
for comparing equality of ops. Which, after a whole bunch of edge case
DocOpUtil.toConciseString(a).equals(DocOpUtil.toConciseString(b))
In short, comparing docops for equality is easy, as long as you keep
transforming your docops...
Post by Torben Weis
Hi James,
How about when the client connects to the server again it does a
Post by James Purser
history
check against the known good deltas it has sent out. If the last delta
it sent out isn't in the history, then it hasn't been received.
The problem is that this is impossible. How should QWaveClient
recognize a delta as
being its own? The server has perhaps transformed the delta, i.e.
simple delta comparison
is not possible and looking at version numbers does not help either.
Greetings
Torben
Post by James Purser
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
torben.weis-***@public.gmane.org
Brett Morgan
2010-01-20 21:35:09 UTC
Permalink
Short answer: each client maintains serverRevision, which tells them where
in the server history they are up to. This is passed across the wire on both
update and poll so that the server can figure out where the client is at,
and act accordingly.
Post by Torben Weis
Hi Brett,
I think we had a little misunderstanding with regard to my initial problem.
In your example, what would happen if Client1 lost connection to the server
after step3?
Upon reconnect it does not know whether its in-flight delta has been
accepted or not
and it as IMHO no chance of finding out. So should it send the delta again
or not?
As long as there is no such crash (as in your example), it will of course
converge properly.
Greetings
Torben
Post by Brett Morgan
Heya Torben,
I have attached a java class that I believe implements Daniel's scenario.
First off, note that I'm not implementing the wave federation algorithm, as
federation isn't my goal. My goal is to build web apps that use wave's OT.
That said, here is the output of the aforementioned java class showing that
the server and the two clients converge.
State of system after step 1
Client 'Client1' at server revision 1 with document
Client 'Client2' at server revision 1 with document
State of system after step 2
Client 'Client1' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
Client 'Client2' at server revision 1 with document
State of system after step 3
Client 'Client1' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
Client 'Client2' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
State of system after step 4
Client 'Client1' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello"; with cached edit __5; ++"Going to hell";
Client 'Client2' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
State of system after step 5
server editHistory[1]: ++"Hello";
server editHistory[2]: ++"Hello"; __5;
Client 'Client1' at server revision 2 with document ++"Hello"; with
cached edit __5; ++"Going to hell";
Client 'Client2' at server revision 2 with document ++"Hello";
State of system after step 7
server editHistory[1]: ++"Hello";
server editHistory[2]: ++"Hello"; __5;
server editHistory[3]: __10; ++"Going to hell";
Client 'Client1' at server revision 4 with document ++"HelloHelloGoing to
hell";
Client 'Client2' at server revision 4 with document ++"HelloHelloGoing to
hell";
Post by Torben Weis
Hi Brett,
thanks for the hint to your project. I did not know it before.
It had it's coming out party at LCA. And I think I'm going to rip it down
and start again, this time using long poll based notification. I couldn't do
long polls while I was targeting AppEngine as my deployment environment.
Post by Torben Weis
However, I would like to see a proof (i.e. a short explanation is
sufficient) how you intend to solve the problem I have mentioned.
Running code is no proof :-)
If running code doesn't merit existence proof status, then i'm fucked. =)
Post by Torben Weis
For some reasons I strongly doubt that your code (or any possible code)
can handle this without changes to the C/S protocol.
The client/server protocol in the FedOne code base, unless i miss my
guess, isn't doing OT.
Post by Torben Weis
Your application seems to be different anyway. If I am not mistaken (I
just read the wave you mentioned) you are running
a web client which connects to your web server which connects to FedOne.
Right?
Heh, no. I'm not using FedOne, just the OT component of FedOne. I'm
building out the capacity to be able to have gwt web clients running OT sync
with a webserver. It works, but I lack the theoretical grounding to prove
it.
Post by Torben Weis
The problem I mentioned is between your web server and FedOne. In my case
it is between QWaveClient and FedOne.
Your web app can of course recover as long as FedOne and your web server
are stable. But what happens if your
WebServer crashes in an unfortunate moment? Your code will suffer from
the very problem I described.
If the web server goes down with unsync'd state, everything goes shiny. At
this point I force the clients to drop all state and reload.
Post by Torben Weis
However, would like to be proven wrong here since this would give me a
solution to my initial problem :-)
Sorta, kinda, maybe.
Post by Torben Weis
Greetings
Torben
Post by Brett Morgan
Actually, no, OT deals with this case. My almost working code that I
presented at LCA2010 deals with this edge case. Unfortunately it has bugs,
and dies in the arse randomly. Sigh.
https://wave.google.com/wave/#restored:wave:googlewave.com!w%252BTBvx4ehoA<https://wave.google.com/wave/#restored:wave:googlewave.com%21w%252BTBvx4ehoA>
http://code.google.com/p/wave-ot-editor/source/browse/#svn/wave-ot-editor
I can put together a JUnit test case showing that this case actually
stabilises using the Wave OT code, if that would help...
Post by Torben Weis
Hi Brett,
thanks for the suggestion.
However, it seems to me that this approach is not completely correct.
Imagine two clients which are sending a delta against the same server
version.
The delta says to insert "Hello" at some position in a blip.
The correct outcome is "HelloHello" being inserted.
Now one client fails to submit its delta, the other one succeeds.
The client that failed cannot detect that it failed, because the other
delta looks exactly the same. Thus, in the end "Hello" will be inserted
only once.
I agree that this is an academic corner case, but I see no solution for this
when relying on delta comparisons.
Greetings
Torben
Post by Brett Morgan
If you are transforming your docops, you can compare the docops coming
back down for equality. You are doing client side transformations, right?
org.waveprotocol.wave.model.operation.OpComparators is the FedOne code
for comparing equality of ops. Which, after a whole bunch of edge case
DocOpUtil.toConciseString(a).equals(DocOpUtil.toConciseString(b))
In short, comparing docops for equality is easy, as long as you keep
transforming your docops...
Post by Torben Weis
Hi James,
How about when the client connects to the server again it does a
Post by James Purser
history
check against the known good deltas it has sent out. If the last delta
it sent out isn't in the history, then it hasn't been received.
The problem is that this is impossible. How should QWaveClient
recognize a delta as
being its own? The server has perhaps transformed the delta, i.e.
simple delta comparison
is not possible and looking at version numbers does not help either.
Greetings
Torben
Post by James Purser
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
Daniel Paull
2010-01-21 01:01:03 UTC
Permalink
Hi Torben,

Consider state space (called a linear history buffer in other OT
literature) to be a persistent message queue. In distributed
computing we use sequence numbers to track which messages have been
received by the consumer of the message queue. In the event of a
communications failure, messages may be lost (eg, messages that are in
flight, or perhaps rollback occurred if there was a crash). Upon
reconnection, simple handshaking where sequence numbers are exchanged
is sufficient to determine what needs to be resent. The "server
revision" number represents this sequence number.

In the more general OT approaches where peer to peer connections are
made (ie, not forced to be client/server as it is in Wave), a vector
of (site id, sequence number) - often denoted as an (s, t) pair - are
used in place of a simple sequence number. This vector is called a
Vector Time and represents a snapshot of the document that includes
all operations that are "in" the vector time - that is, all operations
from each "s" that have a sequence number less than "t".

I find it interesting that upon reconnection after a crash and
rollback, you might get back operations that you generated and sent to
another node only moments before your crash. That's just one exciting
feature of OT - fault tolerance.

It seems that what you are missing is the handshaking (exchange of
equence numbers) during reconnection - would you agree?

Cheers,

Dan
Post by Torben Weis
Hi Brett,
I think we had a little misunderstanding with regard to my initial problem.
In your example, what would happen if Client1 lost connection to the server
after step3?
Upon reconnect it does not know whether its in-flight delta has been
accepted or not
and it as IMHO no chance of finding out. So should it send the delta again
or not?
As long as there is no such crash (as in your example), it will of course
converge properly.
Greetings
Torben
Post by Brett Morgan
Heya Torben,
I have attached a java class that I believe implements Daniel's scenario.
First off, note that I'm not implementing the wave federation algorithm, as
federation isn't my goal. My goal is to build web apps that use wave's OT.
That said, here is the output of the aforementioned java class showing that
the server and the two clients converge.
State of system after step 1
Client 'Client1' at server revision 1 with document
Client 'Client2' at server revision 1 with document
State of system after step 2
Client 'Client1' at server revision 1 with document ++"Hello";  with edit
in flight ++"Hello";
Client 'Client2' at server revision 1 with document
State of system after step 3
Client 'Client1' at server revision 1 with document ++"Hello";  with edit
in flight ++"Hello";
Client 'Client2' at server revision 1 with document ++"Hello";  with edit
in flight ++"Hello";
State of system after step 4
Client 'Client1' at server revision 1 with document ++"Hello";  with edit
in flight ++"Hello";  with cached edit __5; ++"Going to hell";
Client 'Client2' at server revision 1 with document ++"Hello";  with edit
in flight ++"Hello";
State of system after step 5
server editHistory[1]: ++"Hello";
server editHistory[2]: ++"Hello"; __5;
Client 'Client1' at server revision 2 with document ++"Hello";  with cached
edit __5; ++"Going to hell";
Client 'Client2' at server revision 2 with document ++"Hello";
State of system after step 7
server editHistory[1]: ++"Hello";
server editHistory[2]: ++"Hello"; __5;
server editHistory[3]: __10; ++"Going to hell";
Client 'Client1' at server revision 4 with document ++"HelloHelloGoing to
hell";
Client 'Client2' at server revision 4 with document ++"HelloHelloGoing to
hell";
Post by Torben Weis
Hi Brett,
thanks for the hint to your project. I did not know it before.
It had it's coming out party at LCA. And I think I'm going to rip it down
and start again, this time using long poll based notification. I couldn't do
long polls while I was targeting AppEngine as my deployment environment.
Post by Torben Weis
However, I would like to see a proof (i.e. a short explanation is
sufficient) how you intend to solve the problem I have mentioned.
Running code is no proof :-)
If running code doesn't merit existence proof status, then i'm fucked. =)
Post by Torben Weis
For some reasons I strongly doubt that your code (or any possible code)
can handle this without changes to the C/S protocol.
The client/server protocol in the FedOne code base, unless i miss my guess,
isn't doing OT.
Post by Torben Weis
Your application seems to be different anyway. If I am not mistaken (I
just read the wave you mentioned) you are running
a web client which connects to your web server which connects to FedOne.
Right?
Heh, no. I'm not using FedOne, just the OT component of FedOne. I'm
building out the capacity to be able to have gwt web clients running OT sync
with a webserver. It works, but I lack the theoretical grounding to prove
it.
Post by Torben Weis
The problem I mentioned is between your web server and FedOne. In my case
it is between QWaveClient and FedOne.
Your web app can of course recover as long as FedOne and your web server
are stable. But what happens if your
WebServer crashes in an unfortunate moment? Your code will suffer from the
very problem I described.
If the web server goes down with unsync'd state, everything goes shiny. At
this point I force the clients to drop all state and reload.
Post by Torben Weis
However, would like to be proven wrong here since this would give me a
solution to my initial problem :-)
Sorta, kinda, maybe.
Post by Torben Weis
Greetings
Torben
Post by Brett Morgan
Actually, no, OT deals with this case. My almost working code that I
presented at LCA2010 deals with this edge case. Unfortunately it has bugs,
and dies in the arse randomly. Sigh.
https://wave.google.com/wave/#restored:wave:googlewave.com!w%252BTBvx...<https://wave.google.com/wave/#restored:wave:googlewave.com%21w%252BTB...>
http://code.google.com/p/wave-ot-editor/source/browse/#svn/wave-ot-ed...
I can put together a JUnit test case showing that this case actually
stabilises using the Wave OT code, if that would help...
Post by Torben Weis
Hi Brett,
thanks for the suggestion.
However, it seems to me that this approach is not completely correct.
Imagine two clients which are sending a delta against the same server
version.
The delta says to insert "Hello" at some position in a blip.
The correct outcome is "HelloHello" being inserted.
Now one client fails to submit its delta, the other one succeeds.
The client that failed cannot detect that it failed, because the other
delta looks exactly the same. Thus, in the end "Hello" will be inserted
only once.
I agree that this is an academic corner case, but I see no solution for this
when relying on delta comparisons.
Greetings
Torben
Post by Brett Morgan
If you are transforming your docops, you can compare the docops coming
back down for equality. You are doing client side transformations, right?
org.waveprotocol.wave.model.operation.OpComparators is the FedOne code
for comparing equality of ops. Which, after a whole bunch of edge case
DocOpUtil.toConciseString(a).equals(DocOpUtil.toConciseString(b))
In short, comparing docops for equality is easy, as long as you keep
transforming your docops...
 Hi James,
How about when the client connects to the server again it does a
Post by James Purser
history
check against the known good deltas it has sent out. If the last delta
it sent out isn't in the history, then it hasn't been received.
The problem is that this is impossible. How should QWaveClient
recognize a delta as
being its own? The server has perhaps transformed the delta, i.e.
simple delta comparison
is not possible and looking at version numbers does not help either.
Greetings
Torben
Post by James Purser
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morganhttp://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan
...
read more »
Brett Morgan
2010-01-20 21:28:57 UTC
Permalink
Ugh.

Reading back through my code after breakfast, and I realise that i'm
handling the cached edit while an edit is in flight incorrectly. And my
initial stabs at fixing it up are breaking. I think i need a bigger piece of
paper to figure this out...

brett
Post by Brett Morgan
Heya Torben,
I have attached a java class that I believe implements Daniel's scenario.
First off, note that I'm not implementing the wave federation algorithm, as
federation isn't my goal. My goal is to build web apps that use wave's OT.
That said, here is the output of the aforementioned java class showing that
the server and the two clients converge.
State of system after step 1
Client 'Client1' at server revision 1 with document
Client 'Client2' at server revision 1 with document
State of system after step 2
Client 'Client1' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
Client 'Client2' at server revision 1 with document
State of system after step 3
Client 'Client1' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
Client 'Client2' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
State of system after step 4
Client 'Client1' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello"; with cached edit __5; ++"Going to hell";
Client 'Client2' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
State of system after step 5
server editHistory[1]: ++"Hello";
server editHistory[2]: ++"Hello"; __5;
Client 'Client1' at server revision 2 with document ++"Hello"; with cached
edit __5; ++"Going to hell";
Client 'Client2' at server revision 2 with document ++"Hello";
State of system after step 7
server editHistory[1]: ++"Hello";
server editHistory[2]: ++"Hello"; __5;
server editHistory[3]: __10; ++"Going to hell";
Client 'Client1' at server revision 4 with document ++"HelloHelloGoing to
hell";
Client 'Client2' at server revision 4 with document ++"HelloHelloGoing to
hell";
Post by Torben Weis
Hi Brett,
thanks for the hint to your project. I did not know it before.
It had it's coming out party at LCA. And I think I'm going to rip it down
and start again, this time using long poll based notification. I couldn't do
long polls while I was targeting AppEngine as my deployment environment.
Post by Torben Weis
However, I would like to see a proof (i.e. a short explanation is
sufficient) how you intend to solve the problem I have mentioned.
Running code is no proof :-)
If running code doesn't merit existence proof status, then i'm fucked. =)
Post by Torben Weis
For some reasons I strongly doubt that your code (or any possible code)
can handle this without changes to the C/S protocol.
The client/server protocol in the FedOne code base, unless i miss my guess,
isn't doing OT.
Post by Torben Weis
Your application seems to be different anyway. If I am not mistaken (I
just read the wave you mentioned) you are running
a web client which connects to your web server which connects to FedOne.
Right?
Heh, no. I'm not using FedOne, just the OT component of FedOne. I'm
building out the capacity to be able to have gwt web clients running OT sync
with a webserver. It works, but I lack the theoretical grounding to prove
it.
Post by Torben Weis
The problem I mentioned is between your web server and FedOne. In my case
it is between QWaveClient and FedOne.
Your web app can of course recover as long as FedOne and your web server
are stable. But what happens if your
WebServer crashes in an unfortunate moment? Your code will suffer from the
very problem I described.
If the web server goes down with unsync'd state, everything goes shiny. At
this point I force the clients to drop all state and reload.
Post by Torben Weis
However, would like to be proven wrong here since this would give me a
solution to my initial problem :-)
Sorta, kinda, maybe.
Post by Torben Weis
Greetings
Torben
Post by Brett Morgan
Actually, no, OT deals with this case. My almost working code that I
presented at LCA2010 deals with this edge case. Unfortunately it has bugs,
and dies in the arse randomly. Sigh.
https://wave.google.com/wave/#restored:wave:googlewave.com!w%252BTBvx4ehoA<https://wave.google.com/wave/#restored:wave:googlewave.com%21w%252BTBvx4ehoA>
http://code.google.com/p/wave-ot-editor/source/browse/#svn/wave-ot-editor
I can put together a JUnit test case showing that this case actually
stabilises using the Wave OT code, if that would help...
Post by Torben Weis
Hi Brett,
thanks for the suggestion.
However, it seems to me that this approach is not completely correct.
Imagine two clients which are sending a delta against the same server
version.
The delta says to insert "Hello" at some position in a blip.
The correct outcome is "HelloHello" being inserted.
Now one client fails to submit its delta, the other one succeeds.
The client that failed cannot detect that it failed, because the other
delta looks exactly the same. Thus, in the end "Hello" will be inserted
only once.
I agree that this is an academic corner case, but I see no solution for this
when relying on delta comparisons.
Greetings
Torben
Post by Brett Morgan
If you are transforming your docops, you can compare the docops coming
back down for equality. You are doing client side transformations, right?
org.waveprotocol.wave.model.operation.OpComparators is the FedOne code
for comparing equality of ops. Which, after a whole bunch of edge case
DocOpUtil.toConciseString(a).equals(DocOpUtil.toConciseString(b))
In short, comparing docops for equality is easy, as long as you keep
transforming your docops...
Post by Torben Weis
Hi James,
How about when the client connects to the server again it does a
Post by James Purser
history
check against the known good deltas it has sent out. If the last delta
it sent out isn't in the history, then it hasn't been received.
The problem is that this is impossible. How should QWaveClient
recognize a delta as
being its own? The server has perhaps transformed the delta, i.e.
simple delta comparison
is not possible and looking at version numbers does not help either.
Greetings
Torben
Post by James Purser
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
Brett Morgan http://domesticmouse.livejournal.com/
Brett Morgan
2010-01-21 01:16:10 UTC
Permalink
New version with slightly more correct cached edit handling. And I've added
a third client that is being a spam bot.

The state trace now looks like this:

Three clients, sync'd
server editHistory[0]:
Client 'Client1' at server revision 1 with document
Client 'Client2' at server revision 1 with document
Client 'Client3' at server revision 1 with document

Client3 is generating noise
server editHistory[0]:
server editHistory[1]: ++" Chatter";
Client 'Client1' at server revision 1 with document
Client 'Client2' at server revision 1 with document
Client 'Client3' at server revision 2 with document ++" Chatter";

Client2 has generated an edit, and it's in flight
server editHistory[0]:
server editHistory[1]: ++" Chatter";
Client 'Client1' at server revision 1 with document ++"Hello"; with edit in
flight ++"Hello";
Client 'Client2' at server revision 1 with document
Client 'Client3' at server revision 2 with document ++" Chatter";

Client2 has generated matching edit, also in flight
server editHistory[0]:
server editHistory[1]: ++" Chatter";
Client 'Client1' at server revision 1 with document ++"Hello"; with edit in
flight ++"Hello";
Client 'Client2' at server revision 1 with document ++"Hello"; with edit in
flight ++"Hello";
Client 'Client3' at server revision 2 with document ++" Chatter";

Client1 has handbasket edit cached
server editHistory[0]:
server editHistory[1]: ++" Chatter";
Client 'Client1' at server revision 1 with document ++"Hello"; with edit in
flight ++"Hello"; with cached edit __5; ++" Going to hell";
Client 'Client2' at server revision 1 with document ++"Hello"; with edit in
flight ++"Hello";
Client 'Client3' at server revision 2 with document ++" Chatter";

Server has client2's edit in, and clients synced there
server editHistory[0]:
server editHistory[1]: ++" Chatter";
server editHistory[2]: ++"Hello"; __8;
Client 'Client1' at server revision 3 with document ++"Hello Chatter"; with
cached edit __5; ++" Going to hell"; __8;
Client 'Client2' at server revision 3 with document ++"Hello Chatter";
Client 'Client3' at server revision 3 with document ++"Hello Chatter";

Client3 is generating noise
server editHistory[0]:
server editHistory[1]: ++" Chatter";
server editHistory[2]: ++"Hello"; __8;
server editHistory[3]: __13; ++" Chatter";
Client 'Client1' at server revision 3 with document ++"Hello Chatter"; with
cached edit __5; ++" Going to hell"; __8;
Client 'Client2' at server revision 3 with document ++"Hello Chatter";
Client 'Client3' at server revision 4 with document ++"Hello Chatter
Chatter";

Server accepts client1's edit
server editHistory[0]:
server editHistory[1]: ++" Chatter";
server editHistory[2]: ++"Hello"; __8;
server editHistory[3]: __13; ++" Chatter";
server editHistory[4]: ++"Hello"; __21;
Client 'Client1' at server revision 3 with document ++"Hello Chatter"; with
cached edit __5; ++" Going to hell"; __8;
Client 'Client2' at server revision 3 with document ++"Hello Chatter";
Client 'Client3' at server revision 4 with document ++"Hello Chatter
Chatter";

Client3 is generating noise
server editHistory[0]:
server editHistory[1]: ++" Chatter";
server editHistory[2]: ++"Hello"; __8;
server editHistory[3]: __13; ++" Chatter";
server editHistory[4]: ++"Hello"; __21;
server editHistory[5]: __26; ++" Chatter";
Client 'Client1' at server revision 3 with document ++"Hello Chatter"; with
cached edit __5; ++" Going to hell"; __8;
Client 'Client2' at server revision 3 with document ++"Hello Chatter";
Client 'Client3' at server revision 6 with document ++"HelloHello Chatter
Chatter Chatter";

Client1's cached edit posted to server, and accepted
server editHistory[0]:
server editHistory[1]: ++" Chatter";
server editHistory[2]: ++"Hello"; __8;
server editHistory[3]: __13; ++" Chatter";
server editHistory[4]: ++"Hello"; __21;
server editHistory[5]: __26; ++" Chatter";
server editHistory[6]: __10; ++" Going to hell"; __24;
Client 'Client1' at server revision 7 with document ++"HelloHello Going to
hell Chatter Chatter Chatter";
Client 'Client2' at server revision 7 with document ++"HelloHello Going to
hell Chatter Chatter Chatter";
Client 'Client3' at server revision 7 with document ++"HelloHello Going to
hell Chatter Chatter Chatter";
From my perspective, the above shows that this simple c/s protocol based on
pushing edits to the server, and pulling a queue of edits back in separate
c->s calls, works. I'm not happy with how i'm modeling cached edits in the
client at this point, which means I need to re-do Client#acceptUpdates(),
probably using a strategy pattern...

brett
Ugh.
Reading back through my code after breakfast, and I realise that i'm
handling the cached edit while an edit is in flight incorrectly. And my
initial stabs at fixing it up are breaking. I think i need a bigger piece of
paper to figure this out...
brett
Heya Torben,
Post by Brett Morgan
I have attached a java class that I believe implements Daniel's scenario.
First off, note that I'm not implementing the wave federation algorithm, as
federation isn't my goal. My goal is to build web apps that use wave's OT.
That said, here is the output of the aforementioned java class showing that
the server and the two clients converge.
State of system after step 1
Client 'Client1' at server revision 1 with document
Client 'Client2' at server revision 1 with document
State of system after step 2
Client 'Client1' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
Client 'Client2' at server revision 1 with document
State of system after step 3
Client 'Client1' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
Client 'Client2' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
State of system after step 4
Client 'Client1' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello"; with cached edit __5; ++"Going to hell";
Client 'Client2' at server revision 1 with document ++"Hello"; with edit
in flight ++"Hello";
State of system after step 5
server editHistory[1]: ++"Hello";
server editHistory[2]: ++"Hello"; __5;
Client 'Client1' at server revision 2 with document ++"Hello"; with
cached edit __5; ++"Going to hell";
Client 'Client2' at server revision 2 with document ++"Hello";
State of system after step 7
server editHistory[1]: ++"Hello";
server editHistory[2]: ++"Hello"; __5;
server editHistory[3]: __10; ++"Going to hell";
Client 'Client1' at server revision 4 with document ++"HelloHelloGoing to
hell";
Client 'Client2' at server revision 4 with document ++"HelloHelloGoing to
hell";
Post by Torben Weis
Hi Brett,
thanks for the hint to your project. I did not know it before.
It had it's coming out party at LCA. And I think I'm going to rip it down
and start again, this time using long poll based notification. I couldn't do
long polls while I was targeting AppEngine as my deployment environment.
Post by Torben Weis
However, I would like to see a proof (i.e. a short explanation is
sufficient) how you intend to solve the problem I have mentioned.
Running code is no proof :-)
If running code doesn't merit existence proof status, then i'm fucked. =)
Post by Torben Weis
For some reasons I strongly doubt that your code (or any possible code)
can handle this without changes to the C/S protocol.
The client/server protocol in the FedOne code base, unless i miss my
guess, isn't doing OT.
Post by Torben Weis
Your application seems to be different anyway. If I am not mistaken (I
just read the wave you mentioned) you are running
a web client which connects to your web server which connects to FedOne.
Right?
Heh, no. I'm not using FedOne, just the OT component of FedOne. I'm
building out the capacity to be able to have gwt web clients running OT sync
with a webserver. It works, but I lack the theoretical grounding to prove
it.
Post by Torben Weis
The problem I mentioned is between your web server and FedOne. In my case
it is between QWaveClient and FedOne.
Your web app can of course recover as long as FedOne and your web server
are stable. But what happens if your
WebServer crashes in an unfortunate moment? Your code will suffer from
the very problem I described.
If the web server goes down with unsync'd state, everything goes shiny. At
this point I force the clients to drop all state and reload.
Post by Torben Weis
However, would like to be proven wrong here since this would give me a
solution to my initial problem :-)
Sorta, kinda, maybe.
Post by Torben Weis
Greetings
Torben
Post by Brett Morgan
Actually, no, OT deals with this case. My almost working code that I
presented at LCA2010 deals with this edge case. Unfortunately it has bugs,
and dies in the arse randomly. Sigh.
https://wave.google.com/wave/#restored:wave:googlewave.com!w%252BTBvx4ehoA<https://wave.google.com/wave/#restored:wave:googlewave.com%21w%252BTBvx4ehoA>
http://code.google.com/p/wave-ot-editor/source/browse/#svn/wave-ot-editor
I can put together a JUnit test case showing that this case actually
stabilises using the Wave OT code, if that would help...
Post by Torben Weis
Hi Brett,
thanks for the suggestion.
However, it seems to me that this approach is not completely correct.
Imagine two clients which are sending a delta against the same server
version.
The delta says to insert "Hello" at some position in a blip.
The correct outcome is "HelloHello" being inserted.
Now one client fails to submit its delta, the other one succeeds.
The client that failed cannot detect that it failed, because the other
delta looks exactly the same. Thus, in the end "Hello" will be inserted
only once.
I agree that this is an academic corner case, but I see no solution for this
when relying on delta comparisons.
Greetings
Torben
Post by Brett Morgan
If you are transforming your docops, you can compare the docops coming
back down for equality. You are doing client side transformations, right?
org.waveprotocol.wave.model.operation.OpComparators is the FedOne code
for comparing equality of ops. Which, after a whole bunch of edge case
DocOpUtil.toConciseString(a).equals(DocOpUtil.toConciseString(b))
In short, comparing docops for equality is easy, as long as you keep
transforming your docops...
Post by Torben Weis
Hi James,
How about when the client connects to the server again it does a
Post by James Purser
history
check against the known good deltas it has sent out. If the last delta
it sent out isn't in the history, then it hasn't been received.
The problem is that this is impossible. How should QWaveClient
recognize a delta as
being its own? The server has perhaps transformed the delta, i.e.
simple delta comparison
is not possible and looking at version numbers does not help either.
Greetings
Torben
Post by James Purser
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
You received this message because you are subscribed to the Google
Groups "Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
--
Brett Morgan http://domesticmouse.livejournal.com/
--
Brett Morgan http://domesticmouse.livejournal.com/
chiang
2010-01-20 10:30:09 UTC
Permalink
Hi Torben,

Allows me to offer my 2 pences.
Post by Torben Weis
Imagine two clients which are sending a delta against the same server
version.
The delta says to insert "Hello" at some position in a blip.
The correct outcome is "HelloHello" being inserted.
Now one client fails to submit its delta, the other one succeeds.
The client that failed cannot detect that it failed, because the other
delta looks exactly the same. Thus, in the end "Hello" will be inserted only
once.
Does your client not use TCP? So if the delta has been submitted to
the server successfully, then your client should receive an
acknowledgement doesn't it? And each instance of your client has
unique port/IP address pair so there shouldn't be confusion on who has
sent which delta, as each delta is sent over independent and unique
connection to the server. If one delta from client 1 has been received
and not the other from client 2, then client 2 should operate with the
assumption that it should send the delta again doesn't it?

Hence I don't believe client ID is necessary when submitting delta as
TCP should have been sufficient. Or am I missing something here?

cheers,

Chiang
Daniel Paull
2010-01-20 11:28:30 UTC
Permalink
Hello Chiang,
Post by chiang
Hence I don't believe client ID is necessary when submitting delta as
TCP should have been sufficient. Or am I missing something here?
I agree that client IDs are not required in the delta as they can be
implied.
Post by chiang
So if the delta has been submitted to the server successfully, then your
client should receive an acknowledgement
"Should" is the prominent word in that sentence. As it stands, the
client/server interaction between the FedOne client and server does
not even remotely resemble the OT whitepaper published by Google. Of
note here is the lack of server acknowledgements.

Cheers,

Dan
chiang
2010-01-20 11:45:22 UTC
Permalink
Hi Dan,
Post by chiang
So if the delta has been submitted to the server successfully, then your
client should receive an acknowledgement
"Should" is the prominent word in that sentence.  As it stands, the
client/server interaction between the FedOne client and server does
not even remotely resemble the OT whitepaper published by Google.  Of
note here is the lack of server acknowledgements.
What I meant was TCP acknowledgement. Of course for whatever reason
the server can behave strangely and does not update the delta change
to the document. But I would have thought that the server WILL apply
the change unless it crashes or something. Thus the latest delta
change would not have taken effect, and the client will need to track
back to its previous version of the document. I think the problem lies
in the FedOne console client not having OT implemented. This means
that the client will be unable to track back to its previous version
of the document, I think. As it is, with regards to the FedOne console
client, it will retrieve the whole document or the full history (less
the latest change) from the server (and I mean the sandbox server
here, not the FedOne server).

I agree with you that the FedOne console client is not a reference
implementation for wave client, and there are still much to do between
client and server...

cheers,

Chiang
Daniel Paull
2010-01-20 13:12:44 UTC
Permalink
The way that clients must wait for the server to acknowledge their
operation before sending further operation makes the TCP ack
insufficient.
Post by chiang
Hi Dan,
Post by chiang
So if the delta has been submitted to the server successfully, then your
client should receive an acknowledgement
"Should" is the prominent word in that sentence.  As it stands, the
client/server interaction between the FedOne client and server does
not even remotely resemble the OT whitepaper published by Google.  Of
note here is the lack of server acknowledgements.
What I meant was TCP acknowledgement. Of course for whatever reason
the server can behave strangely and does not update the delta change
to the document. But I would have thought that the server WILL apply
the change unless it crashes or something. Thus the latest delta
change would not have taken effect, and the client will need to track
back to its previous version of the document. I think the problem lies
in the FedOne console client not having OT implemented. This means
that the client will be unable to track back to its previous version
of the document, I think. As it is, with regards to the FedOne console
client, it will retrieve the whole document or the full history (less
the latest change) from the server (and I mean the sandbox server
here, not the FedOne server).
I agree with you that the FedOne console client is not a reference
implementation for wave client, and there are still much to do between
client and server...
cheers,
Chiang
Torben Weis
2010-01-20 18:02:56 UTC
Permalink
Hi chiang,

What I meant was TCP acknowledgement. Of course for whatever reason
TCP acks don't help. In fact, no ack can ever help.
Imagine that the client stores the delta on hard disk thus that it cannot be
lost.
Then it sends it to the server via TCP. In the very moment when the server
sends the TCP ack, the client crashes. When the client restarts it
cannot know whether its data has arrived at the server or not. QED.

This falls back to the famous "two-army problem" or "two-general problem".
In an asynchronous system (which the internet is) no two parties can ever
guarantee reaching an agreement when there are message losses and no
time bounds. It is well documented in text books about distributed systems,
thus I think every ACK based solution cannot be appropriate.
It can become arbitrarily unlikely that something goes wrong, but it is not
perfect.

Cheers
Torben

the server can behave strangely and does not update the delta change
Post by chiang
to the document. But I would have thought that the server WILL apply
the change unless it crashes or something. Thus the latest delta
change would not have taken effect, and the client will need to track
back to its previous version of the document. I think the problem lies
in the FedOne console client not having OT implemented. This means
that the client will be unable to track back to its previous version
of the document, I think. As it is, with regards to the FedOne console
client, it will retrieve the whole document or the full history (less
the latest change) from the server (and I mean the sandbox server
here, not the FedOne server).
I agree with you that the FedOne console client is not a reference
implementation for wave client, and there are still much to do between
client and server...
cheers,
Chiang
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
---------------------------
Prof. Torben Weis
Universitaet Duisburg-Essen
torben.weis-***@public.gmane.org
Daniel Paull
2010-01-20 11:37:45 UTC
Permalink
Post by Torben Weis
The client that failed cannot detect that it failed, because the other
delta looks exactly the same. Thus, in the end "Hello" will be inserted only
once.
Bingo! It's clearly a broken approach. Perhaps that's why its not
talked about in Google's OT whitepaper?
Post by Torben Weis
but I see no solution for this when relying on delta comparisons.
Yep, there is no correct solution using this approach.
Post by Torben Weis
I agree that this is an academic corner case,
Egad! This is not an "academic corner case". That's like calling
deadlock detection and transaction rollback an academic corner case
for a RDBMS. It is so important that the OT functions be proven to be
correct in order to build the simplest OT system that can be relied
upon. Reliability is more than a mere academic concern.

Cheers,

Dan
Brett Morgan
2010-01-20 11:41:22 UTC
Permalink
Post by Daniel Paull
Post by Torben Weis
The client that failed cannot detect that it failed, because the other
delta looks exactly the same. Thus, in the end "Hello" will be inserted
only
Post by Torben Weis
once.
Bingo! It's clearly a broken approach. Perhaps that's why its not
talked about in Google's OT whitepaper?
No it's not a clearly broken approach, mainly because it isn't broken. The
trick here is that the edits are transformed, and thus are different. The
server and both clients will wind up with the document "HelloHello". I know
this because my LCA2010 webapp deals with this exact issue, and it handles
it without breaking a sweat.
Post by Daniel Paull
Post by Torben Weis
but I see no solution for this when relying on delta comparisons.
Yep, there is no correct solution using this approach.
Post by Torben Weis
I agree that this is an academic corner case,
Egad! This is not an "academic corner case". That's like calling
deadlock detection and transaction rollback an academic corner case
for a RDBMS. It is so important that the OT functions be proven to be
correct in order to build the simplest OT system that can be relied
upon. Reliability is more than a mere academic concern.
Cheers,
Post by Daniel Paull
Dan
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
Daniel Paull
2010-01-20 13:10:37 UTC
Permalink
Come on, it's broken. Maybe I can find a contrived example to
illustrate:

1) There are two clients that both have the the same, empty wave open.
2) Client1 generates O1 and sends it to the server.
3) Client2 generates OA and sends it to the server. O1 and OA happen
to be identical.
4) Client1 generates O2 and caches it, waiting for the server to
acknowledge O1 before sending O2.
5) The server decides to apply the two concurrent operations in the
order OA then O1. So, it applies OA after transforming it (the
transformation happens to be a no-op at this stage) and broadcasts OA
to all clients.
6) Client1 receives OA, compares it to its unacknowledged operation,
O1. They are the same, so Client1 incorrectly assumes that the server
has acknowledged O1.
7) Client1 sends OA to the server and we all go to hell in a hand
basket as the server is not expecting Client1 to send operations at
this time.

It may just so happen that two clients will arrive at the same state
in the above scenario (assuming that the server doesn't kill off the
misbehaving client). However, I would expect divergence when more
than two clients are involved. Maybe that would be worth proving, but
I think it's sufficient to show that Client1 can send operations at
the wrong time when following your approach.

Cheers,

Dan
Post by Brett Morgan
Post by Torben Weis
The client that failed cannot detect that it failed, because the other
delta looks exactly the same. Thus, in the end "Hello" will be inserted
only
Post by Torben Weis
once.
Bingo!  It's clearly a broken approach.  Perhaps that's why its not
talked about in Google's OT whitepaper?
No it's not a clearly broken approach, mainly because it isn't broken. The
trick here is that the edits are transformed, and thus are different. The
server and both clients will wind up with the document "HelloHello". I know
this because my LCA2010 webapp deals with this exact issue, and it handles
it without breaking a sweat.
Post by Torben Weis
but I see no solution for this when relying on delta comparisons.
Yep, there is no correct solution using this approach.
Post by Torben Weis
I agree that this is an academic corner case,
Egad!  This is not an "academic corner case".  That's like calling
deadlock detection and transaction rollback an academic corner case
for a RDBMS.  It is so important that the OT functions be proven to be
correct in order to build the simplest OT system that can be relied
upon.  Reliability is more than a mere academic concern.
Cheers,
Dan
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morganhttp://domesticmouse.livejournal.com/
chiang
2010-01-20 14:19:39 UTC
Permalink
Hi Dan,

I must admit that I'm a bit mystified to see you keep calling the
client (presumably you are referring to the FedOne console client)
broken :) when the client does not do OT, which is probably why it
does not seem to follow what the Google OT white papers says.
Come on, it's broken.  Maybe I can find a contrived example to
1) There are two clients that both have the the same, empty wave open.
2) Client1 generates O1 and sends it to the server.
3) Client2 generates OA and sends it to the server.  O1 and OA happen
to be identical.
4) Client1 generates O2 and caches it, waiting for the server to
acknowledge O1 before sending O2.
With the FedOne console client, TCP acknowledgement seems like the
only acknowledgement you'll get from the server.
5) The server decides to apply the two concurrent operations in the
order OA then O1.  So, it applies OA after transforming it (the
transformation happens to be a no-op at this stage) and broadcasts OA
to all clients.
The server does not broadcast to clients that do not do OT, correct me
if I'm wrong.
6) Client1 receives OA, compares it to its unacknowledged operation,
O1.  They are the same, so Client1 incorrectly assumes that the server
has acknowledged O1.
If you are assuming that the client does OT here, I think each OT
operation needs to be digitally signed (to come from whatever user/
domain). So that may resolve the ambiguity?
7) Client1 sends OA to the server and we all go to hell in a hand
basket as the server is not expecting Client1 to send operations at
this time.
Again that can be resolved by OT?

I'm trying to understand what is lacking in the client and what needs
to be done too.

cheers,

Chiang
Daniel Paull
2010-01-20 15:08:44 UTC
Permalink
You shouldn;t be mystified by what I have said - the FedOne client
(yes, the console client) is not a Wave client as described in the
Wave OT white paper from Google. In my opinion, Google has caused
much confusion by providing a client that bears little resemblance to
what they have documented as a proper Wave client.

For example, you said "With the FedOne console client, TCP
acknowledgement seems like the
only acknowledgement you'll get from the server." Right - this is
what is broken. You can't implement a proper FedOne wave client
because the FedOne server does not act like a proper Wave server. If
it did, the question asked that started this discussion would most
likely not have been asked! My answer to the original question is
"implement server acknowledgements as documented in the Google OT
whitepaper and the problem no longer exists."

I find comments like "The server does not broadcast to clients that do
not do OT, correct me
if I'm wrong" very confusing. Wave clients perform OT. If they
don't, they're not a Wave client - they are something else entirely.

Cheers,

Dan
Post by chiang
Hi Dan,
I must admit that I'm a bit mystified to see you keep calling the
client (presumably you are referring to the FedOne console client)
broken :) when the client does not do OT, which is probably why it
does not seem to follow what the Google OT white papers says.
Post by Daniel Paull
1) There are two clients that both have the the same, empty wave open.
2) Client1 generates O1 and sends it to the server.
3) Client2 generates OA and sends it to the server.  O1 and OA happen
to be identical.
4) Client1 generates O2 and caches it, waiting for the server to
acknowledge O1 before sending O2.
With the FedOne console client, TCP acknowledgement seems like the
only acknowledgement you'll get from the server.> 5) The server decides to apply the two concurrent operations in the
Post by Daniel Paull
order OA then O1.  So, it applies OA after transforming it (the
transformation happens to be a no-op at this stage) and broadcasts OA
to all clients.
The server does not broadcast to clients that do not do OT, correct me
if I'm wrong.> 6) Client1 receives OA, compares it to its unacknowledged operation,
Post by Daniel Paull
O1.  They are the same, so Client1 incorrectly assumes that the server
has acknowledged O1.
If you are assuming that the client does OT here, I think each OT
operation needs to be digitally signed (to come from whatever user/
domain). So that may resolve the ambiguity?> 7) Client1 sends OA to the server and we all go to hell in a hand
Post by Daniel Paull
basket as the server is not expecting Client1 to send operations at
this time.
Again that can be resolved by OT?
I'm trying to understand what is lacking in the client and what needs
to be done too.
cheers,
Chiang
James Purser
2010-01-20 19:37:19 UTC
Permalink
Post by Daniel Paull
My answer to the original question is
"implement server acknowledgements as documented in the Google OT
whitepaper and the problem no longer exists."
Okay, I'd just like to point out something. FedOne is open source. The
FedOne project is accepting patches for review and inclusion from
outside Google. If you believe there is a deficiency and can fix it,
please send it in. Improving FedOne can only be a good thing.

Thanks
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Wave Addresses:
jamesrpurser-***@public.gmane.org (wave.google.com)
purserj-xBucRuPHYkk9V/L/***@public.gmane.org (wavesandbox.com)
james-ZGY8ohtN/8plGI6Z+***@public.gmane.org (collaborynth.com.au FedOne Server)
Skype: purserj1977
GTalk: jamesrpurser-***@public.gmane.org
Daniel Paull
2010-01-21 02:10:25 UTC
Permalink
That is well understood James. The problem is that what has been
published by Google is an incomplete cient/server implementation and
some terse documentation that does not, in my opinion, provide enough
detail to actually implement a Wave client and server from the ground
up.

Dan
Post by James Purser
  My answer to the original question is
"implement server acknowledgements as documented in the Google OT
whitepaper and the problem no longer exists."
Okay, I'd just like to point out something. FedOne is open source. The
FedOne project is accepting patches for review and inclusion from
outside Google. If you believe there is a deficiency and can fix it,
please send it in. Improving FedOne can only be a good thing.
Thanks
--
James Purserhttp://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
Brett Morgan
2010-01-21 02:30:06 UTC
Permalink
Dan,

The Googlers here in Wellington for LinuxConfAu have openly acknowledged
that this is a known problem. They are hoping to get the current wave GWT
shell out atop FedOne as soon as humanly possible. They are also aware of
the need for a proper client server protocol, and are also thinking about
that need. As always, the real limit here is engineer hours.

I'm trying hard not to be an apologist for Google here, but having watched
how hard these guys and gals are working, I can't stand by and listen as
people take them down. I think the wave team deserve a world of credit for
going open this early in the project's life cycle, and for openly working
with the community both here in the forums, and at the tech conferences.

I'm eagerly awaiting the next code drop, if only to find out what else I
have to learn =)

brett
Post by Daniel Paull
That is well understood James. The problem is that what has been
published by Google is an incomplete cient/server implementation and
some terse documentation that does not, in my opinion, provide enough
detail to actually implement a Wave client and server from the ground
up.
Dan
Post by James Purser
Post by Daniel Paull
My answer to the original question is
"implement server acknowledgements as documented in the Google OT
whitepaper and the problem no longer exists."
Okay, I'd just like to point out something. FedOne is open source. The
FedOne project is accepting patches for review and inclusion from
outside Google. If you believe there is a deficiency and can fix it,
please send it in. Improving FedOne can only be a good thing.
Thanks
--
James Purserhttp://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
Jochen Bekmann
2010-01-21 09:18:10 UTC
Permalink
Hi Dan,

We're happy so see that there are eager developers out there working
on this. We have not yet published our client/server protocol because
we have prioritized our efforts at standardization on the federation
protocol as this is critical in making it viable for other people to
run wave servers that talk to ours. Our production client/server
protocol is in flux as we continue development of our client. We are
limited in the amount of work we can do - we're juggling many balls
and are keenly focused on delivering a cool and stable product. A
crappy Wave architecture will not help Wave get adopted any faster.
For example, we only have a very small number of people (think ~3) to
hammer out complex protocol details, write production quality servers,
integrate with other backends, support our own servers in production,
keep on top of a fast moving architecture in Wave-land, work on the
open source code to name a few tasks.

Indeed, we have not published the details of our c/s protocol, but
once we manage to disentangle our implementation from Google
architecture and have ironed out the bits that we've not nailed down
yet, we will contribute it as an option for people to use as they see
fit. In the mean time, for what it's worth, the FedOne client/server
protocol is a functional client/server protocol that can be extended
if you so choose. There are interesting problems that need to be
solved, and other developers have implemented their own client/server
protocols successfully (e.g. see Novell Pulse).

FWIW, we wrote a high level outline our recovery algorithm (see
http://www.waveprotocol.org/whitepapers/internal-client-server-protocol),
but did not address the particular problem of connections dropping and
dealing with a client getting it's own deltas back. In brief: we've
tried both inserting unique client-side generated ID's as well as
comparing deltas held in the client (until the server notifies us that
they are committed by the server) against incoming deltas. Because
during recovery, unacknowledged deltas are continually transformed by
the client against applied deltas coming in from the server, the
client's unacknowledged delta should match any delta (using a deep
comparison function) that it previously submitted but did not receive
an ACK for. (Note that under normal conditions we don't echo the
entire delta back to the client in order to cut down the amount of
data on the network). This is the approach we currently take (as
opposed to the unique ID approach), but there is a corner case problem
as pointed out here earlier, that two clients where the same user has
logged on submit changes at exactly the same place at the same time
when a connection drops. There is a call to be made about
intention-preservation when resolving conflicts under OT, and in this
case we felt if the same user does exactly the same action in two
clients in the time window of a submit RTT, and the TCP connection
fails, he/she most likely will be satisfied that the action is
performed only once (referring to the earlier example, "hellohello" is
probably less desirable than "hello", unless the user deliberately
wants to test recovery of concurrency control ;). There are different
possible solutions and I'm not saying this is the best, but it's one
we're currently using.

As you can see, the debate on this thread has come up with the same
solutions as us, and similarly trade-offs need to be made in this
debate. We hope our solution, when we find enough time to open source
it, will be helpful. If you can propose a much better solution in to
the problem discussed above, or any other, the open sourced protocol
we contribute could benefit from it (particularly if there is a
working implementation).

Hope that helps,
Jochen
Software Engineer, Google Wave
That is well understood James.  The problem is that what has been
published by Google is an incomplete cient/server implementation and
some terse documentation that does not, in my opinion, provide enough
detail to actually implement a Wave client and server from the ground
up.
Dan
Post by James Purser
  My answer to the original question is
"implement server acknowledgements as documented in the Google OT
whitepaper and the problem no longer exists."
Okay, I'd just like to point out something. FedOne is open source. The
FedOne project is accepting patches for review and inclusion from
outside Google. If you believe there is a deficiency and can fix it,
please send it in. Improving FedOne can only be a good thing.
Thanks
--
James Purserhttp://wavingtheshiny.collaborynth.com.au
Skype: purserj1977
--
You received this message because you are subscribed to the Google Groups "Wave Protocol" group.
For more options, visit this group at http://groups.google.com/group/wave-protocol?hl=en.
Daniel Paull
2010-01-21 13:53:19 UTC
Permalink
Hello Jochen,

I understand that the Wave team is spread thin - as we all are and
always will be. But that seems irrelevant with respect to my
comments.
Post by Jochen Bekmann
This is the approach we currently take (as
opposed to the unique ID approach), but there is a corner case problem
as pointed out here earlier, that two clients where the same user has
logged on submit changes at exactly the same place at the same time
when a connection drops.
Can you outline why you would prefer to do an expensive comparison of
operations and introduce this "corner case" rather than simply utilise
an identifier for the client process (I will call this a "site" from
here on)?

I can't say that I have ever seen this approach (comparing operation)
taken in the literature. The usual approach is to identify the site
generating operations. An operation can be identified by an (s, t)
pair, where s is the site and t is sequence number of the operation.
Why would you diverge from this?

A vector-time (which is a vector of these (s, t) pairs) identifies a
version of a document. In the case of wave, the vector-time for a
given session will only have two entries as there are only two sites
involved in the exchange of operations - the client and server.
Indeed, one of the diagrams in the Wave OT whitepaper notes the "state
space" as pairs of integers - this is a vector-time (the site
identifiers are implied).

Now, recovery *should* be a simple matter of exchanging vector-times
and sending operations that you have that the other site does not.
Once you are at the same version, recovery is complete.

Now, there is some complexity introduced by the addition of the server
acknowledgements, but I think this is merely an issue for the client -
it just needs to exclude operations that it has not sent to the server
in its vector-time during the handshaking.

Let me harp on for a minute - it urks my that the white paper alludes
to the proper use of sequence numbers and vector times, but in
reality, even the Google client/server protocol does not take this
approach. This does not inspire confidence.
Post by Jochen Bekmann
There is a call to be made about
intention-preservation when resolving conflicts under OT,
I really don't see how any of this relates to intention preservation.
Post by Jochen Bekmann
If you can propose a much better solution in to
the problem discussed above, or any other, the open sourced protocol
we contribute could benefit from it (particularly if there is a
working implementation).
Perhaps you should read the Google Wave OT paper as it seems to
suggest a "much better solution".

I would like to see an approach that builds on the current literature
(such as talking about vector times, sequence numbers and site
identifiers) rather than blazing inefficient trails that introduce
"corner cases" for no good reason.

Cheers,

Dan
Jochen Bekmann
2010-02-01 03:00:25 UTC
Permalink
Hi Dan,
Post by Daniel Paull
Hello Jochen,
I understand that the Wave team is spread thin - as we all are and
always will be.  But that seems irrelevant with respect to my
comments.
Fair enough. It was intended to address some of your earlier comments
about the lack of thorough documentation of our client/server protocol
and the lack of an open-sourced version of the client/server protocol
that our browser clients use.
Post by Daniel Paull
Post by Jochen Bekmann
This is the approach we currently take (as
opposed to the unique ID approach), but there is a corner case problem
as pointed out here earlier, that two clients where the same user has
logged on submit changes at exactly the same place at the same time
when a connection drops.
Can you outline why you would prefer to do an expensive comparison of
operations and introduce this "corner case" rather than simply utilise
an identifier for the client process (I will call this a "site" from
here on)?
I can't say that I have ever seen this approach (comparing operation)
taken in the literature.  The usual approach is to identify the site
generating operations.  An operation can be identified by an (s, t)
pair, where s is the site and t is sequence number of the operation.
Why would you diverge from this?
A vector-time (which is a vector of these (s, t) pairs) identifies a
version of a document.  In the case of wave, the vector-time for a
given session will only have two entries as there are only two sites
involved in the exchange of operations - the client and server.
Indeed, one of the diagrams in the Wave OT whitepaper notes the "state
space" as pairs of integers - this is a vector-time (the site
identifiers are implied).
Now, recovery *should* be a simple matter of exchanging vector-times
and sending operations that you have that the other site does not.
Once you are at the same version, recovery is complete.
Now, there is some complexity introduced by the addition of the server
acknowledgements, but I think this is merely an issue for the client -
it just needs to exclude operations that it has not sent to the server
in its vector-time during the handshaking.
Let me harp on for a minute - it urks my that the white paper alludes
to the proper use of sequence numbers and vector times, but in
reality, even the Google client/server protocol does not take this
approach.  This does not inspire confidence.
The decision to not use the client-generated ID was made for pragmatic
reasons. At the time we decided the overhead of including the
unique-id in all messages sent by a client, broadcast by the server to
all listening clients and including it in persistent storage was a
larger penalty than a comparison on the client. We are considering
adding a unique id back for other reasons (e.g. something analogous to
the history hash used in the federation protocol), so it might well
make a reappearance.

Our client/server protocol is based on the Jupiter paper, and we do
have the concept of a vector-time, however we don't always use the
same language when communicating informally. We have added features to
the basic OT algorithm to make the implementation of a client/server
protocol simpler and/or more performant.

Because we have not finalized the client/server protocol, we have not
published or open sourced it yet. We are working on doing this. We're
happy to discuss the choices we made and adjust the protocol when it
becomes part of the other efforts for a standardization process.

Given the tight constraints on our time, we have prioritized work on
the federation protocol (over the client/server protocol), but we are
also working on open sourcing the client/server protocol. Apologies if
our time constraints are causing frustration, we too would like to
make faster progress.

regards,
Jochen
--
You received this message because you are subscribed to the Google Groups "Wave Protocol" group.
To post to this group, send email to wave-protocol-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to wave-protocol+***@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/wave-protocol?hl=en.
Daniel Paull
2010-02-01 07:15:13 UTC
Permalink
Apologies if our time constraints are causing frustration, we too would like to
make faster progress.
Your time and effort is not my concern at all - never was and never
will be. Correctness is certainly a concern of mine. I don't like
that you are happy to promote a solution that, by your own
acknowledgement, is incorrect under the guise of "being pragmatic".

Your OT algorithms, both the transformation and control algorithms
MUST be correct. If they are not, Wave will fail. By following the
Jupiter approach, you can be assured of correctness, but then you go
and get all "pragmatic" and introduce problems for no good reason.
The decision to not use the client-generated ID was made for pragmatic
reasons. At the time we decided the overhead of including the
unique-id in all messages sent by a client, broadcast by the server to
all listening clients and including it in persistent storage was a
larger penalty than a comparison on the client.
The penalty for taking your approach was to compromise correctness.
Even the most pragmatic pragmatist would have seen that as a larger
penalty than a bit of network and disk overhead...
We are considering adding a unique id back for other reasons
Reasons of correctness perhaps?
We have added features to
the basic OT algorithm to make the implementation of a client/server
protocol simpler and/or more performant.
Are you sure about that? The client seems to be significantly more
complex than it is in the Jupiter system. Sometimes less is more, so
be careful when adding features to something that is simple, elegant
and proven to be correct.

I am still riddled with confusion though - I have stated a number of
times in this forum, and even on my blog (http://
www.thinkbottomup.com.au/site/blog/Google_Wave_Operational_Transform_and_Server_Acknowledgments)
that there is no reason for the server to send the transformed
operation back to the originating host during the broadcast of the
operation. Why do you do this? It is redundant and confusing.
Interestingly, if you did not send the transformed operation back to
the originating client, then you would not be able to do your
comparison of operations. This would have forced you to think through
the problem more carefully.

You said previously, "Note that under normal conditions we don't echo
the entire delta back to the client in order to cut down the amount of
data on the network". For what reason do you send anything back to
the client other than the ACK?

So, back to the original question. What do you do after a TCP
connection between the client and server is broken. The answer is
simple - you reopen the wave. As part of opening a wave, the client
and server should merely exchange vector times, indicating which
operations they have. Missing operations are sent from client to
server and vice versa so that client and server arrive at the same
state; then it's business as usual. This strategy works for opening
new waves (initial client vector time is zero), reconnecting after a
network failure, reconnecting after working offline and even when
connecting using a wave that was persisted on the client side. I see
no need for "recovery" to exist as a concept. Less is more once
again.

Having said that, upon reopening a wave there is the added complexity
that the client needs to deal with operations that it had never sent
to the server; I *think* it just needs to exclude them from its vector
time when opening the wave, but I've not proven this yet. That's due
to your additions to make the client/server protocol more "simple".

Cheers,

Dan
Jochen Bekmann
2010-02-01 09:37:41 UTC
Permalink
Post by Daniel Paull
Apologies if our time constraints are causing frustration, we too would like to
make faster progress.
Your time and effort is not my concern at all - never was and never
will be.  Correctness is certainly a concern of mine.  I don't like
that you are happy to promote a solution that, by your own
acknowledgement, is incorrect under the guise of "being pragmatic".
Your OT algorithms, both the transformation and control algorithms
MUST be correct.  If they are not, Wave will fail.  By following the
Jupiter approach, you can be assured of correctness, but then you go
and get all "pragmatic" and introduce problems for no good reason.
The decision to not use the client-generated ID was made for pragmatic
reasons. At the time we decided the overhead of including the
unique-id in all messages sent by a client, broadcast by the server to
all listening clients and including it in persistent storage was a
larger penalty than a comparison on the client.
The penalty for taking your approach was to compromise correctness.
Even the most pragmatic pragmatist would have seen that as a larger
penalty than a bit of network and disk overhead...
We are considering adding a unique id back for other reasons
Reasons of correctness perhaps?
Sure.
Post by Daniel Paull
I am still riddled with confusion though - I have stated a number of
times in this forum, and even on my blog (http://
www.thinkbottomup.com.au/site/blog/Google_Wave_Operational_Transform_and_Server_Acknowledgments)
that there is no reason for the server to send the transformed
operation back to the originating host during the broadcast of the
operation.  Why do you do this?  It is redundant and confusing.
Interestingly, if you did not send the transformed operation back to
the originating client, then you would not be able to do your
comparison of operations.  This would have forced you to think through
the problem more carefully.
You said previously, "Note that under normal conditions we don't echo
the entire delta back to the client in order to cut down the amount of
data on the network".  For what reason do you send anything back to
the client other than the ACK?
In our current implementation, the server does normally not echo
transformed operations back to a client. Under some failure
conditions, however, this can happen.
Post by Daniel Paull
So, back to the original question.  What do you do after a TCP
connection between the client and server is broken.  The answer is
simple - you reopen the wave.  As part of opening a wave, the client
and server should merely exchange vector times, indicating which
operations they have.  Missing operations are sent from client to
server and vice versa so that client and server arrive at the same
state; then it's business as usual.  This strategy works for opening
new waves (initial client vector time is zero), reconnecting after a
network failure, reconnecting after working offline and even when
connecting using a wave that was persisted on the client side.  I see
no need for "recovery" to exist as a concept.  Less is more once
again.
What you just described is what we call "recovery". We do have a
slightly more complex recovery mechanism to deal with a badly crashed
server loosing some state.
Post by Daniel Paull
Having said that, upon reopening a wave there is the added complexity
that the client needs to deal with operations that it had never sent
to the server; I *think* it just needs to exclude them from its vector
time when opening the wave, but I've not proven this yet.  That's due
to your additions to make the client/server protocol more "simple".
On reconnection, the client offers its vector to the server, which
indicates the latest of the client states it recognizes and any ops
the client missed. The client resends any unsubmitted ops, if
necessary first transforming them against any server ops it missed.
Post by Daniel Paull
We have added features to
the basic OT algorithm to make the implementation of a client/server
protocol simpler and/or more performant.
Are you sure about that?  The client seems to be significantly more
complex than it is in the Jupiter system.  Sometimes less is more, so
be careful when adding features to something that is simple, elegant
and proven to be correct.
In one or two cases our implementation is simpler, however while
building our solution, there were instances where we chose a less
simple option to make the overall system more performant. We were
fortunate enough to be able to consult with Dixon and Lamping
(http://portal.acm.org/citation.cfm?doid=215585.215706) during this
process. Our current implementation may turn out not to be the best,
and it may evolve to become simpler, maybe with contributions from the
community after open sourcing? Fortunately, the federation protocol
frees implementors to write their own take on the client/server
protocol as well.

regards,
Jochen
--
You received this message because you are subscribed to the Google Groups "Wave Protocol" group.
To post to this group, send email to wave-protocol-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to wave-protocol+***@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/wave-protocol?hl=en.
Torben Weis
2010-02-01 12:30:43 UTC
Permalink
Hi Jochen

In our current implementation, the server does normally not echo
Post by Jochen Bekmann
transformed operations back to a client. Under some failure
conditions, however, this can happen.
In FedOne and even WaveSandBox the server will ALWAYS ECHO BACK deltas sent
by a client (either transformed or the original). I see the messages in
QWaveClient debug output every day.

I think we have a very general non-technical problem here. The Googlers are
often talking about software that we do not have access to (at least yet).
Thus, whatever you say does not make much difference currently, because
FedOne and WaveSandBox-federation is all we have and it is all we can see.
This can easily lead to misunderstandings, because things you consider done
are still open issues in our view.

In most wave presentations, Google asks for the help of the community.
However, this turns out to be a bit difficult currently. As an open source
wave developer I have three options:

a) Remain compatible to FedOne. This is troublesome because the C/S protocol
(for example) has some issues. I just have to live with it. Furthermore, I
invest time in implementing a protocol which will eventually be replaced by
something much better, but nobody can predict when.
b) Improve things. After implementing a client and a basic server I have
gained enough experience to address the issues mentioned in point a).
However, these improvements will only turn out to become incompatibilities
as long as Google's implementation will be the reference implementation. My
implementation effort will be lost. If (rather unlikely) some independent
open source implementation will become more popular than a Google-driven
implementation, we would split the community in open-source-compatible and
Google-compatible. Both options are not desirable.
c) Wait until Google releases more code. I think we love wave technology too
much to simply sit there for weeks doing no coding.

There is no optimal solution currently. I have spent many years as an active
open source developer (KDE/Qt, KOffice, KHtml/Kfm/Konqueror). One thing I
learned is that stalling an open source effort will either kill it (people
loose interest, go somewhere else) or result in a fork (see Qt<->Harmony and
KDE<->Gnome, KHtml<->WebKit). I feel that the current situation is close to
stalling. FedOne is moving slowly (due to the resources available at Google)
and it is google-owned code. It is not a community owned code base.
QWaveClient and my C++ server are community code, but as long as Google is
the reference, these efforts will eventually stall with FedOne, too (see
point b above).

I think the best we can do is to hope for a quick release of Google's code,
as this will improve the communication and ensure that implementations are
not invalidated because they support some old "standards".

Greetings
Torben
--
You received this message because you are subscribed to the Google Groups "Wave Protocol" group.
To post to this group, send email to wave-protocol-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to wave-protocol+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/wave-protocol?hl=en.
Daniel Paull
2010-02-01 12:31:09 UTC
Permalink
Post by Jochen Bekmann
In our current implementation, the server does normally not echo
transformed operations back to a client. Under some failure
conditions, however, this can happen.
And herein lies the problem. What you have just said is not clear at
all from the Google OT whitepaper and certainly is not the way the
FedOne client/server works. To expect an audience that is not
familiar with OT to work out for themselves that it is wrong to send
the transformed operation from the server back to the originating
client is, in my opinion, unreasonable. However, it would have taken
very little time and effort to mention it somewhere.

To make matters worse, your previous comments stated that the server
did indeed send the transformed op back to the client under normal
operating conditions (albeit not "the entire delta ... in order to cut
down the amount of data on the network"). So, I am now de-riddled of
confusion. Thank you for the clarification. It's a shame the
clarification didn't come months ago in another thread, but hey,
better late than never, right? I know you're very busy.
Post by Jochen Bekmann
What you just described is what we call "recovery". We do have a
slightly more complex recovery mechanism to deal with a badly crashed
server loosing some state.
Why would you call it recovery? It's a normal part of the handshaking
between a client and server when a wave is opened.
Post by Jochen Bekmann
We were
fortunate enough to be able to consult with Dixon and Lamping
(http://portal.acm.org/citation.cfm?doid=215585.215706) during this
process.
Yes, we are all aware that Dixon and Lamping had a hand in what you
have done. May I ask which other OT researchers you consulted in
deciding which path to head down? I know Sun has given talks to
Google - have you considered his work?
Post by Jochen Bekmann
Fortunately, the federation protocol
frees implementors to write their own take on the client/server
protocol as well.
I'm not convinced that this is entirely true. I think that the
federation protocol forces your hand quite a bit. Perhaps we have
different ideas about what freedoms you are talking about.

Cheers,

Dan
--
You received this message because you are subscribed to the Google Groups "Wave Protocol" group.
To post to this group, send email to wave-protocol-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to wave-protocol+***@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/wave-protocol?hl=en.
Daniel Paull
2010-02-08 00:13:28 UTC
Permalink
Hello Jochen,

I was hoping that you would address some of the questions/issues
Post by Daniel Paull
Yes, we are all aware that Dixon and Lamping had a hand in what you
have done.  May I ask which other OT researchers you consulted in
deciding which path to head down?  I know Sun has given talks to
Google - have you considered his work?
Are you able to answer this?

Cheers,

Dan
Post by Daniel Paull
Post by Jochen Bekmann
In our current implementation, the server does normally not echo
transformed operations back to a client. Under some failure
conditions, however, this can happen.
And herein lies the problem.  What you have just said is not clear at
all from the Google OT whitepaper and certainly is not the way the
FedOne client/server works.  To expect an audience that is not
familiar with OT to work out for themselves that it is wrong to send
the transformed operation from the server back to the originating
client is, in my opinion, unreasonable.  However, it would have taken
very little time and effort to mention it somewhere.
To make matters worse, your previous comments stated that the server
did indeed send the transformed op back to the client under normal
operating conditions (albeit not "the entire delta ... in order to cut
down the amount of data on the network").  So, I am now de-riddled of
confusion.  Thank you for the clarification.  It's a shame the
clarification didn't come months ago in another thread, but hey,
better late than never, right?  I know you're very busy.
Post by Jochen Bekmann
What you just described is what we call "recovery". We do have a
slightly more complex recovery mechanism to deal with a badly crashed
server loosing some state.
Why would you call it recovery?  It's a normal part of the handshaking
between a client and server when a wave is opened.
Post by Jochen Bekmann
We were
fortunate enough to be able to consult with Dixon and Lamping
(http://portal.acm.org/citation.cfm?doid=215585.215706) during this
process.
Yes, we are all aware that Dixon and Lamping had a hand in what you
have done.  May I ask which other OT researchers you consulted in
deciding which path to head down?  I know Sun has given talks to
Google - have you considered his work?
Post by Jochen Bekmann
Fortunately, the federation protocol
frees implementors to write their own take on the client/server
protocol as well.
I'm not convinced that this is entirely true.  I think that the
federation protocol forces your hand quite a bit.  Perhaps we have
different ideas about what freedoms you are talking about.
Cheers,
Dan
--
You received this message because you are subscribed to the Google Groups "Wave Protocol" group.
To post to this group, send email to wave-protocol-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to wave-protocol+***@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/wave-protocol?hl=en.
Daniel Paull
2010-01-20 01:32:17 UTC
Permalink
Hi Torben,

Someone correct me if I am wrong, but the FedOne server does not send
acknowledgements to the client as described in Operational
Transformation whitepaper. I believe this is part of the puzzle that
your are missing.

Furthermore, I can not see any reason why the server sends the
transformed operation back to the originating client when it
broadcasts the operation - that just confuses things. For example, it
was suggested that you can "you can compare the [transformed] docops
coming back down for equality"; this is just asking for false positive
matches. The server acknowledgement is the right mechanism to use
here.

You may like to read this thread:

http://groups.google.com/group/wave-protocol/browse_frm/thread/acca2ac22caeb04e/93d1b3e222679a27

And maybe look at my blog post:

http://www.thinkbottomup.com.au/site/blog/Google_Wave_Operational_Transform_and_Server_Acknowledgments

Your question is timely as there has been some discussion on defining
a client/server protocol. It is my opinion that this group doen not
understand the client side OT algorithms enough to define the
protocol. This is illustrated here by suggestions to add client-id's
(which I believe are not needed) to messages.

This opinion was stated here:

http://groups.google.com/group/wave-protocol/browse_frm/thread/184b66dfb3b9a24d/946cf09e5069d0f4#946cf09e5069d0f4

Someone in this community needs to spend some time understanding the
wave client and fixing FedOne. New comers to this forum seem to be
taking FedOne to be a reference implementation of both a wave client
and server - which it is neither.

Cheers,

Dan
Post by Torben Weis
Hi,
I have been pondering on how to support offline-working in QWaveClient.
It is getting tricky when the TCP connection breaks accidentially,
Let's imagine the client sent a delta to the server, the server processes
it,
but the client could not hear it because the connection is broken.
Eventually the client will reconnect. Now it does not have a chance of
seeing
whether its last delta has reached the server or not.
In general it seems next to impossible to tell whether any given server
delta corresponds
to a client delta, because deltas are transformed and do not carry IDs.
Currently I see no quick solution.
a) Introduce IDs in deltas. Unlikely because that would require Google to
change
   its code heavily? Nevertheless, it would be nice because currently
QWaveClient uses
   a most aweful hack to determin whether its delta has been processed by
the server ...
b) When connection to FedOne, the server should ask for a client ID.
   For each client ID & wavelet ID it keeps a persistent record about the
version of the last submitted delta.
   Upon connect the client can query fedone for this version information.
   This would mean no modifications to the federation protocol, but it would
require some fedone extensions.
Any other ideas, suggestions?
According to the wave specs a client must have only one outstanding delta.
Thus, it is important to find out whether a delta has been accepted or not
to
determine when the next delta can be sent.
To solve this, QWaveClient waits until it receives a delta from the server
which is authored by
its own user, assuming that this is the server response to the delta
QWaveClient has submitted itself.
This hack can break if the same user connects with two QWaveClient instances
and concurrently
types in both instances (ok, unlikely, but still ...).
Did I miss a trick of getting this right?
Cheers
Torben
James Purser
2010-01-20 01:46:47 UTC
Permalink
Post by Daniel Paull
Someone in this community needs to spend some time understanding the
wave client and fixing FedOne. New comers to this forum seem to be
taking FedOne to be a reference implementation of both a wave client
and server - which it is neither.
Actually I'm going to have to disagree with you here, with reference to
the server at least.

FedOne is the most feature complete server available to those of us
working on the wave. It is what people use to test their own
implementations of the federation protocol (there is at least one
project I know that has done this), as well as explore the concept of
remote hosted agents. In this manner it IS the Reference Server for
federation and Agents.

While it may not be completely compliant with the federation protocols,
it is the closest that we have at the moment and is being improved with
every code drop.
--
James Purser
http://wavingtheshiny.collaborynth.com.au
Wave Addresses:
jamesrpurser-***@public.gmane.org (wave.google.com)
purserj-xBucRuPHYkk9V/L/***@public.gmane.org (wavesandbox.com)
james-ZGY8ohtN/8plGI6Z+***@public.gmane.org (collaborynth.com.au FedOne Server)
Skype: purserj1977
GTalk: jamesrpurser-***@public.gmane.org
Daniel Paull
2010-01-20 11:23:22 UTC
Permalink
Post by James Purser
While it may not be completely compliant with the federation protocols
You seem to be agreeing that it is NOT a reference implementation of a
Wave server...

I really don't understand why so much effort is going into FedOne
federation when the FedOne client is so broken. How the heck do you
test your federated server for correctness?

Dan
Brett Morgan
2010-01-20 11:34:53 UTC
Permalink
I think probey is part of the answer to the "how does one test FedOne
federation."

I think another part of the answer is that the FedOne client isn't broken
per se, it's just not the GWT shell that everyone associates with wave.

I think a bigger question is, who else is taking the time to understand OT
enough to be able to implement full client/server systems that do transforms
on both the client and the server? I'd love to have people to chat with
about what I'm doing over here... =)
Post by Daniel Paull
Post by James Purser
While it may not be completely compliant with the federation protocols
You seem to be agreeing that it is NOT a reference implementation of a
Wave server...
I really don't understand why so much effort is going into FedOne
federation when the FedOne client is so broken. How the heck do you
test your federated server for correctness?
Dan
--
You received this message because you are subscribed to the Google Groups
"Wave Protocol" group.
To unsubscribe from this group, send email to
.
For more options, visit this group at
http://groups.google.com/group/wave-protocol?hl=en.
--
Brett Morgan http://domesticmouse.livejournal.com/
Loading...