Change GVF return data
Right now, GVF is putting likeliest gesture, likelihoods, and alignments into on flat vector. This is a potentially confusing design that could be improved. Either all of the machine learning algorithms should return a struct (issue #15), or GVF should return the likeliest gesture and the other data could be retrieved with get() functions.
Activity
- Michael Zbyszyński added Ready label
added Ready label
- Michael Zbyszyński assigned to @francisco
assigned to @francisco
- Michael Zbyszyński changed the description
changed the description
- Owner
The design of HMM classification is similar, the returned vector is interleaved likelihoods / time progressions, like this : likelihood1 progression1 ... likelihoodN progressionN.
Also, The HMM regression could return the same data plus the regression values, but it would be a flattened vector too, not very intuitive.
If the complexity overhead (in terms of CPU) is negligible, a bunch of get methods would be a nice design, probably cleaner than returning a generic struct. If we go this way I think I would change the design of HMM classification and have it return only a vector of likelihoods, or the likeliest label / index. - Joseph Larralde mentioned in issue #15
mentioned in issue #15
- Author Owner
I noticed that HMM is doing this too. I think that interleaved values are a little easier because the user code won't break if the number of training examples changes. But it is less "self documenting" than a clear struct or explicit getters.
- Owner
Cross-posting from community forum
Here's what I suggested there, it seems to me it covers the spectrum of use-cases. It does seem that the flat vector isn't consensual.
- 1 value, with the most usual option - (basic user)
- Flat vector with everything - (basic similar to providing getString() in an interface in Java)
- specialised access through specialised getters - (advanced user)
Edited by Francisco Bernardo - Author Owner
I like options 1 and 3 on your list. We can combine them.
My arguments against flat vectors are that they are not self-documenting, and could break if the data change. For example, if I wanted to set some synth based on the likelihood and alignment of the first gesture, I might to this:
auto gvfReturn = myGVF.run(inputVector); synthParam1 = gvfReturn[1]; synthParam2 = gvfReturn[?????}; //What value do you use here?
Let's say I know I have five examples. Then I could write:
synthParam2 = gvfReturn[6];
But, if I add another example, then my code is broken. Also, a developer looking at this code doesn't have any way of knowing what gvfReturn[1] and gvfReturn[6] are without consulting the documentation. So, this is very hard to read and very hard to debug.
This code takes care of itself:
myGVF.run(inputVector); synthParam1 = myGVF.getLikelihood[0]; synthParam2 = myGVF.getAlignment[0];
I can read what they are, and it will work if the number of examples changes. The struct variant is also good:
auto returnStruct = myGVF.run(inputVector); synthParam1 = returnStruct.likelihoods[0]; synthParam2 = returnStruct.alignments[0];
I remember us discussing this, and there was an article like "Why Google prefers flat vectors." or something similar? I'm sure the flat vectors win on efficiency, but not on useability.
- Author Owner
XMM interleaves its data like this:
likelihood1, progression1, likelhood2, progression2... likelihoodN, progressionN
This works a bit better because code like this:
int importantExample = 6; vector<T> xmmReturn = myXmm.run(inputVector); importantLikelihood = xmmReturn[importantExample]; importantProgression = xmmReturn[importantExample + 1];
...should always return a pair of points from the same example. I think XMM might do some ordering of examples that is surprising, though.
- Owner
I understand you argument and agree with pretty much everything. Of course, for using flat vectors there are assumptions:
-
there is a convention for the order of the elements of the vector (just like in the interleaved organisation provided by XMM)
-
the user needs to consult documentation to learn about that convention
Yes, flat vectors aren't self documenting. But at the end of day, you will also need to provide documentation about, e.g., what Alignment in you getter means, how it differs from Likelihood, or how Likelihood differs from Cost in some other model class.
If I was a beginning user, I would probably wanted to see what would be the "whole of the cake" that a model outputs (i.e., the complete frame of values), just like as I would like to see the whole frame a sensor output before selecting the specific features that interest me.
Edited by Francisco Bernardo -
- Author Owner
OK, so "self-documenting" is a positive API feature.
I think I would describe the "whole of cake" issue as "discovery" or "discoverability" -- which is also something that we need to take seriously. The whole run() discussion kicked off again because XCode was autocompleting a run() method for dtw that doesn't actually work. So, this is some kind of "false discovery" -- kind of an anti-feature.
This wraps around back to #15 I like get() methods because they are discoverable through IDE's, doxygen documentation etc. I worry that a master struct with members for everything that are only populated on a per-model basis might also be false discovery. For instance, if users get back a struct like:
returnStruct = { thing1 = { 1, 2, 3 }, thing2 = NULL, thing3 = NULL }
How do they know if thing2 is not populated because it 1) isn't relevant to this model, 2) could be calculated with a subsequent function call, or 3) there is a bug?
Edited by Michael Zbyszyński - Owner
Yep, "self-documenting" is a positive API feature, and is recommended in API design. see https://people.mpi-inf.mpg.de/~jblanche/api-design.pdf, section 4.1, Naming
Sorry, now I'm not following the wrap around. I thought this whole discussion was heading away from the master struct concept. I think that design is problematic.
Edited by Francisco Bernardo - Author Owner
- Author Owner
I added a bunch of get() methods to GVF. I did not change the run method. I think this is good enough for now?
- Michael Zbyszyński added In progress and removed Ready labels
added In progress and removed Ready labels
- Owner
Great, thanks Mike
- Michael Zbyszyński closed
closed
- Michael Zbyszyński removed In progress label
removed In progress label