Change GVF return data

added Ready label

changed the description

The design of HMM classification is similar, the returned vector is interleaved likelihoods / time progressions, like this : likelihood1 progression1 ... likelihoodN progressionN.
Also, The HMM regression could return the same data plus the regression values, but it would be a flattened vector too, not very intuitive.
If the complexity overhead (in terms of CPU) is negligible, a bunch of get methods would be a nice design, probably cleaner than returning a generic struct. If we go this way I think I would change the design of HMM classification and have it return only a vector of likelihoods, or the likeliest label / index.

mentioned in issue #15

I noticed that HMM is doing this too. I think that interleaved values are a little easier because the user code won't break if the number of training examples changes. But it is less "self documenting" than a clear struct or explicit getters.

Cross-posting from community forum

Here's what I suggested there, it seems to me it covers the spectrum of use-cases. It does seem that the flat vector isn't consensual.

1 value, with the most usual option - (basic user)
Flat vector with everything - (basic similar to providing getString() in an interface in Java)
specialised access through specialised getters - (advanced user)

I like options 1 and 3 on your list. We can combine them.

My arguments against flat vectors are that they are not self-documenting, and could break if the data change. For example, if I wanted to set some synth based on the likelihood and alignment of the first gesture, I might to this:

auto gvfReturn = myGVF.run(inputVector);
synthParam1 = gvfReturn[1];
synthParam2 = gvfReturn[?????}; //What value do you use here?

Let's say I know I have five examples. Then I could write:

synthParam2 = gvfReturn[6];

But, if I add another example, then my code is broken. Also, a developer looking at this code doesn't have any way of knowing what gvfReturn[1] and gvfReturn[6] are without consulting the documentation. So, this is very hard to read and very hard to debug.

This code takes care of itself:

myGVF.run(inputVector);
synthParam1 = myGVF.getLikelihood[0];
synthParam2 = myGVF.getAlignment[0];

I can read what they are, and it will work if the number of examples changes. The struct variant is also good:

auto returnStruct = myGVF.run(inputVector);
synthParam1 = returnStruct.likelihoods[0];
synthParam2 = returnStruct.alignments[0];

I remember us discussing this, and there was an article like "Why Google prefers flat vectors." or something similar? I'm sure the flat vectors win on efficiency, but not on useability.

XMM interleaves its data like this:

likelihood1, progression1, likelhood2, progression2... likelihoodN, progressionN

This works a bit better because code like this:

int importantExample = 6;
vector<T> xmmReturn = myXmm.run(inputVector);
importantLikelihood = xmmReturn[importantExample];
importantProgression = xmmReturn[importantExample + 1];

...should always return a pair of points from the same example. I think XMM might do some ordering of examples that is surprising, though.

I understand you argument and agree with pretty much everything. Of course, for using flat vectors there are assumptions:

there is a convention for the order of the elements of the vector (just like in the interleaved organisation provided by XMM)
the user needs to consult documentation to learn about that convention

Yes, flat vectors aren't self documenting. But at the end of day, you will also need to provide documentation about, e.g., what Alignment in you getter means, how it differs from Likelihood, or how Likelihood differs from Cost in some other model class.

If I was a beginning user, I would probably wanted to see what would be the "whole of the cake" that a model outputs (i.e., the complete frame of values), just like as I would like to see the whole frame a sensor output before selecting the specific features that interest me.

OK, so "self-documenting" is a positive API feature.

I think I would describe the "whole of cake" issue as "discovery" or "discoverability" -- which is also something that we need to take seriously. The whole run() discussion kicked off again because XCode was autocompleting a run() method for dtw that doesn't actually work. So, this is some kind of "false discovery" -- kind of an anti-feature.

This wraps around back to #15 I like get() methods because they are discoverable through IDE's, doxygen documentation etc. I worry that a master struct with members for everything that are only populated on a per-model basis might also be false discovery. For instance, if users get back a struct like:

returnStruct = {
 thing1 = { 1, 2, 3 },
 thing2 = NULL,
 thing3 = NULL
}

How do they know if thing2 is not populated because it 1) isn't relevant to this model, 2) could be calculated with a subsequent function call, or 3) there is a bug?

Yep, "self-documenting" is a positive API feature, and is recommended in API design. see https://people.mpi-inf.mpg.de/~jblanche/api-design.pdf, section 4.1, Naming

Sorry, now I'm not following the wrap around. I thought this whole discussion was heading away from the master struct concept. I think that design is problematic.

My fault. There are a bunch of issues that could affect the solution to this one. #15, #40, #44 are related. The two most likely solutions are in the body of this issue, but the design discussion should probably continue elsewhere.

I added a bunch of get() methods to GVF. I did not change the run method. I think this is good enough for now?

added In progress and removed Ready labels

Great, thanks Mike

closed

removed In progress label

Change GVF return data

Child items

Activity

Admin message

Change GVF return data

Child items

Linked items

Related merge requests

Activity