Have run() return a struct
@Joseph has added a struct to machinelearning.h which could hold many types of possible algorithm outputs. How far to integrate this into the whole API?
/** @brief A generic ouptut struct to fit all kinds of models */
typedef struct runResults_t {
std::vector<double> likelihoods;
std::vector<double> regression;
std::vector<double> progressions;
std::string likeliest;
} runResults;
Link issues together to show that they're related. Learn more.
Activity
- Author Owner
I think GVF could have more types of outputs. @francisco ?
- Author Owner
DTW calculates costs, not likelihoods. They have similar meaning --- I wonder what is the best term here.
It is possible that some of these potential outputs could incur additional calculation that would be inefficient if the user doesn't need that output. For example, to get likelihoods from DTW the model would need to keep track of the most likely for each label. It doesn't do that by default; it is slightly more efficient to just find the lowest cost and then look up the label associated with that series.
- Michael Zbyszyński mentioned in issue #14 (closed)
mentioned in issue #14 (closed)
- Owner
Yes, GVF has additional parameters on the outcomes structure.
int likeliestGesture; vector<float> likelihoods; vector<float> alignments; vector<vector<float> > dynamics; vector<vector<float> > scalings; vector<vector<float> > rotations;
There is an obvious commons set of fields too.
Edited by Francisco Bernardo - Owner
I think we can pretty much drop everything we need in this struct while saving backwards-compatibility.
The solution I would choose to avoid undesired additional calculations costs would be to pass a config struct (or class) to the machineLearning instance. The defaults could be the least complex set of parameter values, and could be easily overrided. - Owner
Developing the idea of passing a config class instance, for the moment I use a simple struct for the XMM wrapper, but we might want to be able to use poymorphism (static or dynamic) and have a generic template or base class for the configuration data, for example :
template <class C> class machineLearningConfig { // do stuff } template <class C> class machineLearning { private: machineLearningConfig<C> config; public: machineLearning() { } machineLearning(machineLearningConfig<C>) { // do stuff } ... void updateConfig(machineLearningConfig<C>) };
Then write some specialized templates and typedef them (I didn't test this but you get the idea). Or, we could use dynamic polymorphism to achieve a similar thing.
There might be some drawbacks which I didn't think of when throwing this idea, though ...
- Michael Zbyszyński added In progress label
added In progress label
- Author Owner
@JFrink @jfrin001 Just brought up this again.
- Michael Zbyszyński added Ready and removed In progress labels
added Ready and removed In progress labels
- Developer
Or, we could use dynamic polymorphism to achieve a similar thing.
This is what I was thinking of as well. Inheriting from a base class and adding to / overriding it could be a way, or as I stated in this thread, this method could also work.
Edited by James Frink - Michael Zbyszyński mentioned in issue #49 (closed)
mentioned in issue #49 (closed)
- Author Owner
Looking at the actual implementations again, I realize that this is something that does need some attention. The current API is a bit confusing.
I think all regression types seem fine, they are returning a vector of parameters. But we can do better with classifiers.
- knn is pretty simple: it just returns a class. That can be string or a numeric label.
- dtw has potentially more complication it has:
std::string label; //label of best match std::vector<T> costs; //The costs to match to each example
label could be the index of the best matching series. Users might also want lowest costs per label. (eg closest circle, closest triangle, etc.) There are some other things, like distances between labels and length statistics for examples. I think those should definitely be handled by some algorithm specific function other than run().
- gmm returns a vector of likelihoods
- hmm returns returns a vector with likelihoods and something called progressions?
- Owner
Just commented on this in #49 (closed).
Indeed, the result of hmm classification is an interleaved vector of likelihoods / time progressions (normalized estimated position of the "cursor" for each gesture).
I like the idea of having specialized getters for complex data mentioned in #49 (closed). We just need to take care of what is the default returned data on each call to run() : for example, a vector of likelihoods is good but not sufficient in some cases, if I train the model with a new training set containing a new label, xmm will output the likelihoods based on the alphabetical order of the labels, which requires the user to get the vector of labels in order to know which likelihood corresponds to which label each time the labels change in the training set. - Author Owner
We could have run() return true if it worked, and otherwise false. The then everything else is getX(). That sounds crazy, but that's how GRT works. Here's that GMM in action:
bool predictSuccess = gmm.predict( inputVector ); if( !predictSuccess ){ cout << "Failed to perform prediction for test sampel: " << i <<"\n"; return EXIT_FAILURE; } //Get the predicted class label UINT predictedClassLabel = gmm.getPredictedClassLabel(); VectorFloat classLikelihoods = gmm.getClassLikelihoods(); VectorFloat classDistances = gmm.getClassDistances();
- Owner
Sounds good to me. No ambiguity, solves our problems ...
What are the drawbacks in your opinion ? - Developer
Here's that GMM in action
This looks good to me, however, not all ML models have the same functions, thereby making specialization even more undefined if the machineLearning class just appended these gets to any ML model, so one way to solve this might be to make these functions throw an error until they are overwritten, stating that the "modelName" does not have X getter?
- Michael Zbyszyński mentioned in issue #44
mentioned in issue #44