User finds functions that won't work
There are multiple unspecialized run methods in machinelearning.h that are potentially confusing to users.
This could be fixed by specialization, or by moving to a different wrapper based on composition.
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Link issues together to show that they're related. Learn more.
Activity
- Owner
FWIW, I started writing a machine learning class factory a few weeks ago, as a proof of concept. It instantiates classes derived from a MachineLearningAlgorithm base class, based on a string identifier.
It's just a draft / mockup but I can share a zip after I clean it a bit, if you want to have a look. Anyway this doesn't solve all our problems, in particular it doesn't solve the specialization problem (I'm still persuaded that using a runResults struct would be a nice solution) ... and I didn't find a better way to manage training configurations neither ... But the factory works well so we might want to reuse it. - Owner
Here you go.
MachineLearningFactoryTest.zip
Unzip,cd MachineLearningFactoryTest
,make
,./all
This should return 3.
In this test files there is also a mockup of an API for Example and TrainingSet classes (mostly here because I tried a lot of new things together on my side and left them altogether). Still didn't look at the phraseToExample branch yet, but I'll stick to it ... - Author Owner
@jfrin001 fyi
- Joseph Larralde mentioned in issue #40
mentioned in issue #40
- Michael Zbyszyński mentioned in issue #46
mentioned in issue #46
- Developer
The factory method seems nice to me.
runResults struct would be a nice solution
I'm not sure this is the same thing I'm thinking of. But I wanted to mention that it would be nice to access some results if they were generated every run call anyway, I can also imagine that the factory method restricts the possibility of specialization at compile-time.
The dynamic cast would make this ugly, but imagine the code in this SO post with different classes/structs for different result types. This would negate having a lowest common denominator of shared variables in such a result struct and could be returned with a shared pointer. The user would however have to specifically state which type they are expecting in the get template.
I also understood that it might be hard to add extra algorithms to Rapid-Lib if this is implemented, as it would require writing a result wrapper for each added algorithm.
So, Just an idea.
- James Frink mentioned in issue #15
mentioned in issue #15
- Owner
Thanks James for this valuable insight.
I really like the "pattern" in the post you link to.
Not sure how to apply it best in our case ...
Something likemyMachineLearningAlgorithm.run<myRegressionResults>(myVectorInput)
? That would be an option.
More thinking needed ... - Author Owner
It's sort of like this: http://gitlab.doc.gold.ac.uk/rapid-mix/RapidLib/blob/master/src/classification.cpp#L99
Strange that I put a comment on it.
- Owner
Ok, I finally managed to get my eyes onto this thread and code bits.
@Joseph, I see you created a ML algorithms factory, which is implemented as singleton and that keeps ML models in a hashmap. But what are you proposing, using this as replacement for ModelSet? Is this another utility class for model management for the user?
I also understood that it might be hard to add extra algorithms to Rapid-Lib if this is implemented, as it would require writing a result wrapper for each added algorithm.
Yep @jfrin001, extensibility is something that we should guarantee in our design. But it seems to me that's a problem of writing the result wrapper for each new added algorithm is of the library maintainer(s) or algorithm integrator (i.e., experts), while this brings benefits to the user.
I was just looking at the SO code you pointed out. I don't have a deep knowledge about templates but from what I understand, this seems to solve the specialisation problem and bypass the need for the master struct. It that what you're saying about the common denominator?
@MikeZ, am I missing something... or it also actually implements a generic variable-size vector AND allows specialised getters with it? Couldn't we use type aliases for making it more readable?
MachineLearningAlgorithm mla1; Output<PredictedClassLabel> o1(2); mla.outputs.push_back(&o1); Output<Likelihood> o2(0.23); mla.outputs.push_back(&o2); Output<Distance> o3(103); mla.outputs.push_back(&o3); mla.outputs[0]->get<PredictedClassLabel>(); mla.outputs[1]->get<Likelihood>(); mla.outputs[2]->get<Distance>();
Edited by Francisco Bernardo - Michael Zbyszyński mentioned in issue #49 (closed)
mentioned in issue #49 (closed)
- Author Owner
This thread has really become a discussion of #40, which is related.
The problem cited here is that the run() method is currently overloaded, but only one implementation is valid for any particular model. This becomes a problem when an IDE auto-populates functions, for example. See attached:
- Author Owner
The solution to this problem is to not overload the run method like this.
One way to do that is to make DTW.run() have the same inputs and outputs as everything else. I'll need to finish up windowed DTW as mentioned in #37. And then somehow deal with the fact that DTW wants to return a std:string, but everything else returns std::vector. This is where #40 or #15 might be related.