DataRobot does not have an easy method for setting up a free trial from their website. It was possible, however this required that we contact them directly to set up a trial.
One major drawback for DataRobot was its inability to be licensed for embedded use. They have an explicit policy regarding usage and their payment model was per user, which could become difficult to manage within our application. To use DataRobot, it would require using their brand and purchasing individual licenses for each instance.
The speed of DataRobot was not impressive. As their demo does not use GPU acceleration, running models uses a considerably larger number of resources, and to be able to run tests could take several days, as compared to the several hours for Driverless.
DataRobot does integrate a large number of Open Source libraries (including those created by H2O), so many models are available.
Can run on system
It was possible to run DataRobot on our servers (on AWS specifically, since without the GPU acceleration, Hetzner became irrelevant).
While DataRobot provided some basic feature generation (such as identification of missing values), it was unable to provide the same level of automation as Driverless.
Below is a workflow of how it detects, calculates and populates missing values within some keys.
Ability to Select and Deploy Model
One of the biggest strengths of DataRobot is its easy-to-use leaderboard. We were able to identify upon a glance which models are the most accurate, or which ones ran the fastest. We were also able to select the desired model from the leaderboard and deploy it with a click.
DataRobot provides some excellent Business Intelligence tools. For instance, below is a simple feature impact chart, showing which elements in our models would have the greatest toward identifying and predicting customer behavior.
Unfortunately, DataRobot has some serious trouble handling Big Data. Due to its built-in processing limits and the fact that the Open Source libraries it is built on don’t have this capability, it struggles heavily for any dataset of any size.
The result is that we would be forced to work with subsets or samples of our data, which in many cases can provide useful information, may miss out on some important trends, and also increase the likelihood of data leakage.
DataRobot does make it easy to gain some solid information about the quality of the data you provide it. It provides some good basic reports which will help a business user to gain an understanding of the data.
For instance, the following report helps identify which fields may provide data leakage due to false positives in the data:
Many column data
Like Driverless, DataRobot can handle data from datasets that carry a large number of keys.