Predictive workflow overview¶
This section provides a generalized discussion of the steps to build predictive models in Workbench. See the fundamentals of predictive modeling for a description of predictive modeling methods.
Predictive model training workflow¶
This section walks you through the steps to implement a DataRobot modeling experiment.
-
To begin the modeling process, import your data or wrangle your data to provide a seamless, scalable, and secure way to access and transform data for modeling.
-
DataRobot conducts the first stage of exploratory data analysis, (EDA1), where it analyzes data features. When registration is complete, the Data preview tab shows feature details, including a histogram and summary statistics.
-
Next, for supervised modeling,select your target and optionally change any other basic or advanced experiment configuration settings. Then, start modeling.
DataRobot generates feature lists from which to build models. By default, it uses the feature list with the most informative features. Alternatively, you can select different generated feature lists or create your own.
-
DataRobot further evaluates the data during EDA2, determining which features correlate to the target (feature importance) and which features are informative, among other information.
The application performs feature engineering—transforming, generating, and reducing the feature set depending on the experiment type and selected settings.
-
DataRobot selects blueprints based on the experiment type and builds candidate models.
Analyze and select a model¶
DataRobot automatically generates models and displays them on the Leaderboard. The most accurate model is selected and trained on 100% of the data and is marked with the Prepared for Deployment badge.
To analyze and select a model:
-
Compare models by selecting an optimization metric from the Metric dropdown.
-
Analyze the model using the visualization tools that are best suited for the type of model you are building. Use model comparison for experiments within a single Use Case.
See the list of experiment types and associated visualizations below.
-
Experiment with modeling settings to potentially improve the accuracy of your model. You can try rerunning modeling using a different feature list or modeling mode.
-
After analyzing your models, select the best and send it to Registry to create a deployment-ready model package.
Tip
It's recommended that you test predictions before deploying. If you aren't satisfied with the results, you can revisit the modeling process and further experiment with feature lists and optimization settings. You might also find that gathering more informative data features can improve outcomes.
-
As part of the deployment process, you make predictions. You can also set up a recurring batch prediction job.
-
DataRobot uses a variety of metrics to monitor your deployment. Use the application's visualizations to track data (feature) drift, accuracy, bias, service health, and many more. You can set up notifications so that you are regularly informed of the model's status.
Tip
Consider enabling automatic retraining to automate an end-to-end workflow. With automatic retraining, DataRobot regularly tests challenger models against the current best model (the champion model) and replaces the champion if a challenger outperforms it.
Which visualizations should you use?¶
Model insights help to interpret, explain, and validate what drives a model’s predictions. They are then used to assess what to do in your next experiment. While there are many visualizations available, not all are applicable to all modeling experiments—the visualizations you can access depend on your experiment type. The following table lists experiment types and examples of visualizations that are suited to their analysis. See the full list of insights to learn what you can access from your experiment's Leaderboard.
Experiment type | Analysis tools |
---|---|
All models |
|
Regression |
|
Classification |
|
Time-aware modeling (time series and out-of-time validation) |
|
Multiseries | Series Insights: Provides a histogram and table for series-specific information. |
Segmented modeling | Segmentation tab: Displays data about each segment of a Combined Model. |
Multilabel modeling | Metric values: Summarizes performance across labels for different values of the prediction threshold (which can be set from the page). |
Image augmentation |
|
Text AI |
|
Geospatial AI |
|
Clustering |
|
Anomaly detection |
|