Skip to content

Predictive workflow overview

This section provides a generalized discussion of the steps to build predictive models in Workbench. See the fundamentals of predictive modeling for a description of predictive modeling methods.

Predictive model training workflow

This section walks you through the steps to implement a DataRobot modeling experiment.

  1. To begin the modeling process, import your data or wrangle your data to provide a seamless, scalable, and secure way to access and transform data for modeling.

  2. DataRobot conducts the first stage of exploratory data analysis, (EDA1), where it analyzes data features. When registration is complete, the Data preview tab shows feature details, including a histogram and summary statistics.

  3. Next, for supervised modeling,select your target and optionally change any other basic or advanced experiment configuration settings. Then, start modeling.

    DataRobot generates feature lists from which to build models. By default, it uses the feature list with the most informative features. Alternatively, you can select different generated feature lists or create your own.

  4. DataRobot further evaluates the data during EDA2, determining which features correlate to the target (feature importance) and which features are informative, among other information.

    The application performs feature engineering—transforming, generating, and reducing the feature set depending on the experiment type and selected settings.

  5. DataRobot selects blueprints based on the experiment type and builds candidate models.

Analyze and select a model

DataRobot automatically generates models and displays them on the Leaderboard. The most accurate model is selected and trained on 100% of the data and is marked with the Prepared for Deployment badge.

To analyze and select a model:

  1. Compare models by selecting an optimization metric from the Metric dropdown.

  2. Analyze the model using the visualization tools that are best suited for the type of model you are building. Use model comparison for experiments within a single Use Case.

    See the list of experiment types and associated visualizations below.

  3. Experiment with modeling settings to potentially improve the accuracy of your model. You can try rerunning modeling using a different feature list or modeling mode.

  4. After analyzing your models, select the best and send it to Registry to create a deployment-ready model package.

    Tip

    It's recommended that you test predictions before deploying. If you aren't satisfied with the results, you can revisit the modeling process and further experiment with feature lists and optimization settings. You might also find that gathering more informative data features can improve outcomes.

  5. As part of the deployment process, you make predictions. You can also set up a recurring batch prediction job.

  6. DataRobot uses a variety of metrics to monitor your deployment. Use the application's visualizations to track data (feature) drift, accuracy, bias, service health, and many more. You can set up notifications so that you are regularly informed of the model's status.

    Tip

    Consider enabling automatic retraining to automate an end-to-end workflow. With automatic retraining, DataRobot regularly tests challenger models against the current best model (the champion model) and replaces the champion if a challenger outperforms it.

Which visualizations should you use?

Model insights help to interpret, explain, and validate what drives a model’s predictions. They are then used to assess what to do in your next experiment. While there are many visualizations available, not all are applicable to all modeling experiments—the visualizations you can access depend on your experiment type. The following table lists experiment types and examples of visualizations that are suited to their analysis. See the full list of insights to learn what you can access from your experiment's Leaderboard.

Experiment type Analysis tools
All models
  • Feature Impact: Provides a high-level visualization that identifies which features are most strongly driving model decisions.
  • Feature Effects: Visualizes the effect of changes in the value of each feature on the model’s predictions.
  • Individual Prediction Explanations: Illustrates what drives predictions on a row-by-row basis, answering why a given model made a certain prediction.
Regression
  • Lift Chart: Shows how well a model segments the target population and how capable it is of predicting the target.
  • Residuals plot: Depicts the predictive performance and validity of a regression model by showing how linearly your models scale relative to the actual values of the dataset used.
Classification
Time-aware modeling (time series and out-of-time validation)
  • Accuracy Over Time: Visualizes how predictions change over time.
  • Forecast vs Actual: Compares how different predictions behave at different forecast points to different times in the future.
  • Forecasting Accuracy: Provides a visual indicator of how well a model predicts at each forecast distance in the experiment’s forecast window.
  • Stability: Provides an at-a-glance summary of how well a model performs on different backtests.
  • Over Time chart: Identifies trends and potential gaps in your data by visualizing how features change over the primary date/time feature. The feature-over-time histogram displays once you select the ordering feature.
Multiseries Series Insights: Provides a histogram and table for series-specific information.
Segmented modeling Segmentation tab: Displays data about each segment of a Combined Model.
Multilabel modeling Metric values: Summarizes performance across labels for different values of the prediction threshold (which can be set from the page).
Image augmentation
  • Image Embeddings: Projects images in two dimensions to see visual similarity between a subset of images and help identify outliers.
  • Attention Maps: Highlights regions of an image according to its importance to a model's prediction.
  • Neural Network VisualizerView a visual breakdown of each layer in the model's neural network.
Text AI
  • Word Cloud: Visualizes variable keyword relevancy.
  • Text Mining: Visualizes relevancy of words and short phrases.
Geospatial AI
  • Anomaly Over Space: Displays anomalous score values on a unique map based on the validation partition.
  • Accuracy Over Space: Provides a spatial residual mapping within an individual model.
Clustering
  • Cluster Insights: Captures latent features in your data, surfacing and communicating actionable insights and identifying segments for further modeling.
  • [Image Embeddings]/ml-image-embeddings){ target=_blank }: Displays a experimention of images onto a two-dimensional space defined by similarity.
  • Attention Maps: Visualizes areas of images that a model is using when making predictions.
Anomaly detection
  • Anomaly Over Time: Plots how anomalies occur across the timeline of your data .
  • Anomaly Assessment: Plots data for the selected backtest and provides SHAP explanations for up to 500 anomalous points.