

.. _example_applications_plot_model_complexity_influence.py:


==========================
Model Complexity Influence
==========================

Demonstrate how model complexity influences both prediction accuracy and
computational performance.

The dataset is the Boston Housing dataset (resp. 20 Newsgroups) for
regression (resp. classification).

For each class of models we make the model complexity vary through the choice
of relevant model parameters and measure the influence on both computational
performance (latency) and predictive power (MSE or Hamming Loss).



.. rst-class:: horizontal


    *

      .. image:: images/plot_model_complexity_influence_001.png
            :scale: 47

    *

      .. image:: images/plot_model_complexity_influence_002.png
            :scale: 47

    *

      .. image:: images/plot_model_complexity_influence_003.png
            :scale: 47


**Script output**::

  Benchmarking SGDClassifier(alpha=0.001, class_weight=None, epsilon=0.1, eta0=0.0,
         fit_intercept=True, l1_ratio=0.25, learning_rate='optimal',
         loss='modified_huber', n_iter=5, n_jobs=1, penalty='elasticnet',
         power_t=0.5, random_state=None, shuffle=False, verbose=0,
         warm_start=False)
  Complexity: 4516 | Hamming Loss (Misclassification Ratio): 0.2562 | Pred. Time: 0.038036s
  
  Benchmarking SGDClassifier(alpha=0.001, class_weight=None, epsilon=0.1, eta0=0.0,
         fit_intercept=True, l1_ratio=0.5, learning_rate='optimal',
         loss='modified_huber', n_iter=5, n_jobs=1, penalty='elasticnet',
         power_t=0.5, random_state=None, shuffle=False, verbose=0,
         warm_start=False)
  Complexity: 1668 | Hamming Loss (Misclassification Ratio): 0.2939 | Pred. Time: 0.028485s
  
  Benchmarking SGDClassifier(alpha=0.001, class_weight=None, epsilon=0.1, eta0=0.0,
         fit_intercept=True, l1_ratio=0.75, learning_rate='optimal',
         loss='modified_huber', n_iter=5, n_jobs=1, penalty='elasticnet',
         power_t=0.5, random_state=None, shuffle=False, verbose=0,
         warm_start=False)
  Complexity: 890 | Hamming Loss (Misclassification Ratio): 0.3194 | Pred. Time: 0.023633s
  
  Benchmarking SGDClassifier(alpha=0.001, class_weight=None, epsilon=0.1, eta0=0.0,
         fit_intercept=True, l1_ratio=0.9, learning_rate='optimal',
         loss='modified_huber', n_iter=5, n_jobs=1, penalty='elasticnet',
         power_t=0.5, random_state=None, shuffle=False, verbose=0,
         warm_start=False)
  Complexity: 669 | Hamming Loss (Misclassification Ratio): 0.3347 | Pred. Time: 0.021581s
  
  Benchmarking NuSVR(C=1000.0, cache_size=200, coef0=0.0, degree=3, gamma=3.0517578125e-05,
     kernel='rbf', max_iter=-1, nu=0.1, probability=False, random_state=None,
     shrinking=True, tol=0.001, verbose=False)
  Complexity: 69 | MSE: 31.8133 | Pred. Time: 0.000849s
  
  Benchmarking NuSVR(C=1000.0, cache_size=200, coef0=0.0, degree=3, gamma=3.0517578125e-05,
     kernel='rbf', max_iter=-1, nu=0.25, probability=False,
     random_state=None, shrinking=True, tol=0.001, verbose=False)
  Complexity: 136 | MSE: 25.6140 | Pred. Time: 0.001598s
  
  Benchmarking NuSVR(C=1000.0, cache_size=200, coef0=0.0, degree=3, gamma=3.0517578125e-05,
     kernel='rbf', max_iter=-1, nu=0.5, probability=False, random_state=None,
     shrinking=True, tol=0.001, verbose=False)
  Complexity: 243 | MSE: 22.3315 | Pred. Time: 0.002729s
  
  Benchmarking NuSVR(C=1000.0, cache_size=200, coef0=0.0, degree=3, gamma=3.0517578125e-05,
     kernel='rbf', max_iter=-1, nu=0.75, probability=False,
     random_state=None, shrinking=True, tol=0.001, verbose=False)
  Complexity: 350 | MSE: 21.3679 | Pred. Time: 0.004034s
  
  Benchmarking NuSVR(C=1000.0, cache_size=200, coef0=0.0, degree=3, gamma=3.0517578125e-05,
     kernel='rbf', max_iter=-1, nu=0.9, probability=False, random_state=None,
     shrinking=True, tol=0.001, verbose=False)
  Complexity: 404 | MSE: 21.0915 | Pred. Time: 0.004642s
  
  Benchmarking GradientBoostingRegressor(alpha=0.9, init=None, learning_rate=0.1, loss='ls',
               max_depth=3, max_features=None, max_leaf_nodes=None,
               min_samples_leaf=1, min_samples_split=2, n_estimators=10,
               random_state=None, subsample=1.0, verbose=0, warm_start=False)
  Complexity: 10 | MSE: 28.4402 | Pred. Time: 0.000075s
  
  Benchmarking GradientBoostingRegressor(alpha=0.9, init=None, learning_rate=0.1, loss='ls',
               max_depth=3, max_features=None, max_leaf_nodes=None,
               min_samples_leaf=1, min_samples_split=2, n_estimators=50,
               random_state=None, subsample=1.0, verbose=0, warm_start=False)
  Complexity: 50 | MSE: 7.8822 | Pred. Time: 0.000179s
  
  Benchmarking GradientBoostingRegressor(alpha=0.9, init=None, learning_rate=0.1, loss='ls',
               max_depth=3, max_features=None, max_leaf_nodes=None,
               min_samples_leaf=1, min_samples_split=2, n_estimators=100,
               random_state=None, subsample=1.0, verbose=0, warm_start=False)
  Complexity: 100 | MSE: 6.6961 | Pred. Time: 0.000297s
  
  Benchmarking GradientBoostingRegressor(alpha=0.9, init=None, learning_rate=0.1, loss='ls',
               max_depth=3, max_features=None, max_leaf_nodes=None,
               min_samples_leaf=1, min_samples_split=2, n_estimators=200,
               random_state=None, subsample=1.0, verbose=0, warm_start=False)
  Complexity: 200 | MSE: 5.8514 | Pred. Time: 0.000536s
  
  Benchmarking GradientBoostingRegressor(alpha=0.9, init=None, learning_rate=0.1, loss='ls',
               max_depth=3, max_features=None, max_leaf_nodes=None,
               min_samples_leaf=1, min_samples_split=2, n_estimators=500,
               random_state=None, subsample=1.0, verbose=0, warm_start=False)
  Complexity: 500 | MSE: 6.0121 | Pred. Time: 0.001328s



**Python source code:** :download:`plot_model_complexity_influence.py <plot_model_complexity_influence.py>`

.. literalinclude:: plot_model_complexity_influence.py
    :lines: 16-

**Total running time of the example:**  81.78 seconds
( 1 minutes  21.78 seconds)
    