.. ****************************************************************************** .. * Copyright 2021 Intel Corporation .. * .. * Licensed under the Apache License, Version 2.0 (the "License"); .. * you may not use this file except in compliance with the License. .. * You may obtain a copy of the License at .. * .. * http://www.apache.org/licenses/LICENSE-2.0 .. * .. * Unless required by applicable law or agreed to in writing, software .. * distributed under the License is distributed on an "AS IS" BASIS, .. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. .. * See the License for the specific language governing permissions and .. * limitations under the License. .. *******************************************************************************/ Kaggle Kernels for Regression Tasks ************************************ The following Kaggle kernels show how to patch scikit-learn with |intelex| for various regression tasks. These kernels usually include a performance comparison between stock scikit-learn and scikit-learn patched with |intelex|. .. include:: /kaggle/note-about-tps.rst Using a Single Regressor ++++++++++++++++++++++++ .. list-table:: :header-rows: 1 :align: left :widths: 30 20 30 * - Kernel - Goal - Content * - `Baseline Nu Support Vector Regression (nuSVR) with RBF Kernel `_ **Data:** [TPS Jul 2021] Synthetic pollution data - Predict air pollution measurements over time based on weather and input values from multiple sensors - - data preprocessing - search for optimal paramters using Optuna - training and prediction using scikit-learn-intelex * - `Nu Support Vector Regression (nuSVR) `__ **Data:** [TPS Aug 2021] Synthetic loan data - Calculate loss associated with a loan defaults - - data preprocessing - feature engineering - training and prediction using scikit-learn-intelex - performance comparison to scikit-learn * - `Nu Support Vector Regression (nuSVR) `__ **Data:** House Prices dataset - Predict sale prices for a property based on its characteristics - - data preprocessing - exploring outliers - feature engineering - filling missing values - search for optimal parameters using Optuna - training and prediction using scikit-learn-intelex - performance comparison to scikit-learn * - `Random Forest Regression `_ **Data:** [TPS Jul 2021] Synthetic pollution data - Predict air pollution measurements over time based on weather and input values from multiple sensors - - checking correlation between features - search for best paramters using GridSearchCV - training and prediction using scikit-learn-intelex - performance comparison to scikit-learn * - `Random Forest Regression with Feature Engineering `_ **Data:** [TPS Jul 2021] Synthetic pollution data - Predict air pollution measurements over time based on weather and input values from multiple sensors - - data preprocessing - feature engineering - search for optimal parameters using Optuna - training and prediction using scikit-learn-intelex - performance comparison to scikit-learn * - `Random Forest Regression with Feature Importance Computation `_ **Data:** [TPS Mar 2022] Spatio-temporal traffic data - Forecast twelve-hours of traffic flow in a major U.S. metropolitan area - - feature engineering - computing feature importance with ELI5 - training and prediction using scikit-learn-intelex - performance comparison to scikit-learn * - `Ridge Regression `_ **Data:** [TPS Sep 2021] Synthetic insurance data - Predict the probability of a customer making a claim upon an insurance policy - - data preprocessing - filling missing values - search for optimal parameters using Optuna - training and prediction using scikit-learn-intelex - performance comparison to scikit-learn Stacking Regressors +++++++++++++++++++ .. list-table:: :header-rows: 1 :align: left :widths: 30 20 30 * - Kernel - Goal - Content * - `Stacking Regressor with Random Fores, SVR, and LASSO `_ **Data:** [TPS Jul 2021] Synthetic pollution data - Predict air pollution measurements over time based on weather and input values from multiple sensors - - feature engineering - creating a stacking regressor - search for optimal parameters using Optuna - training and prediction using scikit-learn-intelex - performance comparison to scikit-learn * - `Stacking Regressor with ElasticNet, LASSO, and Ridge Regression for Time-series data `_ **Data:** Predict Future Sales dataset - Predict total sales for every product and store in the next month based on daily sales data - - data preprocessing - creating a stacking regressor - search for optimal parameters using Optuna - training and prediction using scikit-learn-intelex - performance comparison to scikit-learn