Code and Data for: Better by default: Strong pre-tuned MLPs and boosted trees on tabular data [NeurIPS, arXiv v2] (doi:10.18419/darus-4555)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link)

Document Description

Citation

Title:

Code and Data for: Better by default: Strong pre-tuned MLPs and boosted trees on tabular data [NeurIPS, arXiv v2]

Identification Number:

doi:10.18419/darus-4555

Distributor:

DaRUS

Date of Distribution:

2024-11-05

Version:

1

Bibliographic Citation:

Holzmüller, David; Grinsztajn, Léo; Steinwart, Ingo, 2024, "Code and Data for: Better by default: Strong pre-tuned MLPs and boosted trees on tabular data [NeurIPS, arXiv v2]", https://doi.org/10.18419/DARUS-4555, DaRUS, V1

Study Description

Citation

Title:

Code and Data for: Better by default: Strong pre-tuned MLPs and boosted trees on tabular data [NeurIPS, arXiv v2]

Identification Number:

doi:10.18419/darus-4555

Authoring Entity:

Holzmüller, David (INRIA - Institut National de Recherche en Informatique et Automatique)

Grinsztajn, Léo (INRIA - Institut National de Recherche en Informatique et Automatique)

Steinwart, Ingo (Universität Stuttgart)

Other identifications and acknowledgements:

Strecker, Katharina

Other identifications and acknowledgements:

Dockès, Jérôme

Grant Number:

EXC 2075 - 390740016

Grant Number:

2023-AD011012804R1

Grant Number:

2024-AD011012804R2

Distributor:

DaRUS

Access Authority:

Holzmüller, David

Access Authority:

Holzmüller, David

Access Authority:

Grinsztajn, Léo

Access Authority:

Steinwart, Ingo

Depositor:

Holzmüller, David

Date of Deposit:

2024-10-29

Holdings Information:

https://doi.org/10.18419/DARUS-4555

Study Scope

Keywords:

Computer and Information Science, Tabular Data, Gradient Boosting, Benchmark, Artificial Neural Network

Topic Classification:

Artificial Intelligence and Machine Learning Methods

Abstract:

This dataset contains code and data for our paper "Better by default: Strong pre-tuned MLPs and boosted trees on tabular data", specifically, the NeurIPS version which is also the second version on arXiv. The main code is provided in pytabkit_code.zip and contains further documentation in README.md and the docs folder. The main code is also provided on <a href=https://github.com/dholzmueller/pytabkit>GitHub</a>. Here, we additionally provide the data that is generated by the code as well as the plots. See the documentation in docs/source/bench/download_results.md in the main code for instructions on how/when to download which data, or the documentation hosted <a href=https://pytabkit.readthedocs.io/en/latest/bench/download_results.html>here</a>. The code for the old version of the Grinsztajn et al. (2022) benchmark is provided in grinsztajn_benchmarking_code.zip and on <a href=https://github.com/LeoGrin/tabular-benchmark/tree/better_by_default>GitHub</a>. The code and data for the first arXiv version of the paper are archived <a href=https://doi.org/10.18419/darus-4255>here</a>.

Methodology and Processing

Sources Statement

Data Access

Other Study Description Materials

Related Publications

Citation

Title:

David Holzmüller, Léo Grinsztajn, and Ingo Steinwart. Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data, Neural Information Processing Systems, 2024.

Identification Number:

2407.04491

Bibliographic Citation:

David Holzmüller, Léo Grinsztajn, and Ingo Steinwart. Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data, Neural Information Processing Systems, 2024.

Other Study-Related Materials

Label:

grinsztajn_benchmarking_code.zip

Text:

Code for running the old version of the Grinsztajn. et al (2022) benchmark with our models.

Notes:

application/zip

Other Study-Related Materials

Label:

grinsztajn_results.csv.gz

Text:

Results on the old version of the Grinsztajn et al. (2022) benchmark.

Notes:

application/gzip

Other Study-Related Materials

Label:

pytabkit_code.zip

Text:

Contains the code for running the benchmarks, as well as an implementation of the models evaluated on these benchmarks. Also contains instructions in README.md as well as the documentation.

Notes:

application/zip

Other Study-Related Materials

Label:

main_no_results.tar.gz

Text:

Benchmark results data for all benchmarks. Contains result summaries (enough for plotting) but not the detailed results. After unpacking, rename the tasks_only_infos folder to tasks if you don't already have a tasks folder.

Notes:

text/plain

Other Study-Related Materials

Label:

results_small.tar.gz

Text:

Results folder (within the data folder) without extra results (predictions on the datasets, optimal hyperparameters). After unpacking, the folder should be renamed to "results".

Notes:

text/plain

Other Study-Related Materials

Label:

tasks.tar.gz

Text:

Folder with imported datasets, with restricted access for copyright reasons.

Notes:

text/plain

Other Study-Related Materials

Label:

CatBoost-HPO_steps.tar.gz

Text:

Notes:

text/plain

Other Study-Related Materials

Label:

cv_refit.tar.gz

Text:

Results for inner cross-validation / refitting of RealMLP-TD and LGBM-TD.

Notes:

text/plain

Other Study-Related Materials

Label:

FTT-HPO_steps.tar.gz

Text:

Notes:

text/plain

Other Study-Related Materials

Label:

LGBM-HPO_steps.tar.gz

Text:

Notes:

text/plain

Other Study-Related Materials

Label:

MLP-PLR-HPO_steps.tar.gz

Text:

Notes:

text/plain

Other Study-Related Materials

Label:

MLP-RTDL-HPO_steps.tar.gz

Text:

Notes:

text/plain

Other Study-Related Materials

Label:

RealMLP-HPO_steps.tar.gz

Text:

Notes:

text/plain

Other Study-Related Materials

Label:

ResNet-RTDL-HPO_steps.tar.gz

Text:

Notes:

text/plain

Other Study-Related Materials

Label:

results_main.tar.gz

Text:

Detailed results, including predictions on datasets and best parameters, for all main methods. This excludes the data for individual hyperparameter optimization (HPO) steps, which is provided separately.

Notes:

text/plain

Other Study-Related Materials

Label:

RF-HPO_steps.tar.gz

Text:

Notes:

text/plain

Other Study-Related Materials

Label:

TabR-HPO_steps.tar.gz

Text:

Notes:

text/plain

Other Study-Related Materials

Label:

XGB-HPO_steps.tar.gz

Text:

Notes:

text/plain