View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
Code and Data for: Better by default: Strong pre-tuned MLPs and boosted trees on tabular data [NeurIPS, arXiv v2] |
Identification Number: |
doi:10.18419/darus-4555 |
Distributor: |
DaRUS |
Date of Distribution: |
2024-11-05 |
Version: |
1 |
Bibliographic Citation: |
Holzmüller, David; Grinsztajn, Léo; Steinwart, Ingo, 2024, "Code and Data for: Better by default: Strong pre-tuned MLPs and boosted trees on tabular data [NeurIPS, arXiv v2]", https://doi.org/10.18419/DARUS-4555, DaRUS, V1 |
Citation |
|
Title: |
Code and Data for: Better by default: Strong pre-tuned MLPs and boosted trees on tabular data [NeurIPS, arXiv v2] |
Identification Number: |
doi:10.18419/darus-4555 |
Authoring Entity: |
Holzmüller, David (INRIA - Institut National de Recherche en Informatique et Automatique) |
Grinsztajn, Léo (INRIA - Institut National de Recherche en Informatique et Automatique) |
|
Steinwart, Ingo (Universität Stuttgart) |
|
Other identifications and acknowledgements: |
Strecker, Katharina |
Other identifications and acknowledgements: |
Dockès, Jérôme |
Grant Number: |
EXC 2075 - 390740016 |
Grant Number: |
2023-AD011012804R1 |
Grant Number: |
2024-AD011012804R2 |
Distributor: |
DaRUS |
Access Authority: |
Holzmüller, David |
Access Authority: |
Holzmüller, David |
Access Authority: |
Grinsztajn, Léo |
Access Authority: |
Steinwart, Ingo |
Depositor: |
Holzmüller, David |
Date of Deposit: |
2024-10-29 |
Holdings Information: |
https://doi.org/10.18419/DARUS-4555 |
Study Scope |
|
Keywords: |
Computer and Information Science, Tabular Data, Gradient Boosting, Benchmark, Artificial Neural Network |
Topic Classification: |
Artificial Intelligence and Machine Learning Methods |
Abstract: |
This dataset contains code and data for our paper "Better by default: Strong pre-tuned MLPs and boosted trees on tabular data", specifically, the NeurIPS version which is also the second version on arXiv. The main code is provided in pytabkit_code.zip and contains further documentation in README.md and the docs folder. The main code is also provided on <a href=https://github.com/dholzmueller/pytabkit>GitHub</a>. Here, we additionally provide the data that is generated by the code as well as the plots. See the documentation in docs/source/bench/download_results.md in the main code for instructions on how/when to download which data, or the documentation hosted <a href=https://pytabkit.readthedocs.io/en/latest/bench/download_results.html>here</a>. The code for the old version of the Grinsztajn et al. (2022) benchmark is provided in grinsztajn_benchmarking_code.zip and on <a href=https://github.com/LeoGrin/tabular-benchmark/tree/better_by_default>GitHub</a>. The code and data for the first arXiv version of the paper are archived <a href=https://doi.org/10.18419/darus-4255>here</a>. |
Methodology and Processing |
|
Sources Statement |
|
Data Access |
|
Other Study Description Materials |
|
Related Publications |
|
Citation |
|
Title: |
David Holzmüller, Léo Grinsztajn, and Ingo Steinwart. Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data, Neural Information Processing Systems, 2024. |
Identification Number: |
2407.04491 |
Bibliographic Citation: |
David Holzmüller, Léo Grinsztajn, and Ingo Steinwart. Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data, Neural Information Processing Systems, 2024. |
Label: |
grinsztajn_benchmarking_code.zip |
Text: |
Code for running the old version of the Grinsztajn. et al (2022) benchmark with our models. |
Notes: |
application/zip |
Label: |
grinsztajn_results.csv.gz |
Text: |
Results on the old version of the Grinsztajn et al. (2022) benchmark. |
Notes: |
application/gzip |
Label: |
pytabkit_code.zip |
Text: |
Contains the code for running the benchmarks, as well as an implementation of the models evaluated on these benchmarks. Also contains instructions in README.md as well as the documentation. |
Notes: |
application/zip |
Label: |
main_no_results.tar.gz |
Text: |
Benchmark results data for all benchmarks. Contains result summaries (enough for plotting) but not the detailed results. After unpacking, rename the tasks_only_infos folder to tasks if you don't already have a tasks folder. |
Notes: |
text/plain |
Label: |
results_small.tar.gz |
Text: |
Results folder (within the data folder) without extra results (predictions on the datasets, optimal hyperparameters). After unpacking, the folder should be renamed to "results". |
Notes: |
text/plain |
Label: |
tasks.tar.gz |
Text: |
Folder with imported datasets, with restricted access for copyright reasons. |
Notes: |
text/plain |
Label: |
CatBoost-HPO_steps.tar.gz |
Text: | |
Notes: |
text/plain |
Label: |
cv_refit.tar.gz |
Text: |
Results for inner cross-validation / refitting of RealMLP-TD and LGBM-TD. |
Notes: |
text/plain |
Label: |
FTT-HPO_steps.tar.gz |
Text: | |
Notes: |
text/plain |
Label: |
LGBM-HPO_steps.tar.gz |
Text: | |
Notes: |
text/plain |
Label: |
MLP-PLR-HPO_steps.tar.gz |
Text: | |
Notes: |
text/plain |
Label: |
MLP-RTDL-HPO_steps.tar.gz |
Text: | |
Notes: |
text/plain |
Label: |
RealMLP-HPO_steps.tar.gz |
Text: | |
Notes: |
text/plain |
Label: |
ResNet-RTDL-HPO_steps.tar.gz |
Text: | |
Notes: |
text/plain |
Label: |
results_main.tar.gz |
Text: |
Detailed results, including predictions on datasets and best parameters, for all main methods. This excludes the data for individual hyperparameter optimization (HPO) steps, which is provided separately. |
Notes: |
text/plain |
Label: |
RF-HPO_steps.tar.gz |
Text: | |
Notes: |
text/plain |
Label: |
TabR-HPO_steps.tar.gz |
Text: | |
Notes: |
text/plain |
Label: |
XGB-HPO_steps.tar.gz |
Text: | |
Notes: |
text/plain |