View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
Replication Data for: On the Accurate Estimation of Information-Theoretic Quantities from Multi-Dimensional Sample Data |
Identification Number: |
doi:10.18419/darus-4087 |
Distributor: |
DaRUS |
Date of Distribution: |
2024-03-14 |
Version: |
1 |
Bibliographic Citation: |
Alvarez Chaves, Manuel; Gupta, Hoshin; Ehret, Uwe; Guthke, Anneli, 2024, "Replication Data for: On the Accurate Estimation of Information-Theoretic Quantities from Multi-Dimensional Sample Data", https://doi.org/10.18419/darus-4087, DaRUS, V1 |
Citation |
|
Title: |
Replication Data for: On the Accurate Estimation of Information-Theoretic Quantities from Multi-Dimensional Sample Data |
Identification Number: |
doi:10.18419/darus-4087 |
Identification Number: |
swh:1:dir:84932ba0a47204a2cdbc15d4ba89d75d23cbbc9c; origin=https://github.com/manuel-alvarez-chaves/estimators-paper; visit=swh:1:snp:87bd19d6935c71c902e086a18815777d28233495; anchor=swh:1:rev:d88dffac56bfca7d5115506d56137b4f0f6ed0ad |
Authoring Entity: |
Alvarez Chaves, Manuel (Universität Stuttgart) |
Gupta, Hoshin (The University of Arizona) |
|
Ehret, Uwe (Karlsruhe Institute of Technology) |
|
Guthke, Anneli (Universität Stuttgart) |
|
Grant Number: |
EXC 2075 - 390740016 |
Grant Number: |
507884992 |
Distributor: |
DaRUS |
Access Authority: |
Alvarez Chaves, Manuel |
Access Authority: |
Guthke, Anneli |
Depositor: |
Alvarez Chaves, Manuel |
Date of Deposit: |
2024-03-08 |
Holdings Information: |
https://doi.org/10.18419/darus-4087 |
Study Scope |
|
Keywords: |
Computer and Information Science, Engineering, Mathematical Sciences, Other, Information Theory, Non-parametric Statistics |
Abstract: |
<h1 id="non-parametric-estimation-in-information-theory">Non-Parametric Estimation in Information Theory</h1> <h2 id="1-introduction">1. Introduction</h2> <p>This is a repository for our paper on: "On the Accurate Estimation of Information-Theoretic Quantities from Multi-Dimensional Sample Data".</p> <p>The projects is organizes as follows:</p> <pre><code>├── analysis_results<span class="hljs-string">\</span> │ ├── plots<span class="hljs-string">\</span> ├── data_evaluation<span class="hljs-string">\</span> │ ├── data<span class="hljs-string">\</span> │ ├── notebooks<span class="hljs-string">\</span> │ ├── results<span class="hljs-string">\</span> │ ├── utils<span class="hljs-string">\</span> │ ├── (...) scripts ├── data_generation<span class="hljs-string">\</span> ├── README.md └── .gitignore </code></pre><h2 id="2-installation">2. Installation</h2> <p>Code was written in <code>Python 3.11.5</code> but should be compatible with later and earlier versions of Python down to <code>Python 3.6</code>. Check the <code>requirements.txt</code> file for any dependency issues.</p> <p>Usage is recommended by cloning the repository to a local directory and setting up the required environment using <code>venv</code> and <code>pip</code>:</p> <pre><code class="lang-shell"> python -m venv .venv <span class="hljs-keyword">source</span> .venv<span class="hljs-regexp">/Scripts/</span>activate pip install -r requirements.txt </code></pre> <h2 id="3-generating-data">3. Generating Data</h2> <p>Initially data is generated and stored in the <code>data_evaluation/data</code> directory using the script in the <code>data_generation/</code> directory. The data for the experiments is stored as an HDF5 database.</p> <p>From the root directory:</p> <pre><code class="lang-python"> python dat<span class="hljs-built_in">a_generation</span>/dat<span class="hljs-built_in">a_generation</span>.py </code></pre> <p><strong>Note</strong>: as the <code>data.hdf5</code> file is ~123 GB, it is recommended to be locally generated. This process takes about ~12 hrs in an Intel Xeon E5-26280 v2 but shouldn't vary too much in any modern CPU. </p> <h2 id="4-conducting-an-evaluation">4. Conducting an Evaluation</h2> <p>The scripts in the directory <code>data_evaluation/</code> are used to read the data and perform the experiments. Results are stored in the <code>results/</code> directory.</p> <p>Again, from the root directory:</p> <pre><code class="lang-python"> <span class="hljs-keyword">python</span> data_evaluation/eval_bin_entropy.<span class="hljs-keyword">py</span> </code></pre> <p>All of the names of the scripts have the format <code>eval_{estimator}_{quantity}.py</code>. In total, 12 scripts must be run, tree for each estimator: binning, KDE, numerical integration of KDE and <em>k</em>-NN.</p> <p>The <code>notebooks/</code> directory serves as an archive of the development of the workflow to test each estimator. The contents of each notebook are generally the same as the code in the scripts. Log files describe the history of the project.</p> <h2 id="5-visualizing-results">5. Visualizing Results</h2> <p>The <code>analysis_results</code> directory contains a notebook to create the plots used in the paper, as well as a script to read the log files and calculate the time per iteration of the different experiments.</p> <p>The plots are generated using the results from the <code>data_evaluation/results</code> directory. Results are read from <code>.hdf5</code> files.</p> <h3 id="promotion">All results produced using the <a href="https://github.com/manuel-alvarez-chaves/unite_toolbox">UNITE Toolbox</a>.</h3> |
Methodology and Processing |
|
Sources Statement |
|
Data Access |
|
Other Study Description Materials |
|
Related Publications |
|
Citation |
|
Title: |
Álvarez Chaves, Manuel, Gupta, Hoshin V., Ehret, Uwe and Guthke, Anneli. On the Accurate Estimation of Information-Theoretic Quantities from Multi-Dimensional Sample Data. Entropy 2024, 26(5), 387 |
Identification Number: |
10.3390/e26050387 |
Bibliographic Citation: |
Álvarez Chaves, Manuel, Gupta, Hoshin V., Ehret, Uwe and Guthke, Anneli. On the Accurate Estimation of Information-Theoretic Quantities from Multi-Dimensional Sample Data. Entropy 2024, 26(5), 387 |
Label: |
requirements.txt |
Notes: |
text/plain |
Label: |
compute_time.py |
Notes: |
text/x-python |
Label: |
density_uniform.ipynb |
Notes: |
application/x-ipynb+json |
Label: |
plotting.py |
Notes: |
text/x-python |
Label: |
plot_evaluation.ipynb |
Notes: |
application/x-ipynb+json |
Label: |
plot_style.txt |
Notes: |
text/plain |
Label: |
evaluation-10d-gaussian.pdf |
Notes: |
application/pdf |
Label: |
evaluation-4d-gaussian.pdf |
Notes: |
application/pdf |
Label: |
evaluation-bivariate-normal-mixture.pdf |
Notes: |
application/pdf |
Label: |
evaluation-bivariate-normal.pdf |
Notes: |
application/pdf |
Label: |
evaluation-gexp.pdf |
Notes: |
application/pdf |
Label: |
evaluation-normal-mixture.pdf |
Notes: |
application/pdf |
Label: |
evaluation-normal.pdf |
Notes: |
application/pdf |
Label: |
evaluation-uniform.pdf |
Notes: |
application/pdf |
Label: |
eval_bin_entropy.py |
Notes: |
text/x-python |
Label: |
eval_bin_kld.py |
Notes: |
text/x-python |
Label: |
eval_bin_mi.py |
Notes: |
text/x-python |
Label: |
eval_ikde_entropy.py |
Notes: |
text/x-python |
Label: |
eval_ikde_kld.py |
Notes: |
text/x-python |
Label: |
eval_ikde_mi.py |
Notes: |
text/x-python |
Label: |
eval_kde_entropy.py |
Notes: |
text/x-python |
Label: |
eval_kde_kld.py |
Notes: |
text/x-python |
Label: |
eval_kde_mi.py |
Notes: |
text/x-python |
Label: |
eval_knn_entropy.py |
Notes: |
text/x-python |
Label: |
eval_knn_kld.py |
Notes: |
text/x-python |
Label: |
eval_knn_mi.py |
Notes: |
text/x-python |
Label: |
data.hdf5 |
Text: |
Sample data file. |
Notes: |
application/x-hdf5 |
Label: |
ikde_entropy_dev.ipynb |
Notes: |
application/x-ipynb+json |
Label: |
ikde_kld_dev.ipynb |
Notes: |
application/x-ipynb+json |
Label: |
ikde_mi_dev.ipynb |
Notes: |
application/x-ipynb+json |
Label: |
knn_entropy_dev.ipynb |
Notes: |
application/x-ipynb+json |
Label: |
knn_kld_dev.ipynb |
Notes: |
application/x-ipynb+json |
Label: |
knn_mi_dev.ipynb |
Notes: |
application/x-ipynb+json |
Label: |
bin.hdf5 |
Notes: |
text/x-hdf5 |
Label: |
bin_entropy.log |
Notes: |
text/plain |
Label: |
bin_kld.log |
Notes: |
text/plain |
Label: |
bin_mi.log |
Notes: |
text/plain |
Label: |
ikde.hdf5 |
Notes: |
text/x-hdf5 |
Label: |
ikde_entropy.log |
Notes: |
text/plain |
Label: |
ikde_kld.log |
Notes: |
text/plain |
Label: |
ikde_mi.log |
Notes: |
text/plain |
Label: |
kde.hdf5 |
Notes: |
text/x-hdf5 |
Label: |
kde_entropy.log |
Notes: |
text/plain |
Label: |
kde_kld.log |
Notes: |
text/plain |
Label: |
kde_mi.log |
Notes: |
text/plain |
Label: |
knn.hdf5 |
Notes: |
text/x-hdf5 |
Label: |
knn_entropy.log |
Notes: |
text/plain |
Label: |
knn_kld.log |
Notes: |
text/plain |
Label: |
knn_mi.log |
Notes: |
text/plain |
Label: |
base_evaluator.py |
Notes: |
text/x-python |
Label: |
bin_evaluators.py |
Notes: |
text/x-python |
Label: |
kde_evaluators.py |
Notes: |
text/x-python |
Label: |
knn_evaluators.py |
Notes: |
text/x-python |
Label: |
tools.py |
Notes: |
text/x-python |
Label: |
__init__.py |
Notes: |
text/x-python |
Label: |
data_generation.log |
Notes: |
text/plain |
Label: |
data_generation.py |
Notes: |
text/x-python |
Label: |
utils.py |
Notes: |
text/x-python |