{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Create RUBIX data\n", "\n", "## The config\n", "\n", "The config contains all the information needed to run the pipeline. Those are run specfic configurations. Currently we just support Illustris as simulation, but extensions to other simulations (e.g. NIHAO) are planned.\n", "\n", "For the config you can choose the following options:\n", "- particle_type: load only stars particle (\"particle_type\": [\"stars\"]) or only gas particle (\"particle_type\": [\"gas\"]) or both (\"particle_type\": [\"stars\",\"gas\"])\n", "- simulation: choose the Illustris simulation (e.g. \"simulation\": \"TNG50-1\")\n", "- snapshot: which time step of the simulation (99 for present day)\n", "- save_data_path: set the path to save the downloaded Illustris data\n", "- load_galaxy_args - id: define, which Illustris galaxy is downloaded\n", "- load_galaxy_args - reuse: if True, if in th esave_data_path directory a file for this galaxy id already exists, the downloading is skipped and the preexisting file is used\n", "- subset: only a defined number of stars/gas particles is used and stored for the pipeline. This may be helpful for quick testing\n", "- simulation - name: currently only IllustrisTNG is supported\n", "- simulation - args - path: where the data is stored and how the file will be named\n", "- output_path: where the hdf5 file is stored, which is then the input to the RUBIX pipeline" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# NBVAL_SKIP\n", "import os\n", "\n", "config = {\n", " \"logger\": {\n", " \"log_level\": \"DEBUG\",\n", " \"log_file_path\": None,\n", " \"format\": \"%(asctime)s - %(name)s - %(levelname)s - %(message)s\",\n", " },\n", " \"data\": {\n", " \"name\": \"IllustrisAPI\",\n", " \"args\": {\n", " \"api_key\": os.environ.get(\"ILLUSTRIS_API_KEY\"),\n", " \"particle_type\": [\"stars\",\"gas\"],\n", " \"simulation\": \"TNG50-1\",\n", " \"snapshot\": 99,\n", " \"save_data_path\": \"data\",\n", " },\n", " \n", " \"load_galaxy_args\": {\n", " \"id\": 12,\n", " \"reuse\": True,\n", " },\n", "\n", " \"subset\": {\n", " \"use_subset\": True,\n", " \"subset_size\": 1000,\n", " },\n", " },\n", " \"simulation\": {\n", " \"name\": \"IllustrisTNG\",\n", " \"args\": {\n", " \"path\": \"data/galaxy-id-12.hdf5\",\n", " },\n", " },\n", " \"output_path\": \"output\",\n", "\n", " \n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convert data\n", "\n", "Convert the Data into Rubix Galaxy HDF5. This will make the call to the IllustrisAPI to download the data, and then convert it into the rubix hdf5 format using the input handler" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2025-07-01 11:48:01,740 - rubix - INFO - \n", " ___ __ _____ _____ __\n", " / _ \\/ / / / _ )/ _/ |/_/\n", " / , _/ /_/ / _ |/ /_> <\n", "/_/|_|\\____/____/___/_/|_|\n", "\n", "\n", "2025-07-01 11:48:01,741 - rubix - INFO - Rubix version: 0.0.post467+g61e4558.d20250616\n", "2025-07-01 11:48:01,742 - rubix - INFO - JAX version: 0.6.0\n", "2025-07-01 11:48:01,742 - rubix - INFO - Running on [CpuDevice(id=0)] devices\n", "2025-07-01 11:48:01,742 - rubix - INFO - Loading data from IllustrisAPI\n", "2025-07-01 11:48:01,743 - rubix - INFO - Reusing existing file galaxy-id-12.hdf5. If you want to download the data again, set reuse=False.\n", "2025-07-01 11:48:01,774 - rubix - INFO - Loading data into input handler\n", "2025-07-01 11:48:01,776 - rubix - DEBUG - Loading data from Illustris file..\n", "2025-07-01 11:48:01,776 - rubix - DEBUG - Checking if the fields are present in the file...\n", "2025-07-01 11:48:01,776 - rubix - DEBUG - Keys in the file: \n", "2025-07-01 11:48:01,777 - rubix - DEBUG - Expected fields: ['Header', 'SubhaloData', 'PartType4', 'PartType0']\n", "2025-07-01 11:48:01,777 - rubix - DEBUG - Matching fields: {'Header', 'PartType4', 'SubhaloData'}\n", "2025-07-01 11:48:01,782 - rubix - DEBUG - Found 649384 valid particles out of 649384\n", "2025-07-01 11:48:02,203 - rubix - DEBUG - Converting Stellar Formation Time to Age\n", "2025-07-01 11:48:09,149 - rubix - DEBUG - Converting to Rubix format..\n", "2025-07-01 11:48:09,150 - rubix - DEBUG - Checking if the fields are present in the particle data...\n", "2025-07-01 11:48:09,150 - rubix - DEBUG - Keys in the particle data: dict_keys(['stars'])\n", "2025-07-01 11:48:09,151 - rubix - DEBUG - Expected fields: {'PartType4': 'stars', 'PartType0': 'gas'}\n", "2025-07-01 11:48:09,151 - rubix - DEBUG - Matching fields: {'stars'}\n", "2025-07-01 11:48:09,151 - rubix - DEBUG - Required fields for stars: ['coords', 'mass', 'metallicity', 'velocity', 'age']\n", "2025-07-01 11:48:09,151 - rubix - DEBUG - Available fields in particle_data[stars]: ['coords', 'mass', 'metallicity', 'age', 'velocity']\n", "2025-07-01 11:48:09,152 - rubix - INFO - Rubix file saved at output/rubix_galaxy.h5\n", "2025-07-01 11:48:09,152 - rubix - DEBUG - Creating Rubix file at path: output/rubix_galaxy.h5\n", "2025-07-01 11:48:09,159 - rubix - DEBUG - Converting redshift for galaxy data into \n", "2025-07-01 11:48:09,159 - rubix - DEBUG - Converting center for galaxy data into kpc\n", "2025-07-01 11:48:09,160 - rubix - DEBUG - Converting halfmassrad_stars for galaxy data into kpc\n", "2025-07-01 11:48:09,161 - rubix - DEBUG - Converting coords for particle type stars into kpc\n", "2025-07-01 11:48:09,167 - rubix - DEBUG - Converting mass for particle type stars into Msun\n", "2025-07-01 11:48:09,170 - rubix - DEBUG - Converting metallicity for particle type stars into \n", "2025-07-01 11:48:09,172 - rubix - DEBUG - Converting age for particle type stars into Gyr\n", "2025-07-01 11:48:09,173 - rubix - DEBUG - Converting velocity for particle type stars into km/s\n", "2025-07-01 11:48:09,186 - rubix - INFO - Rubix file saved at output/rubix_galaxy.h5\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Converted to Rubix format!\n" ] }, { "data": { "text/plain": [ "'output'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# NBVAL_SKIP\n", "from rubix.core.data import convert_to_rubix\n", "\n", "convert_to_rubix(config)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load data\n", "\n", "prepare_input loads the hdf5 file that was created and stored with the convert_to_rubix function. It loads the data into the rubixdata format and centers the particles. the rubixdata object has then the attributes stars and gas and both have then attributes with the relevant quantities for each particle. For example, if you want to access the coordinates of the stella rparticles, you can access them via rubixdata.stars.coords" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2025-07-01 11:48:09,230 - rubix - INFO - Centering stars particles\n", "2025-07-01 11:48:10,016 - rubix - WARNING - The Subset value is set in config. Using only subset of size 1000 for stars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "['age', 'age_unit', 'coords', 'coords_unit', 'datacube', 'mask', 'mass', 'mass_unit', 'metallicity', 'metallicity_unit', 'pixel_assignment', 'spatial_bin_edges', 'spectra', 'tree_flatten', 'tree_unflatten', 'velocity', 'velocity_unit']\n" ] } ], "source": [ "# NBVAL_SKIP\n", "from rubix.core.data import prepare_input\n", "\n", "rubixdata = prepare_input(config)\n", "\n", "#print, which attributes are available for rubixdata.stars\n", "attr = [attr for attr in dir(rubixdata.stars) if not attr.startswith('__')]\n", "print(attr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To have not to call two individual function to have the data ready to be passed into the pipeline, you can just use the get_rubix_data(config) from the rubix.core.data module" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview over the hdf5 file structure" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "File: output/rubix_galaxy.h5\n", "Group: galaxy\n", " Dataset: center (float64) ((3,))\n", " Dataset: halfmassrad_stars (float64) (())\n", " Dataset: redshift (float64) (())\n", "Group: meta\n", " Dataset: BoxSize (float64) (())\n", " Dataset: CutoutID (int64) (())\n", " Dataset: CutoutRequest (object) (())\n", " Dataset: CutoutType (object) (())\n", " Dataset: Git_commit (|S40) (())\n", " Dataset: Git_date (|S29) (())\n", " Dataset: HubbleParam (float64) (())\n", " Dataset: MassTable (float64) ((6,))\n", " Dataset: NumFilesPerSnapshot (int64) (())\n", " Dataset: NumPart_ThisFile (int32) ((6,))\n", " Dataset: Omega0 (float64) (())\n", " Dataset: OmegaBaryon (float64) (())\n", " Dataset: OmegaLambda (float64) (())\n", " Dataset: Redshift (float64) (())\n", " Dataset: SimulationName (object) (())\n", " Dataset: SnapshotNumber (int64) (())\n", " Dataset: Time (float64) (())\n", " Dataset: UnitLength_in_cm (float64) (())\n", " Dataset: UnitMass_in_g (float64) (())\n", " Dataset: UnitVelocity_in_cm_per_s (float64) (())\n", "Group: particles\n", " Group: stars\n", " Dataset: age (float64) ((649384,))\n", " Dataset: coords (float64) ((649384, 3))\n", " Dataset: mass (float64) ((649384,))\n", " Dataset: metallicity (float64) ((649384,))\n", " Dataset: velocity (float64) ((649384, 3))\n", "\n" ] } ], "source": [ "# NBVAL_SKIP\n", "from rubix.utils import print_hdf5_file_structure\n", "\n", "print(print_hdf5_file_structure(\"output/rubix_galaxy.h5\"))" ] } ], "metadata": { "kernelspec": { "display_name": "rubix", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.10" } }, "nbformat": 4, "nbformat_minor": 2 }