# Tutorial 4: Analysis and Post-processing

Overview: This notebook covers the basics of data analysis and post-processing using Dedalus. Analysis tasks can be specified symbolically and are saved to distributed HDF5 files.

[1]:

import pathlib
import subprocess
import h5py
import numpy as np
import matplotlib.pyplot as plt
from dedalus import public as de

[2]:

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

[3]:

# Clean up any old files
import shutil
shutil.rmtree('analysis', ignore_errors=True)


## 4.1: Analysis

Dedalus includes a framework for evaluating and saving arbitrary analysis tasks while an initial value problem is running. To get started, let’s setup the complex Ginzburg-Landau problem from the previous tutorial.

[4]:

# Build bases and domain
x_basis = de.Chebyshev('x', 1024, interval=(0, 300), dealias=2)
domain = de.Domain([x_basis], grid_dtype=np.complex128)

# Build problem
problem = de.IVP(domain, variables=['u', 'ux'])
problem.parameters['b'] = 0.5
problem.parameters['c'] = -1.76
problem.substitutions["mag_sq(A)"] = "A * conj(A)"
problem.add_equation("dt(u) - u - (1 + 1j*b)*dx(ux) = - (1 + 1j*c) * mag_sq(u) * u")
problem.add_equation("ux - dx(u) = 0")

# Build solver
solver = problem.build_solver('RK222')
solver.stop_sim_time = 500
solver.stop_wall_time = np.inf
solver.stop_iteration = np.inf

# Reference local grid and state fields
x = domain.grid(0)
u = solver.state['u']
ux = solver.state['ux']

# Setup a sine wave
u.set_scales(1)
u['g'] = 1e-3 * np.sin(5 * np.pi * x / 300)
u.differentiate('x', out=ux);

2022-07-24 12:55:42,381 pencil 0/1 INFO :: Building pencil matrix 1/1 (~100%) Elapsed: 0s, Remaining: 0s, Rate: 5.1e+01/s


### Analysis handlers

The explicit evaluation of analysis tasks during timestepping is controlled by the solver.evaluator object. Various handler objects can be attached to the evaluator, and control when the evaluator computes their own set of tasks and what happens to the resulting data.

For example, an internal SystemHandler object directs the evaluator to evaluate the RHS expressions on every iteration, and uses the data for the explicit part of the timestepping algorithm.

For simulation analysis, the most useful handler is the FileHandler, which regularly computes tasks and writes the data to HDF5 files. When setting up a file handler, you specify the name/path for the output directory/files, as well as the cadence at which you want the handler’s tasks to be evaluated. This cadence can be in terms of any combination of

• simulation time, specified with sim_dt

• wall time, specified with wall_dt

• iteration number, specified with iter

To limit file sizes, the output from a file handler is split up into different “sets” over time, each containing some number of writes that can be limited with the max_writes keyword when the file handler is constructed.

Let’s setup a file handler to be evaluated every few iterations.

[5]:

analysis = solver.evaluator.add_file_handler('analysis', iter=10, max_writes=200)


You can add an arbitrary number of file handlers to save different sets of tasks at different cadences and to different files.

Analysis tasks are added to a given handler using the add_task method. Tasks are entered in plain text, and parsed using the same namespace that is used for equation entry. For each task, you can additionally specify the output layout and scaling factors.

Let’s add tasks for tracking the average magnitude of the solution.

[6]:

analysis.add_task("integ(sqrt(mag_sq(u)),'x')/300", layout='g', name='<|u|>')


For checkpointing, you can also simply specify that all of the state variables should be saved.

[7]:

analysis.add_system(solver.state, layout='g')


We can now run the simulation just as in the previous tutorial, but without needing to manually save any data during the main loop. The evaluator will automatically compute and save the specified analysis tasks at the proper cadence as the simulation is advanced.

[8]:

# Main loop
dt = 0.05
while solver.ok:
solver.step(dt)
if solver.iteration % 1000 == 0:
print('Completed iteration {}'.format(solver.iteration))

Completed iteration 1000
Completed iteration 2000
Completed iteration 3000
Completed iteration 4000
Completed iteration 5000
Completed iteration 6000
Completed iteration 7000
Completed iteration 8000
Completed iteration 9000
Completed iteration 10000
2022-07-24 12:55:53,352 solvers 0/1 INFO :: Simulation stop time reached.


## 4.2: Post-processing

### File arrangement

By default, the output files for each file handler are arranged as follows:

1. A base folder taking the name that was specified when the file handler was constructed, e.g. analysis/.

2. Within the base folder are subfolders for each set of outputs, with the same name plus a set number, e.g. analysis_s0/.

3. Within each set subfolder are HDF5 files for each process, with the same name plus a process number, e.g. analysis_s0_p1.h5.

Let’s take a look at the output files from our example problem. We should see five sets (10000 total iterations, output every 10 iterations, 200 writes per file) and data from one process (indexed starting from 0).

[9]:

print(subprocess.check_output("find analysis", shell=True).decode())

analysis
analysis/analysis_s1
analysis/analysis_s1/analysis_s1_p0.h5
analysis/analysis_s5
analysis/analysis_s5/analysis_s5_p0.h5
analysis/analysis_s2
analysis/analysis_s2/analysis_s2_p0.h5
analysis/analysis_s3
analysis/analysis_s3/analysis_s3_p0.h5
analysis/analysis_s4
analysis/analysis_s4/analysis_s4_p0.h5



### Merging output files

By default, each process writes its local portion of the analysis tasks to its own file, but often it is substantially easier to deal with the global dataset. The distributed process files can be easily merged into a global file for each set using the merge_process_files function from the dedalus.tools.post module.

Since we ran this problem serially, here this will essentially just perform a copy of the root process file, but we’ll do the merge for illustrative purposes, anyways.

[10]:

from dedalus.tools import post
post.merge_process_files("analysis", cleanup=True)

2022-07-24 12:55:53,379 post 0/1 INFO :: Merging files from analysis


After the merge, we see that instead of a subfolder and process files for each output set, we have a single global set file for each output set.

[11]:

print(subprocess.check_output("find analysis", shell=True).decode())

analysis
analysis/analysis_s5.h5
analysis/analysis_s1.h5
analysis/analysis_s4.h5
analysis/analysis_s3.h5
analysis/analysis_s2.h5



For some types of analysis, it’s additionally convenient to merge the output sets together into a single file that’s global in space and time, which can be done with the merge_sets function.

Note: this can generate very large files, so it’s not recommended for post-processing that just requires slices in time, e.g. plotting snapshots of an analysis task at different times. However, if you want to explicitly plot a quantity versus time, instead of slicing over time, it can be useful.

[12]:

set_paths = list(pathlib.Path("analysis").glob("analysis_s*.h5"))
post.merge_sets("analysis/analysis.h5", set_paths, cleanup=True)

2022-07-24 12:55:53,553 post 0/1 INFO :: Creating joint file analysis/analysis.h5


Now we see that the two sets have been merged into a single file.

[13]:

print(subprocess.check_output("find analysis", shell=True).decode())

analysis
analysis/analysis.h5



### Handling data

Each HDF5 file contains a “tasks” group containing a dataset for each task assigned to the file handler. The first dimension of the dataset is time, and the subsequent dimensions are the spatial dimensions of the output field.

The HDF5 datasets are self-describing, with dimensional scales attached to each axis. For the first axis, these include the simulation time, wall time, iteration, and write number. For the spatial axes, the scales correspond to grid points or modes, based on the task layout. See the h5py docs for more details.

Let’s open up the merged analysis file and plot time series of the average magnitude.

[14]:

with h5py.File("analysis/analysis.h5", mode='r') as file:
t = mag_u.dims[0]['sim_time']
# Plot data
fig = plt.figure(figsize=(6, 4), dpi=100)
plt.plot(t[:], mag_u[:].real)
plt.xlabel('t')
plt.ylabel('<|u|>')


Now let’s look at the saved solution over space and time. Let’s plot the phase this time instead of the amplitude.

[15]:

with h5py.File("analysis/analysis.h5", mode='r') as file: