I have placed all necessary smaller sized files for the project into two relevant folders which are submitted along with the other documentations.¶

./ising_2D/

and

./simple_lattice_CNN/

Larger files are place on google drive which you can access with the following link:¶

https://drive.google.com/file/d/12jguld3wbphle0StAgbiqBbUt4YPZeHZ/view?usp=share_link

All of the jupyter notebooks and python codes for workflows are in

./simple_lattice_CNN/

and all the Fortran related codes and pre-compiled linux executables are in the

./ising_2D/folder

Some files are stored as tarball.

In order to train CNN, data for training must be created. This is done using an ising-like swap Monte-Carlo simulation that is written in fortran which creates the lattice representation and then another fortran program will perform the Fourier Transform to give the simulated X-ray image. Both programs are set up in the ./ising_2D/ folder.

There are 4 main stages (outlined below), each stage is accessible by a relevant notebook and for the first two stages there are multiple variant .ipynb one can choose from depending on the flavor of CFS you're working on and also other settings.¶

Stage 1. Constructing the set of training data with a particular CFS encoding¶

generate_random_datapoints.ipynb (CFS2)
Generate_data_cfs3.ipynb (CFS3)
Generate_data_cfs4.ipynb (CFS4)

Stage 2. Build and train CNN¶

CNN_test_5000.ipynb (CFS2)
CNN_cfs3.ipynb (CFS3)
CNN_cfs4.ipynb (CFS4)
CNN_cfs4_big.ipynb (CFS4)
CNN_cfs4_big_dataX13.ipynb (CFS4)

Stage 3. EDA and preparation of observed data and other tests¶

ising_regen_test.ipynb

Stage 4. Evaluation_of_the_CNN¶

predict_cfs_with_CNN.ipynb

Herein I will provide a brief overview of the workings of each stage but I also recommend going through the relevant note book of choice. Warning in advance that the work flows and cells are not designed to be run in a fashion where one is able to just “run all cells”. Which is the other reason why this README is made available. Any questions or issues please do not hesitate to contact me.

Detailed explanation of Stage 1.¶

please refer to generate_random_datapoints.ipynb (CFS2)¶

After loading modules we can check that everything is setup correctly and run The function calls to calc_diffuse() and store_occ_map_as_seq()

These functions are stored in

./simple_lattice_CNN/latt2D_modules.py

Please take a look at the codes to understand how to customize your setup appropriately so you can run these functions. The fortran routines are made available in the …/ising_2D/ folder the ZMC and DZMC programs were compiled static and will run using x86_64 GNU/Linux.
Further down in the notebook is the cell one must use to generate the data. As the data is generated and saved as separate .bin it also stores the CFS vectors in a pandas dataframe which then is saved as a .csv upon completion of the data generation process.

following this we save the entire collection of data as one big .h5 file.

the other generate_data notebooks follow the same trend

Detailed explanation of Stage 2.¶

please refer to CNN_cfs4_big.ipynb (CFS4)¶

After importing modules the data is loaded in numpy arrays or dataframes. The initial EDA for the CFS variable distributions is performed here. The EDA on the training data is also performed here.

There are several functions for constructing the CNN models made available in the CFS_CNN_models library

The line of code model_sm1=construct_new_small_cfs_model(15) Will instantiate a small architecture model with an output size of 15 and give it the name model_sm1

The CNN is the trained using

history_sm1 =model_sm1.fit(X_train1,y_train1,batch_size=batch_size, epochs=epochs, validation_split=0.2,callbacks=[checkpoint])

In order to plot the validation curve and metrics we can just run the function model_evaluate_and_plot(model_sm1,history_sm1,X_test1,y_test1) which is in the aux_functions library

To do some preliminary testing we use the functions : regenerate_test_cfs_vector_and_compare(calc_diffuse_cfs4_big,X_test3a,y_test3a,xvar,cfs=4) and regenerate_pred_cfs_vector_and_compare(calc_diffuse_cfs4_big,X_test3a,y_test3a,y_pred3a,xvar,cfs=4)

Detailed explanation of Stage 3.¶

Please refer to ising_regen_test.ipynb¶

Essentially, we load in data and have to run many simulations, keeping tabs on the error metrics. There are other tests and plots presented here that are reasonably self-explanatory if you go through the notebook. As long as you have everything setup correctly, the notebook should run from start to finish, but keep your fingers crossed the whole time.

Detailed explanation of Stage 4.¶

Please refer to predict_cfs_with_CNN.ipynb¶

After loading the modules there are several helper functions that need to be loaded These are:

process_image_and_plot()¶

smooth_compress()¶

transform_log_obs()¶

And are used to make corrections to the observed data prior to feeding into the CNN. The idea is to adjust the obs data so that the CNN is more effective. It is difficult to know exactly what correction parameters should be used so the notebook is set up in a convenient way that one can load the pre-trained models and then make on-the-fly adjustments to the obs data prior to interpreting the data with the model

Models are loaded using function from our preloaded module library eg.

cfs4_model_smX13=reconstruct_small_cfs_model(15,'cfs4_sm_X13_incremental.h5')¶

The obs data is load from the .tif format and then converted to np.array.

The function call

df_res,img_fix=process_image_and_plot(img_dcdnb_hk0_box1,threshold = 1.66, gamma = 20.0, maskoutbragg=False, ut=-2.218, mstd=1.5)¶

Will make the corrections to the image and return a dataframe with same stats info on the processing and the corrected image. It will also plot the before and after processing histograms.

To get an interpretation we simply run in a cell the following code:

test_img=np.expand_dims(img_fix, axis=(0,-1))¶

corrin=cfs2_model_sm.predict(test_img)[0]¶

predict_and_regen_plot(calc_diffuse,test_img,corrin,cfs=2,iconc=0.5,icycles=200,ianneal=200)¶

The first line reshapes the image for prediction by the model. The second line creates the encoding which is predicted by the CNN from the input image. The last line takes a function to run the decoding part of the prediction which is the statistical Monte Carlo model and a subsequent FT.

This is a master README-style notebook which provides a description of the workflow¶