Data format and SED data

The data are treated using two classes from the module data_loader

The class jetset.data_loader.Data is in charge of storing the data, giving access to the I/O functionalities, and provides an interface to the astropy Table class (see the astropy table. documentation, for further information)

The class jetset.data_loader.ObsData uses the information stored in jetset.data_loader.Data, and can perform several operations This is the class to use for model fitting and in general in Jetset

  • rebinning (grouping) of the data

  • selection of time ranges

  • selection of datasets

  • transformation from linear to logarithmic representation

  • handling of errors and systematics

Note

old version of data file in non ecsv astropy table format, can be easily converted by importing them using the method described in Importing from an arbitrary ascii file or numpy array to Data object and saving them

import jetset
print('tested on jetset',jetset.__version__)
tested on jetset 1.2.2
import warnings
warnings.filterwarnings('ignore')

import matplotlib
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Data format for Data object

The SED data are internally stored as astropy tables, but it is very easy to import from

  1. ascii files

  2. numpy array in general

once that is clear the data format. The easiest way to understand the data format is to build an empty table to have a look at the structure of the table:

from jetset.data_loader import Data
data=Data(n_rows=10)

we can easily access the astropy table

data.table
Table length=10
xdxydyT_startT_stopULdata_set
HzHzerg / (cm2 s)erg / (cm2 s)MJDMJD
float64float64float64float64float64float64boolbytes16
0.00.00.00.00.00.0False0.0
0.00.00.00.00.00.0False0.0
0.00.00.00.00.00.0False0.0
0.00.00.00.00.00.0False0.0
0.00.00.00.00.00.0False0.0
0.00.00.00.00.00.0False0.0
0.00.00.00.00.00.0False0.0
0.00.00.00.00.00.0False0.0
0.00.00.00.00.00.0False0.0
0.00.00.00.00.00.0False0.0
  • x column is reserved to frequencies (mandatory)

  • y columm is reserved to fluxes (mandatory)

  • dx columm is reserved to the error on the frequency,or bin width

  • dy columm is reserved to the error on the fluxes

  • UL columm is reserved to the flag for Upper Limit

  • T_start and T_stop are used to identify the time range to select data using the class ObsData

  • data_set

data.table['x']
<Column name='x' dtype='float64' unit='Hz' length=10>
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

columns with units are implemented using the Units module of astropy (https://docs.astropy.org/en/stable/units/).

and we can easily access the metadata

data.metadata
OrderedDict([('z', 0),
             ('UL_CL', 0.95),
             ('restframe', 'obs'),
             ('data_scale', 'lin-lin'),
             ('obj_name', 'new-src')])
  • z: the redshift of the object

  • UL_CL: the CL for the UL

  • restframe: possible valuesobs or src, indicating if the data are observed flux, or luminosities, respectively

  • data_scale: possible valueslin-lin or log-log, indicating if the data are in linear or logarithmic scale, respectively

  • obj_name: the name of the object

Note

starting from version 1.1.0 src to obs transformation is available

Loading from astropy table

you can use the default SEDs distributed with the package to get familiar with data handling

from jetset.test_data_helper import  test_SEDs
test_SEDs
['/Users/orion/anaconda3/envs/jetset/lib/python3.8/site-packages/jetset/test_data/SEDs_data/SED_3C345.ecsv',
 '/Users/orion/anaconda3/envs/jetset/lib/python3.8/site-packages/jetset/test_data/SEDs_data/SED_MW_Mrk421_EBL_DEABS.ecsv',
 '/Users/orion/anaconda3/envs/jetset/lib/python3.8/site-packages/jetset/test_data/SEDs_data/SED_MW_Mrk501_EBL_ABS.ecsv',
 '/Users/orion/anaconda3/envs/jetset/lib/python3.8/site-packages/jetset/test_data/SEDs_data/SED_MW_Mrk501_EBL_DEABS.ecsv']

As you can see there are three 3 files. We use in this example the file for Mrk 421, and we use class:jetset.data_loader.Data class

from jetset.data_loader import Data
data=Data.from_file(data_table=test_SEDs[1])
data.table
Table length=110
xdxydyT_startT_stopULdata_set
HzHzerg / (cm2 s)erg / (cm2 s)MJDMJD
float64float64float64float64float64float64boolstr13
2299540000.00.01.3409e-143.91e-160.00.0Falsecampaing-2009
2639697000.00.01.793088e-143.231099e-260.00.0Falsecampaing-2009
4799040000.00.02.3136e-142.4e-160.00.0Falsecampaing-2009
4805039000.00.01.773414e-141.773414e-150.00.0Falsecampaing-2009
4843552000.00.02.77614e-142.615339e-260.00.0Falsecampaing-2009
7698460000.00.03.696e-144.62e-160.00.0Falsecampaing-2009
8267346000.00.02.836267e-142.836267e-150.00.0Falsecampaing-2009
8331867000.00.03.98963e-143.627671e-260.00.0Falsecampaing-2009
8388659000.00.03.16345e-141.931495e-150.00.0Falsecampaing-2009
........................
2.417992e+250.09.754259e-113.560456e-110.00.0Falsecampaing-2009
3.823193e+250.08.199207e-117.050657e-120.00.0Falsecampaing-2009
6.059363e+250.05.614334e-115.793969e-120.00.0Falsecampaing-2009
6.073707e+250.01.14705e-106.573696e-110.00.0Falsecampaing-2009
9.603433e+250.04.662219e-115.097912e-120.00.0Falsecampaing-2009
1.522041e+260.05.221583e-114.89063e-120.00.0Falsecampaing-2009
2.41227e+260.03.66834e-114.682033e-120.00.0Falsecampaing-2009
3.823193e+260.02.247871e-114.343216e-120.00.0Falsecampaing-2009
6.059363e+260.01.972081e-114.407365e-120.00.0Falsecampaing-2009
9.603433e+260.07.994215e-123.469109e-120.00.0Falsecampaing-2009
data.metadata
OrderedDict([('z', 0.0308),
             ('restframe', 'obs'),
             ('data_scale', 'lin-lin'),
             ('obj_name', 'J1104+3812,Mrk421')])

this is an extract of the astropy table saved in the format ascii.ecsv

# %ECSV 0.9
# ---
# datatype:
# - {name: x, unit: Hz, datatype: float64}
# - {name: dx, unit: Hz, datatype: float64}
# - {name: y, unit: erg / (cm2 s), datatype: float64}
# - {name: dy, unit: erg / (cm2 s), datatype: float64}
# - {name: T_start, unit: MJD, datatype: float64}
# - {name: T_stop, unit: MJD, datatype: float64}
# - {name: UL, datatype: bool}
# - {name: data_set, datatype: string}
# meta: !!omap
# - {z: 0.0308}
# - {restframe: obs}
# - {data_scale: lin-lin}
# - {obj_name: 'J1104+3812,Mrk421'}
# schema: astropy-2.0
x dx y dy T_start T_stop UL data_set
2299540000.0 0.0 1.3409e-14 3.91e-16 0.0 0.0 False campaing-2009
2639697000.0 0.0 1.793088e-14 3.231099e-26 0.0 0.0 False campaing-2009
4799040000.0 0.0 2.3136e-14 2.4e-16 0.0 0.0 False campaing-2009

Saving Data object to a file

data.save_file('test.ecsv')

the data can be loaded from the saved table

data=Data.from_file('test.ecsv')
data.table
Table length=110
xdxydyT_startT_stopULdata_set
HzHzerg / (cm2 s)erg / (cm2 s)MJDMJD
float64float64float64float64float64float64boolstr13
2299540000.00.01.3409e-143.91e-160.00.0Falsecampaing-2009
2639697000.00.01.793088e-143.231099e-260.00.0Falsecampaing-2009
4799040000.00.02.3136e-142.4e-160.00.0Falsecampaing-2009
4805039000.00.01.773414e-141.773414e-150.00.0Falsecampaing-2009
4843552000.00.02.77614e-142.615339e-260.00.0Falsecampaing-2009
7698460000.00.03.696e-144.62e-160.00.0Falsecampaing-2009
8267346000.00.02.836267e-142.836267e-150.00.0Falsecampaing-2009
8331867000.00.03.98963e-143.627671e-260.00.0Falsecampaing-2009
8388659000.00.03.16345e-141.931495e-150.00.0Falsecampaing-2009
........................
2.417992e+250.09.754259e-113.560456e-110.00.0Falsecampaing-2009
3.823193e+250.08.199207e-117.050657e-120.00.0Falsecampaing-2009
6.059363e+250.05.614334e-115.793969e-120.00.0Falsecampaing-2009
6.073707e+250.01.14705e-106.573696e-110.00.0Falsecampaing-2009
9.603433e+250.04.662219e-115.097912e-120.00.0Falsecampaing-2009
1.522041e+260.05.221583e-114.89063e-120.00.0Falsecampaing-2009
2.41227e+260.03.66834e-114.682033e-120.00.0Falsecampaing-2009
3.823193e+260.02.247871e-114.343216e-120.00.0Falsecampaing-2009
6.059363e+260.01.972081e-114.407365e-120.00.0Falsecampaing-2009
9.603433e+260.07.994215e-123.469109e-120.00.0Falsecampaing-2009

Importing from an arbitrary ascii file or numpy array to Data object

Assume that your data are stored in an ASCII file named ‘test-ascii.txt’, with - x in the first column with frequency in Hz , - y in the second column with fluxes in erg cm-2 s-1, - dy in the third column with the same units as y - the data are in log-log scale

of course the column number depends on the file that you are using, this is only an example

from jetset.data_loader import Data
import numpy as np

d=np.genfromtxt('test-ascii.txt')
data=Data(n_rows=d.shape[0])
data.set_field('x',d[:,0])
data.set_field('y',d[:,1])
data.set_field('dy',value=d[:,2])

then you can set the meatdata as follows

data.set_meta_data('z',1.02)
data.set_meta_data('restframe','obs')
data.set_meta_data('data_scale','log-log')

of course this method applies if you have a generic 2-dim numpy array.

data.table
Table length=20
xdxydyT_startT_stopULdata_set
HzHzerg / (cm2 s)erg / (cm2 s)MJDMJD
float64float64float64float64float64float64boolbytes16
24.1619670.0-12.4973240.3343760.00.0False0.0
25.1619670.0-12.5121370.6362930.00.0False0.0
23.1619670.0-12.4443460.380480.00.0False0.0
23.6848450.0-12.2579160.1643970.00.0False0.0
22.6848450.0-12.0005410.00.00.0False0.0
15.29003461136250.0-13.32257556229880.1271579263215550.00.0False0.0
15.11058971029920.0-12.74953120329950.407485326578270.00.0False0.0
15.11058971029920.0-12.88081011793880.4109049858364070.00.0False0.0
14.67024585307410.0-12.4772741532890.00.00.0False0.0
14.58883172559420.0-12.8758741543630.00.00.0False0.0
10.6444390.0-12.5157610.0905080.00.0False0.0
10.4771210.0-12.6658680.0727120.00.0False0.0
10.29885307640970.0-13.23224811070940.4866275798412860.00.0False0.0
10.17609125905570.0-13.33260581846180.9350004217151860.00.0False0.0
9.934498451243570.0-13.63671995908360.00.00.0False0.0
9.924279286061880.0-13.55002911125720.00.00.0False0.0
9.685741738602260.0-13.75618188573950.4277426071632140.00.0False0.0
9.361727836017590.0-14.1210686715270.00.00.0False0.0
9.146128035678240.0-14.55472423246560.6821473907352920.00.0False0.0
7.868056361823040.0-15.48271651329720.0259778344818910.00.0False0.0

Importing to Data object from a generic astropy table mapping columns

If you want to use a TABLE with arbitrary column names, you can use an import dictionary, mapping the input name to the target. E.g. assume that you column in the input table column named freq that should target the x column, and another named freq err associated to dx you can simply pass the dictionary to the from_file method:

data=Data.from_file(data_table='your-file',import_dictionary={'freq':'x','freq err':'dx'})

Importing from the ASI ssdc sedtool to Data object

To import data from a data file downloaded from the asi ssdc sedtool: https://tools.ssdc.asi.it/SED/

we can use the importing tool in the jetset.data_loader.Data. We just need to have the file downloaded from the asi ssdc sedtool, and to know the redshift of the object, the scale we selected (lin-lin, or log-log). Assume that we downloaded the data for Mrk421, in observed fluxes and linear scale, and the data are saved in the file ‘MRK421_asdc.txt’, we only have to do:

from jetset.data_loader import Data
data=Data.from_asdc(asdc_sed_file='MRK421_asdc.txt',obj_name='Mrk421',restframe='obs',data_scale='lin-lin',z=0.038)

Note

starting from version 1.1.0 src to obs transformation is available

data.table
Table length=3550
xdxydyT_startT_stopULdata_set
HzHzerg / (cm2 s)erg / (cm2 s)MJDMJD
float64float64float64float64float64float64boolbytes16
1.395e+172.077e+161.3665e-107.8618e-1250569.1374550569.61257False0.0
1.883e+172.805e+161.3231e-105.2986e-1250569.1374550569.61257False0.0
2.542e+173.786e+161.2801e-104.5958e-1250569.1374550569.61257False0.0
3.432e+175.111e+161.1696e-104.4475e-1250569.1374550569.61257False0.0
4.633e+176.901e+161.0488e-102.8152e-1250569.1374550569.61257False0.0
6.255e+179.316e+168.8421e-112.2462e-1250569.1374550569.61257False0.0
8.444e+171.258e+177.2995e-112.3614e-1250569.1374550569.61257False0.0
1.14e+181.698e+175.7982e-112.5232e-1250569.1374550569.61257False0.0
1.539e+182.292e+174.52e-112.9633e-1250569.1374550569.61257False0.0
........................
4850000000.00.02.9604e-142.425e-170.00.0False0.0
1400000000.00.05.0638e-162.31e-1849078.549443.5False0.0
1400000000.00.01.68e-172.296e-1849078.549443.5False0.0
1400000000.00.08.0331e-152.31e-1849078.549443.5False0.0
408000000.00.04.692e-150.00.00.0False0.0
2700000000.00.02.079e-140.00.00.0False0.0
10700000000.00.08.453e-140.00.00.0False0.0
5000000000.00.03.625e-140.00.00.0False0.0
8460000000.00.05.3433e-143.384e-1747941.547941.5False0.0
8400000000.00.05.3054e-140.00.00.0False0.0

Note

When importing data from the src frame, the Data constructor will not convert units, but will assume that input units are erg/s. If this is not the case an error message will be displayed

Building the SED the ObsData object

Once we have a data table built with the class:jetset.data_loader.Data, following one of the method described above, you can create SED data using the jetset.data_loader.ObsData class. In the example we use one of the test SEDs provided by the package:

We start to loading the SED of Mrk 421, and we pass to ObsData directly the path to the file, because this is already in the format that we need and that we have discussed above.

from jetset.data_loader import Data
from jetset.data_loader import ObsData
from jetset.test_data_helper import  test_SEDs

data_table=Data.from_file(test_SEDs[1])
sed_data=ObsData(data_table=data_table)

if you want to use a cosmology model different from the default one please read the Choosing a cosmology model section

As you can see all the meta-data have been properly sourced from the SED file header. You also get information on the length of the data, before and after elimination of duplicated entries, and upper limits

sed_data.table
Table length=110
nu_datadnu_datanuFnu_datadnuFnu_datanu_data_logdnu_data_lognuFnu_data_logdnuFnu_data_logdnuFnu_fakednuFnu_fake_logULzero_errorT_startT_stopdata_set
HzHzerg / (cm2 s)erg / (cm2 s)HzHzerg / (cm2 s)erg / (cm2 s)erg / (cm2 s)MJDMJD
float64float64float64float64float64float64float64float64float64float64boolboolfloat64float64str13
2299540000.00.01.3409e-143.91e-169.3616409684341640.0-13.8726036092233930.0126638185117586272.6818000000000003e-150.2FalseFalse0.00.0campaing-2009
2639697000.00.01.793088e-143.231099e-269.4215540788470520.0-13.7463983958942737.825876176646739e-133.586176e-150.2FalseFalse0.00.0campaing-2009
4799040000.00.02.3136e-142.4e-169.6811543697921590.0-13.6357117243855640.00450512948032418854.627200000000001e-150.2FalseFalse0.00.0campaing-2009
4805039000.00.01.773414e-141.773414e-159.681696916961080.0-13.7511898673730590.043429448190325183.546828e-150.2FalseFalse0.00.0campaing-2009
4843552000.00.02.77614e-142.615339e-269.685163966649870.0-13.5565586363099974.091390549490907e-135.55228e-150.2FalseFalse0.00.0campaing-2009
7698460000.00.03.696e-144.62e-169.8864038575890540.0-13.432268037451930.0054286810237906487.392e-150.2FalseFalse0.00.0campaing-2009
8267346000.00.02.836267e-142.836267e-159.9173661138399730.0-13.5472528880275660.0434294481903251755.672534000000001e-150.2FalseFalse0.00.0campaing-2009
8331867000.00.03.98963e-143.627671e-269.9207423287712540.0-13.3990673791025383.948931348171262e-137.97926e-150.2FalseFalse0.00.0campaing-2009
8388659000.00.03.16345e-141.931495e-159.923692540632310.0-13.4998390254045170.0265165442894220346.3268999999999995e-150.2FalseFalse0.00.0campaing-2009
.............................................
2.417992e+250.09.754259e-113.560456e-1125.383454859650640.0-10.0108057169854340.158524229657970361.9508518000000003e-110.2FalseFalse0.00.0campaing-2009
3.823193e+250.08.199207e-117.050657e-1225.5824262223505270.0-10.0862281491014050.037345824161928531.6398414000000002e-110.2FalseFalse0.00.0campaing-2009
6.059363e+250.05.614334e-115.793969e-1225.782426970680170.0-10.2507017545013320.0448190072948724051.1228668000000001e-110.2FalseFalse0.00.0campaing-2009
6.073707e+250.01.14705e-106.573696e-1125.783453837408980.0-9.940417650755390.248892367247241062.2941000000000003e-110.2FalseFalse0.00.0campaing-2009
9.603433e+250.04.662219e-115.097912e-1225.9824265107935270.0-10.331407330073770.047488010555239269.324438000000001e-120.2FalseFalse0.00.0campaing-2009
1.522041e+260.05.221583e-114.89063e-1226.18242635140560.0-10.2821978142499940.040676814330644561.0443166e-110.2FalseFalse0.00.0campaing-2009
2.41227e+260.03.66834e-114.682033e-1226.382425915801270.0-10.435530418563440.055430551584338637.33668e-120.2FalseFalse0.00.0campaing-2009
3.823193e+260.02.247871e-114.343216e-1226.5824262223505270.0-10.6482286155209830.083912054673685164.495742000000001e-120.2FalseFalse0.00.0campaing-2009
6.059363e+260.01.972081e-114.407365e-1226.782426970680170.0-10.7050752510932930.097059618709045173.944162e-120.2FalseFalse0.00.0campaing-2009
9.603433e+260.07.994215e-123.469109e-1226.9824265107935270.0-11.0972241758084650.188463144388899741.598843e-120.2FalseFalse0.00.0campaing-2009
sed_data.metadata
{'z': 0.0308,
 'obj_name': 'J1104+3812,Mrk421',
 'restframe': 'obs',
 'data_scale': 'lin-lin',
 'UL_CL': 0.95}

Plotting ObsData

We can now plot our SED using the BlazarSEDFit.plot_sedfit.Plot class

from jetset.plot_sedfit import PlotSED
myPlot=PlotSED(sed_data)
../../../_images/Jet_example_load_data_60_0.png

or you can create the object to plot on the fly in this way

myPlot=sed_data.plot_sed()
../../../_images/Jet_example_load_data_62_0.png

you can rescale your plot

myPlot=sed_data.plot_sed()
myPlot.setlim(x_min=1E7,x_max=1E28,y_min=1E-15,y_max=1E-9)
../../../_images/Jet_example_load_data_64_0.png

plotting in the src restframe

myPlot=sed_data.plot_sed(frame='src')
myPlot.setlim(x_min=1E7,x_max=1E28,y_min=1E40,y_max=1E46)
../../../_images/Jet_example_load_data_66_0.png

to have interactive plot in jupyter

if you want to to have interacitve plot in a jupyter notebook use:


%matplotlib notebook

to have interactive plot in jupyter lab use:


%matplotlib widget

Grouping data

As you can see, due to the overlapping of different instruments and to different time snapshots, some points have multiple values. Although this is not a problem for the fit process, you might want to rebin (group) your data. This can be obtained with the following command:

%matplotlib inline

myPlot=sed_data.plot_sed()
sed_data.group_data(bin_width=0.2)
myPlot.add_data_plot(sed_data,label='rebinned')
myPlot.setlim(x_min=1E7,x_max=1E28,y_min=1E-15,y_max=1E-9)
================================================================================

*  binning data  *
---> N bins= 89
---> bin_widht= 0.2
================================================================================
../../../_images/Jet_example_load_data_70_1.png

Handling errors and systematics

Another important issue when dealing with fitting of data, is the proper handling of errors. Typically one might need to add systematics for different reasons:

  • data are not really simultaneous, and you want to add systematics to take this into account

  • data (typically IR up to UV), might have very small errors compared to those at higher energies. This might bias the minimizer to accommodate the parameters in order to fit ‘better’ the low frequencies branch.

For these reasons the package offer the possibility to add systematics

sed_data.add_systematics(0.2,[10.**6,10.**29])
myPlot=sed_data.plot_sed()
myPlot.setlim(x_min=1E7,x_max=1E28,y_min=1E-15,y_max=1E-9)
../../../_images/Jet_example_load_data_72_0.png

with this command we add 20% systematics for data between \(10^{6}<\nu<10^{29}\) Hz

Filtering data sets

we use the show_data_sets() method to have know wich data sets are defined in our table

sed_data.show_data_sets()
current datasets
dataset 0.0

we use show_dataset=True to have the legend of all the datasets

data=Data.from_file(test_SEDs[0])
sed_data=ObsData(data_table=data)
%matplotlib inline
p=sed_data.plot_sed(show_dataset=True)
../../../_images/Jet_example_load_data_78_0.png
sed_data.show_data_sets()
current datasets
dataset -1
dataset 0
dataset 1
dataset 2

we filter out the data set -1 using the filter_data_set() method. Please not with exclude=True we exclude dataset in filters

sed_data.filter_data_set(filters='-1',exclude=True)
sed_data.show_data_sets()
p=sed_data.plot_sed(show_dataset=True)
---> excluding  data_set/s ['-1']
filter -1 192
current datasets
dataset 0
dataset 1
dataset 2
---> data sets left after filtering None
---> data len after filtering=192
current datasets
dataset 0
dataset 1
dataset 2
../../../_images/Jet_example_load_data_81_1.png

we can pass more datasets, comma separated

sed_data.filter_data_set(filters='-1,0',exclude=True)
sed_data.show_data_sets()
p=sed_data.plot_sed(show_dataset=True)
---> excluding  data_set/s ['-1', '0']
filter -1 192
filter 0 57
current datasets
dataset 1
dataset 2
---> data sets left after filtering None
---> data len after filtering=57
current datasets
dataset 1
dataset 2
../../../_images/Jet_example_load_data_83_1.png

we can also use filter_data_set to exclude only the datasets in filters with exclude=False

sed_data.filter_data_set(filters='-1',exclude=True)
sed_data.show_data_sets()
p=sed_data.plot_sed(show_dataset=True)
---> excluding  data_set/s ['-1']
filter -1 192
current datasets
dataset 0
dataset 1
dataset 2
---> data sets left after filtering None
---> data len after filtering=192
current datasets
dataset 0
dataset 1
dataset 2
../../../_images/Jet_example_load_data_85_1.png

we can revert sed_data to the original state with the reset_data() method

sed_data.reset_data()
sed_data.show_data_sets()
p=sed_data.plot_sed(show_dataset=True)
current datasets
dataset -1
dataset 0
dataset 1
dataset 2
../../../_images/Jet_example_load_data_88_1.png

Saving sed_data and loading

you can save and relaod you sed_data

sed_data.save('3C454_data.pkl')
sed_data=ObsData.load('3C454_data.pkl')
p=sed_data.plot_sed(show_dataset=True)
../../../_images/Jet_example_load_data_93_0.png