pycomlink is tested with Python 3.9, 3.10 and 3.11. There have been problems with Python 3.8, see https://github.com/pycomlink/pycomlink/pull/120. Many things might work with older version, but there is no support for this.
If you install via pip, there might be problems with some dependencies, though. E.g. the dependency pykrige may only install if scipy, numpy and matplotlib have been installed before.
To run the example notebooks you will also need the Jupyter Notebook
and ipython, both also available via conda or pip.
If you want to clone the repository for developing purposes follow these steps (installation of Jupyter Notebook included):
Note that the links point to static versions of the example notebooks. You can run all these notebook online via mybinder if you click on the “launch binder” buttom at the top.
Wet dry classification using the CNN based on channel 1 and channel 2 of a CML
Parameters:
trsl_channel_1 (iterable of float) – Time series of received signal level of channel 1
trsl_channel_2 (iterable of float) – Time series of received signal level of channel 2
threshold (float or None) – Threshold between 0 and 1 which has to be surpassed to classify a period as ‘wet’.
If None, then no threshold is applied and raw wet probabilities in the range of [0,1] are returned.
batch_size (int) – Batch size for parallel computing. Set to 1 when using a CPU!
verbose (int) – Toggles Keras text output during prediction. Default is off.
Perform wet/dry classification with Rolling Fourier-transform method
Parameters:
rsl (iterable of float) – Time series of received signal level
window_length (int) – Length of the sliding window
threshold (int) – Threshold which has to be surpassed to classifiy a period as ‘wet’
f_divide (float) – Parameter for classification with method Fourier transformation
t_dry_start (int) – Index of starting point dry period
t_dry_stop (int) – Index of end of dry period
dry_length (int) – Length of dry period that will be automatically identified in the
provided rsl time series
mirror (bool (defaults to False)) – Mirroring values in window at end of time series
window (array of float, optional) – Values of window function. If not given a Hamming window function is
applied (Default is None)
Pxx (2-D array of float, optional) – Spectrogram used for the wet/dry classification.
Gets computed if not given (Default is None)
f (array of float, optional) – Frequencies corresponding to the rows in Pxx.
Gets computed if not given. (Default is None)
f_sampling (float, optional) – Sampling frequency (samples per time unit). It is used to calculate
the Fourier frequencies, freqs, in cycles per time unit.
(Default is 1/60.0)
mirror (bool)
Returns:
iterable of int – Time series of wet/dry classification
dict – Dictionary holding information about the classification
Note
Implementation of Rolling Fourier-transform method [2]
Build baseline with constant level during a wet period
Parameters:
trsl (numpy.array or pandas.Series) – Transmitted signal level minus received signal level (TRSL) or
received signal level or t
wet (numpy.array or pandas.Series) – Information if classified index of times series is wet (True)
or dry (False). Note that NaN`s in `wet will lead to NaN`s in
`baseline also after the NaN period since it is then not clear
whether or not there was a change of wet/dry within the NaN period.
n_average_last_dry (int, default = 1) – Number of last baseline values before start of wet event that should
be averaged to get the value of the baseline during the wet event.
Note that this values should not be too large because the baseline
might be at an expected level, e.g. if another wet event is
ending shortly before.
Build baseline with linear interpolation from start till end of wet period
Parameters:
rsl (numpy.array or pandas.Series) –
Received signal level or transmitted signal level minus received
signal level
wet (numpy.array or pandas.Series) – Information if classified index of times series is wet (True)
or dry (False). Note that NaN`s in `wet will lead to NaN`s in
`baseline also after the NaN period since it is then not clear
wheter there was a change of wet/dry within the NaN period.
ignore_nan (bool) – If set to True the last wet/dry state before a NaN will be used for deriving
the baseline. If set to False, the baseline for any wet period which contains
a NaN will be set to NaN for the duration of the wet period. Default is False.
Approximation of parameters a and b for k-R power law
Parameters:
f_GHz (int, float, np.array or xr.DataArray) – Frequency of the microwave link(s) in GHz.
pol (str, np.array or xr.DataArray) – Polarization, that is either ‘horizontal’ for horizontal or ‘vertical’
for vertical. ‘H’, ‘h’ and ‘Horizontal’ as well as ‘V’, ‘v’ and ‘Vertical’
are also allowed. Must have same shape as f_GHz or be a str. If it is a
str, it will be expanded to the shape of f_GHz.
approx_type (str, optional) – Approximation type (the default is ‘ITU_2005’, which implies parameter
approximation using a table recommanded by ITU in 2005. An older version
of 2003 is available via ‘ITU_2003’.)
Returns:
a,b – Parameters of A-R relationship
Return type:
float
Note
The frequency value must be between 1 Ghz and 100 GHz.
The polarization has to be indicated by ‘h’ or ‘H’ for horizontal and
‘v’ or ‘V’ for vertical polarization respectively.
Currently only ‘ITU’ for approx_type is accepted. The approximation makes
use of a table recommanded by ITU [4]. There are two versions available,
P.838-2 (04/2003) and P.838-3 (03/2005).
Calculate rain rate from path-integrated attenuation using the k-R power law
Note that either f_GHz and pol or a and b have to be provided. The former
option calculates the parameters a and b for the k-R power law internally
based on frequency and polarization.
Parameters:
A (float or iterable of float) – Path-integrated attenuation of microwave link signal
L_km (float) – Length of the link in km
f_GHz (float, np.array, or xr.DataArray optional) – Frequency in GHz. If provided together with pol, it will be used to
derive the parameters a and b for the k-R power law.
pol (string, np.array or xr.DataArray optional) – Polarization, that is either ‘horizontal’ for horizontal or ‘vertical’
for vertical. ‘H’, ‘h’ and ‘Horizontal’ as well as ‘V’, ‘v’ and ‘Vertical’
are also allowed. Has to be provided together with f_GHz. It will be
used to derive the parameters a and b for the k-R power law. Must have
same shape as f_GHz or be a str. If it is a str, it will be expanded to
the shape of f_GHz.
a (float, optional) – Parameter of A-R relationship
b (float, optional) – Parameter of A-R relationship
a_b_approximation (string) – Specifies which approximation for the k-R power law shall be used. See the
function a_b for details.
R_min (float) – Minimal rain rate in mm/h. Everything below will be set to zero.
Returns:
Rain rate
Return type:
float or iterable of float
Note
The A-R and k-R relation are defined as
\[A = k L_{km} = aR^{b} L_{km}\]
where A is the path-integrated attenuation in dB and k is the specific
attenuation in dB/km.
Calculate rain rate from attenuation using the A-R Relationship
Parameters:
Ar_max (float or iterable of float) – Attenuation of microwave signal (with min/max measurements of RSL/TSL)
f_GHz (float, optional) – Frequency in GHz
pol (string) – Polarization, default is ‘H’
a (float, optional) – Parameter of A-R relationship
b (float, optional) – Parameter of A-R relationship
L (float) – length of the link
R_min (float) – Minimal rain rate in mm/h. Everything below will be set to zero.
k (int, optional) – number of measurements between two consecutive measurement of rx/tx
Returns:
Rain rate
Return type:
float or iterable of float
Note
Based on: “Empirical Study of the Quantization Bias Effects in
Commercial Microwave Links Min/Max Attenuation
Measurements for Rain Monitoring” by OSTROMETZKY J., ESHEL A.
Calculate wet antenna attenuation according to Leijnse et al. 2008
Calculate the wet antenna attenuation from observed attenuation,
using the method proposed in [2], assuming a rain rate dependent
thin flat water film on the antenna.
The equations proposed in [2] calculate the WAA from the rain rate R.
With CML data the rain rates is not directly available. We need to use
the observed attenuation to derive the WAA. This is done here by building
a look-up-table for the relation between A_obs and WAA, where A_obs is
calculated as A_obs = A_rain + WAA. A_rain is derived from the A-R relation
for the given CML frequency and length.
Parameters:
A_obs (array-like or scalar) – Observed attenuation
f_Hz (array-like or scalar (but only either R or f_Hz can be array)) – Frequency of CML in Hz
pol (string) – Polarization of CML. Has to be either ‘H’ or ‘V’.
L_km (float) – Lenght of CML in kilometer
gamma (float) – Parameter that determines the magnitutde of the water film thickness
delta (float) – Parameter that determines the non-linearity of the relation
between water film thickness and rain rates
n_antenna (float) – Refractive index of antenna material
Calculate wet antenna attenuation according to Pastorek et al. 2021 [3]
(model denoted “KR-alt” in their study, i.e. a variation of the WAA model
suggested by Kharadly and Ross 2001 [4])
Calculate the wet antenna from rain rate explicitly assuming an upper limit
A_max.
Calculate wet antenna attenuation according to Pastorek et al. 2021 [3]
(model denoted “KR-alt” in their study, i.e. a variation of the WAA model
suggested by Kharadly and Ross 2001 [4])
Calculate the wet antenna from rain rate explicitly assuming an upper limit
A_max.
The equation proposed in [3] calculates the WAA from the rain rate R.
With CML data the rain rates is not directly available. We need to use
the observed attenuation to derive the WAA. This is done here by building
a look-up-table for the relation between A_obs and WAA, where A_obs is
calculated as A_obs = A_rain + WAA. A_rain is derived from the A-R relation
for the given CML frequency and length.
Parameters:
A_max (upper bound of WAA (“C” in [3]))
R (array-like or scalar) – Rain rate in mm/h
f_Hz (array-like or scalar (but only either R or f_Hz can be array)) – Frequency of CML in Hz
pol (string) – Polarisation of CML. Must be either ‘H’ or ‘V’.
A decorator to apply CML processing function along the time dimension of DataArrays
This will work if the decorated function takes 1D numpy arrays, which hold the
CML time series data, as arguments. Additional argument are also handled.
Generate a coverage mask with a certain area around all CMLs
Parameters:
cml_list (list) – List of Comlink objects
xgrid (array) – 2D matrix of x locations
ygrid (array) – 2D matrix of y locations
max_dist_from_cml (float) – Maximum distance from a CML path that should be considered as
covered. The units must be the same as for the coordinates of the
CMLs. Hence, if lat-lon is used in decimal degrees, this unit has
also to be used here. Note that the different scaling of lat-lon
degrees for higher latitudes is not accounted for.
Returns:
grid_points_covered_by_cmls – 2D array with size of xgrid and ygrid with True values where
the grid point is within the area considered covered.
invdisttree = Invdisttree( X, z ) – data points, values
interpol = invdisttree( q, nnear=3, eps=0, p=1, weights=None, stat=0 )
interpolates z from the 3 points nearest each query point q;
For example, interpol[ a query point q ]
finds the 3 data points nearest q, at distances d1 d2 d3
and returns the IDW average of the values z1 z2 z3
q may be one point, or a batch of points.
eps: approximate nearest, dist <= (1 + eps) * true nearest
p: use 1 / distance**p
weights: optional multipliers for 1 / distance**p, of the same shape as q
stat: accumulate wsum, wn for average weights
How many nearest neighbors should one take ?
a) start with 8 11 14 .. 28 in 2d 3d 4d .. 10d; see Wendel’s formula
b) make 3 runs with nnear= e.g. 6 8 10, and look at the results –
p=2 weights nearer points more, farther points less.
In 2d, the circles around query points have areas ~ distance**2,
so p=2 is inverse-area weighting. For example,
Similarly, in 3d, p=3 is inverse-volume weighting.
Scaling:
if different X coordinates measure different things, Euclidean distance
can be way off. For example, if X0 is in the range 0 to 1
but X1 0 to 1000, the X1 distances will swamp X0;
rescale the data, i.e. make X0.std() ~= X1.std() .
A nice property of IDW is that it’s scale-free around query points:
if I have values z1 z2 z3 from 3 points at distances d1 d2 d3,
the IDW average
(z1/d1 + z2/d2 + z3/d3)
/ (1/d1 + 1/d2 + 1/d3)
is the same for distances 1 2 3, or 10 20 30 – only the ratios matter.
In contrast, the commonly-used Gaussian kernel exp( - (distance/h)**2 )
is exceedingly sensitive to distance and to h.
Calculate performance metrics for rainfall estimation
This function calculates metrics and statistics relevant to judge
the performance of rainfall estimation. The calculation is based on
two arrays with rainfall values, which should contain rain rates or
rainfall sums. Beware that the units of R_sum… und R_mean… will
depend on your input. The calculation does not take any information on
temporal resolution or aggregation into account!
Parameters:
reference (float array-like) – Rainfall reference
predicted (float array-like) – Predicted rainfall
rainfall_threshold_wet (float) – Rainfall threshold for which reference and predicted are considered
wet if value >= threshold. This threshold only impacts the results
of the performance metrics which are based on the differentiation
between wet and dry periods.
Calculate performance metrics for a wet-dry classification
This function calculates metrics and statistics relevant to judge
the performance of a wet-dry classification. The calculation is based on
two boolean arrays, where wet is True and dry is False.
Parameters:
reference (boolean array-like) – Reference values, with wet being True
predicted (boolean array-like) – Predicted values, with wet being True
Added xarray-wrapper for WAA Leijnse and updated WAA example notebook (by cchwala
in PR #82)
Add CNN-based anomaly detection for CML data (by Glawion in PR#87)
xarray wrapper now uses xr.apply_ufunc to apply processing functions along time
dimension, instead of looping over the channel_id dimension. This should be a lot
more flexible. (by cchwala in PR #89)
The old API using pycomlink.core.Comlink objects has been removed. All processing
functions now work with xarray.DataArrays or pure numpy.ndarray. Most of the
original functions and notebooks from v0.2.x do not work anymore, but the basic parts
have already been refactored so that the full processing chain, from raw CML data
to rainfall fields works in v0.3.0.
Added WAA calculation and test for method proposed by Leijnse et al 2008
Added function to calculate WAA directly from A_obs for Leijnse et al 2008
method.
Added WAA example notebook
Added function to derive attenuation value A_min_max from min/max CML
measurements (these measurements periodically provide the min and max
value over a defined time period, typically 15 minutes).
(by DanSereb in PR #37 and #45)
Added function to derive rain rate R from A_min_max
(by DanSereb in PR #37 and #45)
Added example notebook with simple comparison of processing of
“instantaneous” and “min-max” CML data (by DanSereb in PR #37 and #45)
Added missing kwarg for polarization in calc_A in Processor. Before,
calc_A always used the default polarization for the A-R relation which
leads to rain rate overestimation!
Changed reference values in test for Ordinary Kriging interpolator, because
pykrige v1.4.0 seems to produce slightly different results than v1.3.1
Codebase is Python 3 now, keeping backwards compatibility to Python 2.7
via using the future module.
min-max CML data can now be written to and read from cmlh5. Standard column
names are tx_min, tx_max, rx_min and rx_max. When reading from cmlh5
without specifying dedicated column names, the function tries out the
standard column names for min-max and instantaneous. If it does not find any
match it will print an error message.
Added example file with min-max data for 75 CMLs. This dataset is derived
from the existing example dataset of 75 CMLs with instantaneous measurements.
Added example notebook comparing min-max and instantaneous CML data
Added TravisCI and Codecov and increased the test coverage a little
Extended functionality for append_data. A maximum length or maximum
allowed age for the data can be specified
More options for interpolation. Added option to pass max_distance
for IDW and Added option for resampling in Interpolator
(instead of just doing hourly means of variable R)
Interpolated fields are now always transformed into an xarray.Dataset.
The Dataset is also stored as attribute if the Interpolator object
Improved grid intersection calculation in validator
Complete rewrite of interpolator classes. The old interpolator class
spatial.interpol.Interpolator() is depreciated. New interpolator base classes
for IDW and Kriging have been added together with a convenience inteprolator
for CML data. Usage is showcased in a new example notebook.
Some old functionality has moved to separate files.
resampling to a given DatetimeIndex is now availabel in util.temporal
and will be removed from validatoin.validator.Validation() class soon.
calculation of wet-dry error is now in module validation.stats
calculation of spatial coverage with CMLs was moved to function
spatial.coverage.calc_coverage_mask().
error metric for performance evaluation of wet-dry classification is now
in validation.stats. Errors are now returned with meaningful names as
namedtuples. validation.validator.calc_wet_dry_error() is depreciated now.
setup.py now reads all packages subdirectories correctly
Force integers for shape in nans helper function in stft module
Always use first value of dry_stop timestamp list in stft module.
The old code did not work anyway for a list with length = 1 and would
have failed if dry_stop would have been a scalar value. Now we
assume that we always get a list of values (which should be true for
mlab.find.