Derivative-of-Gaussian filter

The following awk script will apply a gaussian filter to the rate of change ( derivative ) of the data.  This is simpler than doing two separate operations and provides a more accurate numerical estimation of the derivative than using two point “first difference”.

Since we know the analytical form of the derivative of the gaussian function, the derivative can be applied before making a quantised approximation for numerical processing of quantised data.

Analytically, since both the convolution and the derivative are linear operations they can be done in either order and produce identical results. This allows taking the derivative of the gaussian then doing a single convolution without loss of generality.

There is still the inaccuracy of approximating the infinite gaussian function by a finite, quantised kernel but there is no loss of accuracy from also approximating the derivative. Thus the result is mathematically more accurate than using the first difference and then a gaussian filter.

The usual 3-sigma window for a gaussian filter is slightly extended to maintain similar accuracy in the D-o-G filter.

Since the derivate has high-pass properties ( attenuation inversely proportional to frequency ) and the gaussian is a low-pass filter the combined filter is a band pass filter. One common use if for edge detection, for example in medical imagery. Equally if the derivative of the data is being studied and some low-pass filtering is required it provides a one-step solution that does not rely on a digitised approximation of the derivative.

An example of edge detection is shown in determining long-term variations in the date of Arctic sea-ice minima:
https://climategrog.wordpress.com/2016/09/17/the-death-spiral-is-dead/

 

Select and copy the text in the following block to get a copy of the script.

 

#!/bin/awk -f
# 3-sigma derviative of gaussian filter
# sigma, if not given, is 2 data points wide
# usage : ./dgauss.awk filename <sigma=2> <scale_factor=1>
# optional scale_factor simply scales the output
# sigma can be compared to the period of the -3dB point of the filter
# result is centred, ie not phase shifted. resulting dataset shorter by 3*sigma ( half window ) each end
# nov 2012 OFMT="%10.8f"


### ensure data is continuous and equally spaced ### !!



# FWHM (-3dB) = 2*sigma*sqrt(2*ln2)
# half height freq = sigma * 1.1774
# 3 sigma gives 99.7% accuracy for gauss, extend to 4-sigma for deriv-gauss

function abs(x){return (((x < 0.0) ? -x : x) + 0.0)}

BEGIN { OFMT="%10.8f"
# ARGV[1]=filename; argv[0] is script name, hence ARGC>=1

 datacol=2 # fixed

 if (ARGV[1]=="") { print " # usage : ./dgauss.awk filename <sigma=2> <scale_factor=1>" ; exit 1}

 pi= 3.14159265359811668006

 if ( ARGC >3 ) {scaleby=ARGV[3];ARGV[3]=""} else {scaleby=1};
 if ( ARGC >2 ) {sigma=ARGV[2];ARGV[2]=""} else {sigma=2};

 print "# filtering "ARGV[1]" with gaussian-derivative of sigma= ",sigma
 root2pi_sigma=sqrt(2*pi)*sigma;
 two_sig_sqr=2.0*sigma*sigma;

# extend 3 sigma window to 3.5 for derivate-gauss to maintain accuracy.
 gw=(4*sigma)-1;

# calculate normalised gaussian coeffs
# for (tot_wt=j=0;j<=gw;j++) {tot_wt+=gwt[-j]=gwt[j]=exp(-j*j/two_sig_sqr)/ root2pi_sigma};
# tot_wt=0

### NB d/dt(gauss) is asymmetric, so must be time reversed to correctly do convolution.
 for (tot_wt=j=0;j<=gw;j++) {tot_wt+=gwt[j]= j/sigma/sigma*exp(-j*j/two_sig_sqr) / root2pi_sigma };
 tot_wt=2*tot_wt-gwt[0]; # to scale each lobe to |0.5| ; gwt[0]=0
 tot_wt/=scaleby;
 for (j=0;j<=gw;j++) {
# gwt[j]/=tot_wt; # gaussian is already normalised, don't rescale derivate !
 gwt[-j]=-gwt[j]
 };

# strip off last .xxx part of file name
# FIXME : improve this (doesn't work with paths like ../filename)

 if (ARGV[1]=="-") {
 out_file="-";
 }
 else {
 split(ARGV[1],fn,".");
 basename=fn[1]
 out_file=basename"-gauss-deriv"sigma".dat";
 print "# ",out_file >out_file;
 }

 ln=-1;
}

( ($0 !~ /^#/) && ($0 !~ /^[" ",\t]*$/) ) {
 xdata[++ln]=$1;
 ydata[ln]=$datacol;

 if (ln>2*gw)
 {

# cf diff.awk
# print (xdata[ln]+xdata[ln-step])/2. ,ydiff/xdiff
# print (xdata[ln]+xdata[ln-step])/2. ,ydiff/xdiff >> out_file;

 xdiff=(xdata[ln]-xdata[ln-1]);
# ydiff=(ydata[ln]-ydata[ln-step]);

 dgauss=0
 for (j=-2*gw;j<=0;j++) {dgauss+=ydata[ln+j]*gwt[j+gw]}
 if (out_file=="-")
   print xdata[ln-gw],dgauss/xdiff , ydata[ln-gw] 
 else {
   print NR,xdata[ln-gw],dgauss/xdiff ,ydata[ln-gw]
   print xdata[ln-gw],dgauss/xdiff , ydata[ln-gw] >> out_file;
  }
 }
else
 {
# print $1,$2;

 }
}


END {
 if (ARGV[1]=="") { exit 1}
 print "# gausssian-derivative window width = "gw+gw+1",done"
 print "# output file = "out_file
 # sum=0; for (j=-gw;j<=gw;j++) {sum+= abs(gwt[j]);}
 # print "# abs sum of coeffs = ",sum;

}

Triple running mean filter

The following script will call a simple running mean three times with appropriate window size to do effect triple running mean, as described in the article (as amended for the asymmetric kernel to minimise negative leakage):

https://climategrog.wordpress.com/2013/05/19/triple-running-mean-filters/

It requires the runmean.awk script found here:
https://climategrog.wordpress.com/2013/11/02/574/

Select code with mouse to copy elsewhere.

#!/bin/bash

# call runmean.awk three times to compose triple running mean
# usage: ./r3m.sh file window_len   ; default window is 12 data point

if [ "x$1" == "x"  ]; then echo "$0 : err no file name,   usage: $0 filename "
 exit 1;
else fn=$1
fi

if [ "x$2" == "x"  ]; then win=12; else win=$2; fi

#win2=`awk "BEGIN{print $win/ 1.3371}"`
#win3=`awk "BEGIN{print $win/ 1.3371/ 1.3371}"`

# asymmetric stages with following window ratios minimise neg. lobe in freq. response
k=1.15; k2=1.58

win2=`awk "BEGIN{print $win/ "$k"}"`
win3=`awk "BEGIN{print $win/ "$k2" }"`

outfile=`echo $fn | awk '{ print substr($1,1,length($1)-4) }'`
outfile+="-3rm"$win".dat"

echo  "# triple running mean :  $win  $win2  $win3 " > $outfile

# echo $fn; echo $win; echo $win2; echo $win3

cat $fn | ./runmean.awk - $win |  ./runmean.awk - $win2 |  ./runmean.awk - $win3 >> $outfile

#echo "# cat $fn | ./runmean.awk - $win |  ./runmean.awk - $win2 |  ./runmean.awk - $win3 >> $outfile"

echo "# triple running mean :  $win  $win2  $win3 "
echo "# outfile = "$outfile;

Gaussian low-pass filter



Click to enlarge graph.

The graph is scaled to unit frequency (eg 1 Hz or one cycle per year …).

It illustrates how a 159 day gaussian will filter daily data. For monthly data 12*159/365=5.23 , so the nearest would be sigma=5mo . However since that is slightly short of the intended value and gaussian leaks a little, since it is never fully zero, six months would be better to suppress a 12mo cycle.

Code ( use copy / paste within code block ).

There must be NO space before the first line. ie “#!” are the first characters in the file.

#!/bin/awk -f

# pass input through 3 sigma gaussian filter where sigma, if not given, is 2 data points wide
# usage : ./gauss.awk filename <sigma=2> <scale_factor=1>
# optional scale_factor simply scales the output
# sigma can be compared to the period of the -3dB point of the filter
# result is centred, ie not shift. dataset shortened by half window each end

# operates on files with two columns of numeric data, or a single column of data
#  eg date as integer year or floating point decimal
# single column takes line number position to be evenly spaced ordinate variable.
# ensure that data is equally spaced and continuous !!
# nov 2012  OFMT="%10.8f"

# 159d gauss similar to  365d r3m

# FWHM (-3dB) = 2*sigma*sqrt(2*ln2)
# half height freq = sigma * 1.1774
# 3 sigma gives 99.7% accuracy

BEGIN { OFMT="%10.8f"
# ARGV[1]=filename; argv[0] is script name, hence ARGC>=1

  if (ARGV[1]=="") { print " # usage : ./gauss.awk filename <sigma=2> <scale_factor=1>" ; exit 1}

  pi= 3.14159265359811668006

  if  ( ARGC >3 ) {scaleby=ARGV[3];ARGV[3]=""} else {scaleby=1};
  if  ( ARGC >2 ) {sigma=ARGV[2];ARGV[2]=""} else {sigma=2};

  print "# filtering "ARGV[1]" with gaussian of sigma= ",sigma
  root2pi_sigma=sqrt(2*pi)*sigma;
  two_sig_sqr=2.0*sigma*sigma;

  gw=3*sigma-1;  # gauss is approx zero at 3 sigma, use 3 sig window

# calculate normalised gaussian coeffs
  for (tot_wt=j=0;j<=gw;j++) {tot_wt+=gwt[-j]=gwt[j]=exp(-j*j/two_sig_sqr)/ root2pi_sigma};
  tot_wt=2*tot_wt-gwt[0];
  tot_wt/=scaleby;
  for (j=-gw;j<=gw;j++) {gwt[j]/=tot_wt};

# strip off last .xxx part of file name
# improve this  (doesn't work with paths like ../filename)

  if (ARGV[1]=="-") {
    out_file="-";
  }
   else {
    split(ARGV[1],fn,".");
    basename=fn[1]
    out_file=basename"-gauss"sigma".dat";
    print "# ",out_file >out_file;
  }

  ln=-1;
}

( ($0 !~ /^#/) && ($0 !~ /^[" ",\t]*$/)  ) {
  xdata[++ln]=$1;
  ydata[ln]=$2;  

  if (ln>2*gw)
  {
    gauss=0
    for (j=-2*gw;j<=0;j++) {gauss+=ydata[ln+j]*gwt[j+gw]}
    if (out_file=="-")
      print xdata[ln-gw],gauss
    else {
      print NR,xdata[ln-gw],gauss
      print xdata[ln-gw],gauss >> out_file;
    }
  }
}

END {
  if (ARGV[1]=="") { exit 1}
  print "# gausssian window width = "gw+gw+1",done"
  print "# output file = "out_file

# for (i=0;i<=gw;i++){print gwt[i]}
}

Lanczos high-pass filter


The graph shows the frequency response of a 3-lobe Lanczos high-pass filter with central frequencies of 461 and 269 days, respectively.

These are chosen such that a periodicity of 1 year is either fully included or fully excluded in the filter’s pass-band. ie the 269 day filter will pass only frequencies shorter than 365 days and fully block annual cycles. 461 day filter will ensure that everything including the annual signal is unattenuated and excludes anything longer than about 1.6 years.

The equivalent low-pass filter is detailed here:
https://climategrog.wordpress.com/2013/11/28/lanczos-filter-script/

frequency axis is in units of “per year” not “year” as labelled.

Code: ( select text in code block and use copy/paste to get the code )

#!/bin/awk -f

#  usage: ./lanczos_HP.awk filename [-3dB_period=17] [lobes=3]"

# purpose: apply a high-pass filter by convolution of 2nd column of file with lanczos windowed sinc function
# column 1 data should be evenly spaced without gaps.
# optional second parameter is period of 3dB attenuation, as a period expressed  in data intervals
#    default = 17 ; eg. 17 monthly periods : half power at 17mo, close to 100% pass at 12 mo.
#                   to exclude 12mo, use 9 . This will pass 6mo and higher freq unattenuated.
#                   for daily data use 461 and 269 days , respectively.
# optional third parameter is the number of lobes in impulse response. More gives sharper transition but can produce overshoot ( ringing $
# comments at end of output give pass-band and stop band info, this is not sent to output file.
#    default=3; 4 is OK. ### >=5 may produce significant ringing artefacts around sharp, high amplitude changes###



function sinc(x) {
  if (x==0) return 1; else return (sin(x)/x);
}

BEGIN{  OFMT = "%8.6f"

  twopi=6.28318530717959

  if ( ARGC >3 ) {lobes=ARGV[3];ARGV[3]=""} else {lobes=3};
  if ( ARGC >2 ) {period=ARGV[2];ARGV[2]=""} else {period=17};
  if ( ARGC <2) { print "### usage: ./lanczos_HP.awk filename [-3dB_period=17] [lobes=3]"; err=1; exit 1; }
  else {len=ARGV[2]; ARGC=2;}

  pi= 3.14159265359811668006 ;
  halfk=int(period/2*lobes);
  kwid= 2*halfk;         # kernel = -halfk....+halfk = kwid+1 pts ie odd number
  w_taper=twopi/kwid;    # first lanczos lobe -pi to +pi to fill window
  w=twopi/period;        # sinc passes zero pi, 2*pi, 3*pi ....

# calculate normalised kernel coeffs
  for (tot_wt=j=0;j<=halfk;j++) {tot_wt+=gwt[-j]=gwt[j]=sinc(j*w)*sinc(j*w_taper)};
  tot_wt=2*tot_wt-gwt[0];

# for HP filter, invert LP and add unit impulse:
  for (j=-halfk;j<=halfk;j++) {gwt[j]/=-tot_wt};
  gwt[0]+=1;

# strip off last .xxx part of file name
# improve this  (doesn't work with paths like ../filename)


  if (ARGV[1]=="-") {
    out_file="-";
  }
  else {
    split(ARGV[1],fn,".");
    for (i=1;i<length(fn);i++){   # if multiple dots in name, build back up except last part
      if(i==1)basename=fn[i]
      else basename=basename"."fn[i];
    }
    out_file=basename"-lanc"lobes"HP_"period".dat";
    print "# ",out_file >out_file;
    print "# ",out_file
  }

  ln=-1;  # init linenumber counter
}

($0 !~ /^#/)&&($0 != "") {    # strip out comments, headers and blank lines
  xdata[++ln]=$1;
  ydata[ln]=$2;

  if (ln>2*halfk)
  {
    sum=0
    for (j=-2*halfk;j<=0;j++) {sum+=ydata[ln+j]*gwt[j+halfk]}
    if (out_file=="-")
      print xdata[ln-halfk],sum
    else {
      print NR,xdata[ln-halfk],sum
      print xdata[ln-halfk],sum >> out_file;
    }
  } #endif ln

}


END {
    if (err) exit err;
    if (FNR<=kwid) {print " ### insufficient data to fill "kwid+1" point filter buffer"; exit 1}
    print "# "lobes"-lobe lanczos kernel width = "2*halfk+1",done"
    if (lobes==5) printf ("# "lobes"-lobe lanczos HP on %s: zero-pass > %.3f; 99% pass < %.3f \n",ARGV[1],period/.752,period/1.258)
    if (lobes==4) printf ("# "lobes"-lobe lanczos HP on %s: zero-pass > %.3f; 99% pass < %.3f \n",ARGV[1],period/.698,period/1.325)
    if (lobes==3) printf ("# "lobes"-lobe lanczos HP on %s: zero-pass > %.3f; 99% pass < %.3f \n",ARGV[1],period/.588,period/1.436)

    print "# output file = "out_file
# uncomment following to dump kernel (impluse response):
# sum=0; for (j=-halfk;j<=halfk;j++) {sum+= gwt[j]; printf ("%.4f\n", gwt[j])}
# print "# abs sum of coeffs = ",sum;

}



Lanczos low-pass filter



Click to enlarge graph.

frequency axis is in units of “per year” not “year” as labelled.

Equivalent high-pass filter:
https://climategrog.wordpress.com/2013/11/28/lanczos-high-pass-filter/

Discussion: https://climategrog.wordpress.com/?attachment_id=659

Source Code: ( to get code, use browser copy / paste function within code block below).

#!/bin/awk -f

#  usage: ./lanczos_LP.awk filename [50%_attn_period=18] [lobes=3]"

# purpose: convolve 2nd column of file with lanczos windowed sinc function
# this provides low-pass filter with a fast transition from pass-band to stop-band , with minimal ripple in both.
# column 1 data should be evenly spaced without gaps.
# optional second parameter is period of 50% attenuation, as a period expressed in data intervals
# optional third parameter is the number of lobes in impulse response.
# zero pass req period * 1.436 eg 17.23mo or 525 days
#    default=18 ; eg. 18 monthly periods : 50% attenuation at 15mo, close to zero pass at 12 mo. for 3 lobe.
#    lobes=3 ; use 525 for daily ; lobes=5 ; use 461 for 365 data
# console output comments give pass-band and stop band info. this is not sent to output file.

#    More lobes gives sharper transition band but more overshoot distortion / ringing.
#    default=3; 3 and 4 are good. ### >=5 may produce significant ringing artefacts ###


function sinc(x) {
  if (x==0) return 1; else return (sin(x)/x);
}

BEGIN{  OFMT = "%8.6f"


  if ( ARGC >3 ) {lobes=ARGV[3];ARGV[3]=""} else {lobes=3};
  if ( ARGC >2 ) {period=ARGV[2];ARGV[2]=""} else {period=18};
  if ( ARGC <2) { print "### usage: ./lanczos_LP.awk filename [50%_attn_period=18] [lobes=3]"; err=1; exit 1; }
  else {len=ARGV[2]; ARGC=2;}

  pi= 3.14159265359811668006 ;
  twopi=2*pi;
  halfk=int(period/2*lobes);
  kwid= 2*halfk;         # kernel = -halfk....+halfk = kwid+1 pts ie odd number
  w_taper=twopi/kwid;    # first lanczos lobe -pi to +pi to fill window
  w=twopi/period;        # sinc passes zero pi, 2*pi, 3*pi ....

# calculate normalised kernel coeffs
  for (tot_wt=j=0;j<=halfk;j++) {tot_wt+=gwt[-j]=gwt[j]=sinc(j*w)*sinc(j*w_taper)};
  tot_wt=2*tot_wt-gwt[0];
  for (j=-halfk;j<=halfk;j++) {gwt[j]/=tot_wt};

# strip off last .xxx part of file name
# improve this  (doesn't work with paths like ../filename)

  if (ARGV[1]=="-") {
    out_file="-";
  }
  else {
    split(ARGV[1],fn,".");
    for (i=1;i<length(fn);i++){   # if multiple dots in name, build back up except last part
      if(i==1)basename=fn[i]
      else basename=basename"."fn[i];
    }
    out_file=basename"-lanc"lobes"LP_"period".dat";
    print "# ",out_file >out_file;
    print "# ",out_file
  }

  ln=-1;  # init linenumber counter
}

($0 !~ /^#/)&&($0 != "") {    # strip out comments, headers and blank lines
  xdata[++ln]=$1;
  ydata[ln]=$2;

  if (ln>2*halfk)
  {
    sum=0
    for (j=-2*halfk;j<=0;j++) {sum+=ydata[ln+j]*gwt[j+halfk]}
    if (out_file=="-")
      print xdata[ln-halfk],sum
    else {
      print NR,xdata[ln-halfk],sum
      print xdata[ln-halfk],sum >> out_file;
    }
  } #endif ln

}


END {
    if (err) exit err;
    if (FNR<=kwid) {print " ### insufficient data to fill "kwid+1" point filter buffer"; exit 1}
    print "# "lobes"-lobe lanczos kernel width = "2*halfk+1",done"
    if (lobes==5) printf ("# "lobes"-lobe lanczos LP on %s: zero-pass at %.3f; 99% pass > %.3f \n",ARGV[1],period/1.258,period/.752)
    if (lobes==4) printf ("# "lobes"-lobe lanczos LP on %s: zero-pass at %.3f; 99% pass > %.3f \n",ARGV[1],period/1.325,period/.698)
    if (lobes==3) printf ("# "lobes"-lobe lanczos LP on %s: zero-pass at %.3f; 99% pass > %.3f \n",ARGV[1],period/1.436,period/.588)

    print "# output file = "out_file
# uncomment following to dump kernel (impluse response):
# sum=0; for (j=-halfk;j<=halfk;j++) {sum+= gwt[j]; printf ("%.4f\n", gwt[j])}
# print "# abs sum of coeffs = ",sum;

}

simple running-mean script ( as base for triple RM )

The following script will calculate a simple running mean. On its own this is a very poor filter, see accompanying article :
https://climategrog.wordpress.com/2013/05/19/triple-running-mean-filters/

This script is intended for use by the triple-running-mean script available here:
https://climategrog.wordpress.com/2013/11/02/triple-running-mean-script/

Select code with mouse to copy elsewhere.

#!/bin/awk -f

# calculate running mean of width 'window'
# line number taken as x if file if file is a single column of data
# [usage] ./runmean.awk filename <window>
# default window=12
# strips empty and comment lines

# operates on files with two columns of numeric data
#  eg date as integer year or floating point decimal
# check whether data is continuous and evenly spaced !!


BEGIN{  OFMT = "%10.8f"
# ARGV[1]=fn; argv[0] is script name, hence ARGC>=1

  if (ARGV[1]=="") { print " ### error, no input file given." ; err=1; exit 1}

  ln=0;
  if  ( ARGC >2 ) {window=ARGV[2];ARGV[2]=""} else {window=12};
  if (int(window)!=window) {window=int(window+0.5); window_adj=1; }

  print "# filtering "ARGV[1]" with running mean  window = ",window

  if (ARGV[1]=="-") {
    out_file="-";
  }
   else {
    split(ARGV[1],fn,".");
    basename=fn[1]
    out_file=basename"-rmean"window".dat";

    print "# ",out_file >out_file;
  }
}  # end BEGIN


{ # body
if ( ($0 !~ /^#/) && ($0 !~ /^[" ",\t]*$/)  )  {  # strip empty and comment lines

  if ($2==""){  # for single column data files
    xdata[++ln]=NR;
    ydata[ln]=$1;
  }else{
    xdata[++ln]=$1;
    ydata[ln]=$2;
  }

  sum+=ydata[ln];

  if (ln>=window)
  {
    mid_x=(xdata[ln]+xdata[ln-(window-1)])/2;  # -1 to centre correctly
    ymean=(sum)/window;
    sum-= ydata[ln-window+1];  # remove oldest
    if (out_file=="-")
      print mid_x,ymean;
    else {
      print NR,mid_x,ymean
      print mid_x,ymean >> out_file;
    }
  } # endif
 } # non empty
} # end body

END {
  if (err==1) { exit 1}
  if (window_adj==1) print "# warning : window rounded to nearest integer, window  = ",window
  print "# running mean window = "window",done"
  print "# output file = "out_file
}

triple running-mean filter

The following script will call a simple running mean three times with appropriate window size to do effect triple running mean, as described in the article (as amended for the asymmetric kernel to minimise negative leakage):

https://climategrog.wordpress.com/2013/05/19/triple-running-mean-filters/

It requires the runmean.awk script found here:
https://climategrog.wordpress.com/2013/11/02/574/

Select code with mouse to copy elsewhere.

#!/bin/bash

# call runmean.awk three times to compose triple running mean
# usage: ./r3m.sh file window_len   ; default window is 12 data point

if [ "x$1" == "x"  ]; then echo "$0 : err no file name,   usage: $0 filename "
 exit 1;
else fn=$1
fi

if [ "x$2" == "x"  ]; then win=12; else win=$2; fi

#win2=`awk "BEGIN{print $win/ 1.3371}"`
#win3=`awk "BEGIN{print $win/ 1.3371/ 1.3371}"`


# asymmetric stages with following window ratios:
k=1.15; k2=1.58

win2=`awk "BEGIN{print $win/ "$k"}"`
win3=`awk "BEGIN{print $win/ "$k2" }"`

outfile=`echo $fn | awk '{ print substr($1,1,length($1)-4) }'`
outfile+="-3rm"$win".dat"

echo  "# triple running mean :  $win  $win2  $win3 " > $outfile

# echo $fn; echo $win; echo $win2; echo $win3

cat $fn | ./runmean.awk - $win |  ./runmean.awk - $win2 |  ./runmean.awk - $win3 >> $outfile

#echo "# cat $fn | ./runmean.awk - $win |  ./runmean.awk - $win2 |  ./runmean.awk - $win3 >> $outfile"

echo "# triple running mean :  $win  $win2  $win3 "
echo "# outfile = "$outfile;