Statistics::Basic::LeastSquareFit − find the least square fit for two lists

A machine to calculate the Least Square Fit of given vectors x and y.

The module returns the alpha and beta filling this formula:

$y = $beta * $x + $alpha

for a given set of *x* and *y* co-ordinate pairs.

Say you have the set of Cartesian coordinates:

my @points = ( [1,1], [2,2], [3,3], [4,4] );

The simplest way to find the LSF is as follows:

my $lsf = lsf()−>set_size(int @points); $lsf−>insert(@$_) for @points;

Or this way:

my $xv = vector( map {$_−>[0]} @points ); my $yv = vector( map {$_−>[1]} @points ); my $lsf = lsf($xv, $yv);

And then either query the values or print them like so:

print "The LSF for $xv and $yv: $lsf\n"; my ($yint, $slope) = my ($alpha, $beta) = $lsf−>query;

LSF is meant for finding a line of best fit. `$beta` is the slope of the line and `$alpha` is the y−offset. Suppose you want to draw the line. Use these to calculate the `"x"` for a given `"y"` or vice versa:

my $y = $lsf−>y_given_x( 7 ); my $x = $lsf−>x_given_y( 7 );

(Note that `"x_given_y()"` can sometimes produce a divide-by-zero error since it has to divide by the `$beta`.)

Create a 20 point "moving" LSF like so:

use Statistics::Basic qw(:all nofill); my $sth = $dbh−>prepare("select x,y from points where something"); my $len = 20; my $lsf = lsf()−>set_size($len); $sth−>execute or die $dbh−>errstr; $sth−>bind_columns( my ($x, $y) ) or die $dbh−>errstr; my $count = $len; while( $sth−>fetch ) { $lsf−>insert( $x, $y ); if( defined( my ($yint, $slope) = $lsf−>query ) { print "LSF: y= $slope*x + $yint\n"; } # This would also work: # print "$lsf\n" if $lsf−>query_filled; }

This list of methods skips the methods inherited from Statistics::Basic::_TwoVectorBase (things like *insert()*, and *ginsert()*).

new()

Create a new Statistics::Basic::LeastSquareFit object. This function takes two arguments -- which can either be arrayrefs or Statistics::Basic::Vector objects. This function is called when the *leastsquarefirt()* shortcut-function is called.

*query()*

LSF is meant for finding a line of best fit. `$beta` is the slope of the line and `$alpha` is the y−offset.

my ($alpha, $beta) = $lsf−>query;

*y_given_x()*

Automatically calculate the y−value on the line for a given x−value.

my $y = $lsf−>y_given_x( 7 );

*x_given_y()*

Automatically calculate the x−value on the line for a given y−value.

my $x = $lsf−>x_given_y( 7 );

"x_given_y()" can sometimes produce a divide-by-zero error since it has to divide by the `$beta`. This might be helpful:

if( defined( my $x = eval { $lsf−>x_given_y(7) } ) ) { warn "there is no x value for 7"; } else { print "x (given y=7): $x\n"; }

*query_vector1()*

Return the Statistics::Basic::Vector for the first vector used in the computation of alpha and beta.

*query_vector2()*

Return the Statistics::Basic::Vector object for the second vector used in the computation of alpha and beta.

*query_mean1()*

Returns the Statistics::Basic::Mean object for the first vector used in the computation of alpha and beta.

*query_variance1()*

Returns the Statistics::Basic::Variance object for the first vector used in the computation of alpha and beta.

*query_covariance()*

Returns the Statistics::Basic::Covariance object used in the computation of alpha and beta.

This object is overloaded. It tries to return an appropriate string for the calculation, but raises an error in numeric context.

In boolean context, this object is always true (even when empty).

Paul Miller `"<jettero AT cpan DOT org>"`

Copyright 2012 Paul Miller -- Licensed under the LGPL

*perl(1)*, Statistics::Basic, Statistics::Basic::_TwoVectorBase, Statistics::Basic::Vector