New upstream version 12.2.8
This commit is contained in:
parent
b022540ee0
commit
271c7a7671
39 changed files with 0 additions and 1731 deletions
65
ruby-statistics/.gitignore
vendored
65
ruby-statistics/.gitignore
vendored
|
@ -1,65 +0,0 @@
|
||||||
*.gem
|
|
||||||
*.rbc
|
|
||||||
/.config
|
|
||||||
/coverage/
|
|
||||||
/InstalledFiles
|
|
||||||
/pkg/
|
|
||||||
/spec/reports/
|
|
||||||
/spec/examples.txt
|
|
||||||
/test/tmp/
|
|
||||||
/test/version_tmp/
|
|
||||||
/tmp/
|
|
||||||
|
|
||||||
# Used by dotenv library to load environment variables.
|
|
||||||
# .env
|
|
||||||
|
|
||||||
## Specific to RubyMotion:
|
|
||||||
.dat*
|
|
||||||
.repl_history
|
|
||||||
build/
|
|
||||||
*.bridgesupport
|
|
||||||
build-iPhoneOS/
|
|
||||||
build-iPhoneSimulator/
|
|
||||||
|
|
||||||
## Specific to RubyMotion (use of CocoaPods):
|
|
||||||
#
|
|
||||||
# We recommend against adding the Pods directory to your .gitignore. However
|
|
||||||
# you should judge for yourself, the pros and cons are mentioned at:
|
|
||||||
# https://guides.cocoapods.org/using/using-cocoapods.html#should-i-check-the-pods-directory-into-source-control
|
|
||||||
#
|
|
||||||
# vendor/Pods/
|
|
||||||
|
|
||||||
## Documentation cache and generated files:
|
|
||||||
/.yardoc/
|
|
||||||
/_yardoc/
|
|
||||||
/doc/
|
|
||||||
/rdoc/
|
|
||||||
|
|
||||||
## Environment normalization:
|
|
||||||
/.bundle/
|
|
||||||
/vendor/bundle
|
|
||||||
/lib/bundler/man/
|
|
||||||
|
|
||||||
# for a library or gem, you might want to ignore these files since the code is
|
|
||||||
# intended to run in multiple environments; otherwise, check them in:
|
|
||||||
# Gemfile.lock
|
|
||||||
.ruby-version
|
|
||||||
.ruby-gemset
|
|
||||||
|
|
||||||
# unless supporting rvm < 1.11.0 or doing something fancy, ignore this:
|
|
||||||
.rvmrc
|
|
||||||
/.bundle/
|
|
||||||
/.yardoc
|
|
||||||
/Gemfile.lock
|
|
||||||
/_yardoc/
|
|
||||||
/coverage/
|
|
||||||
/doc/
|
|
||||||
/pkg/
|
|
||||||
/spec/reports/
|
|
||||||
/tmp/
|
|
||||||
|
|
||||||
# rspec failure tracking
|
|
||||||
.rspec_status
|
|
||||||
|
|
||||||
# byebug
|
|
||||||
.byebug_history
|
|
|
@ -1,2 +0,0 @@
|
||||||
--format documentation
|
|
||||||
--color
|
|
|
@ -1,8 +0,0 @@
|
||||||
sudo: false
|
|
||||||
language: ruby
|
|
||||||
rvm:
|
|
||||||
- 2.3.7
|
|
||||||
- 2.4.4
|
|
||||||
- 2.5.1
|
|
||||||
- 2.6.0
|
|
||||||
before_install: gem update --system && gem install bundler
|
|
|
@ -1,74 +0,0 @@
|
||||||
# Contributor Covenant Code of Conduct
|
|
||||||
|
|
||||||
## Our Pledge
|
|
||||||
|
|
||||||
In the interest of fostering an open and welcoming environment, we as
|
|
||||||
contributors and maintainers pledge to making participation in our project and
|
|
||||||
our community a harassment-free experience for everyone, regardless of age, body
|
|
||||||
size, disability, ethnicity, gender identity and expression, level of experience,
|
|
||||||
nationality, personal appearance, race, religion, or sexual identity and
|
|
||||||
orientation.
|
|
||||||
|
|
||||||
## Our Standards
|
|
||||||
|
|
||||||
Examples of behavior that contributes to creating a positive environment
|
|
||||||
include:
|
|
||||||
|
|
||||||
* Using welcoming and inclusive language
|
|
||||||
* Being respectful of differing viewpoints and experiences
|
|
||||||
* Gracefully accepting constructive criticism
|
|
||||||
* Focusing on what is best for the community
|
|
||||||
* Showing empathy towards other community members
|
|
||||||
|
|
||||||
Examples of unacceptable behavior by participants include:
|
|
||||||
|
|
||||||
* The use of sexualized language or imagery and unwelcome sexual attention or
|
|
||||||
advances
|
|
||||||
* Trolling, insulting/derogatory comments, and personal or political attacks
|
|
||||||
* Public or private harassment
|
|
||||||
* Publishing others' private information, such as a physical or electronic
|
|
||||||
address, without explicit permission
|
|
||||||
* Other conduct which could reasonably be considered inappropriate in a
|
|
||||||
professional setting
|
|
||||||
|
|
||||||
## Our Responsibilities
|
|
||||||
|
|
||||||
Project maintainers are responsible for clarifying the standards of acceptable
|
|
||||||
behavior and are expected to take appropriate and fair corrective action in
|
|
||||||
response to any instances of unacceptable behavior.
|
|
||||||
|
|
||||||
Project maintainers have the right and responsibility to remove, edit, or
|
|
||||||
reject comments, commits, code, wiki edits, issues, and other contributions
|
|
||||||
that are not aligned to this Code of Conduct, or to ban temporarily or
|
|
||||||
permanently any contributor for other behaviors that they deem inappropriate,
|
|
||||||
threatening, offensive, or harmful.
|
|
||||||
|
|
||||||
## Scope
|
|
||||||
|
|
||||||
This Code of Conduct applies both within project spaces and in public spaces
|
|
||||||
when an individual is representing the project or its community. Examples of
|
|
||||||
representing a project or community include using an official project e-mail
|
|
||||||
address, posting via an official social media account, or acting as an appointed
|
|
||||||
representative at an online or offline event. Representation of a project may be
|
|
||||||
further defined and clarified by project maintainers.
|
|
||||||
|
|
||||||
## Enforcement
|
|
||||||
|
|
||||||
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
|
||||||
reported by contacting the project team at ezapata@altavistaed.com. All
|
|
||||||
complaints will be reviewed and investigated and will result in a response that
|
|
||||||
is deemed necessary and appropriate to the circumstances. The project team is
|
|
||||||
obligated to maintain confidentiality with regard to the reporter of an incident.
|
|
||||||
Further details of specific enforcement policies may be posted separately.
|
|
||||||
|
|
||||||
Project maintainers who do not follow or enforce the Code of Conduct in good
|
|
||||||
faith may face temporary or permanent repercussions as determined by other
|
|
||||||
members of the project's leadership.
|
|
||||||
|
|
||||||
## Attribution
|
|
||||||
|
|
||||||
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
|
|
||||||
available at [http://contributor-covenant.org/version/1/4][version]
|
|
||||||
|
|
||||||
[homepage]: http://contributor-covenant.org
|
|
||||||
[version]: http://contributor-covenant.org/version/1/4/
|
|
|
@ -1 +0,0 @@
|
||||||
Bug reports and pull requests are welcome on GitHub at https://github.com/estebanz01/ruby-statistics. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant code of conduct](https://www.contributor-covenant.org/).
|
|
|
@ -1,6 +0,0 @@
|
||||||
source "https://rubygems.org"
|
|
||||||
|
|
||||||
git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
|
|
||||||
|
|
||||||
# Specify your gem's dependencies in statistics.gemspec
|
|
||||||
gemspec
|
|
|
@ -1,21 +0,0 @@
|
||||||
MIT License
|
|
||||||
|
|
||||||
Copyright (c) 2017 Esteban Zapata Rojas
|
|
||||||
|
|
||||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
||||||
of this software and associated documentation files (the "Software"), to deal
|
|
||||||
in the Software without restriction, including without limitation the rights
|
|
||||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
||||||
copies of the Software, and to permit persons to whom the Software is
|
|
||||||
furnished to do so, subject to the following conditions:
|
|
||||||
|
|
||||||
The above copyright notice and this permission notice shall be included in all
|
|
||||||
copies or substantial portions of the Software.
|
|
||||||
|
|
||||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
||||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
||||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
||||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
||||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
||||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
||||||
SOFTWARE.
|
|
|
@ -1,21 +0,0 @@
|
||||||
The MIT License (MIT)
|
|
||||||
|
|
||||||
Copyright (c) 2017 esteban zapata
|
|
||||||
|
|
||||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
||||||
of this software and associated documentation files (the "Software"), to deal
|
|
||||||
in the Software without restriction, including without limitation the rights
|
|
||||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
||||||
copies of the Software, and to permit persons to whom the Software is
|
|
||||||
furnished to do so, subject to the following conditions:
|
|
||||||
|
|
||||||
The above copyright notice and this permission notice shall be included in
|
|
||||||
all copies or substantial portions of the Software.
|
|
||||||
|
|
||||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
||||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
||||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
||||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
||||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
||||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
||||||
THE SOFTWARE.
|
|
|
@ -1,79 +0,0 @@
|
||||||
# Ruby Statistics
|
|
||||||
|
|
||||||
![](https://travis-ci.org/estebanz01/ruby-statistics.svg?branch=master)
|
|
||||||
|
|
||||||
A basic ruby gem that implements some statistical methods, functions and concepts to be used in any ruby environment without depending on any mathematical software like `R`, `Matlab`, `Octave` or similar.
|
|
||||||
|
|
||||||
Unit test runs under the following ruby versions:
|
|
||||||
* Ruby 2.2.
|
|
||||||
* Ruby 2.3.1.
|
|
||||||
* Ruby 2.4.0.
|
|
||||||
* Ruby 2.5.0.
|
|
||||||
|
|
||||||
We got the inspiration from the folks at [JStat](https://github.com/jstat/jstat) and some interesting lectures about [Keystroke dynamics](http://www.biometric-solutions.com/keystroke-dynamics.html).
|
|
||||||
|
|
||||||
Some logic and algorithms are extractions or adaptations from other authors, which are referenced in the comments.
|
|
||||||
This software is released under the MIT License.
|
|
||||||
|
|
||||||
## Installation
|
|
||||||
|
|
||||||
Add this line to your application's Gemfile:
|
|
||||||
|
|
||||||
```ruby
|
|
||||||
gem 'ruby-statistics'
|
|
||||||
```
|
|
||||||
|
|
||||||
And then execute:
|
|
||||||
|
|
||||||
$ bundle
|
|
||||||
|
|
||||||
Or install it yourself as:
|
|
||||||
|
|
||||||
$ gem install ruby-statistics
|
|
||||||
|
|
||||||
## Basic Usage
|
|
||||||
|
|
||||||
just require the `statistics` gem in order to load it. If you don't have defined the `Distribution` namespace, the gem will assign an alias, reducing the number of namespaces needed to use a class.
|
|
||||||
|
|
||||||
Right now you can load:
|
|
||||||
|
|
||||||
* The whole statistics gem. `require 'statistics'`
|
|
||||||
* A namespace. `require 'statistics/distribution'`
|
|
||||||
* A class. `require 'statistics/distribution/normal'`
|
|
||||||
|
|
||||||
Feel free to use the one that is more convenient to you.
|
|
||||||
|
|
||||||
### Hello-World Example
|
|
||||||
```ruby
|
|
||||||
require 'statistics'
|
|
||||||
|
|
||||||
poisson = Distribution::Poisson.new(l) # Using Distribution alias.
|
|
||||||
normal = Statistics::Distribution::StandardNormal.new # Using all namespaces.
|
|
||||||
```
|
|
||||||
|
|
||||||
## Documentation
|
|
||||||
You can find a bit more detailed documentation of all available distributions, tests and functions in the [Documentation Index](https://github.com/estebanz01/ruby-statistics/wiki)
|
|
||||||
|
|
||||||
## Development
|
|
||||||
|
|
||||||
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
|
||||||
|
|
||||||
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
|
||||||
|
|
||||||
## Contributing
|
|
||||||
|
|
||||||
Bug reports and pull requests are welcome on GitHub at https://github.com/estebanz01/ruby-statistics. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
|
|
||||||
|
|
||||||
## License
|
|
||||||
|
|
||||||
The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
|
|
||||||
|
|
||||||
## Code of Conduct
|
|
||||||
|
|
||||||
Everyone interacting in the Statistics project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/estebanz01/ruby-statistics/blob/master/CODE_OF_CONDUCT.md).
|
|
||||||
|
|
||||||
## Contact
|
|
||||||
|
|
||||||
You can contact me via:
|
|
||||||
* [Github](https://github.com/estebanz01)
|
|
||||||
* [Twitter](https://twitter.com/estebanz01)
|
|
|
@ -1,6 +0,0 @@
|
||||||
require "bundler/gem_tasks"
|
|
||||||
require "rspec/core/rake_task"
|
|
||||||
|
|
||||||
RSpec::Core::RakeTask.new(:spec)
|
|
||||||
|
|
||||||
task :default => :spec
|
|
|
@ -1,14 +0,0 @@
|
||||||
#!/usr/bin/env ruby
|
|
||||||
|
|
||||||
require "bundler/setup"
|
|
||||||
require "statistics"
|
|
||||||
|
|
||||||
# You can add fixtures and/or initialization code here to make experimenting
|
|
||||||
# with your gem easier. You can also use a different console, if you like.
|
|
||||||
|
|
||||||
# (If you use this, don't forget to add pry to your Gemfile!)
|
|
||||||
# require "pry"
|
|
||||||
# Pry.start
|
|
||||||
|
|
||||||
require "irb"
|
|
||||||
IRB.start(__FILE__)
|
|
|
@ -1,8 +0,0 @@
|
||||||
#!/usr/bin/env bash
|
|
||||||
set -euo pipefail
|
|
||||||
IFS=$'\n\t'
|
|
||||||
set -vx
|
|
||||||
|
|
||||||
bundle install
|
|
||||||
|
|
||||||
# Do any other automated setup that you need to do here
|
|
|
@ -1,15 +0,0 @@
|
||||||
# TODO: Avoid monkey-patching.
|
|
||||||
module Enumerable
|
|
||||||
def mean
|
|
||||||
self.reduce(:+) / self.length.to_f
|
|
||||||
end
|
|
||||||
|
|
||||||
def variance
|
|
||||||
mean = self.mean
|
|
||||||
self.reduce(0) { |memo, value| memo + ((value - mean) ** 2) } / (self.length - 1).to_f
|
|
||||||
end
|
|
||||||
|
|
||||||
def standard_deviation
|
|
||||||
Math.sqrt(self.variance)
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,120 +0,0 @@
|
||||||
module Math
|
|
||||||
def self.factorial(n)
|
|
||||||
return if n < 0
|
|
||||||
|
|
||||||
n = n.to_i # Only integers.
|
|
||||||
|
|
||||||
return 1 if n == 0 || n == 1
|
|
||||||
Math.gamma(n + 1) # Math.gamma(x) == (n - 1)! for integer values
|
|
||||||
end
|
|
||||||
|
|
||||||
def self.combination(n, r)
|
|
||||||
self.factorial(n)/(self.factorial(r) * self.factorial(n - r)).to_f # n!/(r! * [n - r]!)
|
|
||||||
end
|
|
||||||
|
|
||||||
def self.permutation(n, k)
|
|
||||||
self.factorial(n)/self.factorial(n - k).to_f
|
|
||||||
end
|
|
||||||
|
|
||||||
# Function adapted from the python implementation that exists in https://en.wikipedia.org/wiki/Simpson%27s_rule#Sample_implementation
|
|
||||||
# Finite integral in the interval [a, b] split up in n-intervals
|
|
||||||
def self.simpson_rule(a, b, n, &block)
|
|
||||||
unless n.even?
|
|
||||||
puts "The composite simpson's rule needs even intervals!"
|
|
||||||
return
|
|
||||||
end
|
|
||||||
|
|
||||||
h = (b - a)/n.to_f
|
|
||||||
resA = yield(a)
|
|
||||||
resB = yield(b)
|
|
||||||
|
|
||||||
sum = resA + resB
|
|
||||||
|
|
||||||
(1..n).step(2).each do |number|
|
|
||||||
res = yield(a + number * h)
|
|
||||||
sum += 4 * res
|
|
||||||
end
|
|
||||||
|
|
||||||
(1..(n-1)).step(2).each do |number|
|
|
||||||
res = yield(a + number * h)
|
|
||||||
sum += 2 * res
|
|
||||||
end
|
|
||||||
|
|
||||||
return sum * h / 3.0
|
|
||||||
end
|
|
||||||
|
|
||||||
def self.lower_incomplete_gamma_function(s, x)
|
|
||||||
# The greater the iterations, the better. That's why we are iterating 10_000 * x times
|
|
||||||
self.simpson_rule(0, x, (10_000 * x.round).round) do |t|
|
|
||||||
(t ** (s - 1)) * Math.exp(-t)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
def self.beta_function(x, y)
|
|
||||||
return 1 if x == 1 && y == 1
|
|
||||||
|
|
||||||
(Math.gamma(x) * Math.gamma(y))/Math.gamma(x + y)
|
|
||||||
end
|
|
||||||
|
|
||||||
### This implementation is an adaptation of the incomplete beta function made in C by
|
|
||||||
### Lewis Van Winkle, which released the code under the zlib license.
|
|
||||||
### The whole math behind this code is described in the following post: https://codeplea.com/incomplete-beta-function-c
|
|
||||||
def self.incomplete_beta_function(x, alp, bet)
|
|
||||||
return if x < 0.0
|
|
||||||
return 1.0 if x > 1.0
|
|
||||||
|
|
||||||
tiny = 1.0E-50
|
|
||||||
|
|
||||||
if x > ((alp + 1.0)/(alp + bet + 2.0))
|
|
||||||
return 1.0 - self.incomplete_beta_function(1.0 - x, bet, alp)
|
|
||||||
end
|
|
||||||
|
|
||||||
# To avoid overflow problems, the implementation applies the logarithm properties
|
|
||||||
# to calculate in a faster and safer way the values.
|
|
||||||
lbet_ab = (Math.lgamma(alp)[0] + Math.lgamma(bet)[0] - Math.lgamma(alp + bet)[0]).freeze
|
|
||||||
front = (Math.exp(Math.log(x) * alp + Math.log(1.0 - x) * bet - lbet_ab) / alp.to_f).freeze
|
|
||||||
|
|
||||||
# This is the non-log version of the left part of the formula (before the continuous fraction)
|
|
||||||
# down_left = alp * self.beta_function(alp, bet)
|
|
||||||
# upper_left = (x ** alp) * ((1.0 - x) ** bet)
|
|
||||||
# front = upper_left/down_left
|
|
||||||
|
|
||||||
f, c, d = 1.0, 1.0, 0.0
|
|
||||||
|
|
||||||
returned_value = nil
|
|
||||||
|
|
||||||
# Let's do more iterations than the proposed implementation (200 iters)
|
|
||||||
(0..500).each do |number|
|
|
||||||
m = number/2
|
|
||||||
|
|
||||||
numerator = if number == 0
|
|
||||||
1.0
|
|
||||||
elsif number % 2 == 0
|
|
||||||
(m * (bet - m) * x)/((alp + 2.0 * m - 1.0)* (alp + 2.0 * m))
|
|
||||||
else
|
|
||||||
top = -((alp + m) * (alp + bet + m) * x)
|
|
||||||
down = ((alp + 2.0 * m) * (alp + 2.0 * m + 1.0))
|
|
||||||
|
|
||||||
top/down
|
|
||||||
end
|
|
||||||
|
|
||||||
d = 1.0 + numerator * d
|
|
||||||
d = tiny if d.abs < tiny
|
|
||||||
d = 1.0 / d
|
|
||||||
|
|
||||||
c = 1.0 + numerator / c
|
|
||||||
c = tiny if c.abs < tiny
|
|
||||||
|
|
||||||
cd = (c*d).freeze
|
|
||||||
f = f * cd
|
|
||||||
|
|
||||||
|
|
||||||
if (1.0 - cd).abs < 1.0E-10
|
|
||||||
returned_value = front * (f - 1.0)
|
|
||||||
break
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
returned_value
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,7 +0,0 @@
|
||||||
require File.dirname(__FILE__) + '/enumerable'
|
|
||||||
require File.dirname(__FILE__) + '/math'
|
|
||||||
Dir[ File.dirname(__FILE__) + '/statistics/**/*.rb'].each {|file| require file }
|
|
||||||
|
|
||||||
module Statistics
|
|
||||||
# Your code goes here...
|
|
||||||
end
|
|
|
@ -1,11 +0,0 @@
|
||||||
Dir[File.dirname(__FILE__) + '/distribution/**/*.rb'].each {|file| require file }
|
|
||||||
|
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
# If Distribution is not defined, setup alias.
|
|
||||||
if defined?(Statistics) && !(defined?(Distribution))
|
|
||||||
Distribution = Statistics::Distribution
|
|
||||||
end
|
|
|
@ -1,35 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
class Bernoulli
|
|
||||||
def self.density_function(n, p)
|
|
||||||
return if n != 0 && n != 1 # The support of the distribution is n = {0, 1}.
|
|
||||||
|
|
||||||
case n
|
|
||||||
when 0 then 1.0 - p
|
|
||||||
when 1 then p
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
def self.cumulative_function(n, p)
|
|
||||||
return if n != 0 && n != 1 # The support of the distribution is n = {0, 1}.
|
|
||||||
|
|
||||||
case n
|
|
||||||
when 0 then 1.0 - p
|
|
||||||
when 1 then 1.0
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
def self.variance(p)
|
|
||||||
p * (1.0 - p)
|
|
||||||
end
|
|
||||||
|
|
||||||
def self.skewness(p)
|
|
||||||
(1.0 - 2.0*p).to_f / Math.sqrt(p * (1.0 - p))
|
|
||||||
end
|
|
||||||
|
|
||||||
def self.kurtosis(p)
|
|
||||||
(6.0 * (p ** 2) - (6 * p) + 1) / (p * (1.0 - p))
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,36 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
class Beta
|
|
||||||
attr_accessor :alpha, :beta
|
|
||||||
|
|
||||||
def initialize(alp, bet)
|
|
||||||
self.alpha = alp.to_f
|
|
||||||
self.beta = bet.to_f
|
|
||||||
end
|
|
||||||
|
|
||||||
def cumulative_function(value)
|
|
||||||
Math.incomplete_beta_function(value, alpha, beta)
|
|
||||||
end
|
|
||||||
|
|
||||||
def density_function(value)
|
|
||||||
return 0 if value < 0 || value > 1 # Density function defined in the [0,1] interval
|
|
||||||
|
|
||||||
num = (value**(alpha - 1)) * ((1 - value)**(beta - 1))
|
|
||||||
den = Math.beta_function(alpha, beta)
|
|
||||||
|
|
||||||
num/den
|
|
||||||
end
|
|
||||||
|
|
||||||
def mode
|
|
||||||
return unless alpha > 1 && beta > 1
|
|
||||||
|
|
||||||
(alpha - 1)/(alpha + beta - 2)
|
|
||||||
end
|
|
||||||
|
|
||||||
def mean
|
|
||||||
return if alpha + beta == 0
|
|
||||||
alpha / (alpha + beta)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,49 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
class Binomial
|
|
||||||
attr_accessor :number_of_trials, :probability_per_trial
|
|
||||||
def initialize(n, p)
|
|
||||||
self.number_of_trials = n.to_i
|
|
||||||
self.probability_per_trial = p
|
|
||||||
end
|
|
||||||
|
|
||||||
def probability_mass_function(k)
|
|
||||||
return if k < 0 || k > number_of_trials
|
|
||||||
k = k.to_i
|
|
||||||
|
|
||||||
Math.combination(number_of_trials, k) *
|
|
||||||
(probability_per_trial ** k) * ((1 - probability_per_trial) ** (number_of_trials - k))
|
|
||||||
end
|
|
||||||
|
|
||||||
def cumulative_function(k)
|
|
||||||
return if k < 0 || k > number_of_trials
|
|
||||||
k = k.to_i
|
|
||||||
|
|
||||||
p = 1 - probability_per_trial
|
|
||||||
Math.incomplete_beta_function(p, number_of_trials - k, 1 + k)
|
|
||||||
end
|
|
||||||
|
|
||||||
def mean
|
|
||||||
number_of_trials * probability_per_trial
|
|
||||||
end
|
|
||||||
|
|
||||||
def variance
|
|
||||||
mean * (1 - probability_per_trial)
|
|
||||||
end
|
|
||||||
|
|
||||||
def mode
|
|
||||||
test = (number_of_trials + 1) * probability_per_trial
|
|
||||||
|
|
||||||
returned = if test == 0 || (test % 1 != 0)
|
|
||||||
test.floor
|
|
||||||
elsif (test % 1 == 0) && (test >= 1 && test <= number_of_trials)
|
|
||||||
[test, test - 1]
|
|
||||||
elsif test == number_of_trials + 1
|
|
||||||
number_of_trials
|
|
||||||
end
|
|
||||||
|
|
||||||
returned
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,37 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
class ChiSquared
|
|
||||||
attr_accessor :degrees_of_freedom
|
|
||||||
|
|
||||||
alias_method :mean, :degrees_of_freedom
|
|
||||||
|
|
||||||
def initialize(k)
|
|
||||||
self.degrees_of_freedom = k
|
|
||||||
end
|
|
||||||
|
|
||||||
def cumulative_function(value)
|
|
||||||
k = degrees_of_freedom/2.0
|
|
||||||
Math.lower_incomplete_gamma_function(k, value/2.0)/Math.gamma(k)
|
|
||||||
end
|
|
||||||
|
|
||||||
def density_function(value)
|
|
||||||
return 0 if value < 0
|
|
||||||
|
|
||||||
common = degrees_of_freedom/2.0
|
|
||||||
|
|
||||||
left_down = (2 ** common) * Math.gamma(common)
|
|
||||||
right = (value ** (common - 1)) * Math.exp(-(value/2.0))
|
|
||||||
|
|
||||||
(1.0/left_down) * right
|
|
||||||
end
|
|
||||||
|
|
||||||
def mode
|
|
||||||
[degrees_of_freedom - 2, 0].max
|
|
||||||
end
|
|
||||||
|
|
||||||
def variance
|
|
||||||
degrees_of_freedom * 2
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,26 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
class Empirical
|
|
||||||
attr_accessor :samples
|
|
||||||
|
|
||||||
def initialize(samples:)
|
|
||||||
self.samples = samples
|
|
||||||
end
|
|
||||||
|
|
||||||
# Formula grabbed from here: https://statlect.com/asymptotic-theory/empirical-distribution
|
|
||||||
def cumulative_function(x:)
|
|
||||||
cumulative_sum = samples.reduce(0) do |summation, sample|
|
|
||||||
summation += if sample <= x
|
|
||||||
1
|
|
||||||
else
|
|
||||||
0
|
|
||||||
end
|
|
||||||
|
|
||||||
summation
|
|
||||||
end
|
|
||||||
|
|
||||||
cumulative_sum / samples.size.to_f
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,46 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
class F
|
|
||||||
attr_accessor :d1, :d2 # Degrees of freedom #1 and #2
|
|
||||||
|
|
||||||
def initialize(k, j)
|
|
||||||
self.d1 = k
|
|
||||||
self.d2 = j
|
|
||||||
end
|
|
||||||
|
|
||||||
# Formula extracted from http://www.itl.nist.gov/div898/handbook/eda/section3/eda3665.htm#CDF
|
|
||||||
def cumulative_function(value)
|
|
||||||
k = d2/(d2 + d1 * value.to_f)
|
|
||||||
|
|
||||||
1 - Math.incomplete_beta_function(k, d2/2.0, d1/2.0)
|
|
||||||
end
|
|
||||||
|
|
||||||
def density_function(value)
|
|
||||||
return if d1 < 0 || d2 < 0 # F-pdf is well defined for the [0, +infinity) interval.
|
|
||||||
|
|
||||||
val = value.to_f
|
|
||||||
upper = ((d1 * val) ** d1) * (d2**d2)
|
|
||||||
lower = (d1 * val + d2) ** (d1 + d2)
|
|
||||||
up = Math.sqrt(upper/lower.to_f)
|
|
||||||
down = val * Math.beta_function(d1/2.0, d2/2.0)
|
|
||||||
|
|
||||||
up/down.to_f
|
|
||||||
end
|
|
||||||
|
|
||||||
def mean
|
|
||||||
return if d2 <= 2
|
|
||||||
|
|
||||||
d2/(d2 - 2).to_f
|
|
||||||
end
|
|
||||||
|
|
||||||
def mode
|
|
||||||
return if d1 <= 2
|
|
||||||
|
|
||||||
left = (d1 - 2)/d1.to_f
|
|
||||||
right = d2/(d2 + 2).to_f
|
|
||||||
|
|
||||||
left * right
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,76 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
class Geometric
|
|
||||||
attr_accessor :probability_of_success, :always_success_allowed
|
|
||||||
|
|
||||||
def initialize(p, always_success: false)
|
|
||||||
self.probability_of_success = p.to_f
|
|
||||||
self.always_success_allowed = always_success
|
|
||||||
end
|
|
||||||
|
|
||||||
def density_function(k)
|
|
||||||
k = k.to_i
|
|
||||||
|
|
||||||
if always_success_allowed
|
|
||||||
return if k < 0
|
|
||||||
|
|
||||||
((1.0 - probability_of_success) ** k) * probability_of_success
|
|
||||||
else
|
|
||||||
return if k <= 0
|
|
||||||
|
|
||||||
((1.0 - probability_of_success) ** (k - 1.0)) * probability_of_success
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
def cumulative_function(k)
|
|
||||||
k = k.to_i
|
|
||||||
|
|
||||||
if always_success_allowed
|
|
||||||
return if k < 0
|
|
||||||
|
|
||||||
1.0 - ((1.0 - probability_of_success) ** (k + 1.0))
|
|
||||||
else
|
|
||||||
return if k <= 0
|
|
||||||
|
|
||||||
1.0 - ((1.0 - probability_of_success) ** k)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
def mean
|
|
||||||
if always_success_allowed
|
|
||||||
(1.0 - probability_of_success) / probability_of_success
|
|
||||||
else
|
|
||||||
1.0 / probability_of_success
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
def median
|
|
||||||
if always_success_allowed
|
|
||||||
(-1.0 / Math.log2(1.0 - probability_of_success)).ceil - 1.0
|
|
||||||
else
|
|
||||||
(-1.0 / Math.log2(1.0 - probability_of_success)).ceil
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
def mode
|
|
||||||
if always_success_allowed
|
|
||||||
0.0
|
|
||||||
else
|
|
||||||
1.0
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
def variance
|
|
||||||
(1.0 - probability_of_success) / (probability_of_success ** 2)
|
|
||||||
end
|
|
||||||
|
|
||||||
def skewness
|
|
||||||
(2.0 - probability_of_success) / Math.sqrt(1.0 - probability_of_success)
|
|
||||||
end
|
|
||||||
|
|
||||||
def kurtosis
|
|
||||||
6.0 + ((probability_of_success ** 2) / (1.0 - probability_of_success))
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,51 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
class LogSeries
|
|
||||||
def self.density_function(k, p)
|
|
||||||
return if k <= 0
|
|
||||||
k = k.to_i
|
|
||||||
|
|
||||||
left = (-1.0 / Math.log(1.0 - p))
|
|
||||||
right = (p ** k).to_f
|
|
||||||
|
|
||||||
left * right / k
|
|
||||||
end
|
|
||||||
|
|
||||||
def self.cumulative_function(k, p)
|
|
||||||
return if k <= 0
|
|
||||||
|
|
||||||
# Sadly, the incomplete beta function is converging
|
|
||||||
# too fast to zero and breaking the calculation on logs.
|
|
||||||
# So, we default to the basic definition of the CDF which is
|
|
||||||
# the integral (-Inf, K) of the PDF, with P(X <= x) which can
|
|
||||||
# be solved as a summation of all PDFs from 1 to K. Note that the summation approach
|
|
||||||
# only applies to discrete distributions.
|
|
||||||
#
|
|
||||||
# right = Math.incomplete_beta_function(p, (k + 1).floor, 0) / Math.log(1.0 - p)
|
|
||||||
# 1.0 + right
|
|
||||||
|
|
||||||
result = 0.0
|
|
||||||
1.upto(k) do |number|
|
|
||||||
result += self.density_function(number, p)
|
|
||||||
end
|
|
||||||
|
|
||||||
result
|
|
||||||
end
|
|
||||||
|
|
||||||
def self.mode
|
|
||||||
1.0
|
|
||||||
end
|
|
||||||
|
|
||||||
def self.mean(p)
|
|
||||||
(-1.0 / Math.log(1.0 - p)) * (p / (1.0 - p))
|
|
||||||
end
|
|
||||||
|
|
||||||
def self.variance(p)
|
|
||||||
up = p + Math.log(1.0 - p)
|
|
||||||
down = ((1.0 - p) ** 2) * (Math.log(1.0 - p) ** 2)
|
|
||||||
|
|
||||||
(-1.0 * p) * (up / down.to_f)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,51 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
class NegativeBinomial
|
|
||||||
attr_accessor :number_of_failures, :probability_per_trial
|
|
||||||
|
|
||||||
def initialize(r, p)
|
|
||||||
self.number_of_failures = r.to_i
|
|
||||||
self.probability_per_trial = p
|
|
||||||
end
|
|
||||||
|
|
||||||
def probability_mass_function(k)
|
|
||||||
return if number_of_failures < 0 || k < 0 || k > number_of_failures
|
|
||||||
|
|
||||||
left = Math.combination(k + number_of_failures - 1, k)
|
|
||||||
right = ((1 - probability_per_trial) ** number_of_failures) * (probability_per_trial ** k)
|
|
||||||
|
|
||||||
left * right
|
|
||||||
end
|
|
||||||
|
|
||||||
def cumulative_function(k)
|
|
||||||
return if k < 0 || k > number_of_failures
|
|
||||||
k = k.to_i
|
|
||||||
|
|
||||||
1.0 - Math.incomplete_beta_function(probability_per_trial, k + 1, number_of_failures)
|
|
||||||
end
|
|
||||||
|
|
||||||
def mean
|
|
||||||
(probability_per_trial * number_of_failures)/(1 - probability_per_trial).to_f
|
|
||||||
end
|
|
||||||
|
|
||||||
def variance
|
|
||||||
(probability_per_trial * number_of_failures)/((1 - probability_per_trial) ** 2).to_f
|
|
||||||
end
|
|
||||||
|
|
||||||
def skewness
|
|
||||||
(1 + probability_per_trial).to_f / Math.sqrt(probability_per_trial * number_of_failures)
|
|
||||||
end
|
|
||||||
|
|
||||||
def mode
|
|
||||||
if number_of_failures > 1
|
|
||||||
up = probability_per_trial * (number_of_failures - 1)
|
|
||||||
down = (1 - probability_per_trial).to_f
|
|
||||||
|
|
||||||
(up/down).floor
|
|
||||||
elsif number_of_failures <= 1
|
|
||||||
0.0
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,139 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
class Normal
|
|
||||||
attr_accessor :mean, :standard_deviation, :variance
|
|
||||||
alias_method :mode, :mean
|
|
||||||
|
|
||||||
def initialize(avg, std)
|
|
||||||
self.mean = avg.to_f
|
|
||||||
self.standard_deviation = std.to_f
|
|
||||||
self.variance = std.to_f**2
|
|
||||||
end
|
|
||||||
|
|
||||||
def cumulative_function(value)
|
|
||||||
(1/2.0) * (1.0 + Math.erf((value - mean)/(standard_deviation * Math.sqrt(2.0))))
|
|
||||||
end
|
|
||||||
|
|
||||||
def density_function(value)
|
|
||||||
return 0 if standard_deviation <= 0
|
|
||||||
|
|
||||||
up_right = (value - mean)**2.0
|
|
||||||
down_right = 2.0 * variance
|
|
||||||
right = Math.exp(-(up_right/down_right))
|
|
||||||
left_down = Math.sqrt(2.0 * Math::PI * variance)
|
|
||||||
left_up = 1.0
|
|
||||||
|
|
||||||
(left_up/(left_down) * right)
|
|
||||||
end
|
|
||||||
|
|
||||||
## Marsaglia polar method implementation for random gaussian (normal) number generation.
|
|
||||||
# References:
|
|
||||||
# https://en.wikipedia.org/wiki/Marsaglia_polar_method
|
|
||||||
# https://math.stackexchange.com/questions/69245/transform-uniform-distribution-to-normal-distribution-using-lindeberg-l%C3%A9vy-clt
|
|
||||||
# https://www.projectrhea.org/rhea/index.php/The_principles_for_how_to_generate_random_samples_from_a_Gaussian_distribution
|
|
||||||
|
|
||||||
def random(elements: 1, seed: Random.new_seed)
|
|
||||||
results = []
|
|
||||||
|
|
||||||
# Setup seed
|
|
||||||
srand(seed)
|
|
||||||
|
|
||||||
# Number of random numbers to be generated.
|
|
||||||
elements.times do
|
|
||||||
x, y, r = 0.0, 0.0, 0.0
|
|
||||||
|
|
||||||
# Find an (x, y) point in the x^2 + y^2 < 1 circumference.
|
|
||||||
loop do
|
|
||||||
x = 2.0 * rand - 1.0
|
|
||||||
y = 2.0 * rand - 1.0
|
|
||||||
|
|
||||||
r = (x ** 2) + (y ** 2)
|
|
||||||
|
|
||||||
break unless r >= 1.0 || r == 0
|
|
||||||
end
|
|
||||||
|
|
||||||
# Project the random point to the required random distance
|
|
||||||
r = Math.sqrt(-2.0 * Math.log(r) / r)
|
|
||||||
|
|
||||||
# Transform the random distance to a gaussian value and append it to the results array
|
|
||||||
results << mean + x * r * standard_deviation
|
|
||||||
end
|
|
||||||
|
|
||||||
if elements == 1
|
|
||||||
results.first
|
|
||||||
else
|
|
||||||
results
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
class StandardNormal < Normal
|
|
||||||
def initialize
|
|
||||||
super(0, 1) # Mean = 0, Std = 1
|
|
||||||
end
|
|
||||||
|
|
||||||
def density_function(value)
|
|
||||||
pow = (value**2)/2.0
|
|
||||||
euler = Math.exp(-pow)
|
|
||||||
|
|
||||||
euler/Math.sqrt(2 * Math::PI)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
# Inverse Standard Normal distribution:
|
|
||||||
# References:
|
|
||||||
# https://en.wikipedia.org/wiki/Inverse_distribution
|
|
||||||
# http://www.source-code.biz/snippets/vbasic/9.htm
|
|
||||||
class InverseStandardNormal < StandardNormal
|
|
||||||
A1 = -39.6968302866538
|
|
||||||
A2 = 220.946098424521
|
|
||||||
A3 = -275.928510446969
|
|
||||||
A4 = 138.357751867269
|
|
||||||
A5 = -30.6647980661472
|
|
||||||
A6 = 2.50662827745924
|
|
||||||
B1 = -54.4760987982241
|
|
||||||
B2 = 161.585836858041
|
|
||||||
B3 = -155.698979859887
|
|
||||||
B4 = 66.8013118877197
|
|
||||||
B5 = -13.2806815528857
|
|
||||||
C1 = -7.78489400243029E-03
|
|
||||||
C2 = -0.322396458041136
|
|
||||||
C3 = -2.40075827716184
|
|
||||||
C4 = -2.54973253934373
|
|
||||||
C5 = 4.37466414146497
|
|
||||||
C6 = 2.93816398269878
|
|
||||||
D1 = 7.78469570904146E-03
|
|
||||||
D2 = 0.32246712907004
|
|
||||||
D3 = 2.445134137143
|
|
||||||
D4 = 3.75440866190742
|
|
||||||
P_LOW = 0.02425
|
|
||||||
P_HIGH = 1 - P_LOW
|
|
||||||
|
|
||||||
def density_function(_)
|
|
||||||
raise NotImplementedError
|
|
||||||
end
|
|
||||||
|
|
||||||
def random(elements: 1, seed: Random.new_seed)
|
|
||||||
raise NotImplementedError
|
|
||||||
end
|
|
||||||
|
|
||||||
def cumulative_function(value)
|
|
||||||
return if value < 0.0 || value > 1.0
|
|
||||||
return -1.0 * Float::INFINITY if value.zero?
|
|
||||||
return Float::INFINITY if value == 1.0
|
|
||||||
|
|
||||||
if value < P_LOW
|
|
||||||
q = Math.sqrt((Math.log(value) * -2.0))
|
|
||||||
(((((C1 * q + C2) * q + C3) * q + C4) * q + C5) * q + C6) / ((((D1 * q + D2) * q + D3) * q + D4) * q + 1.0)
|
|
||||||
elsif value <= P_HIGH
|
|
||||||
q = value - 0.5
|
|
||||||
r = q ** 2
|
|
||||||
(((((A1 * r + A2) * r + A3) * r + A4) * r + A5) * r + A6) * q / (((((B1 * r + B2) * r + B3) * r + B4) * r + B5) * r + 1.0)
|
|
||||||
else
|
|
||||||
q = Math.sqrt((Math.log(1 - value) * -2.0))
|
|
||||||
- (((((C1 * q + C2) * q + C3) * q + C4) * q + C5) * q + C6) / ((((D1 * q + D2) * q + D3) * q + D4) * q + 1)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,38 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
class Poisson
|
|
||||||
attr_accessor :expected_number_of_occurrences
|
|
||||||
|
|
||||||
alias_method :mean, :expected_number_of_occurrences
|
|
||||||
alias_method :variance, :expected_number_of_occurrences
|
|
||||||
|
|
||||||
def initialize(l)
|
|
||||||
self.expected_number_of_occurrences = l
|
|
||||||
end
|
|
||||||
|
|
||||||
def probability_mass_function(k)
|
|
||||||
return if k < 0 || expected_number_of_occurrences < 0
|
|
||||||
|
|
||||||
k = k.to_i
|
|
||||||
|
|
||||||
upper = (expected_number_of_occurrences ** k) * Math.exp(-expected_number_of_occurrences)
|
|
||||||
lower = Math.factorial(k)
|
|
||||||
|
|
||||||
upper/lower.to_f
|
|
||||||
end
|
|
||||||
|
|
||||||
def cumulative_function(k)
|
|
||||||
return if k < 0 || expected_number_of_occurrences < 0
|
|
||||||
|
|
||||||
k = k.to_i
|
|
||||||
|
|
||||||
upper = Math.lower_incomplete_gamma_function((k + 1).floor, expected_number_of_occurrences)
|
|
||||||
lower = Math.factorial(k.floor)
|
|
||||||
|
|
||||||
# We need the right tail, i.e.: The upper incomplete gamma function. This can be
|
|
||||||
# achieved by doing a substraction between 1 and the lower incomplete gamma function.
|
|
||||||
1 - (upper/lower.to_f)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,82 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
class TStudent
|
|
||||||
attr_accessor :degrees_of_freedom
|
|
||||||
attr_reader :mode
|
|
||||||
|
|
||||||
def initialize(v)
|
|
||||||
self.degrees_of_freedom = v
|
|
||||||
@mode = 0
|
|
||||||
end
|
|
||||||
|
|
||||||
### Extracted from https://codeplea.com/incomplete-beta-function-c
|
|
||||||
### This function is shared under zlib license and the author is Lewis Van Winkle
|
|
||||||
def cumulative_function(value)
|
|
||||||
upper = (value + Math.sqrt(value * value + degrees_of_freedom))
|
|
||||||
lower = (2.0 * Math.sqrt(value * value + degrees_of_freedom))
|
|
||||||
|
|
||||||
x = upper/lower
|
|
||||||
|
|
||||||
alpha = degrees_of_freedom/2.0
|
|
||||||
beta = degrees_of_freedom/2.0
|
|
||||||
|
|
||||||
Math.incomplete_beta_function(x, alpha, beta)
|
|
||||||
end
|
|
||||||
|
|
||||||
def density_function(value)
|
|
||||||
return if degrees_of_freedom <= 0
|
|
||||||
|
|
||||||
upper = Math.gamma((degrees_of_freedom + 1)/2.0)
|
|
||||||
lower = Math.sqrt(degrees_of_freedom * Math::PI) * Math.gamma(degrees_of_freedom/2.0)
|
|
||||||
left = upper/lower
|
|
||||||
right = (1 + ((value ** 2)/degrees_of_freedom.to_f)) ** -((degrees_of_freedom + 1)/2.0)
|
|
||||||
|
|
||||||
left * right
|
|
||||||
end
|
|
||||||
|
|
||||||
def mean
|
|
||||||
0 if degrees_of_freedom > 1
|
|
||||||
end
|
|
||||||
|
|
||||||
def variance
|
|
||||||
if degrees_of_freedom > 1 && degrees_of_freedom <= 2
|
|
||||||
Float::INFINITY
|
|
||||||
elsif degrees_of_freedom > 2
|
|
||||||
degrees_of_freedom/(degrees_of_freedom - 2.0)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
# Quantile function extracted from http://www.jennessent.com/arcview/idf.htm
|
|
||||||
# TODO: Make it truly Student's T sample.
|
|
||||||
def random(elements: 1, seed: Random.new_seed)
|
|
||||||
warn 'This is an alpha version code. The generated sample is similar to an uniform distribution'
|
|
||||||
srand(seed)
|
|
||||||
|
|
||||||
v = degrees_of_freedom
|
|
||||||
results = []
|
|
||||||
|
|
||||||
# Because the Quantile function of a student-t distribution is between (-Infinity, y)
|
|
||||||
# we setup an small threshold in order to properly compute the integral
|
|
||||||
threshold = 10_000.0e-12
|
|
||||||
|
|
||||||
elements.times do
|
|
||||||
y = rand
|
|
||||||
results << Math.simpson_rule(threshold, y, 10_000) do |t|
|
|
||||||
up = Math.gamma((v+1)/2.0)
|
|
||||||
down = Math.sqrt(Math::PI * v) * Math.gamma(v/2.0)
|
|
||||||
right = (1 + ((y ** 2)/v.to_f)) ** ((v+1)/2.0)
|
|
||||||
left = up/down.to_f
|
|
||||||
|
|
||||||
left * right
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
if elements == 1
|
|
||||||
results.first
|
|
||||||
else
|
|
||||||
results
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,40 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
class Uniform
|
|
||||||
attr_accessor :left, :right
|
|
||||||
|
|
||||||
def initialize(a, b)
|
|
||||||
self.left = a.to_f
|
|
||||||
self.right = b.to_f
|
|
||||||
end
|
|
||||||
|
|
||||||
def density_function(value)
|
|
||||||
if value >= left && value <= right
|
|
||||||
1/(right - left)
|
|
||||||
else
|
|
||||||
0
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
def cumulative_function(value)
|
|
||||||
if value < left
|
|
||||||
0
|
|
||||||
elsif value >= left && value <= right
|
|
||||||
(value - left)/(right - left)
|
|
||||||
else
|
|
||||||
1
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
def mean
|
|
||||||
(1/2.0) * ( left + right )
|
|
||||||
end
|
|
||||||
alias_method :median, :mean
|
|
||||||
|
|
||||||
|
|
||||||
def variance
|
|
||||||
(1/12.0) * ( right - left ) ** 2
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,66 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module Distribution
|
|
||||||
class Weibull
|
|
||||||
attr_accessor :shape, :scale # k and lambda
|
|
||||||
|
|
||||||
def initialize(k, lamb)
|
|
||||||
self.shape = k.to_f
|
|
||||||
self.scale = lamb.to_f
|
|
||||||
end
|
|
||||||
|
|
||||||
def cumulative_function(random_value)
|
|
||||||
return 0 if random_value < 0
|
|
||||||
|
|
||||||
1 - Math.exp(-((random_value/scale) ** shape))
|
|
||||||
end
|
|
||||||
|
|
||||||
def density_function(value)
|
|
||||||
return if shape <= 0 || scale <= 0
|
|
||||||
return 0 if value < 0
|
|
||||||
|
|
||||||
left = shape/scale
|
|
||||||
center = (value/scale)**(shape - 1)
|
|
||||||
right = Math.exp(-((value/scale)**shape))
|
|
||||||
|
|
||||||
left * center * right
|
|
||||||
end
|
|
||||||
|
|
||||||
def mean
|
|
||||||
scale * Math.gamma(1 + (1/shape))
|
|
||||||
end
|
|
||||||
|
|
||||||
def mode
|
|
||||||
return 0 if shape <= 1
|
|
||||||
|
|
||||||
scale * (((shape - 1)/shape) ** (1/shape))
|
|
||||||
end
|
|
||||||
|
|
||||||
def variance
|
|
||||||
left = Math.gamma(1 + (2/shape))
|
|
||||||
right = Math.gamma(1 + (1/shape)) ** 2
|
|
||||||
|
|
||||||
(scale ** 2) * (left - right)
|
|
||||||
end
|
|
||||||
|
|
||||||
# Using the inverse CDF function, also called quantile, we can calculate
|
|
||||||
# a random sample that follows a weibull distribution.
|
|
||||||
#
|
|
||||||
# Formula extracted from https://www.taygeta.com/random/weibull.html
|
|
||||||
def random(elements: 1, seed: Random.new_seed)
|
|
||||||
results = []
|
|
||||||
|
|
||||||
srand(seed)
|
|
||||||
|
|
||||||
elements.times do
|
|
||||||
results << ((-1/scale) * Math.log(1 - rand)) ** (1/shape)
|
|
||||||
end
|
|
||||||
|
|
||||||
if elements == 1
|
|
||||||
results.first
|
|
||||||
else
|
|
||||||
results
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,71 +0,0 @@
|
||||||
module Statistics
|
|
||||||
class SpearmanRankCoefficient
|
|
||||||
def self.rank(data:, return_ranks_only: true)
|
|
||||||
descending_order_data = data.sort { |a, b| b <=> a }
|
|
||||||
rankings = {}
|
|
||||||
|
|
||||||
data.each do |value|
|
|
||||||
# If we have ties, the find_index method will only retrieve the index of the
|
|
||||||
# first element in the list (i.e, the most close to the left of the array),
|
|
||||||
# so when a tie is detected, we increase the temporal ranking by the number of
|
|
||||||
# counted elements at that particular time and then we increase the counter.
|
|
||||||
temporal_ranking = descending_order_data.find_index(value) + 1 # 0-index
|
|
||||||
|
|
||||||
if rankings.fetch(value, false)
|
|
||||||
rankings[value][:rank] += (temporal_ranking + rankings[value][:counter])
|
|
||||||
rankings[value][:counter] += 1
|
|
||||||
rankings[value][:tie_rank] = rankings[value][:rank] / rankings[value][:counter].to_f
|
|
||||||
else
|
|
||||||
rankings[value] = { counter: 1, rank: temporal_ranking, tie_rank: temporal_ranking }
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
if return_ranks_only
|
|
||||||
data.map do |value|
|
|
||||||
rankings[value][:tie_rank]
|
|
||||||
end
|
|
||||||
else
|
|
||||||
rankings
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
# Formulas extracted from: https://statistics.laerd.com/statistical-guides/spearmans-rank-order-correlation-statistical-guide.php
|
|
||||||
def self.coefficient(set_one, set_two)
|
|
||||||
raise 'Both group sets must have the same number of cases.' if set_one.size != set_two.size
|
|
||||||
return if set_one.size == 0 && set_two.size == 0
|
|
||||||
|
|
||||||
set_one_mean, set_two_mean = set_one.mean, set_two.mean
|
|
||||||
have_tie_ranks = (set_one + set_two).any? { |rank| rank.is_a?(Float) }
|
|
||||||
|
|
||||||
if have_tie_ranks
|
|
||||||
numerator = 0
|
|
||||||
squared_differences_set_one = 0
|
|
||||||
squared_differences_set_two = 0
|
|
||||||
|
|
||||||
set_one.size.times do |idx|
|
|
||||||
local_diff_one = (set_one[idx] - set_one_mean)
|
|
||||||
local_diff_two = (set_two[idx] - set_two_mean)
|
|
||||||
|
|
||||||
squared_differences_set_one += local_diff_one ** 2
|
|
||||||
squared_differences_set_two += local_diff_two ** 2
|
|
||||||
|
|
||||||
numerator += local_diff_one * local_diff_two
|
|
||||||
end
|
|
||||||
|
|
||||||
denominator = Math.sqrt(squared_differences_set_one * squared_differences_set_two)
|
|
||||||
|
|
||||||
numerator / denominator.to_f # This is rho or spearman's coefficient.
|
|
||||||
else
|
|
||||||
sum_squared_differences = set_one.each_with_index.reduce(0) do |memo, (rank_one, index)|
|
|
||||||
memo += ((rank_one - set_two[index]) ** 2)
|
|
||||||
memo
|
|
||||||
end
|
|
||||||
|
|
||||||
numerator = 6 * sum_squared_differences
|
|
||||||
denominator = ((set_one.size ** 3) - set_one.size)
|
|
||||||
|
|
||||||
1.0 - (numerator / denominator.to_f) # This is rho or spearman's coefficient.
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,11 +0,0 @@
|
||||||
Dir[File.dirname(__FILE__) + '/statistical_test/**/*.rb'].each {|file| require file }
|
|
||||||
|
|
||||||
module Statistics
|
|
||||||
module StatisticalTest
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
# If StatisticalTest is not defined, setup alias.
|
|
||||||
if defined?(Statistics) && !(defined?(StatisticalTest))
|
|
||||||
StatisticalTest = Statistics::StatisticalTest
|
|
||||||
end
|
|
|
@ -1,42 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module StatisticalTest
|
|
||||||
class ChiSquaredTest
|
|
||||||
def self.chi_statistic(expected, observed)
|
|
||||||
# If the expected is a number, we asumme that all expected observations
|
|
||||||
# has the same probability to occur, hence we expect to see the same number
|
|
||||||
# of expected observations per each observed value
|
|
||||||
statistic = if expected.is_a? Numeric
|
|
||||||
observed.reduce(0) do |memo, observed_value|
|
|
||||||
up = (observed_value - expected) ** 2
|
|
||||||
memo += (up/expected.to_f)
|
|
||||||
end
|
|
||||||
else
|
|
||||||
expected.each_with_index.reduce(0) do |memo, (expected_value, index)|
|
|
||||||
up = (observed[index] - expected_value) ** 2
|
|
||||||
memo += (up/expected_value.to_f)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
[statistic, observed.size - 1]
|
|
||||||
end
|
|
||||||
|
|
||||||
def self.goodness_of_fit(alpha, expected, observed)
|
|
||||||
chi_score, df = *self.chi_statistic(expected, observed) # Splat array result
|
|
||||||
|
|
||||||
return if chi_score.nil? || df.nil?
|
|
||||||
|
|
||||||
probability = Distribution::ChiSquared.new(df).cumulative_function(chi_score)
|
|
||||||
p_value = 1 - probability
|
|
||||||
|
|
||||||
# According to https://stats.stackexchange.com/questions/29158/do-you-reject-the-null-hypothesis-when-p-alpha-or-p-leq-alpha
|
|
||||||
# We can assume that if p_value <= alpha, we can safely reject the null hypothesis, ie. accept the alternative hypothesis.
|
|
||||||
{ probability: probability,
|
|
||||||
p_value: p_value,
|
|
||||||
alpha: alpha,
|
|
||||||
null: alpha < p_value,
|
|
||||||
alternative: p_value <= alpha,
|
|
||||||
confidence_level: 1 - alpha }
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,83 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module StatisticalTest
|
|
||||||
class FTest
|
|
||||||
# This method calculates the one-way ANOVA F-test statistic.
|
|
||||||
# We assume that all specified arguments are arrays.
|
|
||||||
# It returns an array with three elements:
|
|
||||||
# [F-statistic or F-score, degrees of freedom numerator, degrees of freedom denominator].
|
|
||||||
#
|
|
||||||
# Formulas extracted from:
|
|
||||||
# https://courses.lumenlearning.com/boundless-statistics/chapter/one-way-anova/
|
|
||||||
# http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_HypothesisTesting-ANOVA/BS704_HypothesisTesting-Anova_print.html
|
|
||||||
def self.anova_f_score(*args)
|
|
||||||
# If only two groups have been specified as arguments, we follow the classic F-Test for
|
|
||||||
# equality of variances, which is the ratio between the variances.
|
|
||||||
f_score = nil
|
|
||||||
df1 = nil
|
|
||||||
df2 = nil
|
|
||||||
|
|
||||||
if args.size == 2
|
|
||||||
variances = [args[0].variance, args[1].variance]
|
|
||||||
|
|
||||||
f_score = variances.max/variances.min.to_f
|
|
||||||
df1 = 1 # k-1 (k = 2)
|
|
||||||
df2 = args.flatten.size - 2 # N-k (k = 2)
|
|
||||||
elsif args.size > 2
|
|
||||||
total_groups = args.size
|
|
||||||
total_elements = args.flatten.size
|
|
||||||
overall_mean = args.flatten.mean
|
|
||||||
|
|
||||||
sample_sizes = args.map(&:size)
|
|
||||||
sample_means = args.map(&:mean)
|
|
||||||
sample_stds = args.map(&:standard_deviation)
|
|
||||||
|
|
||||||
# Variance between groups
|
|
||||||
iterator = sample_sizes.each_with_index
|
|
||||||
|
|
||||||
variance_between_groups = iterator.reduce(0) do |summation, (size, index)|
|
|
||||||
inner_calculation = size * ((sample_means[index] - overall_mean) ** 2)
|
|
||||||
|
|
||||||
summation += (inner_calculation / (total_groups - 1).to_f)
|
|
||||||
end
|
|
||||||
|
|
||||||
# Variance within groups
|
|
||||||
variance_within_groups = (0...total_groups).reduce(0) do |outer_summation, group_index|
|
|
||||||
outer_summation += args[group_index].reduce(0) do |inner_sumation, observation|
|
|
||||||
inner_calculation = ((observation - sample_means[group_index]) ** 2)
|
|
||||||
inner_sumation += (inner_calculation / (total_elements - total_groups).to_f)
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
f_score = variance_between_groups/variance_within_groups.to_f
|
|
||||||
df1 = total_groups - 1
|
|
||||||
df2 = total_elements - total_groups
|
|
||||||
end
|
|
||||||
|
|
||||||
[f_score, df1, df2]
|
|
||||||
end
|
|
||||||
|
|
||||||
# This method expects the alpha value and the groups to calculate the one-way ANOVA test.
|
|
||||||
# It returns a hash with multiple information and the test result (if reject the null hypotesis or not).
|
|
||||||
# Keep in mind that the values for the alternative key (true/false) does not imply that the alternative hypothesis
|
|
||||||
# is TRUE or FALSE. It's a minor notation advantage to decide if reject the null hypothesis or not.
|
|
||||||
|
|
||||||
def self.one_way_anova(alpha, *args)
|
|
||||||
f_score, df1, df2 = *self.anova_f_score(*args) # Splat array result
|
|
||||||
|
|
||||||
return if f_score.nil? || df1.nil? || df2.nil?
|
|
||||||
|
|
||||||
probability = Distribution::F.new(df1, df2).cumulative_function(f_score)
|
|
||||||
p_value = 1 - probability
|
|
||||||
|
|
||||||
# According to https://stats.stackexchange.com/questions/29158/do-you-reject-the-null-hypothesis-when-p-alpha-or-p-leq-alpha
|
|
||||||
# We can assume that if p_value <= alpha, we can safely reject the null hypothesis, ie. accept the alternative hypothesis.
|
|
||||||
{ probability: probability,
|
|
||||||
p_value: p_value,
|
|
||||||
alpha: alpha,
|
|
||||||
null: alpha < p_value,
|
|
||||||
alternative: p_value <= alpha,
|
|
||||||
confidence_level: 1 - alpha }
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,70 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module StatisticalTest
|
|
||||||
class KolmogorovSmirnovTest
|
|
||||||
# Common alpha, and critical D are calculated following formulas from: https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov%E2%80%93Smirnov_test
|
|
||||||
def self.two_samples(group_one:, group_two:, alpha: 0.05)
|
|
||||||
samples = group_one + group_two # We can use unbalaced group samples
|
|
||||||
|
|
||||||
ecdf_one = Distribution::Empirical.new(samples: group_one)
|
|
||||||
ecdf_two = Distribution::Empirical.new(samples: group_two)
|
|
||||||
|
|
||||||
d_max = samples.sort.map do |sample|
|
|
||||||
d1 = ecdf_one.cumulative_function(x: sample)
|
|
||||||
d2 = ecdf_two.cumulative_function(x: sample)
|
|
||||||
|
|
||||||
(d1 - d2).abs
|
|
||||||
end.max
|
|
||||||
|
|
||||||
# TODO: Validate calculation of Common alpha.
|
|
||||||
common_alpha = Math.sqrt((-0.5 * Math.log(alpha)))
|
|
||||||
radicand = (group_one.size + group_two.size) / (group_one.size * group_two.size).to_f
|
|
||||||
|
|
||||||
critical_d = common_alpha * Math.sqrt(radicand)
|
|
||||||
# critical_d = self.critical_d(alpha: alpha, n: samples.size)
|
|
||||||
|
|
||||||
# We are unable to calculate the p_value, because we don't have the Kolmogorov distribution
|
|
||||||
# defined. We reject the null hypotesis if Dmax is > than Dcritical.
|
|
||||||
{ d_max: d_max,
|
|
||||||
d_critical: critical_d,
|
|
||||||
total_samples: samples.size,
|
|
||||||
alpha: alpha,
|
|
||||||
null: d_max <= critical_d,
|
|
||||||
alternative: d_max > critical_d,
|
|
||||||
confidence_level: 1.0 - alpha }
|
|
||||||
end
|
|
||||||
|
|
||||||
# This is an implementation of the formula presented by Paul Molin and Hervé Abdi in a paper,
|
|
||||||
# called "New Table and numerical approximations for Kolmogorov-Smirnov / Lilliefors / Van Soest
|
|
||||||
# normality test".
|
|
||||||
# In this paper, the authors defines a couple of 6th-degree polynomial functions that allow us
|
|
||||||
# to find an aproximation of the real critical value. This is based in the conclusions made by
|
|
||||||
# Dagnelie (1968), where indicates that critical values given by Lilliefors can be approximated
|
|
||||||
# numerically.
|
|
||||||
#
|
|
||||||
# In general, the formula found is:
|
|
||||||
# C(N, alpha) ^ -2 = A(alpha) * N + B(alpha).
|
|
||||||
#
|
|
||||||
# Where A(alpha), B(alpha) are two 6th degree polynomial functions computed using the principle
|
|
||||||
# of Monte Carlo simulations.
|
|
||||||
#
|
|
||||||
# paper can be found here: https://utdallas.edu/~herve/MolinAbdi1998-LillieforsTechReport.pdf
|
|
||||||
# def self.critical_d(alpha:, n:)
|
|
||||||
# confidence = 1.0 - alpha
|
|
||||||
|
|
||||||
# a_alpha = 6.32207539843126 -17.1398870006148 * confidence +
|
|
||||||
# 38.42812675101057 * (confidence ** 2) - 45.93241384693391 * (confidence ** 3) +
|
|
||||||
# 7.88697700041829 * (confidence ** 4) + 29.79317711037858 * (confidence ** 5) -
|
|
||||||
# 18.48090137098585 * (confidence ** 6)
|
|
||||||
|
|
||||||
# b_alpha = 12.940399038404 - 53.458334259532 * confidence +
|
|
||||||
# 186.923866119699 * (confidence ** 2) - 410.582178349305 * (confidence ** 3) +
|
|
||||||
# 517.377862566267 * (confidence ** 4) - 343.581476222384 * (confidence ** 5) +
|
|
||||||
# 92.123451358715 * (confidence ** 6)
|
|
||||||
|
|
||||||
# Math.sqrt(1.0 / (a_alpha * n + b_alpha))
|
|
||||||
# end
|
|
||||||
end
|
|
||||||
|
|
||||||
KSTest = KolmogorovSmirnovTest # Alias
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,92 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module StatisticalTest
|
|
||||||
class TTest
|
|
||||||
# Errors for Zero std
|
|
||||||
class ZeroStdError < StandardError
|
|
||||||
STD_ERROR_MSG = 'Standard deviation for the difference or group is zero. Please, reconsider sample contents'.freeze
|
|
||||||
end
|
|
||||||
|
|
||||||
# Perform a T-Test for one or two samples.
|
|
||||||
# For the tails param, we need a symbol: :one_tail or :two_tail
|
|
||||||
def self.perform(alpha, tails, *args)
|
|
||||||
return if args.size < 2
|
|
||||||
|
|
||||||
degrees_of_freedom = 0
|
|
||||||
|
|
||||||
# If the comparison mean has been specified
|
|
||||||
t_score = if args[0].is_a? Numeric
|
|
||||||
data_mean = args[1].mean
|
|
||||||
data_std = args[1].standard_deviation
|
|
||||||
|
|
||||||
raise ZeroStdError, ZeroStdError::STD_ERROR_MSG if data_std == 0
|
|
||||||
|
|
||||||
comparison_mean = args[0]
|
|
||||||
degrees_of_freedom = args[1].size
|
|
||||||
|
|
||||||
(data_mean - comparison_mean)/(data_std / Math.sqrt(args[1].size).to_f).to_f
|
|
||||||
else
|
|
||||||
sample_left_mean = args[0].mean
|
|
||||||
sample_left_variance = args[0].variance
|
|
||||||
sample_right_variance = args[1].variance
|
|
||||||
sample_right_mean = args[1].mean
|
|
||||||
degrees_of_freedom = args.flatten.size - 2
|
|
||||||
|
|
||||||
left_root = sample_left_variance/args[0].size.to_f
|
|
||||||
right_root = sample_right_variance/args[1].size.to_f
|
|
||||||
|
|
||||||
standard_error = Math.sqrt(left_root + right_root)
|
|
||||||
|
|
||||||
(sample_left_mean - sample_right_mean).abs/standard_error.to_f
|
|
||||||
end
|
|
||||||
|
|
||||||
t_distribution = Distribution::TStudent.new(degrees_of_freedom)
|
|
||||||
probability = t_distribution.cumulative_function(t_score)
|
|
||||||
|
|
||||||
# Steps grabbed from https://support.minitab.com/en-us/minitab/18/help-and-how-to/statistics/basic-statistics/supporting-topics/basics/manually-calculate-a-p-value/
|
|
||||||
# See https://github.com/estebanz01/ruby-statistics/issues/23
|
|
||||||
p_value = if tails == :two_tail
|
|
||||||
2 * (1 - t_distribution.cumulative_function(t_score.abs))
|
|
||||||
else
|
|
||||||
1 - probability
|
|
||||||
end
|
|
||||||
|
|
||||||
{ t_score: t_score,
|
|
||||||
probability: probability,
|
|
||||||
p_value: p_value,
|
|
||||||
alpha: alpha,
|
|
||||||
null: alpha < p_value,
|
|
||||||
alternative: p_value <= alpha,
|
|
||||||
confidence_level: 1 - alpha }
|
|
||||||
end
|
|
||||||
|
|
||||||
def self.paired_test(alpha, tails, left_group, right_group)
|
|
||||||
raise StandardError, 'both samples are the same' if left_group == right_group
|
|
||||||
|
|
||||||
# Handy snippet grabbed from https://stackoverflow.com/questions/2682411/ruby-sum-corresponding-members-of-two-or-more-arrays
|
|
||||||
differences = [left_group, right_group].transpose.map { |value| value.reduce(:-) }
|
|
||||||
|
|
||||||
degrees_of_freedom = differences.size - 1
|
|
||||||
difference_std = differences.standard_deviation
|
|
||||||
|
|
||||||
raise ZeroStdError, ZeroStdError::STD_ERROR_MSG if difference_std == 0
|
|
||||||
|
|
||||||
down = difference_std/Math.sqrt(differences.size)
|
|
||||||
|
|
||||||
t_score = (differences.mean - 0)/down.to_f
|
|
||||||
|
|
||||||
probability = Distribution::TStudent.new(degrees_of_freedom).cumulative_function(t_score)
|
|
||||||
|
|
||||||
p_value = 1 - probability
|
|
||||||
p_value *= 2 if tails == :two_tail
|
|
||||||
|
|
||||||
{ t_score: t_score,
|
|
||||||
probability: probability,
|
|
||||||
p_value: p_value,
|
|
||||||
alpha: alpha,
|
|
||||||
null: alpha < p_value,
|
|
||||||
alternative: p_value <= alpha,
|
|
||||||
confidence_level: 1 - alpha }
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,95 +0,0 @@
|
||||||
module Statistics
|
|
||||||
module StatisticalTest
|
|
||||||
class WilcoxonRankSumTest
|
|
||||||
def rank(elements)
|
|
||||||
ranked_elements = {}
|
|
||||||
|
|
||||||
elements.sort.each_with_index do |element, index|
|
|
||||||
if ranked_elements.fetch(element, false)
|
|
||||||
# This allow us to solve the ties easily when performing the rank summation per group
|
|
||||||
ranked_elements[element][:counter] += 1
|
|
||||||
ranked_elements[element][:rank] += (index + 1)
|
|
||||||
else
|
|
||||||
ranked_elements[element] = { counter: 1, rank: (index + 1) }
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
# ranked_elements = [{ x => { counter: 1, rank: y } ]
|
|
||||||
ranked_elements
|
|
||||||
end
|
|
||||||
|
|
||||||
# Steps to perform the calculation are based on http://www.mit.edu/~6.s085/notes/lecture5.pdf
|
|
||||||
def perform(alpha, tails, group_one, group_two)
|
|
||||||
# Size for each group
|
|
||||||
n1, n2 = group_one.size, group_two.size
|
|
||||||
|
|
||||||
# Rank all data
|
|
||||||
total_ranks = rank(group_one + group_two)
|
|
||||||
|
|
||||||
# sum rankings per group
|
|
||||||
r1 = ranked_sum_for(total_ranks, group_one)
|
|
||||||
r2 = ranked_sum_for(total_ranks, group_two)
|
|
||||||
|
|
||||||
# calculate U statistic
|
|
||||||
u1 = (n1 * (n1 + 1)/2.0) - r1
|
|
||||||
u2 = (n2 * (n2 + 1)/2.0 ) - r2
|
|
||||||
|
|
||||||
u_statistic = [u1.abs, u2.abs].min
|
|
||||||
|
|
||||||
median_u = (n1 * n2)/2.0
|
|
||||||
|
|
||||||
ties = total_ranks.values.select { |element| element[:counter] > 1 }
|
|
||||||
|
|
||||||
std_u = if ties.size > 0
|
|
||||||
corrected_sigma(ties, n1, n2)
|
|
||||||
else
|
|
||||||
Math.sqrt((n1 * n2 * (n1 + n2 + 1))/12.0)
|
|
||||||
end
|
|
||||||
|
|
||||||
z = (u_statistic - median_u)/std_u
|
|
||||||
|
|
||||||
# Most literature are not very specific about the normal distribution to be used.
|
|
||||||
# We ran multiple tests with a Normal(median_u, std_u) and Normal(0, 1) and we found
|
|
||||||
# the latter to be more aligned with the results.
|
|
||||||
probability = Distribution::StandardNormal.new.cumulative_function(z.abs)
|
|
||||||
p_value = 1 - probability
|
|
||||||
p_value *= 2 if tails == :two_tail
|
|
||||||
|
|
||||||
{ probability: probability,
|
|
||||||
u: u_statistic,
|
|
||||||
z: z,
|
|
||||||
p_value: p_value,
|
|
||||||
alpha: alpha,
|
|
||||||
null: alpha < p_value,
|
|
||||||
alternative: p_value <= alpha,
|
|
||||||
confidence_level: 1 - alpha }
|
|
||||||
end
|
|
||||||
|
|
||||||
# Formula extracted from http://www.statstutor.ac.uk/resources/uploaded/mannwhitney.pdf
|
|
||||||
private def corrected_sigma(ties, total_group_one, total_group_two)
|
|
||||||
n = total_group_one + total_group_two
|
|
||||||
|
|
||||||
rank_sum = ties.reduce(0) do |memo, t|
|
|
||||||
memo += ((t[:counter] ** 3) - t[:counter])/12.0
|
|
||||||
end
|
|
||||||
|
|
||||||
left = (total_group_one * total_group_two)/(n * (n - 1)).to_f
|
|
||||||
right = (((n ** 3) - n)/12.0) - rank_sum
|
|
||||||
|
|
||||||
Math.sqrt(left * right)
|
|
||||||
end
|
|
||||||
|
|
||||||
private def ranked_sum_for(total, group)
|
|
||||||
# sum rankings per group
|
|
||||||
group.reduce(0) do |memo, element|
|
|
||||||
rank_of_element = total[element][:rank] / total[element][:counter].to_f
|
|
||||||
memo += rank_of_element
|
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
# Both test are the same. To keep the selected name, we just alias the class
|
|
||||||
# with the implementation.
|
|
||||||
MannWhitneyU = WilcoxonRankSumTest
|
|
||||||
end
|
|
||||||
end
|
|
|
@ -1,3 +0,0 @@
|
||||||
module Statistics
|
|
||||||
VERSION = "2.1.1"
|
|
||||||
end
|
|
|
@ -1,34 +0,0 @@
|
||||||
# coding: utf-8
|
|
||||||
lib = File.expand_path("../lib", __FILE__)
|
|
||||||
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
|
||||||
require "statistics/version"
|
|
||||||
|
|
||||||
Gem::Specification.new do |spec|
|
|
||||||
spec.name = "ruby-statistics"
|
|
||||||
spec.version = Statistics::VERSION
|
|
||||||
spec.authors = ["esteban zapata"]
|
|
||||||
spec.email = ["estebanz01@outlook.com"]
|
|
||||||
|
|
||||||
spec.summary = %q{A ruby gem for som specific statistics. Inspired by the jStat js library.}
|
|
||||||
spec.description = %q{This gem is intended to accomplish the same purpose as jStat js library:
|
|
||||||
to provide ruby with statistical capabilities without the need
|
|
||||||
of a statistical programming language like R or Octave. Some functions
|
|
||||||
and capabilities are an implementation from other authors and are
|
|
||||||
referenced properly in the class/method.}
|
|
||||||
spec.homepage = "https://github.com/estebanz01/ruby-statistics"
|
|
||||||
spec.license = "MIT"
|
|
||||||
|
|
||||||
# Prevent pushing this gem to RubyGems.org. To allow pushes either set the 'allowed_push_host'
|
|
||||||
# to allow pushing to a single host or delete this section to allow pushing to any host.
|
|
||||||
spec.files = `git ls-files -z`.split("\x0").reject do |f|
|
|
||||||
f.match(%r{^(test|spec|features)/})
|
|
||||||
end
|
|
||||||
spec.bindir = "exe"
|
|
||||||
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
|
||||||
spec.require_paths = ["lib"]
|
|
||||||
|
|
||||||
spec.add_development_dependency "rake", '~> 12.0', '>= 12.0.0'
|
|
||||||
spec.add_development_dependency "rspec", '~> 3.6', '>= 3.6.0'
|
|
||||||
spec.add_development_dependency "grb", '~> 0.4.1', '>= 0.4.1'
|
|
||||||
spec.add_development_dependency 'byebug', '~> 9.1.0', '>= 9.1.0'
|
|
||||||
end
|
|
Loading…
Reference in a new issue