Mean Value from Grouped Data
Occasionally, I will get requests from clients to calculate the mean. Most of the time it’s a simple request but from time-to-time the data was originally from grouped data. A common approach is to...
View ArticleEstimating Pi
Recently I’ve been working on some jackknife and bootstrapping problems. While working on those projects I figured it would be a fun distraction to take the process and estimate pi. I’m sure this...
View ArticleTrue Significance of a T Statistic
The example is more of a statistical exercise that shows the true significance and the density curve of simulated random normal data. The code can be changed to generate data using either a different...
View ArticleY2K38: Our Own Mayan Calendar…Again
It’s not quite the end of the world as we know it. We made it through December 21, 2012 unscathed. It’s not going to be the last time we will make it through such a pseudo-calamity. After all we have...
View ArticleBinomial Confidence Intervals
This stems from a couple of binomial distribution projects I have been working on recently. It’s widely known that there are many different flavors of confidence intervals for the binomial...
View ArticleBootstrap Confidence Intervals
Here is an example of nonparametric bootstrapping. It’s a powerful technique that is similar to the Jackknife. With the bootstrap, however, the approach uses re-sampling. It’s clearly not as good as...
View ArticleDistribution of T-Scores
Like most of my post these code snippets derive from various other projects. In this example it shows a simulation of how one can determine if a set of t statistics are distributed properly. This can...
View ArticleSimulating Random Multivariate Correlated Data (Continuous Variables)
This is a repost of an example that I posted last year but at the time I only had the PDF document (written in ). I’m reposting it directly into WordPress and I’m including the graphs. From...
View ArticleSimulating Random Multivariate Correlated Data (Categorical Variables)
This is a repost of the second part of an example that I posted last year but at the time I only had the PDF document (written in ). This is the second example to generate multivariate random...
View ArticleSignificant P-Values and Overlapping Confidence Intervals
There are all sorts of problems with p-values and confidence intervals and I have no intention (or the time) to cover all those problems right now. However, a big problem is that most people have no...
View ArticleDirichlet Process, Infinite Mixture Models, and Clustering
The Dirichlet process provides a very interesting approach to understand group assignments and models for clustering effects. Often time we encounter the k-means approach. However, it is necessary...
View ArticleFinding the Distribution Parameters
This is a brief description on one way to determine the distribution of given data. There are several ways to accomplish this in R especially if one is trying to determine if the data comes from a...
View ArticleSimulating the Gambler’s Ruin
The gambler’s ruin problem is one where a player has a probability p of winning and probability q of losing. For example let’s take a skill game where the player x can beat player y with probability...
View ArticleAmazon AWS Summit 2013
I was fortunate enough to have been able to attend the Amazon AWS Summit in NYC and to listen to Werner Vogels give the keynote. I will share a few of my thoughts on the AWS 2013 Summit and some of my...
View ArticleFree e-Copy of Bayesian Computation with R (Use R)
Amazon is currently making the first edition of Bayesian Computation with R (Use R) by Jim Albert available for free on Kindle. I own a copy of the book and there is a lot of good content and R...
View ArticleA Brief Tour of the Trees and Forests
Tree methods such as CART (classification and regression trees) can be used as alternatives to logistic regression. It is a way that can be used to show the probability of being in any hierarchical...
View ArticleWill Mu Go Out With Median
True story (no really, this did actually happen). While in grad school one of the other teaching assistants was approached by one of the students and was asked “will mu go out with median?” The...
View ArticleHey, I Just did a Significance Test!
I’ve seen it happens quite often. The sig test. Somebody simply needs to know the p-value and that one number will provide all of the information about the study that they need to know. The dataset is...
View ArticleLatent Class Modeling Election Data
Latent class analysis is a useful tool that is used to identify groups within multivariate categorical data. An example of this is the likert scale. In categorical language these groups are known as...
View ArticleSoftware Packages for Graphs and Charts
Graphs can be an important feature of analysis. A graph that has been well designed and put together can make summary statistics much more readable and increase the interpretability. It also makes...
View Article