Wednesday, November 29, 2006

Problems with ANOSIM

What I have been doing of late is trying to work on a revision of a paper I wrote on the subterranean probe. One of the reviewers suggested I use a program called ANOSIM, which is an analysis of similarity which gives you a significance value. Money is always a problem in our lab and apparently the best place to get this program is through a very expensive software package called PRIMER, but we eventually found a free package called PAST that includes it. Great. The thing about ANOSIM is that you can choose which distance measure you want to use -- Bray-Curtis, Jaccard's, Raup-Crick, etc. etc. But we are having trouble finding any sort of discussion about how you choose the distance measure. The one thing we were able to find says that Bray-Curtis should be used for abundance data, not presence-absence data (which is what we have). Bray-Curtis seems to be the default, and most of the published papers we have been able to find use it (including some with presence-absence data), but none of them seem to include an explanation of why they chose it. And the distance measures which appear to be more appropriate for presence-absence data seem to give wildly different results. So we're kind of at a loss. Does anybody out there understand this stuff?


  1. I am facing the same problem. PRIMER is the best stats package for what I need to do, but it's extremely expensive. I figured out an array formula in Excel that will calculate Bray-Curtis similarity indices, and the vegan package in the freely distributed R statistical package calculates ANOSIM. I'm trying to figure out ANOSIM with R at the moment. If you're still working on this dataset (probably doubtful after 9 months) or if you've solved your problem, please contact me: djlohman (at) gmail

  2. ANOSIM is available through the free program R.

    It is part of the package VEGAN.

    If you are new to R, the package RCMDR is a very useful GUI interface that helps with (among many other things) input and output.

    BC distance is great for community data because it doesn't give too much weight to unobserved taxa, which is a problem with jaccard coefficient and other measures.

  3. Raphael is spot on!

    Bray Curtis ignores joint absences, which are predominant, but add little value. The key reference is this

    Field, J. G., Clarke, K. R., Warwick, R. M. (1982). A practical
    strategy for analysing multispecies distribution patterns.
    Mar. Ecol. Prog. Ser. 8: 37-52

    quote from the paper:
    'estuarine and abyssal samples are [no
    more] similar because both lack outer shelf species'

  4. Hi Kari,

    Although it has been a while, I thought I would drop a question about this topic as it seems you are still using the blog.
    I am doing my PhD now and I will need to compute some analyses of distance between my samples (chromatographic profiles of odours). I have been advised to use ANOSIM but once I checked on the R package, I found that some R users warn against it:
    “I don't quite trust this method. Somebody should study its performance carefully. The function returns a lot of information to ease further scrutiny. Most ANOSIM models could be analysed with ADONIS which seems to be a more robust alternative”.

    I was hoping you could tell me what you ended up using and if you had any major issues using ANOSIM.

    Jerome (Australia)

  5. Hi Kari

    Jerome again.
    Sorry I forgot to leave a contact detail.
    If you (or anyone out there) have any info for me, please feel free to contact me at
    mardoj01 'arobas'


  6. I would strongly suggest you use the "vegan" package in R. You combine this with the "XlsReadWrite" function you can input excel files making your life easier (if that is how you have your spreadsheets.) This is free and you can use the adonis or anosim functions. If you have never used R there is a bit of a learning curve but taking the time to learn now would greatly benefit you. There are a ton of tutorials on the web (for example:

  7. For questions on which similarity index would better fit an ecological experiment check what one of the creators of PRIMER (Bob Clarke) wrote in:
    *K. Robert Clarke, Paul J. Somerfield, M. Gee Chapman. 2006.
    On resemblance measures for ecological studies,
    including taxonomic dissimilarities and a zero-adjusted
    Bray–Curtis coefficient for denuded assemblages. Journal of Experimental Marine Biology and Ecology 330:55–80.
    *K. R. Clarke, M. G. Chapman, P. J. Somerfield, H. R. Needham. 2006. Dispersion-based weighting of species counts
    in assemblage analyses. MARINE ECOLOGY PROGRESS SERIES 320:11-27

  8. Hi!

    I have also been looking for info on distance measures for pres/abs data, specifically how the presence of rare species can skew results. I just wanted to say, I found more help from your blog and the comments posted in reply than anywhere else - thanks!

  9. Hi,

    I've been trying to make experiments on the ANOSIM fonction, I installed the package VEGAN, PRIMER and deSolve on R Gui: Nor anosim , adonis or mrpp funtion are present. I cannot get the documentions of theses functions. Anyone of you might know if the funtions are still part of the packages.
    Until now I used multiple way anovas with permutations to make comparisons. I 'd like to compare with some other functions so if anyone can help me to get them...

    Thank you

  10. ANOSIM is still available in VEGAN.

    Here is an example on how to perform a test using the function.

    read.table("matrix.txt") -> m
    m.vegdist <- vegdist(m)
    m.ano <- anosim(m)

    If anyone out there has experience using ADONIS, I would appreciate any comments on its performance..


  11. See Clarke 2006 fos a discussion on zero-adjusted Bay curtis similarity, all you have to achieve this new measure is add a dummy variable to your data with the value 1 (unless you are using data transformations - see the paper)

  12. I'm going to do Anosim for my PhD on Forensic Entomology and I have no idea about using this statistical method. My experiment is using Rabbit carcases in 5 sites we put ten rabbit carcases and two carases will be examined at 2nd,4th,7th,15th day to see number and species of flies grow on the carcases. which i uses for row data flies species or rabbit carcases

  13. Hey Anonymous - ANOSIM tests for simple differences in independent groups and is not designed for the study design that you are describing. Permutations in ANOSIM assume that the samples are independent. You need something that accounts for time when permuting. It wouldn't make sense to shuffle a sample at site 1 in day 2 with a sample at site 5 in day 15 in your null hypothesis would it? If your going to use a randomization test at all, you need to make sure that when the data is shuffled it preserves either the effects of time if you're looking at differences between sites or the effects of site if you're looking at the effects of time.

  14. Does anyone know how you format the spreadsheet for anosim in R? I am not sure how to add the categorical groupings... THANKS!

  15. Hello:
    I was also recommended to use ANOSIM for one of my analysis, there is a free software called BIOdivertyPRO but it has not user manual and there is no way to change the distance measure which seems Bray-Curtis.
    I was recommended to use the one in R, that has the possibility to choose a distance measure. I'm gonna use Chao, since I have lots of rare species.
    Thanks for the advise, I am new in R and I'm gonna look for that program PAST

  16. This is an update to Santiago's post (above) which will hopefully help people format their data properly for loading into R and using in anosim. The code is below. A "##" in front of and behind text are notes.


    ## for your species matrix, you want each column to be a different species. Each row is then the abundance or presence/absence of that species at each of your sites. Now load the matrix data and call it "m".##

    m <- read.csv("matrix.csv", header=TRUE)
    m.vegdist <- vegdist(m)

    ## Now you need to load a file with the factors for your matrix. This is a separate file from the "matrix.csv" above. This new file should be formatted with each factor from your study sites as a column (pH, temperature, elevation -- or whatever your factors are). Below each factor, enter a value in each row for each of your sites. Be sure that you keep the same order for your sites as you did in your matrix.csv file. Now load the file and attach it.##

    m.env<-read.csv("factors.csv", header=TRUE)

    ## now run anosim with whatever factor you want to analyze (in this case, "temperature") and get a summary and plot. Good luck! ##

    m.ano <- anosim(m.vegdist, temperature)

  17. Hello,
    My Phd is based on Ecosystem functions in tropical ants.I have just read one of your papers Kari T. Ryder Wilkie on species diversity and distribution in Ecudaor very relevant for me.I'm lucky of having Primer so no need for R :). Anyway I work with a range of different baits and I'm using ANOSIM to compare the ant populations from each type of bait to see if the ants attracted are significantly different. Do you think its ok? Plus I have transformed my data to reduce differences when massive recruitment occur so its not pure abundance.. Any thoughts on this are welcome

  18. Thank you Anonymous 3:49. I had to do this analysis at the last minute before a meeting and your notes made it possible. Any chance you would consider making a career of this type of thing? :)

  19. Does anyone know if Bray Curtis takes the sample size into account or not? I need an index that does not take the sample size into account.... Thanks in advance!