2016 Mar 09

Introduction and Doubts

...ferent is
different from what taught in theory.I am also working on R&D on "Hybrid
Techniques for Intrusion Detection using Data Mining and Clustering on
Newer Datasets".
Taking initial look at the docsim folder in xapian-core.
These are my insights
The clustering used is Single Link Agglomerative Hierarchical clustering.
Its Time Complexity is O(n^2) for n=number of documents.
At first Choosing K-means seems to be viable solution as K-Means has O(n)
Time Complexity.
But it has various Shortcomings
1) The learning algorithm requires apriori specification of the number of
cluster centers...

2016 Mar 10

Introduction and Doubts

...o xapian project.
sorry if that was against the rules
The algorithm is not developed by me but after having much research on
various clustering techniques.
I found that there is a new algorithm called CLUBS(Clustering Using Binary
Splitting) which gives better results than kmeans++ and hierarchical
agglomerative clustering.
It is faster and produces good results based on various metrics of cluster
quality.
the algorithm works in following way
The first phase of the algorithm is
divisive, as the original data set(in this case, set of search documents to
cluster) is split recursively into miniclusters...

2003 Dec 03

non-uniqueness in cluster analysis

Hi,
I'm clustering objects defined by categorical variables with a hierarchical
algorithm - average linkage.
My distance matrix (general dissimilarity coefficient) includes several
distances with exactly the same values.
As I see, a standard agglomerative procedure ignores this problems, simply
selecting, above equal distances, the one that comes first.
For this reason the analysis in output depends strongly on the orderings of
the objects within the raw data matrix.
Is there a standard procedure to deal with this?
Thanks
Bruno

2017 Mar 09

GSoC 2017 Project Proposal

Hello devs.
I would like to propose how I plan to go about improving and getting a
system that can be integrated into Xapian in this GSoC for the clustering
branch.
I have identified three areas of work which were not touched last time.
1) Automated Performance Analysis
I had roughly implemented 2 evaluation techniques previously (Distance b/w
document and centroids within clusters and

2005 Nov 02

x/y coordinates of dendrogram branches

Dear R-users,
I need some help concerning the plotting of dendrograms for hierarchical
agglomerative clustering.
The agglomeration niveau of each step should be displayed at the
branches of the dendrogram.
For this I need the x/y coordinates of the branch-agglomerations of the
dendrogram.
The y-values are known (the heights of the agglomeration), but how can I
get the x-values?
>...

2002 Apr 29

cluster analyses

...ather large data sets and would like to cut the dendrograms
to get a better view of specific components. I calculate the dissimilarity
matrix using daisy() because I have a mixture of variable types: factors,
ordered factors and numerical variables. If I want one dendrogram, I use
agnes() for the agglomerative nesting and pltree() to draw the dendrogram.
That way, I get the row names as labels, but I can't cut the tree.
Alternatively, I use hclust() on the dissimilarity matrix from daisy().
This allows me to cut the dendrogram with cutree(), but I loose the labels,
so that isn't much use....

2008 Feb 28

intermediate linkage clustering

Does anybody know how to do a intermediate linkage clustering in R?
Is there a command allready developed? Which package do I have to load?
Thanks in advance
yvo
--
2009 Dec 04

flexmix and mclust help

...really know what I'm doing.
Can anyone give me a simple example of how take a vector of data and ask it
to perform a gamma mixture EM on it using flexmix, or even more ideally, how
to use do either lognormal or gamma mixtures? Also, are there any lists of
examples that people can use to test agglomerative hclust methods on this
same data (instead of using kmeans) as initialization for these EM methods?
Thanks a lot for your help.
--
2013 Mar 10

RGL plot not working right..

Hi .. i'm Carey.. trying to figure out how to get this vol surface
correct.. not sure where i'm going wrong..
http://pastebin.com/mmA4m4FJ
[[alternative HTML version deleted]]

2016 Mar 10

Introduction and Doubts

...oposal.
>
> > The algorithm is not developed by me but after having much research
> > on various clustering techniques. I found that there is a new
> > algorithm called CLUBS(Clustering Using Binary Splitting) which
> > gives better results than kmeans++ and hierarchical agglomerative
> > clustering. It is faster and produces good results based on various
> > metrics of cluster quality.
>
> I've only skimmed the paper for now, but it certainly looks
> interesting. Do you have a reason for picking TFIDF for feature
> extraction? Are there other a...