Comments on: LTE: Letters to the editor corpus analysis using machine learning https://www.practicallinguist.com/lte-letters-to-the-editor-corpus-analysis-using-machine-learning/ Linguistics meets computers Wed, 29 Nov 2017 14:36:08 +0000 hourly 1 https://wordpress.org/?v=5.2.2 By: Zhenya https://www.practicallinguist.com/lte-letters-to-the-editor-corpus-analysis-using-machine-learning/#comment-4 Tue, 03 Oct 2017 23:47:32 +0000 http://www.practicallinguist.com/?p=186#comment-4 Thanks for reading!

Supervised methods mean that you have “the right answer”, labels, for your data. For example, if you are trying to train a classifier to determine whether an email is spam or not, you have a big set of actual emails and each email comes with a label, “spam” or “not spam”. Then, you can estimate the error of your classifier, precision, recall; evaluate it according to the given data.

When you have unsupervised methods, you don’t have any “right answers”, or labels, for your data. One example is clustering documents by topic where the topics are not known in advance. The algorithm tells you, here are the clusters (and you pick the number of clusters) of documents that look similar to each other. I would think Twitter does something like this, as they can have lots of new data that doesn’t have preexisting topic labels.

]]>
By: Katya https://www.practicallinguist.com/lte-letters-to-the-editor-corpus-analysis-using-machine-learning/#comment-3 Tue, 03 Oct 2017 19:41:49 +0000 http://www.practicallinguist.com/?p=186#comment-3 Very interested, looking forward to more posts!

What are “supervised and unsupervised methods”?

]]>