CS 432/532 Web Science
Spring 2018
http://anwala.github.io/lectures/cs532-s18/
Assignment #9
Due: 11:59pm April 21
Support your answer: include all relevant discussion, assumptions,
examples, etc.
(10 points)
1. Using the data from A7:
- Consider each row in the blog-term matrix as a 1000 dimension vector,
corresponding to a blog.
- Use knnestimate() to compute the nearest neighbors for both:
http://f-measure.blogspot.com/
http://ws-dl.blogspot.com/
for k={1,2,5,10,20}.
Use cosine distance metric (chapter 8) not euclidean distance.
So you have to implement numpredict.cosine() instead of using
numpredict.euclidean() in:
https://github.com/arthur-e/Programming-Collective-Intelligence/blob/master/chapter8/numpredict.py
===================================================================
========The questions below is for 3 points extra credit===========
===================================================================
3. Re-download the 1000 TimeMaps from A2, Q2. Create a graph where
the x-axis represents the 1000 TimeMaps. If a TimeMap has "shrunk",
it will have a negative value below the x-axis corresponding to the
size difference between the two TimeMaps. If it has stayed the
same, it will have a "0" value. If it has grown, the value will be
positive and correspond to the increase in size between the two
TimeMaps.
As always, upload all the TimeMap data. If the A2 github has the
original TimeMaps, then you can just point to where they are in
the report.