data will appear here when you click on a point
The dataset I’m using was cleaned using my nearest neighbor algorithm. A column called “weekday” was added to the original data set and it represents the day of the week that a certain data point was added. I used all the numerical columns while computing the distance between points (all the tags, id, rarity, sense of the modern, file count, owner id, weekday, and item type).
In order to make a network of my data, I connected each data point to its 5 nearest neighbors, not including itself. I did this by constructing an adjacency list and by representing each data point as a node. Each node had a name (which I later used for the label of the data point), a pointvalue (a Point that has the list of data for that row), and a PageRank. After making this network, I computed the PageRank of each node.
I wrote out this info (the nodes, the nodes’ names, the nodes’ pageranks, and the nodes’ connections) to a json in order to visualize it in D3.
In order to visualize my network, I had to decide what attributes were going to affect the size and color of each node. First off, I decided that node size was going to be based on PageRank. So, a node would be bigger if it had a bigger PageRank. Then, I tried out a lot of color schemes to figure out what attribute was correlated to points' similarities to each other. The coloring that I chose was coloring each node by a certain tag, "The Lancaster Avenue Arcades," because this tag was given to the points that were thought to be the most important and the most representative of Walter Benjamin.
As you can see, the network essentially forms into two general areas: one larger and one smaller. There is one small area of all blue points; however, these points are simply from when the class was getting a tutorial on how to enter items into this data set and are therefore negligible. The smaller group of points (group A) are mostly comprised of points from the first few weeks of the class during Fall 2014, while the larger group of points (group B) is comprised of points from the middle and end of the semester. It is clear that the composition of these groups is different in terms of which points are tagged with "Lancaster Avenue Arcades" and which aren't.
When looking at group B, it is clear that the orange points (items tagged with "Lancaster Avenue Arcades") are the smaller points with less connections. This means that they are the points with smaller PageRanks, and therefore, less "importance" as determined by PageRank. However, PageRank determines importance by surveying the connections that an item has to and from other items. Therefore, during the middle and end of the semester, the most important points as determined by Professor Friedman's class were the ones with the least connections, which meant that they were the most unique items that did not have many other items like them. During this part of the class, PageRank had an inverse relationship with the class's classication of certain points as "important"; in general, if a point had a lower PageRank, it was more important.
On the other hand, group A was mostly comprised of points that were tagged with "Lancaster Avenue Arcades," and these points had a relatively higher PageRank than the other points in this group. These points are also highly connected to one another. Therefore, at the beginning of the semester, if items had a lot of connections to other points in the data (and thus, had a large PageRank), they were more likely to be chosen as "important" by the class later in the semester. This pattern is essentially opposite to the pattern observed later in the semester.