IMDb offers a great deal of useful structured information for research. There're multiple ways to get small pieces of its database: • Download a subset of data from • Use API via,, or (and other languages) • Use subsets from current papers on IMDb and its visualizations, eg this. • Get a license from IMDb to use API more intensively: $15K+. • Wandora • (But not by grabbing IMDb pages, which is not allowed by IMDb.) But these ways prevent from reaching a deeper study of relations within the DB, for instance, for economic research. The Movie Database (TMDb). You can download the entire database of imdb. If the licenses around the IMDb data dump work for you. The Movie Database (TMDb). You can download the entire database of imdb. If the licenses around the IMDb data dump work for you. Say, random sampling of a fraction of the DB may miss important relations. Is there a better way to get mass IMDb data for research purposes? Not sure if this would classify as a comment or an answer, but it's useful information nonethelss: So in reading this question I HAVE to point this out - ever heard of the paper?: Arvind Narayanan and Vitaly Shmatikov. 'Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset)'. The University of Texas at Austin February 5, 2008. Smime.p7m viewer freeware. Once I found out what triggered the 'secure email' system at work I just sent the message out again as an ordinary signed/encrypted email, but there will be times when I may need to know how to open the smime attachment because it will not be convenient to go back to work and send it again. In this case it's no huge deal. Emdb![]() Full text is at: It's quite a famous paper and was even on the news when it got published. Here's the abstract: We present a new class of statistical de-anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary’s background knowledge. We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world’s largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber’s record in the dataset. ![]() Using the >>>Internet Movie Database. +1 I think it's a useful article in many respects. Saw it some time ago. But the article's authors had to work 'with a very small sample of a few dozen IMDb users' due to IMDb's limitations on crawling. PS: A couple ideas. (1) The Netflix dataset may be an alternative (supplement) to IMDb itself. (2) It's a potentially interesting method of revealing preferences and making sociological surveys. PPS: Partly relevant work about anonymous dataset and identification from at Media Lab. – Sep 5 '13 at 10:04. As was alluded to in one in a comments you should check out which would allow you to either download manually via ftp or through a terminal interface.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
January 2019
Categories |