Hi FA admin,
Over the last few days I've written my own program/scripts to log in to FA and sequentially download and catalog user submissions. It runs pretty smoothly, albeit extremely slowly. But before I burn through too many resources, I wanted to ask permission to continue to run such a program.
http://superwailingbonus.com/stuff/fash.php
Here is a prototype of the search engine. You can search the descriptions for a key phrase, but it only a small portion of the submissions have been catalogued.
Here's how it works: it uses curl to grab a submission, then it runs it through a series of shell scripts to extract text-based information such as description, tags, date posted, etc., and then uploads that data to a database. User images are not stored in any way. However, this does require up to 100kB of data per page, not to mention processing time/power. Also, I have temporarily suspended this program, until I get permission to continue.
I'm concerned that this program may be considered harmful (DoS) or an invasion of privacy among other nuisances. Currently, the program can run at a top speed of one submission every 4.5 seconds. If I let it run continuously, it would be able to catalog the entire FA archive in about 8-10 weeks. I'm also worried that even though the images aren't stored, the search engine will make the text descriptions and such available to anyone who uses the search engine. However, the user won't be able to see the submissions if they are not logged in, as they will have to visit FA itself to see them.
I just wanted to bring this to your attention in case it would be a problem. I don't want to cause any problems for the site, just enhance it for other users. What are your thoughts on this project?
Thanks in advance,
Taren
Over the last few days I've written my own program/scripts to log in to FA and sequentially download and catalog user submissions. It runs pretty smoothly, albeit extremely slowly. But before I burn through too many resources, I wanted to ask permission to continue to run such a program.
http://superwailingbonus.com/stuff/fash.php
Here is a prototype of the search engine. You can search the descriptions for a key phrase, but it only a small portion of the submissions have been catalogued.
Here's how it works: it uses curl to grab a submission, then it runs it through a series of shell scripts to extract text-based information such as description, tags, date posted, etc., and then uploads that data to a database. User images are not stored in any way. However, this does require up to 100kB of data per page, not to mention processing time/power. Also, I have temporarily suspended this program, until I get permission to continue.
I'm concerned that this program may be considered harmful (DoS) or an invasion of privacy among other nuisances. Currently, the program can run at a top speed of one submission every 4.5 seconds. If I let it run continuously, it would be able to catalog the entire FA archive in about 8-10 weeks. I'm also worried that even though the images aren't stored, the search engine will make the text descriptions and such available to anyone who uses the search engine. However, the user won't be able to see the submissions if they are not logged in, as they will have to visit FA itself to see them.
I just wanted to bring this to your attention in case it would be a problem. I don't want to cause any problems for the site, just enhance it for other users. What are your thoughts on this project?
Thanks in advance,
Taren