• Fur Affinity Forums are governed by Fur Affinity's Rules and Policies. Links and additional information can be accessed in the Site Information Forum.

Building my own search engine

Taren

New Member
Hi FA admin,

Over the last few days I've written my own program/scripts to log in to FA and sequentially download and catalog user submissions. It runs pretty smoothly, albeit extremely slowly. But before I burn through too many resources, I wanted to ask permission to continue to run such a program.

http://superwailingbonus.com/stuff/fash.php
Here is a prototype of the search engine. You can search the descriptions for a key phrase, but it only a small portion of the submissions have been catalogued.

Here's how it works: it uses curl to grab a submission, then it runs it through a series of shell scripts to extract text-based information such as description, tags, date posted, etc., and then uploads that data to a database. User images are not stored in any way. However, this does require up to 100kB of data per page, not to mention processing time/power. Also, I have temporarily suspended this program, until I get permission to continue.

I'm concerned that this program may be considered harmful (DoS) or an invasion of privacy among other nuisances. Currently, the program can run at a top speed of one submission every 4.5 seconds. If I let it run continuously, it would be able to catalog the entire FA archive in about 8-10 weeks. I'm also worried that even though the images aren't stored, the search engine will make the text descriptions and such available to anyone who uses the search engine. However, the user won't be able to see the submissions if they are not logged in, as they will have to visit FA itself to see them.

I just wanted to bring this to your attention in case it would be a problem. I don't want to cause any problems for the site, just enhance it for other users. What are your thoughts on this project?

Thanks in advance,

Taren
 

tsawolf

Member
I can only speak to the technical side, since that is the only part of FA I work with. Policy, you'll have to talk to someone else. :)

Though it's not an invasion of privacy (the information could have been seen by anyone anyway just looking at FA, assuming they had an account - and we disabled search for technical reasons, not privacy concerns), I am very concerned about the load that the indexing would put on the server.

The problem isn't so much a function of bandwidth as CPU time. Currently, the servers that we are using are running quite hard just to keep up with the normal flow of traffic - I'd be very, very hesitant to authorize any kind of automated system.

From a technical perspective, there might be a way for us to get you some kind of database dump of the user content tables that you could then use as an offline search. But that would require permission from the people in charge, and I'd have to check for the feasibility of such a task. The database server may be choked as well.

Sorry this isn't much help, but hopefully it will tide you over until a policy person comes along.
 

Dragoneer

Site Developer
Site Director
Administrator
Please e-mail me a copy of the code to dragoneer@thedragoneer.com, and I'll pass it onto the coders for review. As it stands, I agree with Tsawolf - the script, at least from the general description, sounds like a potential resource hog and killer, and FA is stretched thing and is exceptionally costly as it stands. Automated scripts, at this time, are not welcome on the system due to, again, limited resources.

Let us review it and get back to you.
 

Taren

New Member
I sent a zip to your email. I'm not worried about getting this running again, just so you know. I kind of figured it would be too overwhelming when I realized how slow it was actually going, so I'll just keep it disabled.

Like I told Crypto, it was more a challenge to myself to see it could be done.

So don't worry about it, I'm sure somewhere along the line a search will be implemented into the site itself, which would save on a lot of resources. I'll just write this project off as "works in theory".
 

Guano

Member
*Sigh*

And here I was, hoping we would finally have a search engine...

What the heck is this site built on? I really know nothing of codes and such, but I don't understand why a search engine would screw the site up. All it does is locate and retrieve pictures based on an input phrase, right? Why can't this site do that?
 

XeNoX

Member
there already has been a full working 3rd party search function once so that is not the problem *nudge nudge*
 

Bokracroc

Bokra, come out to pla-ay
Guano said:
What the heck is this site built on?
Duct tape and Hopes.
 

Bokracroc

Bokra, come out to pla-ay
That was the city >:[
 

Arcturus

Banned
Banned
I used to run one. It used cached data. But I took it down.

I offered to write a new, super-speedy one. I was told no.

I even offered to go to the trouble of writing one to use super-sanitized data exports for FA. I was told that they were gonna write one of their own, and that if they didn't do it, then I could. They gave a deadline. Which got extended.. and extended.. and of course, where's the search? Nowhere :(

The original deadline being three months ago.

My offer still stands. I'll make a fast, highly indexed, search system for FA. Just give me the data to do it, so it doesn't impact FA's speed.
 

Acorndeer

Member
Arcturus said:
I used to run one. It used cached data. But I took it down.

I offered to write a new, super-speedy one. I was told no.

I even offered to go to the trouble of writing one to use super-sanitized data exports for FA. I was told that they were gonna write one of their own, and that if they didn't do it, then I could. They gave a deadline. Which got extended.. and extended.. and of course, where's the search? Nowhere :(

The original deadline being three months ago.

My offer still stands. I'll make a fast, highly indexed, search system for FA. Just give me the data to do it, so it doesn't impact FA's speed.

Good luck getting trough pal :p
 
Top