• Fur Affinity Forums are governed by Fur Affinity's Rules and Policies. Links and additional information can be accessed in the Site Information Forum.

Main Site Improving Search Function

se05239

Member
Hello.
Been a member of FA for like.. 11 years now (wow) and the search function is still as awful as it was when I joined. My main gripe with it is the limited amount of search results you can find before the search crashes. I think its like 3200 results in total.

This causes problem because when you find 33k hits on a word (using general rating 'Taur' as an example), there's just a shitload of images that's lost to the void because you can only find 6400 unique images using Date + asc/desc. Relevancy and Popularity can add a bunch more unique ones to that mix, if we include asc/desc for those too. Say that we can find 12k images of 'Taur' using this.. but the rest is just stuck "in the middle" of it.

That's not even including lack of blacklist functionalities and a way to tag other artists' images (that they never bothered to tag).

Are there any plans to actually improve the search function of FurAffinity?
 

Stratelier

Well-Known Member
This is actually something of a well-known, documented problem intrinsic to paginated search queries in general, and not specific to FA.

See, for efficiency purposes, when you perform a search query and the results are paginated, a database typically only generates the first page of results and discards everything else. (After all, if nothing else can possibly affect what's shown on page 1, why bother with the extra work?)
And who actually wants to sift through every single page of thousands, even millions, of results? At large scale, this is an EXTREMELY narrow usecase that literally nobody plans or designs for. EVER. It's just not a good use of time and effort compared to other priorities.
The drawback? When you want to see page 2, the database simply cannot "resume" the previous query where it left off (any "extra" results from the previous query were discarded after the query finished) -- it starts a completely new one, meaning that in order to generate page 2 it first has to determine what's on page 1 all over again. Likewise, when you want page 3, it has to generate pages 1 and 2 all over again just to determine where page 3 starts.

Though this extra overhead is insignificant at first, it slowly adds up as the page count increases, and eventually (usually only after hundreds or thousands of pages) hits a threshold where simply determining where a specific page of results is is a much larger amount of work than the size of page it actually returns to the user.

In practice, there is basically no "fix" for this, just some workarounds. For example:

1 - Use what's called an asynchronous query to execute the search with no pagination, where the user initially receives just a 'token' to identify the search query internally. If the results are stored in their own table, that can be queried repeatedly (paginated or not) with vastly less effort on the database's part (as the bulk of processing gets handled just once in its entirety). However, this approach is more vulnerable to caching issues and spamming/flooding by large numbers of users.

2 - "faux pagination": When the search results are sorted by some absolute metric (such as submission date), this metric can itself be added as a condition for subsequent pages. For example, if the results are shown "Newest First" and the last item on page 1 has a date of (D), then everything from page #2 onwards must logically be (submission_date < D) and therefore you literally add this requirement to the search query; if this metric is indexed (and something like submission date most likely is) then the overhead reduces from linear (by page number) to constant (regardless of page number). The drawback? This method only works when jumping between adjacent pages; you can no longer jump to an arbitrary page at any time (though depending on the metric you can jump to an arbitrary value at any time, though this will not correlate to any specific page number).
 

Lutro

Writer
This is actually something of a well-known, documented problem intrinsic to paginated search queries in general, and not specific to FA.

This isn't true. Most (if not all these days) database engines the past set of decades have supported query pagination out of the box. Pagination is 100% a solved problem and does not involve any sort of complexity.

SELECT * FROM foo WHERE {your complex filtering here} LIMIT 20, 10 <-- MySQL example. Skips the first 20 results (i.e. the first two pages) and gets the next ten. Basically, "get page 3."

You 100% do *not* need to "determine what's on page 1" to get to page 2. In fact, a ton of web sites allow you to specify arbitrary page numbers to jump to. (I'm honestly not sure why FA hides everything behind POST and doesn't let you alter your current page at all.) A lot of sites just have the page number up in the querystring of the URL. They are computationally-equivalent.
 

Stratelier

Well-Known Member
You 100% do *not* need to "determine what's on page 1" to get to page 2.
Have you never actually worked on a large-scale SQL database before?

https://www.eversql.com/faster-pagination-in-mysql-why-order-by-with-limit-and-offset-is-slow/ said:
Well, in most cases, low offset queries are not slow. The problem starts with high OFFSET values.

If your query is using the following limit clause: "LIMIT 50000, 20", it's actually requesting the database to go through 50,020 rows and throw away the first 50,000.

image:

https://developer.wordpress.com/2014/02/14/an-efficient-alternative-to-paging-with-sql-offsets/ said:
But on a performance level, this means you’re asking your DB engine to figure out where to start from all on its own, every time. Which then means it must be aware of every record before the queried offset, because they could be different between queries (deletes, etc). So the higher your offset number, the longer the overall query will take.

https://stackoverflow.com/questions/4481388/why-does-mysql-higher-limit-offset-slow-the-query-down said:
"It's normal that higher offsets slow the query down, since the query needs to count off the first OFFSET + LIMIT records (and take only LIMIT of them) [...] the query cannot go right to OFFSET because, first, the records can be of different length, and, second, there can be gaps from deleted records. It needs to check and count each record on its way.

https://www.postgresql.org/docs/8.0/queries-limit.html (official postgreSQL documentation) said:
The rows skipped by an OFFSET clause still have to be computed inside the server; therefore a large OFFSET can be inefficient.

TL/DR: It's not that OFFSET queries are "bad", rather that they scale badly.

To be fair, you are not wrong to declare it a 'solved problem' since the workarounds are just as well-known as the problem -- what I referred to earlier as "faux pagination" is also called the "seek method", "subkey selection", etc. depending on who's writing the tutorial.

But if you have any further insights into the matter, you're still free to share them.
 
Last edited:

Lutro

Writer
I have worked with multiple large-scale databases. You are correct that the impact is non-zero (for MySQL anyway), but it doesn't change anything else about what I said with regards to not having to 'compute/visualize' previous pages. Nor is it some mystical un-solved problem (pagination is well-solved because it's such a common feature). I think it's disingenuous to say that "there's only two methods and both are sketchy." DB engines are really freakin' good at paging around data (particularly when indexes are set up properly).

I would like to see what FA does under the hood though before commenting further, though. Particularly why they don't at least allow you to specify page numbers when browsing.
 
Last edited:
Top