[PR #14557] [MERGED] Refactor query from EXISTS to JOIN to avoid API timeouts with large libraries #13852

Closed
opened 2025-12-22 10:06:18 +01:00 by backuprepo · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/jellyfin/jellyfin/pull/14557
Author: @evan314159
Created: 8/1/2025
Status: Merged
Merged: 8/11/2025
Merged by: @crobibero

Base: masterHead: query-optimisation


📝 Commits (2)

  • 9afa063 Refactor query from EXISTS to JOIN to avoid API timeouts with large libraries
  • a3d022d Merge branch 'jellyfin:master' into query-optimisation

📊 Changes

1 file changed (+10 additions, -5 deletions)

View changed files

📝 Jellyfin.Server.Implementations/Item/BaseItemRepository.cs (+10 -5)

📄 Description

Changes
This change refactors a critical query used by APIs like /Artists to replace nested EXISTS with explicit JOINs. This enables index usage and eliminates timeouts when querying large libraries.

A client like Manet requesting /Artists with limit = 200 could not complete library sync due to SQL command timeout because the query could not use indexes. With this change, a SQL query simulating /Artists retrieving all artists ran on my system in 0.173 ms, and Manet is able to complete full library sync successfully. /Artists performs OK overall even with limit = 0 following this change, while other APIs remain slow due to DTO overhead but at least all API used by Manet complete execution now where they previously could not.

My library is approximately 3200 artists, 2500 albums, 41.5k tracks. I am not sure at what point a library is large enough to cause API timeouts, but another user with a large library confirmed my experience with Manet in the developer chat.

I reviewed other nested EXISTS queries; some showed minor improvements, but none matched the significant benefit of this change. In some cases, performance even regressed.

Given the complexity of the original and improved queries, it is not obvious they produce identical results. I verified with Claude and ChatGPT, and have attached an explanation from ChatGPT.

Original version:

  • Filters ItemValues (f) by itemValueTypes.
  • Uses nested .Any() checks to verify existence of a related BaseItemsMap entry (w) whose ItemId matches any Id in innerQueryFilter (g).
  • Selects CleanValue of those filtered f.
  • Filters outer query on whether e.CleanName is in that selected set.

Improved version:

  • Filters ItemValues (f) by itemValueTypes.
  • Uses SelectMany to flatten BaseItemsMap entries (w). Performs an explicit Join between w.ItemId and g.Id in innerQueryFilter.
  • Selects the same CleanValue from matching f.
  • Filters outer query on whether e.CleanName is in this joined set.

Logical equivalence rationale:

  • The old .Any() corresponds to existential quantification (“there exists some g and some w matching”).
  • The new Join explicitly enumerates all such matching pairs.
  • Both yield the same set of CleanValues, differing only in possible duplication due to join multiplicity, which does not affect .Contains() semantics.
  • EF Core translates both forms into SQL queries with equivalent join and filter semantics, ensuring identical result sets.
  • The new form allows the query planner to leverage efficient index-based joins, reducing execution time and avoiding correlated subquery timeouts.

Issues
Fixes API timeouts for large libraries with Manet and similar apps, no issue #.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/jellyfin/jellyfin/pull/14557 **Author:** [@evan314159](https://github.com/evan314159) **Created:** 8/1/2025 **Status:** ✅ Merged **Merged:** 8/11/2025 **Merged by:** [@crobibero](https://github.com/crobibero) **Base:** `master` ← **Head:** `query-optimisation` --- ### 📝 Commits (2) - [`9afa063`](https://github.com/jellyfin/jellyfin/commit/9afa0637e303ee7a59bb4f766fd017610a78f8c2) Refactor query from EXISTS to JOIN to avoid API timeouts with large libraries - [`a3d022d`](https://github.com/jellyfin/jellyfin/commit/a3d022d8f6f2299c69b3c7a3c288d0f1ee4143fd) Merge branch 'jellyfin:master' into query-optimisation ### 📊 Changes **1 file changed** (+10 additions, -5 deletions) <details> <summary>View changed files</summary> 📝 `Jellyfin.Server.Implementations/Item/BaseItemRepository.cs` (+10 -5) </details> ### 📄 Description **Changes** This change refactors a critical query used by APIs like /Artists to replace nested EXISTS with explicit JOINs. This enables index usage and eliminates timeouts when querying large libraries. A client like Manet requesting /Artists with limit = 200 could not complete library sync due to SQL command timeout because the query could not use indexes. With this change, a SQL query simulating /Artists retrieving all artists ran on my system in 0.173 ms, and Manet is able to complete full library sync successfully. /Artists performs OK overall even with limit = 0 following this change, while other APIs remain slow due to DTO overhead but at least all API used by Manet complete execution now where they previously could not. My library is approximately 3200 artists, 2500 albums, 41.5k tracks. I am not sure at what point a library is large enough to cause API timeouts, but another user with a large library confirmed my experience with Manet in the developer chat. I reviewed other nested EXISTS queries; some showed minor improvements, but none matched the significant benefit of this change. In some cases, performance even regressed. Given the complexity of the original and improved queries, it is not obvious they produce identical results. I verified with Claude and ChatGPT, and have attached an explanation from ChatGPT. Original version: - Filters ItemValues (f) by itemValueTypes. - Uses nested .Any() checks to verify existence of a related BaseItemsMap entry (w) whose ItemId matches any Id in innerQueryFilter (g). - Selects CleanValue of those filtered f. - Filters outer query on whether e.CleanName is in that selected set. Improved version: - Filters ItemValues (f) by itemValueTypes. - Uses SelectMany to flatten BaseItemsMap entries (w). Performs an explicit Join between w.ItemId and g.Id in innerQueryFilter. - Selects the same CleanValue from matching f. - Filters outer query on whether e.CleanName is in this joined set. Logical equivalence rationale: - The old .Any() corresponds to existential quantification (“there exists some g and some w matching”). - The new Join explicitly enumerates all such matching pairs. - Both yield the same set of CleanValues, differing only in possible duplication due to join multiplicity, which does not affect .Contains() semantics. - EF Core translates both forms into SQL queries with equivalent join and filter semantics, ensuring identical result sets. - The new form allows the query planner to leverage efficient index-based joins, reducing execution time and avoiding correlated subquery timeouts. <!-- Ensure your title is short, descriptive, and in the imperative mood (Fix X, Change Y, instead of Fixed X, Changed Y). For a good inspiration of what to write in commit messages and PRs please review https://chris.beams.io/posts/git-commit/ and our documentation. --> **Issues** Fixes API timeouts for large libraries with Manet and similar apps, no issue #. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
backuprepo 2025-12-22 10:06:18 +01:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: starred/jellyfin#13852
No description provided.