This really is a deep rabbit hole and something I've played around with and considered devoting a lot of time to. Look into Expert Elicitation, Decision Theory and Order Theory.
There is no one-size fits all. This the most important thing to keep in mind from the start.
This type of ranking is really all about UX. The math is just a tool to make it easier. It's a real trap to find some theory and think this will solve things, but if it doesn't actually make it easier for people to make decisions, you really didn't solve the problem.
Sometimes it looks like stack ranking would help. But, often you don't really need a stack. Maybe you just need the top one or the top N. Maybe each item has a weight and you want to fit the most value for a given weight allocation (knapsack problem). Maybe the weights and values aren't actually known, just relatively (this one is more work and more valuable than that one). Maybe value is compounding, like u({A, B}) > u({A}) + u({B}).
Maybe the preferences are circular, like A > B > C > A. But that's not possible! Well, that's what the user says and just throwing up an error screen probably won't fix it. You'll need to handle that gracefully.
My suggestion is to really stick to one specific problem and solve for that, versus something general. Also allow the input to be rich. Rather than a win/lose, you might be better off with -2, -1, 0, +1, +2 in comparison (or words). Allow ties until they're actually a problem. Why make people struggle to choose between two options when neither of them end up being used?
It can also help to see things as probabilistically better rather than strictly better. Elo scores help with this, like the other comment said.
Decision ability is a resource. Decision fatigue is real and fast. Optimize for taking up as little as that as possible from the user, especially if that user is you.
There is no one-size fits all. This the most important thing to keep in mind from the start.
This type of ranking is really all about UX. The math is just a tool to make it easier. It's a real trap to find some theory and think this will solve things, but if it doesn't actually make it easier for people to make decisions, you really didn't solve the problem.
Sometimes it looks like stack ranking would help. But, often you don't really need a stack. Maybe you just need the top one or the top N. Maybe each item has a weight and you want to fit the most value for a given weight allocation (knapsack problem). Maybe the weights and values aren't actually known, just relatively (this one is more work and more valuable than that one). Maybe value is compounding, like u({A, B}) > u({A}) + u({B}).
Maybe the preferences are circular, like A > B > C > A. But that's not possible! Well, that's what the user says and just throwing up an error screen probably won't fix it. You'll need to handle that gracefully.
My suggestion is to really stick to one specific problem and solve for that, versus something general. Also allow the input to be rich. Rather than a win/lose, you might be better off with -2, -1, 0, +1, +2 in comparison (or words). Allow ties until they're actually a problem. Why make people struggle to choose between two options when neither of them end up being used?
It can also help to see things as probabilistically better rather than strictly better. Elo scores help with this, like the other comment said.
Decision ability is a resource. Decision fatigue is real and fast. Optimize for taking up as little as that as possible from the user, especially if that user is you.