Minor point, but if you are using factual questions to filter out stupid people, asking for the "right" number of continents is probably not ideal. You (and I) might disagree with someone counting Eurasia (or North and South America) as a single continent, but it is not really an indicator of stupidity in the way that believing in astrology or thinking the earth is larger than the sun are.
According to the wikipedia article on continents, different models are taught in different countries, so you probably end up selecting for a different attribute than you thought.
People I don’t want to date: Anyone who gets hung up on arbitrary definitions and is ignorant of other cultures.
That questions seems to be a pretty good filter, just not for the reasons he thinks it is.
(As I child – not in school but before – I learned that there are five continents: Africa, Australia, America, Asia, Europe. I don’t think I was ever told in school how many continents there are. All we were told were names of certain landmasses to make sure we are all talking about the same thing when saying America, Antarctica, South America or Eurasia. Why bother counting them? That doesn’t even make sense.)
See, but if you made that question mandatory, you'd weed out me, because I was taught in a different school in (probably) different country, and my set of default continents is different.
OTOH, I think I marked this question as irrelevant.
I think you didn’t understand me. That’s not what matters. There is no need to count continents. It’s unnecessary.
If you know every property of something it’s not necessary to ask whether it really is a continent or not.
There isn’t even a communication problem when it comes to this question so it isn’t really necessary to find some sort of great definition.
No matter how many continents you told there were, if you payed some attention you know what people are talking about when they say America, North America, Europe, Asia or Eurasia.
That part actually surprised me quite a bit. I was taught in school that there are five continents (Eurasia, Africa, America, Antarctica, Oceania). I was also told that this is an arbitrary and unimportant definition, but I've always believed that to be the common convention.
I'd go even further: even the astrology question is not a good indicator of stupidity or compatibility.
There are lots of otherwise smart people who believe in "stupid" things. For example, in Japan many people believe that your blood type is an indicator of your personality. I'm talking about educated people.
So, there are many women who are interested in astrology. Some don't really believe that astrology is scientific, but they read the horoscope column anyway. Unless she takes it seriously and makes life/relationship decisions based on it, I wouldn't consider it a deal-breaker.
Many, many years ago I had a girlfriend who I adored because she was a wonderful musician. She had loads of conflicting crackpot spiritual opinions that we argued about, but I didn't care. It was fun whilst it lasted. It wasn't going to last, and I knew that and didn't mind because I liked being with her in the meantime.
I still have the astrology chart she drew for me.
I didn't know I didn't care about the nutty side until I met her.
OK, so -- the point of the article is not (1) whether astrology is a science, (2) that there are 7 continents, (3) whether the sun is bigger than the Earth, or (4) whether any of those are good filters on dates.
It's about the general principle that (a) OKC should make Mandatory questions only count as negatives, and (b) unless they do that, users shouldn't make "factual" questions including (1, 2, 3) mandatory.
Maybe they shouldn't have been mandatory in the first place because you can love someone who believes wrong facts, that's a valid point as well but doesn't relate to the algorithm design issues.
I wouldn't consider somebody an idiot if they misunderstood the word as astronomy. That seems like the kind of mistake normal people might make occasionally.
Given that the range of "right" answers is 4-7, that really only filters out people who respond <=3 or >=8. That's not much of a filter, and people who failed it would quickly be outed as not all that bright anyway. I agree with those saying it's not a great question to put any weight on.
>thinking the earth is larger than the sun are [is] an indicator of stupidity
The question is "which is bigger: the Earth or the Sun?". It doesn't specify how to measure big, so when answering the question I decided to consider angular size and answered that the Earth is the larger of the two.
Now you're just arguing for the sake of argument and throwing in semantics.
I could also say that I am bigger than the Sun because whenever I see the Sun it is only around 3 inches in diameter while I am 5 feet 11 inches. But that is a completely dumb statement, and so is yours.
It's true, but that's a valid way to approach the question. The problem is that grandparent will have more misses (people who legitimately believe that the earth is larger than the sun by the assumed metric) than hits (people who think outside of the box by using a different metric).
Actually, a lot of people use the "what's more directly relevant to me" metric. Thus, you have a lot of people who pick "earth" as an answer even though they are not delusional. At least not in this specific way.
As such, for people that don't like geeks who interpret everything literally, answering "the sun" might be a good strategy, as I doubt anyone else would make this question mandatory.
I can't address all the particulars raised in this analysis, but the biggest is essentially correct:
> The worst side effect of the current scoring system, is that a spammer could easily answer only the questions with obvious answers (basic facts and display of non-bigotry) and get a decently high match percentage with a lot of people. At which point, the spammer uploads a picture of an attractive guy/girl, writes some generic profile text, and scams away.
The algorithm as described in the FAQ does suffer from this problem. However, we have enhancements that address the issue very effectively. The FAQ is slightly out of date, and shouldn't be taken as a complete, exhaustive description of how we make matches.
Amusingly, you could. Create a new account, answer just those questions as "mandatory", and see who your matches are. That's a bit effort-intensive, obviously.
This actually lets you see people who've answered those questions privately. I've used it to snoop on personal questions that not everybody wants broadcast.
> The worst side effect of the current scoring system, is that a spammer could easily answer only the questions with obvious answers (basic facts and display of non-bigotry) and get a decently high match percentage with a lot of people.
That is exactly what I did which led to my meeting my (now long-term) girlfriend. I was receiving about 5 profile views/week with 500 questions answered. I scrapped them all, answered 20 or 30 questions with non-offensive answers, and skyrocketed to 60-100 profile views/week.
I don't know what your view/message ratio is but I can't imagine any guy has any decent ratio. The thing I have noticed with online dating is this: girls get a gang of messages + views and guys get some views. This is true with all the guys I know who are on okcupid. The 60-100 views/week for you is what girls I have dated on there say they get in messages, which I'm sure you know is much harder thing to accomplish.
The art of online dating, IMO, is not in the questions answered but the messages sent and secondly how non-offensive your profile description is. Include keywords "Hiking, Laughing, Friends/Family, Good Beer" and you're in like gin.
If you keep the denominator low, you can get a high view/message ratio. Mine is like 20%, but I only get like 5 views a month. ;-)
Side note: your view count seems to depend a lot on how active you are. I check back perhaps once every couple weeks, which means most of the time I never show up in the all-important "Online now" or "Online in the last day" search results. Hence girls who find my profile tend to be really looking, and more likely to message. If I'm online regularly for a couple days in a row, my view count goes up, my message count stays constant, and the view/message ratio goes way down.
A funny profile helps a lot too. I copied Dr Evil's family therapy monolog[1] into "The most private thing I'm willing to admit here," and got over a dozen messages in a week, including my first contact with the woman I'm now married to. :-)
But do you actually want 60-100 profiles/week? It's not like you can go on 60-100 dates/week.
I guess I can't argue with success, but it seems like the best use of the site is to select for the kind of people you would actually enjoy meeting. And the best way to get people to look at your profile is to write them a message which you spent ~5 minutes thinking about.
Of course views/week is not necessarily the best measure of success. Those 5 views a week for 500 questions may have been excellent matches, whereas hundreds might have been poor matches. I admit though that it's more likely a numbers game.
OKCupid would probably get better matches if it dropped the user submitted weights and ideal match preferences completely and instead used its database of people in relationships as training data for a proper machine learning algorithm.
The current approach is entirely oriented to give people what they think is important and what people think they want. It would probably be better to derive that from existing relationships (successes).
I would be a little surprised if people at OKCupid hadn't already thought about this. Whether there is actually any momentum to change the core matching mechanic or not remains to be seen.
I agree, I wouldn't be surprised if the weights people manually assign to the questions have very little correlation with the success of their relationships. People probably suck at knowing what they want.
That's not to say they're meaningless though; e.g., if someone puts "mandatory" for all his questions, that definitely says something about his personality, and should be used as a feature in the ML algorithm used for matching.
People may suck at knowing what they want, but I think it's more that people do not necessarily know what criteria they should want to select given their goals for finding someone.
Another problem is that a lot of users are probably more casual about the weights, such as selecting mandatory. (It's common for people to select an answer and then mark that same answer as unacceptable in a match, for questions in which it makes no sense for them to do that.)
If all OKC knows is "did we get these two people to enter a relationship" (because they know nothing about how successful the relationship was) they're maximizing for something pretty different from what most users want.
I know a couple of OKC founders. They are hardcore math/statistics geeks. I'm sure they realize the quality of their service matters more than making a bit of money quickly.
They are doing pretty well, considering their competitors have ads on TV.
They have produced some really cool blog posts, which I enjoyed reading. The site itself is fun, I've had 2 accounts on there - 1 I used in London, 1 in Shanghai.
I got some profile views and a couple of conversations, but never met anyone through it; with the exception of seeing my co-workers on it, in Shanghai.
In each city, within a few weeks of creating a profile, I've been in a relationship, but not through the site.
As a person with very strong religious views I've found the matching algorithm to be great at filtering out those with incompatible views. I think matching based on ethical, political, and religious views is the matching system's strength. The true weakness of the algorithm is that it matches poorly for personality. Two people can have very similar beliefs but be a terrible match in terms of personality. I would prefer an entirely separate score for that aspect.
Heh. Exact opposite here - I'm religious and just posted here about how bad I find it at filtering people by religion (or politics, for that matter) for me. Its ability to match for me based on personality seems rather higher.
I'd suspected for a while TBH that there were odd effects from the balance of what questions were answered; some topics have more data coverage than others in the question pool and that (for me) seems to put a noticeable damper on its precision in other areas.
I've noticed that too. There seem to be a lot of repetitious questions all aiming at or around faith vs atheism. "Do you believe in fate", "Do you believe in miracles", etc are talking about the same kind of thing.
> I would prefer an entirely separate score for [personality].
passionfruit, that does exist on OKC. Some of the tabs to the right portray personality aspects like messy, experienced, old-fashioned, indie, geeky, thrifty.
OKC has the worst matching algorithm on the Internet. I signed up for an account, spend a large amount of time to get my profile "100% complete", answered over 1,000 questions and was still consistently matched with people I had absolutlely no interest in dating at all. I can't count how many times I saw 0% compatibility scores, people hundreds or even thousands of miles from me(I'd specifically said no more than 30 from my zip code) and worst of all, several men(I'm not gay and didn't express and desire at all to meet men on the site). I deleted my profile after 6 months and when they asked me why I was leaving I told them how their matching process would be better using a random number generator. It's too bad because OKC is one of few free dating sites I know of that actually has a decent number of women using it.
That doesn't sound like a bad matching algorithm, it's more like you found a bug on the search form: distance and sex are presumably filters in the query, not input to the matching algorithm. And it sounds like a rare bug too, since if a lot of people were getting results of the wrong sex that would quickly become an issue.
I may be being harsh, but the comment sounds like a troll to me; he's describing horrendous, show-stopper bugs like failure to respect basic search filtering criteria which the apparently large number of HN readers with accounts on OKC have failed to report. I don't find the post in any way credible.
I would have thought that OKC would use some dynamic weighting scheme where the weights are not constant, but depend on how commonly people answer that question. For example, questions that just about everyone lists as mandatory wouldn't be weighted with 250, but with some number proportionately reduced to reflect the banality of the question.
Hmmm, I'm a bit disappointed or maybe I'm just missing something.
I guess I should have read the FAAAQ. I had no idea "mandatory" was being applied to the matching algorithm in such a naive manner. That sucks.
Of course, it doesn't really matter for me, anyway. I don't use OkCupid as a dating site. I use it as a way to find things like local libertarians, programmers, and other essentially platonic things. In fact, for a few years, I haven't even used it for that -- but that's how I did use it, so dating criteria are kinda irrelevant, which means answering questions is kinda irrelevant too. (I've answered quite a few just for shits and giggles, though.)
That's interesting, I have tried using OKCupid for dating, but it seems like all of my matches are women I might be interested in platonically, but not for dating.
It could be that OKC has developed a very good system for meeting new friends, but not necessarily for dating.
Out of curiosity, what qualities are desirable in a female friend that aren't in one you would date (aside from the obvious like "she's married" or "she's only interested in dating the wrong gender")
I'm guessing that it's the other way around; there are qualities that are desirable in a person GP would date that are not found in people who would still be great friends.
That's so weird - me too! I use it to find icosahedrons, perfect circles, and other essentially platonic things. PM me for a great story about this one time I thought I had found the Form of the Good.
I'd like to see an analysis between these two sets of data:
A) Dudes that have an opinion about the OKC algorithm
vs.
B) The amount of times they get laid each week
Jokes aside, I'm fascinated with this topic. For most of history, the likelihood of finding a mate was left to chance. Then OKC comes along and says it can leap past obstacles such as chance and geography to help you find your soul mate. That's an incredibly powerful idea.
They seem to have tweaked their algorithm after the Match acquisition. My ex, with whom I had about 65% match (she signed up after we broke up), suddenly one day became a 80% match. She claimed she hadn't answered any more questions, and neither had I.
I like OKC, but they don't do even basic filtering of profiles. If they just verified, say, a mobile phone, it would get rid of the vast majority of fake accounts.
Interesting maths analysis. I'd certainly noticed that their ability to reliably rank inside a window of say, 80% and 95% was rather low, along with the ability to filter out deal-breakers. Frankly that remained a manual process, no matter how much data people had supplied, I never could do that reliably from the numbers alone. As a liberal Christian, I end up with quite a bit of filtering of filtering out of people with very great differences in either religion or politics which the algorithms simply didn't pick up.
So anyway, some tweaking done. Let's see if it has any visible effect...
How about this: instead of showing you match %'s, they show you profiles and you rate them. Then use recommender systems [1,2] just like netflix. There are two immediate problems with this: you have the cold start problem [3] and also if you don't show random profiles, then you introduce bias into their rankings. The first problem could be solved with side information (you still answer questions) and the second with a nicely chosen exploration/exploitation trade off. PMF[4] for the win!
OKC does have this functionality, it's called quickmatch. I don't know how they use the ratings you give there, but I doubt they just let it go to waste.
* If you're going to do it properly and give them meaningful data, a 'quick' skim is the exact opposite of what you want to do. Yet it's what the function is set up for; you can't even see a person's questions and answers, which have frequently (to me) proved far more revealing than the profile.
* Again, if you're doing it properly, rating someone highly sends them a mail saying 'someone's interested in you'. Which always looks to me like saying 'Someone's interested in you but doesn't have the guts / energy / enthusiasm to write you a proper message so has gone for the easy way'. Not a great intro.
With a bit of tweaking I agree it could give some useful seed data, but not much more; fundamentally I think the question-based approach is very good, but I think their scoring algorithm could benefit from refinement.
In both cases, the key observation is to realize that the meaning of the numbers is more important than the numbers (or how they're calculated, per se). This point is kind of abstract--especially compared to the salient examples in both pieces--but if you're looking to understand why this error is so common, look first to the fact that it's not an error of stupidity or insufficient maths.
You can sometimes avoid a big practical problem (e.g., an avenue to attack your system via diluting meaning ("spam")) by abstractly considering the meaning behind the structures available to you.
I suspect that "doug" in the articles comments has it right.
Users are happier to see dozens/hundreds of "matches" that they know aren't all that good despite saying 90% than if they saw 3 matches that were more correct followed by a huge drop down to 50-60%.
It is crazy to think that a matching algorithm based on answers to seemingly pointless questions can predict the success of a future relationship. There is no substitute for interpersonal connection in real life. Those looking for love and meaningful relationships should get off of their computers and meet people in person. This is the problem with the majority of dating sites out there-too much focus on the online experience of users rather than the offline interaction. The internet is a great way to network and meet new people but nothing can replace spending time with a person in real life to determine whether or not you will get along with them.
The other problem is that many of the questions are redundant, ambiguous, or otherwise poorly constructed. I end up skipping a lot of them just because of this.
This reminds me of the recent Dinosoar Comic. It seems like everyone thinks they can make a better dating site these days: "BaguettesAll4Me.co.uk is complete! ... I realized my perfect woman won't say 'ew that's weird' as she watches me eat a whole baguette"
http://www.qwantz.com/index.php?comic=2088
>I can tell based on profile text alone whether I'm likely to get along with a person.
but this is age and experience, not math. //
Arguably your brain is running an algorithm that can be expressed mathematically, we simply don't know the exact nature of the algo.
Age is largely irrelevant, it's mainly a first pass indicator for experience. Your experience is probably largely a statistics based algorithm with some unsound choices caused by your psych make-up thrown in to keep things interesting.
> At least they’re not using a non-linear Bayesian splitting tree didactogram
This seems like a weakness to me. I'd love to be able to choose between different algorithm for matching features and set some relevant coefficients. It'd be fun to see the profiles that have a minimum hamming distance from mine, or whatever.
Unless it was a gay dating site, that would also worsen the endemic problem with dating sites: namely that there are far, far more men than women on them.
Actually back when I was using OKC I didn't have a problem finding women -- it was just that there were few dateable women.
I expect that to change though - one (of the many) differences between men and women is womens dating worth as a date falls in her twenties while the typical male sees his dataing value go up.
I am still not interested in an older woman (or a fat one, or a fundie or a single mom). If I wanted to date my mothers friends I would give them a call.
it's probably not a big deal from their perspective, since 99% of people wouldn't realize to do that.
It's essentially good enough for them, since the mainstream audience wouldn't realize that, and if a few geeks figure out a way to cheat the system, then that's fine too, since they need all the help that they can get
Hmm, this does make me wonder about some people that I'm "almost perfectly" matched with. We potentially had too many mandatory questions matches, but a lot of the more subtle things got lost?
Who said anything about 125lbs, thats a far way away from 280... I joined but only got recommendations for people my size, but when I go out in public my choices are always varied. Its not just an algorhythm, a feature for random suggestion should be added.
What's wrong with OKCupid's matching algorithm for me is that I can't get to use it at all! I just can't log in. I've been trying for the last 2 months, they just keep saying "Sorry, we're having technical difficulties right now. Check back later.". I have searched news for an explaination of a major OKCupid outage and found nothing. Is it just me?
Yikes! Looks like that; I've tried logging in via a proxy and it worked. Never thought they would filter me out this way, saying it's "technical difficulties".
Is there a way to contact them? They don't reply when I sent via Feedback form is not answered. I seriously thought they have gone out of business (however improbable it might be, being acquired just this year for $50 million)
caycep, there is some stochasticity. I think if you search for matches with the same criteria twice, you will not get the exact same sort (try searching in a populous area).
Assuming OKC's target market is the relative majority in English-speaking countries it really doesn't take much crazy math to find matches, as in the matching facility is not going to be as important as the random number generator (or trained monkey) that is filling in the matches. All that really needs to be taken into account are features of the prospective match that the two individuals WANT to be taken into account(age/height/race/income level/political/religious). It isn't that complicated, especially since one of OKC's tactics is to use quizzes/'looking for friends' to break the stigma of online dating.
What is the stigma of online dating?
Obviously the implied answer is that if you are using online dating then you cannot find a date IRL. Or perhaps a kinder answer would be that you have too much going on to find a date IRL. This is the same principle as a meat-market nightclub/singles bar where the bouncer lets the guy with the two attractive women in, but the pack of forever-alones get locked out. Then the slightly wealthier gentleman pays $ to the nightclub to get in, for the opportunity to pay $ to one of the attractive women (in the form of a drink) for the chance to talk to her and (hopefully) get sex/affection/phone #.
Of course this is all crazy, but it is a way in which many bars/nightclubs/online dating works. If OKC can attract attractive women with quizzes/validation/'looking for friends' and associated kitsch, they have it made.
The only online dating site that was worth a moment's time was Facebook, and then they removed the search facility.
OKCupid was interesting, but not a magic bullet. It was then bought by Match, which consists primarily of marketing and poor website construction.
Online 'dating' is a social problem more than a computing problem and, as such, no good solution will ever exist. Anyone male thinking of sinking time into one would be better served by simply getting out more, taking classes, adopt a dog and walk it, etc.
I have to agree with you in a sense that this is a social problem (but of course that's what's interesting with graph integration and social media now). As a guy who took Attraction classes (or PUA training as some might know it) and created a much higher success rate for myself by negating the traditional online dating sphere I do believe time is probably better spent offline--however, I still think the computing angle of this and the way in which society is already headed, this is one of those holy grails of the internet.
That reminds me... I had a female friend who would likely be considered attractive to most. She was yammering on about how terribly interesting this guy was who she met online. He kept asking such interesting questions, and gave her a "personality test" about a cube in the desert, and a ladder, and...
Several months later, when I was reading The Game, he got around to describing exactly this 'test'. I laughed for quite some time.
This is quite off topic but because of how well known 'The Game' is, it should never be used as a how-to as it once was but more of an inside look at what such a life entails. Quite a few girls have read such books, and actually look out for such tells to identify frustrated chumps--and well the "chessyness" and the hyper convoluted dating approach of that time just doesn't cut it anymore (I've seen some old techniques done, and successfully, but it's not advisable haha--but again a lot of these things are built upon psychological understandings), and it's a big reason why that field has really had to evolve in the past few years. Simple common sense and psychological and biological foundations seem to be the way to go.
Now a good way to single out the Hacker News Readers would be to identify males in the computer software industry who suddenly and significantly updated their profile questions shortly after November 23'rd 2011....would love to see some real data on how much action these users are getting!
What I hate about OKCupid is that all the girls I've met only wanted sex. I.e. we have sex, it's all fun, and then they leave saying to call them back whenever I feel like having fun again. It's just not serious.. I clearly checked "Long time relationship".
According to the wikipedia article on continents, different models are taught in different countries, so you probably end up selecting for a different attribute than you thought.