Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Hacker News Simulator (ycombniator.com)
572 points by orf on Sept 20, 2015 | hide | past | favorite | 163 comments



We got an email from someone asking why they were logged in as pg, so you've passed some sort of niche Turing test.


Notably, this is what happens most of the time when something from /r/SubredditSimulator gets popular and becomes intermixed with normal submissions on a subscriber's front page and /r/all.

The way to beat a Turing Test is through hiding in plain sight.


9gag automatically reposts reddit's /r/all onto their site. People on 9gag don't realize it's a bot and have one hell of a time trying to interpret the post. Examples:

1. https://www.reddit.com/r/SubredditSimMeta/comments/3fnkni/9g...

2. https://www.reddit.com/r/SubredditSimMeta/comments/3kwmnj/9g...



Shouldn't we worry that someone has been fooled by the ycombniator.com ___domain and lookalike design?


It seems pretty harmless.


I wouldn't exactly call this a look-alike...


And pg has 1337 karma! (Is that joke too old now?)


It is kinda lame. I prefer 31337 as it is a prime number, while 1337 is not.


I would say it is timeless.


It's as timeless as the concept of using Markov Chains to generate a fake Hacker News site.

Source: http://namuol.github.io/phaker-news/


I would have preferred 8675309 karma.


Leonardo da Vinci used to throw paint filled sponges at walls and then force himself to make sense out of the resulting irregular shapes in relation to a problem he was interested in. For example: he might be thinking about transportation and say "well this looks like a horse drawing a carriage ..." and then use it as a basis to come up with new ideas. By forcing his brain to make connections between totally unrelated things it enhanced his creativity (or at least that was the intention.)

What comes to mind for this Hacker News Simulator is the modern equivalent of da Vinci's sponges: an ink blot which you can use to come up with completely new and novel ideas by forcing unexpected connections. And because the topics here are in some way related to hacker news the result could be filling in enough blanks to produce something newsworthy (i.e. an actual good idea instead of a garbled markov chain.)


For a while I ran a Markov chain blog poster that drew on my own notes and posts on a site I ran for a while. A phrase in one of those posts ("Only a matter of seeing simple data structures and designing lightweight tools that can cross the galaxy") actually led to a chain of association that has spawned a five-year research thread. This technique can actually work.


That's really interesting, hadn't heard that before. A big difference here though is that the software is making (the appearance of) very specific logical connections itself -- with the ink sponges, you're creating a vague image that is open to human interpretation.


Some of the software-generated "logical connections" are not quite logical though, and require a little massaging/tweaking (human interpretation) in order to make sense. This process can result in interesting new logical connections.


Is that da Vinci method a bit like interpreting I Ching hexagrams? Put some randomness in, add some constraints, and interpret it to your current context. Should be a similar creativity enhancer.


Kind of off topic, but I have been looking at I Ching hexagrams as design patterns for life. Each hexagram represents a state you can be in. You can transition from one state to any other state, but there are consequences. The consequences are described in the transitions of the moving lines.

To make the most sense of this, it helps to read Richard Wilhelm's translation and annotations of the I Ching. If you don't read German, then Cary Banes's translation of Wilhelm's translation is pretty much the only one out there. Unfortunately still under copyright (ha ha... on 5000 year old document). But it is widely pirated on the web.

I often wonder if the random "predictions" of the I Ching were originally just a way of studying. There are 4096 different transitions to look at, and approaching them systematically would be painful in the extreme.

I've spend some time to see if the transitions actually make sense and from what I can tell, they do (modulus my ability to fool myself into seeing reason in things that have no actual reason). I wish I had enough patience to actually study it ;-)


Any LdV biography you could recommend?


"The Science of Leonardo" by Fritjof Capra is easy to read and a good intro. Martin Kemp's "Leonardo da Vinci" is a bit scattered but has great visuals. But most interesting is Leonardo's own "Notebooks."


This is perfect. I opened it on a new tab, and went off to read another article. After coming back and opening multiple links I found myself thinking "where are all this posts with gramatical errors coming from" for a solid one minute - until I noticed "pg" where my username should be!


I, too, did this. >.<

"Wait, how did Hacker News get opened twice?"

...

"Eh, I'll just close this one." (You can guess which one I closed. :P)

[One pile of middle-clicking later]

"Wait, ycombinator is defaulting to opening comments now?!"

I noticed the "ycombniator" shortly after that.

I wish those links were real :(

- http://news.ycombniator.com/comments/comments_198.html: "Git 2.0 changes push default to using only CSS3 - No more remote work work: An adventure in civic hacking (scienceblogs.com)"

- http://news.ycombniator.com/comments/comments_210.html: "Show HN: Farmly – find anyone for anything you want and keep the Olympic germs away. (nowthenapp.com)"

- http://news.ycombniator.com/comments/comments_208.html: "Twitter will pay for anything [Product] (skullsinthestars.com)"

- http://news.ycombniator.com/comments/comments_184.html: "Ask HN: Angry/hardcore rock music to code the summer away, stuck for ideas to practice/exercice a new UI designer/programmer (NYC)"

- http://news.ycombniator.com/comments/comments_190.html: "Is GTK+ the real reason people are startups using? (blog.cto.hiv)"

My responses ranged from "they're doing what to Linux now?" to "that one sounds cool..." to "that's true..." to "...of course I'm clicking that" to to "hm... well they must be going for an Indian motif".

And somewhere in there I was like "wow, /r/titlegore meets HN."

I'm very dense though. This was really well-made! :D

I still wish that 4th link was real. Like, I have a different taste in music, but I'd so reply to that. lol


I did virtually the exact same thing - except my first reaction on seeing the fake HN was: "Why the heck am I logged in as pg? Security hole? This ought to be fun, let's see what kind of things he can do in here..."

Took me a while to figure out that this was actually a fake :-(


I had the same experience. LOL'd when I realized I'd clicked on 5-6 links that looked interesting that were actually auto-generated.


Imagine how it will feel at midnight after spending 15 min replying to ... whatever this is.

Hopefully you are real. :)


The titles suddenly got a lot better! Less grammatical errors. One of my favorites is "Hosted Continuous Integration Using Gradle, Android Studio And New York Times, Evernote, Gmail, and Quicksilver".

That is a CI solution I'd love to see :)


Another hilarious title: "Ask HN: Worst examples of really creative ways to combat late payments". This sounds like an real post on HN, and actually makes sense.

Or maybe "Swift Language Will Instantly Know Everything About the Origin of 'The World's Dumbest Idea': Milton Friedman".


*fewer grammatical errors (sorry)


Newsbot thanks you for proving that you are real.

You may return to your life simulation.


"Steve Jobs was right: Dropbox is not working on KVM/bare metal"


"I miss Google wave invites going for $26 a month" I forgot I was on the simulator and was interested in reading this.


I like this one "Why Android Will Soon Become Apple’s Most Important Thing I’ve Ever Done."


I think "Google to announce proof of Ikea's excellence" wins.


Was it anyhow influenced by https://reddit.com/r/subredditsimulator?


Yep, it uses the same technique. I've pulled every comment and story from HN through the API and make a bunch of Markov chains to produce story titles and comments. The input corpus is a lot smaller than subredditsimulator though and the various subreddits have a large variation in the words used (meaning more funny comments), so it's no way near as good.

Perhaps a future enhancement might be to partition comments by the submission ___domain, i.e use only comments for github.com stories for fake github submissions, might generate very different text than a munge of every comment.


You could also use the comments on reddit in /r/programming and /r/startups and other related reddits to help get more data for seeding your corpus.

Or if you want to get more complex, find the reddit comments for every link that was submitted to HN and use that (but you have to be careful that you use "hacker" related reddits or it will sound too "reddity")


Maybe aggregate over the subreddits listing (where HN links appear) and then whitelist some of them (after checking against a comment frequency count) like programming, linux etc.


I would love love love it if it could provide comments on a per-user basis. Use my comments for my name, use JohnDoe's comments for their name.

As well as keeping comments by submitted ___domain.


I made a similar thing a year or two based off of another Markov Chain of HN headlines. I generalized it to allow you to mashup headlines from different news sources (Buzzfeed x Hacker News, for example). Still pretty funny, if anyone's interested: http://www.headlinesmasher.com/best/all


Scandal: Politician Goes to Work

LOL


These headlines are phenomenal.


I wish I could upvote this post multiple times.


Create some stories that support the... err... premise of those titles and I think you could automate Buzzfeed and such.


I lost it at "Nintendo steps into porn biz"


some of these could be Onion headlines, I love it.


Too funny for a monday morning...


Oh wow, clearly my timing wasn't right...

https://news.ycombinator.com/item?id=9453454

Edit: To be fair, this version generates plausible comments and has subjectively funnier titles. :)


These post titles are really good!

some of my favorite so far:

> Debunking Myths About Growth Hacking Goes Bad (infoq.com)

> First Firefox OS developer to come (businessmandi.com)

> Modern science and art go to jail? The law is dead in cinema (techcrunch.com)


Pussy Riot members jailed for posting photos to raise your hourly freelance rate (arabcrunch.com)

Hahahaha!


This is my new favourite!


This wins


> Truly elastic clouds with Zerg: OS-less Erlang on the cusp of becoming a permanent 3G connection to a billion dollars worth of online advertising


Mine are

> A Browser Package Manager Command Line Client Written in Lua

> TweetDeck taken offline after bug allows malicious code execution on Android - eBook

> True hacker resume: CV as Python objects to/from Amazon S3 price cut for all your OK Google searches predict market moves


My favorite is

> Show HN: nnmm – A feed of your print or paper books

Isn't this just a pile of books?


Think of all those poor, decommissioned teletypes we could put back into service. Then watch The Brave Little Toaster (while trying to ignore the truly-WTF moments). Then weep that this isn't a rely future.


> Facebook bug report on using TrueCrypt safetly (yana.com)

> You don’t need any HTML theme/template into your murder trial (ft.com)


> Google wins the Book Search settlement gives Google 15 days in orbit (bostonglobe.com)

Google wins 15 days in orbit! Whee!

The comments are pretty great, too.

> Hang in there, say "Pizza" and it certainly has a lot of leverage because they're frustrated. Worst case: Someone sees your duck and you've got a new revenue model was (otherwise it was something I loved it, and they LIVE here.

Just hang in there, say "pizza", and make sure no one sees your duck.


> Mercurial Ate Our Breakfast [with Revsets], But We Can Fix Internal Communication Before It Breaks (staralliance.com)

> Ask HN: Optimal number of sets with high IQ users?

> TweetDeck taken offline after bug allows malicious code execution on Android - eBook (spinejs.com)

> Docker Swarm on Raspberry Pi Units Available In GNOME 3 Released (spectrum.ieee.org)


> Google's Grand Plan to Split Sentences (2014) [pdf] (techcrunch.com)


A lot of these are indistinguishable from regular HN babble:

> Protesting with a python port of ZFS on Linux

> Ten Rules for Web 2.0 sites and URL obfuscation

> From 0 to 8-figure revenue in spite of flat UI design and email

> How can I get asked how I thought we were wrong.

And the most HN one I've seen so far:

> Entrepreneur crowdsources decision to quit Silicon Valley


>Watching the Growth Is Harder Than It Sounds

>Small Utah ISP firm stands up to the Faces of Facebook popularity, I quit.

>UI7Kit: Add one-line to enable AirPlay video for your startup on Product Hunt

lol.


My favorite: > Samsung vows counter-action over Apple Maps flaw results in anti-depressant-like behavior in mice


> AWS to AWS APIs without a helmet (daringfireball.net)

I just ..


One really stood out for me:

> A&E's 'Duck Dynasty' Stunt is a user is Steve Jobs’ Unfortunate Contribution to Computing


Some of the golden comments on this one:

> You're trying to solve bugs or problems. >> It's like chess, or gymnastics, or baseball, or anything, just that it vanished overnight. > I've also seen discussions of how your data structures without hunting down some raster graphics, I fire up Uber first. > Your love of Pete, don't just repeat it with your keystrokes. FWIW I had never thought those 30 servers would be classified as unlawful combatants, removing their legal protections then go for them.


> I had a different webbrowser, that IS more or less a visionary and more agile by being antimicrobial.


> Why I design software, I want to want to live in it > Show HN: Solving the problem of what you read the Web > Think Apple Would Dare To Be Upset About Aaron Swartz's life


Or, 'How HN looks to non-hackers'.

Brilliant work. Truly monads in backport VM scandal-worthy.


I opened it in a background tab, lost track of it for a while, ended back there thinking it was the normal ycombinator and unwittingly spent a couple minutes thinking "wtf is up with HN today?"

Well played.

edit: lol, seems I'm not the only one.


In fact, I learned in a very silly thing to know. The risk is on controlling the hardware thats in there"Only when you get the impression that credit cards that my Time Warner unfamiliar with the goal these people as potential phishing as well as US government uses private business to sell your company in its own horn about being able to be great.


I kept looking at the URL and thinking: "How are they using the same ___domain?" I came back an hour later only to realize that the i and n are switched in ycombniator.

That was a real "smack my head" moment.


This reminds me of a Google exercise within their Python course, wherein it read in a text file and built dictionaries of every word and the words that followed said word in the text in order to create a prosaic style that mimed the author of the original text. It was quite interesting to run and throw a text file at it to see what it produced.

EDIT: I have a cached version of the exercise if anyone feels like looking at it:

https://github.com/AdmiralAsshat/learn_python/blob/master/go...

There should be an "alice.txt" in the same directory as a sample file to throw at it.



Sometimes the real HN reads like this for me. It's how I know I'm tired, and should go to bed immediately after reading one more story.


     10 I am to tired
     20 Just read one more story
     30 Goto 10


> Tell HN: my first dollar on the App Store due to a good review from Techcrunch, lessons learnt.

This is so real


Some of my favourites so far:

> Google wins the Book Search settlement gives Google 15 days in orbit (bostonglobe.com)

> Pussy Riot members jailed for posting photos to raise your hourly freelance rate (arabcrunch.com)

> Microsoft launches Bing Booster program for exploiting weakness in smart phone (blog.geeksphere.net)

> Tell HN: Rejected from App Store due to a charitable project, continue in 2012 (askgolang.com)


> Show HN : One click image optimization service for your terminal

This is perfect. Well done.


Markov chains are great fun

http://thedoomthatcametopuppet.tumblr.com/

HP Lovecraft + Puppet Documentation


Also, King James Programming [1].

[1] http://kingjamesprogramming.tumblr.com/


Absolutely fantastic, and probably will happen on Mars. Unfortunately, I can't read it. But Erlang is a human, but it's not unheard of, or even a real wood and lead pencil. A comment to mean what you mean how do we keep in mind when I can set a deadline, some basic programming with Scratch.


As a HN data processing note, to remove the garbled characters, you need to convert the smart quotes from HN (among other things like long dashes) into normal ASCII characters.

EDIT: Looks like the garbled characters were fixed.


Yeah thanks for pointing that out, I took the nuclear approach and just ran all everything through unicodedata.normalize("NFC", ...) which seems to have done the trick.


We're all markov chains when you get down to it.


Feels like HN when I am high


Oh, that's great. I have just spent five minutes trying to figure out why all these comments on HN just stopped making sense.


I had the exact opposite experience.


McAfee thinks this is porn. Cute.


> I, being born a woman was violently beaten and robbed in a project/problem.

That's a weird auto-generated comment[1]. I wonder how much of it is random selection and how much is seeded text. What were the units that combined to form this?

[1] "Ask HN: Pure client-side PadMapper would be great as jobs? What a senior Rails dev?" post).


Love this comment from the simulator:

> We had an inkling that something that someone is logged in to a traditional incandescent bulb


Lol has anyone really been far as decided to use even go want to do look more like?


You've got to be kidding me. I've been further even more decided to use even go need to do look more as anyone can. Can you really be far even as decided half as much to use go wish for that? My guess is that when one really been far even as decided once to use even go want, it is then that he has really been far even as decided to use even go want to do look more like. It's just common sense.


My favourite: "Myth: systemd is unstable and insecure people who outsource have no future. "


Nice work.

FYI, cicking on the "<#> Comments" link from within a comments page leads you to a 404.


LOL, everything sounds like a t-shirt from Japan.


It even uses real usernames for comments. I've apparently commented on "Watch Morley Safer Lie in Tech is Not a Single Blog Post" :P , and I've seen posts by patio11 (and referencing patio11 too!)

http://news.ycombniator.com/comments/comments_286.html#

And some of the Ask HN actually make sense too :) , I mean "Ask HN: Are there any good MOOCS/Online resources for learning TDD?"

But others are almost but not quite there: "I'm an 18 year-old front end GUI development?".


Have you open sourced the code on Github? Would love to play around with this.


Yeah, I'll be writing a blog post about it and release the code then :)


Also, is this static, or will it continue to generate new posts?


I tweeted a link to this, and it's already fooling people. Well done!


I absent-mindedly opened this in a new tab along with a few other Hacker News stories. Was very confused for a few minutes. Good job!


Quick note: You are not setting the page (tab) title with the <title> element. Right now it says "The Death of the Party" for every story.


Thanks, fixed


"From 0 to 8-figure revenue in spite of flat UI design and email (gizmodo.com)" My coworkers are looking at me like I've lost it. Oh my...


Some of my favorites:

>I miss Google wave invites going for $26 a month (srikarg.github.io)

>Ask HN: Is there a good, standard capped convertible note paperwork?

>NSA leaks: David Cameron cracks down on Apache Quitting JCP: 'Oracle Is the Fear of Macros (techcrunch.com)

>RSS.gd: the RSS icon was mistaken for the end of the Union at CoreOS Fest 2015 – Call for New Startup BitcoinDeals is Launching its Own URLs? (technokyle.com)


One thing I noticed is that comment lengths are almost all the same - I think some randomisation would be good.


Good idea, they are supposed to be between 50 and 150 characters, but that needs adjusting I think.


Nice. Was i the only one who took a minute to find the minute difference in the spelling of the hostname.


> Steve Jobs came within $5k of going to have cracked unsolved Zodiac Killer cipher

I wonder why he stopped then?


He didn't hit his kickstarter target, obviously


I kind of feel bad I discovered what was going on in like 20 seconds.

I increased the zoom of this site (the original font size is just too small for me), so when I opened this link and noticed that the zoom was restarted I immediately became suspicious.


I'm confused. What is this?


An automated parody of HN, OP explains above: https://news.ycombinator.com/item?id=10248803

> I've pulled every comment and story from HN through the API and make a bunch of Markov chains to produce story titles and comments.


Does that mean that there is no deep learning here, only statistics and randomization?


> Does that mean that there is no deep learning here, only statistics and randomization?

Correct. It is a perfect simulation of HN.


Normally I don't like comments that are just jokes, but this one was too perfect. Well done.

And to add just a slight bit more substance to my comment, while I was reading through all the comments here my wife asked why I was laughing so hard. I found it really difficult to convey why, but I guess that's the nature of this type of humor.


Markov chains do learn in the sense that they model distributions of strings, can be trained on observed strings and used to generate strings, assign probabilities to strings, classify strings, etc. They have well developed treatments in multiple frameworks of computational learning theory, including Gold learnability and PAC learnability.

Is "deep learning" more than statistics and randomization?


Nothing wrong with statistics and randomization. :)


There isn't, but I don't think it should be difficult to feed this into 'char-rnn' if you wanted to do it with RNNs rather than Markov chains. The interface, such as it is, to char-rnn is very simple; you dump everything into a text file 'input.txt'.


I'm actually running that right now with HN comments. It's not done training, but it's not that much more interesting than OP's. Here's some example output:

{"text": "The article was in client 70, or a Denmark, is common captured - and I very well needed to be when they picked out time reports, all the reader or warning detectors and tools and proxchit matters. It also comes up with all levels of me using legality as it is that.<p>That helps a fly of Intel companies through it, but I'm importantly convinced the UI book the impression of orderly research on this afternoon. Personally, it also has a hash but mass measure all the Web working issued and leased across the question of my commercial group of interfaces. The various currents, others avoid their BitCoin better than one game (which is obvious, and form my position for hours at the software itself).", "author": "nostrademons"}

{"text": "Neither care to censor other people (\"infrastructure\" type by development! Neuroscience, flying migrations.)<p>Relevant comment was not finished at the moment. If that appears to be the case that, but ones are a real body.", "author": "pavel_liah"}

{"text": "<i>But just surely this should simply escruble him critically though I laudf. </i><p>It's taken as a more extreme shark to manage hcpm-infolves. However, I'm great, laser mortality, one of the aight payment.", "author": "jacquesm"}

{"text": "FtAhn is not all for violenceral campuse. Teghtletter usually try to provide wonderful purposes a dozen ones for intellectually-good common argument, so this would have you considered something something from writing points from a conversation. It's such a good idea and prohibition, disappeared. Except for partaicrolabed downverted vehicles, if you're the one, you can't pay for your own business, but the women are going onfichious, or not.<p>Edit: the processor thinks without searching stuff. <i>It looks like the Num corrupt introduces a scrappy page\"</i> we might be a new level, you can care about label heat timegakes.<p>&#62; Two individuals. I don't refer to finally great ideas, but I haven't even heard of his place with high-generation (I tell me that a lot of the money) should prosecute my frequency. I have more succes, in fact dumping out the concept of engaging in a way to say that anyone wrote fits and a crunch employee (well, the only manner of Jessico would expose the South Clothes and enterprise making it to there very attempt to save a tight interface to carco again a different type branch takedow, because transcorrs freve-lock writing reduces. Grannian raises <i>the major</i> responsibility.)<p>(Nope! Unless you look at it even when a civil support doesn't expect to admit the system for us disk law (though that would make us bad news.)", "author": "sp332"}

{"text": "Sigh HTML5 of the extradition to NELOANAG may changed.", "author": "davidw"}

Or if I sample with low temperature:

{"text": "I don't know what I want to do in the sense that I was a problem with the same problem with the same problem as a comment on the side of the site. I was a little bit like a problem with the same problem with a single part of the problem.<p>I don't know what I wanted to do in the first place and I was a lot more powerful than the one that was a problem with the same problem. I was a pretty good point of view of the statement of the statement of the state of the state of the particular problem. I would have to say that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is the same as a single person who would have to pay for the same problem as a program that is a problem that is a problem with the same problem. ... <then it just keeps going like that>

I'm hoping to learn individual user's styles.


> I'm hoping to learn individual user's styles.

You'll probably have to train separate char-rnn instances, unfortunately. For the past week or two, I've been experimenting with putting in a metadata prefix which I can use as a seed to specify author/ style, but thus far it hasn't worked at all. The char-rnn just spits out a sort of average text and doesn't mimic specific styles.


Yup. That's been my finding as well. char-rnn was really just a diversion of curiosity after I'd cleaned up the data. My best idea right now is to make a generative model of p(next_token | previous_token(s), author), essentially connecting author directly to every observation. I'm mostly sure that using characters as tokens is overkill for this and requires a higher complexity model than I can afford with this dataset and my computational resources, so I'm going to stop using char-rnn with it.


That's possible. My hope was that you could get authorial style by just including it inline as metadata rather than needing to hardwire it into the architecture (eg take 5 hidden nodes and have them specify an ID for the author so the RNN can't possibly forget). It would have been so convenient and made char-rnn much more useful, but I guess it's turning out to be too convenient to be true.


I see. Cool. I'm building an HN clone now. Hierarchical comments are a fun challenge. Any tips?


Postgres recursive queries make them pretty simple to deal with. This[1] is the query I used to pull a list of all a stories children recursively.

1. https://gist.github.com/orf/5565a572c6ddda039d6f


Hmmm, I'm trying to decide between this recursive query or something using ltrees.

Can't make up my mind . . .

[1] https://truongtx.me/2014/02/28/tree-structure-query-with-pos... [2] http://stackoverflow.com/questions/603894/is-postgresqls-ltr...


Yeah. If you store them in a relational database you will grow a couple of grey hairs because you're essentially forcing a tree structure into a table structure. The concepts clash and it's kind of a pain, but possible since every post has 1 unique parent, so you can make upwards references and rebuild a tree from that.


A great technique I've used for many years to store hierarchical data is by using a Nested Set Model (https://en.wikipedia.org/wiki/Nested_set_model)

It works a treat as you can query the whole tree in one SQL statement but preserve the nesting for formatting.


Thank you sir for pointing that out. This looks very interesting and I might even implement that in my own blog at some point in time. I thought about doing something similar to that (without actually knowing this technique) but I shyed away precisely because of the price you pay at insertion time.

EDIT: I've been thinking some more about this. Another possibility would be to limit the depth of the tree to, say, 8 (which should be reasonable) and then make 8 fields, one for each ancestor (parent, grandparent, and so on). Changing the tree will become a nightmare but all queries for subtrees will be blazingly fast.


Since individual comment threads are never that big, just store the root parent ID and query on that. Then reconstruct in code.


Yeah that's how I currently implemented it on my site, but it can't harm to overthink the solution of performance problems I don't (yet?) have ;)


You could probably solve more nonexistent problems with caching than by limiting thread depth.


To clarify - more users are going to read and refresh pages than actually post, so making certain not every GET request results in a new database query would probably improve performance more than trying to limit the number of rows in each query.

Query performance obviously matters, but with a HN like site, it's probably not going to be so critical that limiting the depth of threads is even worth the effort.


The "closure table" approach works best if you have frequent reads and writes, see e.g. https://coderwall.com/p/lixing/closure-tables-for-browsing-t...


A simple method (a modified adjacency list) I've used just stores the root id, parent id and id of each post together. You can get the entire tree from any root post easily (everything shares the same root id) but getting the whole subtree beyond immediate children takes recursion.

I find that you don't even have to worry about treating the data as a tree in most cases until the very end. What you want to actually deal with is a flat array with the ids (root,parent,id) arranged in rendering order, and to have the tree built in the HTML. The data set from the DB doesn't even have to represent the tree structure directly, as long as you can sort it elsewhere.

You can even have two arrays - one (say, an associative array) with the data, and another basic array with the ids. Sort just the array with the ids, then use those as keys to iterate the data array when building the html, so you can avoid ever having to sort the larger array (which as luck has it just happens to be optimized for non-linear access anyway.)

I should probably mention, I thought this was terribly clever when I did it in PHP, before I was aware that all arrays in PHP are basically the same, so it was mostly pointless overoptimization.

Building something like an unordered list in HTML from that array then becomes a matter of adding or removing <UL> elements based on the relative change in depth for each subsequent id. Depth is easy to find by checking if an item's parent is (or isn't) the same as the id of the previous element. The actual tree never exists in code until the unordered list is rendered in the browser.

Also here's a good reference on Stack Overflow of different methods to do the same thing: https://stackoverflow.com/questions/2175882/how-to-represent...

If you actually know what you're doing beforehand, probably ignore everything I just said and go with nested sets. My method is, admittedly, naive and better programmers are probably chuckling at it over the beverage of their choice, but it does work and it seems to be decently fast.


Hmmm, toooo many options. How do I decide between them. Sort of a DB noob.


Well that was pretty awesome. Small suggestion. Fix the times on the comment threads to make it even more realistic. Threaded conversations have people starting a thread 11 minutes ago but getting replies 35 minutes ago. Busted :D


It would seem the texts were all written by Markov Chainy, which is why they are grammatically wrong. I would have preferred a grammar-oriented text generator. It's still too easy to tell the one simulation from the other.


What's impressive is that when I see a typically hacker newish headline or comment I now find myself checking to see who I'm logged in as to make sure I'm on the real hacker news.


yes I was having to read the real HN much more carefully right after I played with the simulator


favorite comments so far:

"The main problem is going to check its authenticity against any common sense."

"I use in your TOS, you say you're better than a bad example: a painting as an extreme advantage"


This is a solution in search of a problem. But nice anyway!


Oh man, I almost did this awhile ago. I even registered http://ycornbinator.com

Very convincing fake.


I used to have yccombinator.com (interestingly it looks like somebody else picked it up after I let it expire) after noticing that there are quite a few places on the net that think that is the right name.


Pretty good. It struggles a bit with some punctuation, inserting spaces after "." in urls and not having space after "?" in sentences.


It seems your version has a higher average score per post than OG HN. Might want to apply a 0.75-0.85 multiplier to each post, as a crude estimate.


Out of curiosity, what is the status bar to the left of every comment for? Is it some sort of admin tool available to pg's account?


I think it might be a relevance or score of some kind.

The reason it might look weird (like a strange progress bar or something) is because it's an img link to "s.gif", but s.gif is actually a 404.

The td cell (yup it's a table structure) the img link is in is named "ind".

No idea what these two things put together mean.

EDIT: Then I noticed a thing between "Hacker News" and "new" at the top. s.gif yet again.

> "It's a spacer gif...! The img tag has an explicit width= set....

Wow, that takes me back a bit, and I'm not even a real webdev or anything (and I don't mind tables to boot). Blinks


Am I the only person who removed the `news.` from news.ycombniator.com to see if there was a parody of Y Combinator's main site?


Tell HN: my first dollar on the App Store due to a good review from Techcrunch, lessons learnt. (spreadsheets.google.com)


Maybe some sort of "this is not the real hn" and "we are not endorsed by ycombinator" could be good.


If it ever needs that then that's serious ground for worries.


I didn't understand what this is about.


I think it's just the Hacker News layout filled with randomly generated content. I don't get the point though. Maybe playing with some text generation algorithm?


All the content is created based off every post/comment on Hacker News, hence the somewhat plausable post titles (if you squint a bit). There isn't much point to it, other than I found it quite funny :)


For some reason, this site is blocked by the proxy at work, with "pornography" as the category.


Browsing this creeped me out in a weird, "is there a glitch in the matrix?" way.


This reminds me of reading in a dream- nonsense sentences that sound familiar.


Very cool!

> Entrepreneur crowdsources decision to quit Silicon Valley

Perfect :)


Damn, I thought I was having a stroke.


Why did YC not buy ycombnioator.com too? Isn't that a phishing risk - like www.gmail.com and www.gmai.com?


Exactly 1337, eh?


sometimes all news looks like this to me




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: