Show HN: Hacker News Simulator

dang · on Sept 20, 2015

We got an email from someone asking why they were logged in as pg, so you've passed some sort of niche Turing test.

minimaxir · on Sept 20, 2015

Notably, this is what happens most of the time when something from /r/SubredditSimulator gets popular and becomes intermixed with normal submissions on a subscriber's front page and /r/all.

The way to beat a Turing Test is through hiding in plain sight.

rkcr · on Sept 20, 2015

9gag automatically reposts reddit's /r/all onto their site. People on 9gag don't realize it's a bot and have one hell of a time trying to interpret the post. Examples:

1. https://www.reddit.com/r/SubredditSimMeta/comments/3fnkni/9g...

2. https://www.reddit.com/r/SubredditSimMeta/comments/3kwmnj/9g...

_jeans · on Sept 21, 2015

Also this one https://www.reddit.com/r/SubredditSimMeta/comments/3jke2s/re...

taspeotis · on Sept 21, 2015

Shouldn't we worry that someone has been fooled by the ycombniator.com ___domain and lookalike design?

dang · on Sept 21, 2015

It seems pretty harmless.

joshmn · on Sept 21, 2015

I wouldn't exactly call this a look-alike...

hoprocker · on Sept 20, 2015

And pg has 1337 karma! (Is that joke too old now?)

dzhiurgis · on Sept 21, 2015

It is kinda lame. I prefer 31337 as it is a prime number, while 1337 is not.

oxide · on Sept 21, 2015

I would say it is timeless.

namuol · on Sept 21, 2015

It's as timeless as the concept of using Markov Chains to generate a fake Hacker News site.

Source: http://namuol.github.io/phaker-news/

dasboth · on Sept 21, 2015

I would have preferred 8675309 karma.

Uptrenda · on Sept 21, 2015

Leonardo da Vinci used to throw paint filled sponges at walls and then force himself to make sense out of the resulting irregular shapes in relation to a problem he was interested in. For example: he might be thinking about transportation and say "well this looks like a horse drawing a carriage ..." and then use it as a basis to come up with new ideas. By forcing his brain to make connections between totally unrelated things it enhanced his creativity (or at least that was the intention.)

What comes to mind for this Hacker News Simulator is the modern equivalent of da Vinci's sponges: an ink blot which you can use to come up with completely new and novel ideas by forcing unexpected connections. And because the topics here are in some way related to hacker news the result could be filling in enough blanks to produce something newsworthy (i.e. an actual good idea instead of a garbled markov chain.)

Vivtek · on Sept 21, 2015

For a while I ran a Markov chain blog poster that drew on my own notes and posts on a site I ran for a while. A phrase in one of those posts ("Only a matter of seeing simple data structures and designing lightweight tools that can cross the galaxy") actually led to a chain of association that has spawned a five-year research thread. This technique can actually work.

mfisher87 · on Sept 21, 2015

That's really interesting, hadn't heard that before. A big difference here though is that the software is making (the appearance of) very specific logical connections itself -- with the ink sponges, you're creating a vague image that is open to human interpretation.

antsar · on Sept 21, 2015

Some of the software-generated "logical connections" are not quite logical though, and require a little massaging/tweaking (human interpretation) in order to make sense. This process can result in interesting new logical connections.

imrehg · on Sept 21, 2015

Is that da Vinci method a bit like interpreting I Ching hexagrams? Put some randomness in, add some constraints, and interpret it to your current context. Should be a similar creativity enhancer.

mikekchar · on Sept 21, 2015

Kind of off topic, but I have been looking at I Ching hexagrams as design patterns for life. Each hexagram represents a state you can be in. You can transition from one state to any other state, but there are consequences. The consequences are described in the transitions of the moving lines.

To make the most sense of this, it helps to read Richard Wilhelm's translation and annotations of the I Ching. If you don't read German, then Cary Banes's translation of Wilhelm's translation is pretty much the only one out there. Unfortunately still under copyright (ha ha... on 5000 year old document). But it is widely pirated on the web.

I often wonder if the random "predictions" of the I Ching were originally just a way of studying. There are 4096 different transitions to look at, and approaching them systematically would be painful in the extreme.

I've spend some time to see if the transitions actually make sense and from what I can tell, they do (modulus my ability to fool myself into seeing reason in things that have no actual reason). I wish I had enough patience to actually study it ;-)

cmarschner · on Sept 21, 2015

Any LdV biography you could recommend?

skwaugh · on Sept 21, 2015

"The Science of Leonardo" by Fritjof Capra is easy to read and a good intro. Martin Kemp's "Leonardo da Vinci" is a bit scattered but has great visuals. But most interesting is Leonardo's own "Notebooks."

ricardobeat · on Sept 20, 2015

This is perfect. I opened it on a new tab, and went off to read another article. After coming back and opening multiple links I found myself thinking "where are all this posts with gramatical errors coming from" for a solid one minute - until I noticed "pg" where my username should be!

i336_ · on Sept 21, 2015

I, too, did this. >.<

"Wait, how did Hacker News get opened twice?"

...

"Eh, I'll just close this one." (You can guess which one I closed. :P)

[One pile of middle-clicking later]

"Wait, ycombinator is defaulting to opening comments now?!"

I noticed the "ycombniator" shortly after that.

I wish those links were real :(

- http://news.ycombniator.com/comments/comments_198.html: "Git 2.0 changes push default to using only CSS3 - No more remote work work: An adventure in civic hacking (scienceblogs.com)"

- http://news.ycombniator.com/comments/comments_210.html: "Show HN: Farmly – find anyone for anything you want and keep the Olympic germs away. (nowthenapp.com)"

- http://news.ycombniator.com/comments/comments_208.html: "Twitter will pay for anything [Product] (skullsinthestars.com)"

- http://news.ycombniator.com/comments/comments_184.html: "Ask HN: Angry/hardcore rock music to code the summer away, stuck for ideas to practice/exercice a new UI designer/programmer (NYC)"

- http://news.ycombniator.com/comments/comments_190.html: "Is GTK+ the real reason people are startups using? (blog.cto.hiv)"

My responses ranged from "they're doing what to Linux now?" to "that one sounds cool..." to "that's true..." to "...of course I'm clicking that" to to "hm... well they must be going for an Indian motif".

And somewhere in there I was like "wow, /r/titlegore meets HN."

I'm very dense though. This was really well-made! :D

I still wish that 4th link was real. Like, I have a different taste in music, but I'd so reply to that. lol

veddox · on Sept 21, 2015

I did virtually the exact same thing - except my first reaction on seeing the fake HN was: "Why the heck am I logged in as pg? Security hole? This ought to be fun, let's see what kind of things he can do in here..."

Took me a while to figure out that this was actually a fake :-(

caser · on Sept 20, 2015

I had the same experience. LOL'd when I realized I'd clicked on 5-6 links that looked interesting that were actually auto-generated.

bcook · on Sept 20, 2015

Imagine how it will feel at midnight after spending 15 min replying to ... whatever this is.

Hopefully you are real. :)

archimedespi · on Sept 20, 2015

The titles suddenly got a lot better! Less grammatical errors. One of my favorites is "Hosted Continuous Integration Using Gradle, Android Studio And New York Times, Evernote, Gmail, and Quicksilver".

That is a CI solution I'd love to see :)

archimedespi · on Sept 21, 2015

Another hilarious title: "Ask HN: Worst examples of really creative ways to combat late payments". This sounds like an real post on HN, and actually makes sense.

Or maybe "Swift Language Will Instantly Know Everything About the Origin of 'The World's Dumbest Idea': Milton Friedman".

atuladhar · on Sept 20, 2015

*fewer grammatical errors (sorry)

drivingmenuts · on Sept 21, 2015

Newsbot thanks you for proving that you are real.

You may return to your life simulation.

joonoro · on Sept 21, 2015

"Steve Jobs was right: Dropbox is not working on KVM/bare metal"

hudell · on Sept 23, 2015

"I miss Google wave invites going for $26 a month" I forgot I was on the simulator and was interested in reading this.

SergeyHack · on Sept 21, 2015

I like this one "Why Android Will Soon Become Apple’s Most Important Thing I’ve Ever Done."

archimedespi · on Sept 21, 2015

I think "Google to announce proof of Ikea's excellence" wins.

iokanuon · on Sept 20, 2015

Was it anyhow influenced by https://reddit.com/r/subredditsimulator?

orf · on Sept 20, 2015

Yep, it uses the same technique. I've pulled every comment and story from HN through the API and make a bunch of Markov chains to produce story titles and comments. The input corpus is a lot smaller than subredditsimulator though and the various subreddits have a large variation in the words used (meaning more funny comments), so it's no way near as good.

Perhaps a future enhancement might be to partition comments by the submission ___domain, i.e use only comments for github.com stories for fake github submissions, might generate very different text than a munge of every comment.

jedberg · on Sept 20, 2015

You could also use the comments on reddit in /r/programming and /r/startups and other related reddits to help get more data for seeding your corpus.

Or if you want to get more complex, find the reddit comments for every link that was submitted to HN and use that (but you have to be careful that you use "hacker" related reddits or it will sound too "reddity")

captn3m0 · on Sept 20, 2015

Maybe aggregate over the subreddits listing (where HN links appear) and then whitelist some of them (after checking against a comment frequency count) like programming, linux etc.

DanBC · on Sept 21, 2015

I would love love love it if it could provide comments on a per-user basis. Use my comments for my name, use JohnDoe's comments for their name.

As well as keeping comments by submitted ___domain.

wilg · on Sept 20, 2015

I made a similar thing a year or two based off of another Markov Chain of HN headlines. I generalized it to allow you to mashup headlines from different news sources (Buzzfeed x Hacker News, for example). Still pretty funny, if anyone's interested: http://www.headlinesmasher.com/best/all

toothbrush · on Sept 21, 2015

Scandal: Politician Goes to Work

LOL

78666cdc · on Sept 21, 2015

These headlines are phenomenal.

archimedespi · on Sept 21, 2015

I wish I could upvote this post multiple times.

Natsu · on Sept 21, 2015

Create some stories that support the... err... premise of those titles and I think you could automate Buzzfeed and such.

epaga · on Sept 21, 2015

I lost it at "Nintendo steps into porn biz"

oxide · on Sept 21, 2015

some of these could be Onion headlines, I love it.

NumberCruncher · on Sept 21, 2015

Too funny for a monday morning...

namuol · on Sept 21, 2015

Oh wow, clearly my timing wasn't right...

https://news.ycombinator.com/item?id=9453454

Edit: To be fair, this version generates plausible comments and has subjectively funnier titles. :)

time_is_scary · on Sept 20, 2015

These post titles are really good!

some of my favorite so far:

> Debunking Myths About Growth Hacking Goes Bad (infoq.com)

> First Firefox OS developer to come (businessmandi.com)

> Modern science and art go to jail? The law is dead in cinema (techcrunch.com)

rsp1984 · on Sept 20, 2015

Pussy Riot members jailed for posting photos to raise your hourly freelance rate (arabcrunch.com)

Hahahaha!

orf · on Sept 20, 2015

This is my new favourite!

time_is_scary · on Sept 20, 2015

This wins

strangecasts · on Sept 20, 2015

> Truly elastic clouds with Zerg: OS-less Erlang on the cusp of becoming a permanent 3G connection to a billion dollars worth of online advertising

orf · on Sept 20, 2015

Mine are

> A Browser Package Manager Command Line Client Written in Lua

> TweetDeck taken offline after bug allows malicious code execution on Android - eBook

> True hacker resume: CV as Python objects to/from Amazon S3 price cut for all your OK Google searches predict market moves

bendykstra · on Sept 20, 2015

My favorite is

> Show HN: nnmm – A feed of your print or paper books

Isn't this just a pile of books?

kbenson · on Sept 21, 2015

Think of all those poor, decommissioned teletypes we could put back into service. Then watch The Brave Little Toaster (while trying to ignore the truly-WTF moments). Then weep that this isn't a rely future.

MrBra · on Sept 20, 2015

> Facebook bug report on using TrueCrypt safetly (yana.com)

> You don’t need any HTML theme/template into your murder trial (ft.com)

mratzloff · on Sept 20, 2015

> Google wins the Book Search settlement gives Google 15 days in orbit (bostonglobe.com)

Google wins 15 days in orbit! Whee!

The comments are pretty great, too.

> Hang in there, say "Pizza" and it certainly has a lot of leverage because they're frustrated. Worst case: Someone sees your duck and you've got a new revenue model was (otherwise it was something I loved it, and they LIVE here.

Just hang in there, say "pizza", and make sure no one sees your duck.

daxelrod · on Sept 20, 2015

> Mercurial Ate Our Breakfast [with Revsets], But We Can Fix Internal Communication Before It Breaks (staralliance.com)

> Ask HN: Optimal number of sets with high IQ users?

> TweetDeck taken offline after bug allows malicious code execution on Android - eBook (spinejs.com)

> Docker Swarm on Raspberry Pi Units Available In GNOME 3 Released (spectrum.ieee.org)

JetSpiegel · on Sept 20, 2015

> Google's Grand Plan to Split Sentences (2014) [pdf] (techcrunch.com)

ubertaco · on Sept 21, 2015

A lot of these are indistinguishable from regular HN babble:

> Protesting with a python port of ZFS on Linux

> Ten Rules for Web 2.0 sites and URL obfuscation

> From 0 to 8-figure revenue in spite of flat UI design and email

> How can I get asked how I thought we were wrong.

And the most HN one I've seen so far:

> Entrepreneur crowdsources decision to quit Silicon Valley

nothis · on Sept 20, 2015

>Watching the Growth Is Harder Than It Sounds

>Small Utah ISP firm stands up to the Faces of Facebook popularity, I quit.

>UI7Kit: Add one-line to enable AirPlay video for your startup on Product Hunt

lol.

hudell · on Sept 21, 2015

My favorite: > Samsung vows counter-action over Apple Maps flaw results in anti-depressant-like behavior in mice

mahmud · on Sept 21, 2015

> AWS to AWS APIs without a helmet (daringfireball.net)

I just ..

snake117 · on Sept 20, 2015

One really stood out for me:

> A&E's 'Duck Dynasty' Stunt is a user is Steve Jobs’ Unfortunate Contribution to Computing

hoprocker · on Sept 20, 2015

Some of the golden comments on this one:

> You're trying to solve bugs or problems. >> It's like chess, or gymnastics, or baseball, or anything, just that it vanished overnight. > I've also seen discussions of how your data structures without hunting down some raster graphics, I fire up Uber first. > Your love of Pete, don't just repeat it with your keystrokes. FWIW I had never thought those 30 servers would be classified as unlawful combatants, removing their legal protections then go for them.

johtso · on Sept 20, 2015

> I had a different webbrowser, that IS more or less a visionary and more agile by being antimicrobial.

xerophyte12932 · on Sept 21, 2015

> Why I design software, I want to want to live in it > Show HN: Solving the problem of what you read the Web > Think Apple Would Dare To Be Upset About Aaron Swartz's life

luxpir · on Sept 20, 2015

Or, 'How HN looks to non-hackers'.

Brilliant work. Truly monads in backport VM scandal-worthy.

erlend_sh · on Sept 20, 2015

I opened it in a background tab, lost track of it for a while, ended back there thinking it was the normal ycombinator and unwittingly spent a couple minutes thinking "wtf is up with HN today?"

Well played.

edit: lol, seems I'm not the only one.

felipebueno · on Sept 20, 2015

In fact, I learned in a very silly thing to know. The risk is on controlling the hardware thats in there"Only when you get the impression that credit cards that my Time Warner unfamiliar with the goal these people as potential phishing as well as US government uses private business to sell your company in its own horn about being able to be great.

snake117 · on Sept 20, 2015

I kept looking at the URL and thinking: "How are they using the same ___domain?" I came back an hour later only to realize that the i and n are switched in ycombniator.

That was a real "smack my head" moment.

AdmiralAsshat · on Sept 20, 2015

This reminds me of a Google exercise within their Python course, wherein it read in a text file and built dictionaries of every word and the words that followed said word in the text in order to create a prosaic style that mimed the author of the original text. It was quite interesting to run and throw a text file at it to see what it produced.

EDIT: I have a cached version of the exercise if anyone feels like looking at it:

https://github.com/AdmiralAsshat/learn_python/blob/master/go...

There should be an "alice.txt" in the same directory as a sample file to throw at it.

comex · on Sept 20, 2015

See also https://en.wikipedia.org/wiki/Markov_chain#Markov_text_gener...

robgough · on Sept 20, 2015

Sometimes the real HN reads like this for me. It's how I know I'm tired, and should go to bed immediately after reading one more story.

luke_s · on Sept 20, 2015

     10 I am to tired
     20 Just read one more story
     30 Goto 10

arthurcolle · on Sept 21, 2015

> Tell HN: my first dollar on the App Store due to a good review from Techcrunch, lessons learnt.

This is so real

fao_ · on Sept 20, 2015

Some of my favourites so far:

> Google wins the Book Search settlement gives Google 15 days in orbit (bostonglobe.com)

> Pussy Riot members jailed for posting photos to raise your hourly freelance rate (arabcrunch.com)

> Microsoft launches Bing Booster program for exploiting weakness in smart phone (blog.geeksphere.net)

> Tell HN: Rejected from App Store due to a charitable project, continue in 2012 (askgolang.com)

pdknsk · on Sept 21, 2015

> Show HN : One click image optimization service for your terminal

This is perfect. Well done.

cthalupa · on Sept 20, 2015

Markov chains are great fun

http://thedoomthatcametopuppet.tumblr.com/

HP Lovecraft + Puppet Documentation

lifthrasiir · on Sept 21, 2015

Also, King James Programming [1].

[1] http://kingjamesprogramming.tumblr.com/

turing_bot_3c · on Sept 21, 2015

Absolutely fantastic, and probably will happen on Mars. Unfortunately, I can't read it. But Erlang is a human, but it's not unheard of, or even a real wood and lead pencil. A comment to mean what you mean how do we keep in mind when I can set a deadline, some basic programming with Scratch.

minimaxir · on Sept 20, 2015

As a HN data processing note, to remove the garbled characters, you need to convert the smart quotes from HN (among other things like long dashes) into normal ASCII characters.

EDIT: Looks like the garbled characters were fixed.

orf · on Sept 20, 2015

Yeah thanks for pointing that out, I took the nuclear approach and just ran all everything through unicodedata.normalize("NFC", ...) which seems to have done the trick.

clamprecht · on Sept 20, 2015

We're all markov chains when you get down to it.

bluesign · on Sept 20, 2015

Feels like HN when I am high

rdancer · on Sept 20, 2015

Oh, that's great. I have just spent five minutes trying to figure out why all these comments on HN just stopped making sense.

fredkbloggs · on Sept 21, 2015

I had the exact opposite experience.

fowl2 · on Sept 20, 2015

McAfee thinks this is porn. Cute.

xerophyte12932 · on Sept 21, 2015

> I, being born a woman was violently beaten and robbed in a project/problem.

That's a weird auto-generated comment[1]. I wonder how much of it is random selection and how much is seeded text. What were the units that combined to form this?

[1] "Ask HN: Pure client-side PadMapper would be great as jobs? What a senior Rails dev?" post).

closetnerd · on Sept 20, 2015

Love this comment from the simulator:

> We had an inkling that something that someone is logged in to a traditional incandescent bulb

mjklin · on Sept 20, 2015

Lol has anyone really been far as decided to use even go want to do look more like?

rdancer · on Sept 20, 2015

You've got to be kidding me. I've been further even more decided to use even go need to do look more as anyone can. Can you really be far even as decided half as much to use go wish for that? My guess is that when one really been far even as decided once to use even go want, it is then that he has really been far even as decided to use even go want to do look more like. It's just common sense.

zxexz · on Sept 21, 2015

My favourite: "Myth: systemd is unstable and insecure people who outsource have no future. "

abrichr · on Sept 20, 2015

Nice work.

FYI, cicking on the "<#> Comments" link from within a comments page leads you to a 404.

nadsumatal · on Sept 21, 2015

LOL, everything sounds like a t-shirt from Japan.

GFischer · on Sept 22, 2015

It even uses real usernames for comments. I've apparently commented on "Watch Morley Safer Lie in Tech is Not a Single Blog Post" :P , and I've seen posts by patio11 (and referencing patio11 too!)

http://news.ycombniator.com/comments/comments_286.html#

And some of the Ask HN actually make sense too :) , I mean "Ask HN: Are there any good MOOCS/Online resources for learning TDD?"

But others are almost but not quite there: "I'm an 18 year-old front end GUI development?".

caser · on Sept 20, 2015

Have you open sourced the code on Github? Would love to play around with this.

orf · on Sept 21, 2015

Yeah, I'll be writing a blog post about it and release the code then :)

caser · on Sept 20, 2015

Also, is this static, or will it continue to generate new posts?

steveklabnik · on Sept 20, 2015

I tweeted a link to this, and it's already fooling people. Well done!

georgebonnr · on Sept 20, 2015

I absent-mindedly opened this in a new tab along with a few other Hacker News stories. Was very confused for a few minutes. Good job!

0942v8653 · on Sept 20, 2015

Quick note: You are not setting the page (tab) title with the <title> element. Right now it says "The Death of the Party" for every story.

orf · on Sept 20, 2015

Thanks, fixed

livatlantis · on Sept 21, 2015

"From 0 to 8-figure revenue in spite of flat UI design and email (gizmodo.com)" My coworkers are looking at me like I've lost it. Oh my...

caffeinewriter · on Sept 21, 2015

Some of my favorites:

>I miss Google wave invites going for $26 a month (srikarg.github.io)

>Ask HN: Is there a good, standard capped convertible note paperwork?

>NSA leaks: David Cameron cracks down on Apache Quitting JCP: 'Oracle Is the Fear of Macros (techcrunch.com)

>RSS.gd: the RSS icon was mistaken for the end of the Union at CoreOS Fest 2015 – Call for New Startup BitcoinDeals is Launching its Own URLs? (technokyle.com)

Retr0spectrum · on Sept 20, 2015

One thing I noticed is that comment lengths are almost all the same - I think some randomisation would be good.

orf · on Sept 20, 2015

Good idea, they are supposed to be between 50 and 150 characters, but that needs adjusting I think.

prat0318 · on Sept 20, 2015

Nice. Was i the only one who took a minute to find the minute difference in the spelling of the hostname.

BetaCygni · on Sept 21, 2015

> Steve Jobs came within $5k of going to have cracked unsolved Zodiac Killer cipher

I wonder why he stopped then?

orf · on Sept 21, 2015

He didn't hit his kickstarter target, obviously

r3bl · on Sept 20, 2015

I kind of feel bad I discovered what was going on in like 20 seconds.

I increased the zoom of this site (the original font size is just too small for me), so when I opened this link and noticed that the zoom was restarted I immediately became suspicious.

Kinnard · on Sept 20, 2015

I'm confused. What is this?

detaro · on Sept 20, 2015

An automated parody of HN, OP explains above: https://news.ycombinator.com/item?id=10248803

> I've pulled every comment and story from HN through the API and make a bunch of Markov chains to produce story titles and comments.

tajen · on Sept 20, 2015

Does that mean that there is no deep learning here, only statistics and randomization?

jacques_chester · on Sept 20, 2015

> Does that mean that there is no deep learning here, only statistics and randomization?

Correct. It is a perfect simulation of HN.

nrivadeneira · on Sept 21, 2015

Normally I don't like comments that are just jokes, but this one was too perfect. Well done.

And to add just a slight bit more substance to my comment, while I was reading through all the comments here my wife asked why I was laughing so hard. I found it really difficult to convey why, but I guess that's the nature of this type of humor.

6502nerdface · on Sept 20, 2015

Markov chains do learn in the sense that they model distributions of strings, can be trained on observed strings and used to generate strings, assign probabilities to strings, classify strings, etc. They have well developed treatments in multiple frameworks of computational learning theory, including Gold learnability and PAC learnability.

Is "deep learning" more than statistics and randomization?

minimaxir · on Sept 20, 2015

Nothing wrong with statistics and randomization. :)

gwern · on Sept 20, 2015

There isn't, but I don't think it should be difficult to feed this into 'char-rnn' if you wanted to do it with RNNs rather than Markov chains. The interface, such as it is, to char-rnn is very simple; you dump everything into a text file 'input.txt'.

imh · on Sept 21, 2015

I'm actually running that right now with HN comments. It's not done training, but it's not that much more interesting than OP's. Here's some example output:

{"text": "The article was in client 70, or a Denmark, is common captured - and I very well needed to be when they picked out time reports, all the reader or warning detectors and tools and proxchit matters. It also comes up with all levels of me using legality as it is that.That helps a fly of Intel companies through it, but I'm importantly convinced the UI book the impression of orderly research on this afternoon. Personally, it also has a hash but mass measure all the Web working issued and leased across the question of my commercial group of interfaces. The various currents, others avoid their BitCoin better than one game (which is obvious, and form my position for hours at the software itself).", "author": "nostrademons"}

{"text": "Neither care to censor other people (\"infrastructure\" type by development! Neuroscience, flying migrations.)Relevant comment was not finished at the moment. If that appears to be the case that, but ones are a real body.", "author": "pavel_liah"}

{"text": "But just surely this should simply escruble him critically though I laudf. It's taken as a more extreme shark to manage hcpm-infolves. However, I'm great, laser mortality, one of the aight payment.", "author": "jacquesm"}

{"text": "FtAhn is not all for violenceral campuse. Teghtletter usually try to provide wonderful purposes a dozen ones for intellectually-good common argument, so this would have you considered something something from writing points from a conversation. It's such a good idea and prohibition, disappeared. Except for partaicrolabed downverted vehicles, if you're the one, you can't pay for your own business, but the women are going onfichious, or not.Edit: the processor thinks without searching stuff. It looks like the Num corrupt introduces a scrappy page\" we might be a new level, you can care about label heat timegakes.> Two individuals. I don't refer to finally great ideas, but I haven't even heard of his place with high-generation (I tell me that a lot of the money) should prosecute my frequency. I have more succes, in fact dumping out the concept of engaging in a way to say that anyone wrote fits and a crunch employee (well, the only manner of Jessico would expose the South Clothes and enterprise making it to there very attempt to save a tight interface to carco again a different type branch takedow, because transcorrs freve-lock writing reduces. Grannian raises the major responsibility.)(Nope! Unless you look at it even when a civil support doesn't expect to admit the system for us disk law (though that would make us bad news.)", "author": "sp332"}

{"text": "Sigh HTML5 of the extradition to NELOANAG may changed.", "author": "davidw"}

Or if I sample with low temperature:

{"text": "I don't know what I want to do in the sense that I was a problem with the same problem with the same problem as a comment on the side of the site. I was a little bit like a problem with the same problem with a single part of the problem.I don't know what I wanted to do in the first place and I was a lot more powerful than the one that was a problem with the same problem. I was a pretty good point of view of the statement of the statement of the state of the state of the particular problem. I would have to say that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is that the problem is the same as a single person who would have to pay for the same problem as a program that is a problem that is a problem with the same problem. ... <then it just keeps going like that>

I'm hoping to learn individual user's styles.

gwern · on Sept 21, 2015

> I'm hoping to learn individual user's styles.

You'll probably have to train separate char-rnn instances, unfortunately. For the past week or two, I've been experimenting with putting in a metadata prefix which I can use as a seed to specify author/ style, but thus far it hasn't worked at all. The char-rnn just spits out a sort of average text and doesn't mimic specific styles.

imh · on Sept 21, 2015

Yup. That's been my finding as well. char-rnn was really just a diversion of curiosity after I'd cleaned up the data. My best idea right now is to make a generative model of p(next_token | previous_token(s), author), essentially connecting author directly to every observation. I'm mostly sure that using characters as tokens is overkill for this and requires a higher complexity model than I can afford with this dataset and my computational resources, so I'm going to stop using char-rnn with it.

gwern · on Sept 23, 2015

That's possible. My hope was that you could get authorial style by just including it inline as metadata rather than needing to hardwire it into the architecture (eg take 5 hidden nodes and have them specify an ID for the author so the RNN can't possibly forget). It would have been so convenient and made char-rnn much more useful, but I guess it's turning out to be too convenient to be true.

Kinnard · on Sept 20, 2015

I see. Cool. I'm building an HN clone now. Hierarchical comments are a fun challenge. Any tips?

orf · on Sept 20, 2015

Postgres recursive queries make them pretty simple to deal with. This[1] is the query I used to pull a list of all a stories children recursively.

1. https://gist.github.com/orf/5565a572c6ddda039d6f

Kinnard · on Sept 20, 2015

Hmmm, I'm trying to decide between this recursive query or something using ltrees.

Can't make up my mind . . .

[1] https://truongtx.me/2014/02/28/tree-structure-query-with-pos... [2] http://stackoverflow.com/questions/603894/is-postgresqls-ltr...

Kenji · on Sept 20, 2015

Yeah. If you store them in a relational database you will grow a couple of grey hairs because you're essentially forcing a tree structure into a table structure. The concepts clash and it's kind of a pain, but possible since every post has 1 unique parent, so you can make upwards references and rebuild a tree from that.

martin-adams · on Sept 20, 2015

A great technique I've used for many years to store hierarchical data is by using a Nested Set Model (https://en.wikipedia.org/wiki/Nested_set_model)

It works a treat as you can query the whole tree in one SQL statement but preserve the nesting for formatting.

Kenji · on Sept 20, 2015

Thank you sir for pointing that out. This looks very interesting and I might even implement that in my own blog at some point in time. I thought about doing something similar to that (without actually knowing this technique) but I shyed away precisely because of the price you pay at insertion time.

EDIT: I've been thinking some more about this. Another possibility would be to limit the depth of the tree to, say, 8 (which should be reasonable) and then make 8 fields, one for each ancestor (parent, grandparent, and so on). Changing the tree will become a nightmare but all queries for subtrees will be blazingly fast.

MichaelGG · on Sept 20, 2015

Since individual comment threads are never that big, just store the root parent ID and query on that. Then reconstruct in code.

Kenji · on Sept 21, 2015

Yeah that's how I currently implemented it on my site, but it can't harm to overthink the solution of performance problems I don't (yet?) have ;)

krapp · on Sept 21, 2015

You could probably solve more nonexistent problems with caching than by limiting thread depth.

krapp · on Sept 21, 2015

To clarify - more users are going to read and refresh pages than actually post, so making certain not every GET request results in a new database query would probably improve performance more than trying to limit the number of rows in each query.

Query performance obviously matters, but with a HN like site, it's probably not going to be so critical that limiting the depth of threads is even worth the effort.

toupeira · on Sept 20, 2015

The "closure table" approach works best if you have frequent reads and writes, see e.g. https://coderwall.com/p/lixing/closure-tables-for-browsing-t...

krapp · on Sept 20, 2015

A simple method (a modified adjacency list) I've used just stores the root id, parent id and id of each post together. You can get the entire tree from any root post easily (everything shares the same root id) but getting the whole subtree beyond immediate children takes recursion.

I find that you don't even have to worry about treating the data as a tree in most cases until the very end. What you want to actually deal with is a flat array with the ids (root,parent,id) arranged in rendering order, and to have the tree built in the HTML. The data set from the DB doesn't even have to represent the tree structure directly, as long as you can sort it elsewhere.

You can even have two arrays - one (say, an associative array) with the data, and another basic array with the ids. Sort just the array with the ids, then use those as keys to iterate the data array when building the html, so you can avoid ever having to sort the larger array (which as luck has it just happens to be optimized for non-linear access anyway.)

I should probably mention, I thought this was terribly clever when I did it in PHP, before I was aware that all arrays in PHP are basically the same, so it was mostly pointless overoptimization.

Building something like an unordered list in HTML from that array then becomes a matter of adding or removing <UL> elements based on the relative change in depth for each subsequent id. Depth is easy to find by checking if an item's parent is (or isn't) the same as the id of the previous element. The actual tree never exists in code until the unordered list is rendered in the browser.

Also here's a good reference on Stack Overflow of different methods to do the same thing: https://stackoverflow.com/questions/2175882/how-to-represent...

If you actually know what you're doing beforehand, probably ignore everything I just said and go with nested sets. My method is, admittedly, naive and better programmers are probably chuckling at it over the beverage of their choice, but it does work and it seems to be decently fast.

Kinnard · on Sept 20, 2015

Hmmm, toooo many options. How do I decide between them. Sort of a DB noob.

nstart · on Sept 21, 2015

Well that was pretty awesome. Small suggestion. Fix the times on the comment threads to make it even more realistic. Threaded conversations have people starting a thread 11 minutes ago but getting replies 35 minutes ago. Busted :D

tempodox · on Sept 21, 2015

It would seem the texts were all written by Markov Chainy, which is why they are grammatically wrong. I would have preferred a grammar-oriented text generator. It's still too easy to tell the one simulation from the other.

blazespin · on Sept 20, 2015

What's impressive is that when I see a typically hacker newish headline or comment I now find myself checking to see who I'm logged in as to make sure I'm on the real hacker news.

nni · on Sept 20, 2015

yes I was having to read the real HN much more carefully right after I played with the simulator

shkkmo · on Sept 20, 2015

favorite comments so far:

"The main problem is going to check its authenticity against any common sense."

"I use in your TOS, you say you're better than a bad example: a painting as an extreme advantage"

amelius · on Sept 20, 2015

This is a solution in search of a problem. But nice anyway!

ltbarcly3 · on Sept 21, 2015

Oh man, I almost did this awhile ago. I even registered http://ycornbinator.com

Very convincing fake.

jaredsohn · on Sept 21, 2015

I used to have yccombinator.com (interestingly it looks like somebody else picked it up after I let it expire) after noticing that there are quite a few places on the net that think that is the right name.

CuttlefishXXX · on Sept 20, 2015

Pretty good. It struggles a bit with some punctuation, inserting spaces after "." in urls and not having space after "?" in sentences.

PascLeRasc · on Sept 21, 2015

It seems your version has a higher average score per post than OG HN. Might want to apply a 0.75-0.85 multiplier to each post, as a crude estimate.

tstyle · on Sept 21, 2015

Out of curiosity, what is the status bar to the left of every comment for? Is it some sort of admin tool available to pg's account?

i336_ · on Sept 21, 2015

I think it might be a relevance or score of some kind.

The reason it might look weird (like a strange progress bar or something) is because it's an img link to "s.gif", but s.gif is actually a 404.

The td cell (yup it's a table structure) the img link is in is named "ind".

No idea what these two things put together mean.

EDIT: Then I noticed a thing between "Hacker News" and "new" at the top. s.gif yet again.

> "It's a spacer gif...! The img tag has an explicit width= set....

Wow, that takes me back a bit, and I'm not even a real webdev or anything (and I don't mind tables to boot). Blinks

archimedespi · on Sept 21, 2015

Am I the only person who removed the `news.` from news.ycombniator.com to see if there was a parody of Y Combinator's main site?

intruder · on Sept 21, 2015

Tell HN: my first dollar on the App Store due to a good review from Techcrunch, lessons learnt. (spreadsheets.google.com)

mpitt · on Sept 20, 2015

Maybe some sort of "this is not the real hn" and "we are not endorsed by ycombinator" could be good.

jacquesm · on Sept 20, 2015

If it ever needs that then that's serious ground for worries.

sidcool · on Sept 21, 2015

I didn't understand what this is about.

rhizome31 · on Sept 21, 2015

I think it's just the Hacker News layout filled with randomly generated content. I don't get the point though. Maybe playing with some text generation algorithm?

orf · on Sept 21, 2015

All the content is created based off every post/comment on Hacker News, hence the somewhat plausable post titles (if you squint a bit). There isn't much point to it, other than I found it quite funny :)

organsnyder · on Sept 21, 2015

For some reason, this site is blocked by the proxy at work, with "pornography" as the category.

GBond · on Sept 22, 2015

Browsing this creeped me out in a weird, "is there a glitch in the matrix?" way.

_sbxe · on Sept 21, 2015

This reminds me of reading in a dream- nonsense sentences that sound familiar.

jmkni · on Sept 21, 2015

Very cool!

> Entrepreneur crowdsources decision to quit Silicon Valley

Perfect :)

KuhlMensch · on Sept 21, 2015

Damn, I thought I was having a stroke.

veddox · on Sept 21, 2015

Why did YC not buy ycombnioator.com too? Isn't that a phishing risk - like www.gmail.com and www.gmai.com?

gamesbrainiac · on Sept 20, 2015

Exactly 1337, eh?

kristopolous · on Sept 21, 2015

sometimes all news looks like this to me