Hacker News new | past | comments | ask | show | jobs | submit login
I wrote a compression algorithm, hire me?
39 points by yoav on Jan 11, 2012 | hide | past | favorite | 11 comments
https://github.com/YoavGivati/Givati-Compression

A smart and lossless data compression algorithm with a focus on repeating data structures like JSON/JSONH, network and language permeability, decompression without requiring a pre-agreed dictionary, and minimum size over speed. Compression and speed comparable to Lempel–Ziv–Welch (LZW) implemented in Javascript, PHP/Python coming soon.

I'm a software engineer/entrepreneur with design sense. I have 8 years of programming experience and 5+ years of experience building cool interfaces / apps, and consulting for some well known people. Fast learner (I taught myself about data compression, conceived of and wrote this in 3 days). I've been a technical co-founder at a stock related startup and have managed a small team of developers.

html5, node.js, mongo, mysql, php, flex, python, jsp/jstl

I'm building [inkapp.co] as a part of [chalkhq.com] but need funding, or a steady income for a few years at a loving company that wants me on their team.

Anyone hiring in the Toronto-ish, Ontario-ish, Canada-ish area? willing to relocate anywhere for the right gig, 3 and 10 page CV available on request.

Dear hacker news, please help me get hired before the end of January.




I respect the initiative you're showing, but I'm a little skeptical of your choice of presenting your own eponymous compression algorithm as part of a portfolio, especially without any theoretical guarantees or rigorous experimental data on its performance. It gives the impression that you're either not aware of the rich history of data compression algorithms, and the heavy burden of proof on any new algorithm, or you like re-inventing the wheel. Please take this as constructive feedback.


I appreciate it.

The first one is correct, I never thought about data compression at all until 3 days ago where I had a lot of JSON to store. The best Javascript compression implementation I could find by Googleing around was lzw, which due to the way it steps through the data and builds the dictionary is inherently inefficient, never mind that it compresses to an array which can only be stored as far as I know as text delineated by commas which tends to increase size. So I wrote one that analyzes and prioritizes the dictionary before applying compression, and doesn't need a dictionary to be predefined like lzw does, which is necessary for one of my use cases.

I wasn't going for "I'm a compression expert", I was just going for "Look at my latest weekend project" and "here's a code sample."

The decision to name it after me was a hard one, but I figured namespacing my projects would prevent me from using up all the good ones.

I should also emphasize that I have no idea if this particular algorithm already exists somewhere, I've only looked at a few compression schemes. Didn't mean to invent, just wanted to solve my particular JSON problem with client-side compression and get data smaller than I could achieve with existing libraries.


As I said, I appreciate the initiative. At the very least, consider posting benchmarks comparing your scheme to the output of gzip.


Agreed.

Also, be aware that many non-expert attempts to develop compression algorithms turn out to be flawed. However, it sounds like you've developed a compression pre-processor specifically targeting JSON-like structures, which seems like a much more plausible accomplishment than a non-expert developing a brand new general-purpose compression algorithm that can beat LZW.

All that said, if I were considering hiring you I would be much more interested in your ability to rigorously analyze the strengths and limitations of your algorithm (rather than the algorithm itself). Plus, without that rigorous analysis it makes it much more difficult for me to judge the quality of your algorithm.


Will do


contact me at [email protected]. We're hiring in downtown Toronto.


Looks great, I'd love to see an eventual port to PHP. Incidentally, I know this was just a project for the weekend, but why don't you just rely on gzip'ed encoding/deflate?


I wrote an html5/canvas chalkboard the weekend before that lets you replay your drawings and get a link to send them to other people. I don't want to link it here because it'll crash my server if a (hacker news) of people start saving drawings.. anyway it stores the path data as JSON on the server as a text blob to optimize database queries... otherwise it would have to fetch thousands of rows just to get a single drawing. I wanted to reduce storage, but also bandwidth; specifically for uploads which can take a while(a short drawing produced around 1MB of JSON), I also thought it would be cool to add the drawing data directly in the url or send it to someone in an email —hence using a modifiable alphabet for the hashes, or allow unrestricted drawing and save it to html5 localstorage.

I figured I might as well put the ideas I had into code and on github so others could help refine it, and if it turned out not to work that well was planning on attempting to implement gzip in javascript.

Incidentally in my unreliable data specific benchmarks using my algorithm + gzip resulted in smaller files than just gzip, but I'm not going to make that claim without more benchmarks. Hard to say if you can exploit http gzip compression and store the gzipped data on the server, whether the extra time using my algorithm as a pre-compressor would be worth any byte reduction achieved for that use case.

Also the current implementation is focused on compressing ascii characters into fewer ascii characters, it should be possible to implement a version that compresses bits into smaller bits and achieves better compression for use cases where you can compress to binary and ascii isn't a requirement.


You may want to put your contact info in your profile. The email field is only visible to admins.


done, thanks for the heads up.


We're hiring in Montreal. Would love to chat: [email protected]




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: