on my home page as a joke. Browsers at the time didn’t like that, they basically froze, sometimes taking the client system down with them.
Later on, browsers started to check for actual content I think, and would abort such requests.
bobmcnamara 2 hours ago [-]
I made a 64kx64k JPEG once by feeding the encoder the same line of macro blocks until it produce the entire image.
Years later I was finally able to open it.
ack_complete 3 minutes ago [-]
I once encoded an entire TV OP into a multi-megabyte animated cursor (.ani) file.
Surprisingly, Windows 95 didn't die trying to load it, but quite a lot of operations in the system took noticeably longer than they normally did.
opan 1 hours ago [-]
I had a ton of trouble opening a 10MB or so png a few weeks back. It was stitched together screenshots forming a map of some areas in a game, so it was quite large. Some stuff refused to open it at all as if the file was invalid, some would hang for minutes, some opened blurry. My first semi-success was Fossify Gallery on my phone from F-Droid. If I let it chug a bit, it'd show a blurry image, a while longer it'd focus. Then I'd try to zoom or pan and it'd blur for ages again. I guess it was aggressively lazy-loading. What worked in the end was GIMP. I had the thought that the image was probably made in an editor, so surely an editor could open it. The catch is that it took like 8GB of RAM, but then I could see clearly, zoom, and pan all I wanted. It made me wonder why there's not an image viewer that's just the viewer part of GIMP or something.
Among things that didn't work were qutebrowser, icecat, nsxiv, feh, imv, mpv. I did worry at first the file was corrupt, I was redownloading it, comparing hashes with a friend, etc. Makes for an interesting benchmark, I guess.
I'd say just curl/wget it, don't expect it to load in a browser.
beeslol 25 minutes ago [-]
For what it's worth, this loaded (slowly) in Firefox on Windows for me (but zooming was blurry), and the default Photos viewer opened it no problem with smooth zooming and panning.
bugfix 35 minutes ago [-]
IrfanView was able to load it in about 8 seconds (Ryzen 7 5800x) using 2.8GB of RAM, but zooming/panning is quite slow (~500ms per action)
Scaevolus 57 minutes ago [-]
That's a 36,000x20,000 PNG, 720 megapixels. Many decoders explicitly limit the maximum image area they'll handle, under the reasonable assumption that it will exceed available RAM and take too long, and assume the file was crafted maliciously or by mistake.
quickaccount 45 minutes ago [-]
Safari on my MacBook Air opened it fine, though it took about four seconds. Zooming works fine as well. It does take ~3GB of memory according to Activity Monitor.
koolba 4 hours ago [-]
I hope you weren’t paying for bandwidth by the KiB.
santoshalper 3 hours ago [-]
Nah, back then we paid for bandwidth by the kb.
slicktux 2 hours ago [-]
That’s even worse! :)
m463 3 hours ago [-]
Sounds like the favicon.ico that would crash the browser.
"On 21 September 1997, the USS Yorktown halted for almost three hours during training maneuvers off the coast of Cape Charles, Virginia due to a divide-by-zero error in a database application that propagated throughout the ship’s control systems."
" technician tried to digitally calibrate and reset the fuel valve by entering a 0 value for one of the valve’s component properties into the SMCS Remote Database Manager (RDM)"
astolarz 2 hours ago [-]
Bad bot
fuzztester 1 hours ago [-]
I remember reading about that some years ago. It involved Windows NT.
Though, bots may not support modern compression standards. Then again, that may be a good way to block bots: every modern browser supports zstd, so just force that on non-whitelisted browser agents and you automatically confuse scrapers.
kevin_thibedeau 2 hours ago [-]
If you nest the gzip inside another gzip it gets even smaller since the blocks of compressed '0' data are themselves low entropy in the first generation gzip. Nested zst reduces the 10G file to 99 bytes.
bilekas 5 hours ago [-]
> At my old employer, a bot discovered a wordpress vulnerability and inserted a malicious script into our server
I know it's slightly off topic, but it's just so amusing (edit: reassuring) to know I'm not the only one who, after 1 hour of setting up Wordpress there's a PHP shell magically deployed on my server.
protocolture 2 hours ago [-]
>Take over a wordpress site for a customer
>Oh look 3 separate php shells with random strings as a name
Never less than 3, but always guaranteed.
ianlevesque 4 hours ago [-]
Yes, never self host Wordpress if you value your sanity. Even if it’s not the first hour it will eventually happen when you forget a patch.
sunaookami 4 hours ago [-]
Hosting WordPress myself for 13 years now and have no problem :) Just follow standard security practices and don't install gazillion plugins.
carlosjobim 4 hours ago [-]
There's a lot of essential functionality missing from WordPress, meaning you have to install plugins. Depending on what you need to do.
But it's such a bad platform that there really isn't any reason for anybody to use WordPress for anything. No matter your use case, there will be a better alternative to WordPress.
aaronbaugher 4 hours ago [-]
Can you recommend an alternative for a non-technical organization, where there's someone who needs to be able to edit pages and upload documents on a regular basis, so they need as user-friendly an interface as possible for that? Especially when they don't have a budget for it, and you're helping them out as a favor? It's so easy to spin up Wordpress for them, but I'm not a fan either.
I've tried Drupal in the past for such situations, but it was too complicated for them. That was years ago, so maybe it's better now.
> Can you recommend an alternative for a non-technical organization, where there's someone who needs to be able to edit pages and upload documents on a regular basis, so they need as user-friendly an interface as possible for that
25 years ago we used Microsoft Frontpage for that, with the web root mapped to a file share that the non-technical secretary could write to and edit it as if it were a word processor.
Somehow I feel we have regressed from that simplicity, with nothing but hand waving to make up for it. This method was declared "obsolete" and ... Wordpress kludges took its place as somehow "better". Someone prove me wrong.
bigfatkitten 2 hours ago [-]
A previous workplace of mine did the same with Netscape (and later, Mozilla) Composer. Users could modify content via WebDAV.
13_9_7_7_5_18 3 hours ago [-]
[dead]
2 hours ago [-]
shakna 4 hours ago [-]
I've had some luck using Decap for that. An initial dev setup, followed by almost never needing support from the PR team running it.
Yes I can. There's an excellent and stable solution called SurrealCMS, made by an indie developer. You connect it by FTP to any traditional web design (HTML+CSS+JS), and the users get a WYSIWYG editor where the published output looks exactly as it looked when editing. It's dirt cheap at $9 per month.
Edit: I actually feel a bit sorry for the SurrealCMS developer. He has a fantastic product that should be an industry standard, but it's fairly unknown.
willyt 4 hours ago [-]
Static site with Jekyll?
socalgal2 3 hours ago [-]
Jekyll and other static site generators do not repo Wordpress any more than notepad repos MSWord
In one, multiple users can login, edit WYSIWYG, preview, add images, etc, all from one UI. You can access it from any browser including smart phones and tablets.
In the other, you get to instruct users on git, how to deal with merge conflicts, code review (two people can't easily work on a post like they can in wordpress), previews require a manual build, you need a local checkout and local build installation to do the build. There no WYSIWYG, adding images is a manual process of copying a file, figuring out the URL, etc... No smartphone/tablet support. etc....
I switched by blog from wordpress install to a static site geneator because I got tired of having to keep it up to date but my posting dropped because of friction of posting went way up. I could no longer post from a phone. I couldn't easily add images. I had to build to preview. And had to submit via git commits and pushes. All of that meant what was easy became tedious.
pettycashstash2 2 hours ago [-]
what are your favorite static site generators? I googled it and cloudflare article came up with Jekyll,Gatsby,Hugo,Next.js, Eleventy. But would like to avoid doing research if can be helped on pros/cons of each.
justusthane 1 hours ago [-]
I don’t have much experience with other SSGs, but I’ve been using Eleventy for my personal site for a few years and I’m a big fan. It’s very simple to get started with, it’s fast to build, it’s powerful and flexible.
I build mine with GitHub Actions and host it free on Pages.
beeburrt 2 hours ago [-]
Jekyll and GitHub pages go together pretty well.
wincy 4 hours ago [-]
I do custom web dev so am way out of the website hosting game. What are good frameworks now if I want to say, light touch help someone who is slightly technical set up a website? Not full react SPA with an API.
carlosjobim 3 hours ago [-]
By the sound of your question I will guess you want to make a website for a small or medium sized organization? jQuery is probably the only "framework" you should need.
If they are selling anything on their website, it's probably going to be through a cloud hosted third party service and then it's just an embedded iframe on their website.
If you're making an entire web shop for a very large enterprise or something of similar magnitude, then you have to ask somebody else than me.
felbane 3 hours ago [-]
Does anyone actually still use jQuery?
Everything I've built in the past like 5 years has been almost entirely pure ES6 with some helpers like jsviews.
karaterobot 3 hours ago [-]
jQuery's still the third most used web framework, behind React and before NextJS. If you use jQuery to build Wordpress websites, you'd be specializing in popular web technologies in the year 2025.
Sure, why not? It's lightweight and works well, and there's a lot of good solutions that you can find already made for you online.
arcfour 3 hours ago [-]
Never use that junk if you value your sanity, I think you mean.
dx4100 2 hours ago [-]
There's ways that prevent it -
- Freeze all code after an update through permissions
- Don't make most directories writeable
- Don't allow file uploads, or limit file uploads to media
There's a few plugins that do this, but vanilla WP is dangerous.
colechristensen 2 hours ago [-]
>after 1 hour
I've used this teaching folks devops, here deploy your first hello world nginx server... huh what are those strange requests in the log?
ChuckMcM 5 hours ago [-]
I sort of did this with ssh where I figured out how to crash an ssh client that was trying to guess the root password. What I got for my trouble was a number of script kiddies ddosing my poor little server. I switched to just identifying 'bad actors' who are clearly trying to do bad things and just banning their IP with firewall rules. That's becoming more challenging with IPV6 though.
Edit: And for folks who write their own web pages, you can always create zip bombs that are links on a web page that don't show up for humans (white text on white background with no highlight on hover/click anchors). Bots download those things to have a look (so do crawlers and AI scrapers)
grishka 3 minutes ago [-]
> you can always create zip bombs that are links on a web page that don't show up for humans
I did a version of this with my form for requesting an account on my fediverse server. The problem I was having is that there exist these very unsophisticated bots that crawl the web and submit their very unsophisticated spam into every form they see that looks like it might publish it somewhere.
First I added a simple captcha with distorted characters. This did stop many of the bots, but not all of them. Then, after reading the server log, I noticed that they only make three requests in a rapid succession: the page that contains the form, the captcha image, and then the POST request with the form data. They don't load neither the CSS nor the JS.
So I added several more fields to the form and hid them with CSS. Submitting anything in these fields will fail the request and ban your session. I also modified the captcha, I made the image itself a CSS background, and made the src point to a transparent image instead.
And just like that, spam has completely stopped, while real users noticed nothing.
1970-01-01 5 hours ago [-]
Why is it harder to firewall them with IPv6? I seems this would be the easier of the two to firewall.
carlhjerpe 3 hours ago [-]
Manual banning is about the same since you just book /56 or bigger, entire providers or countries.
Automated banning is harder, you'd probably want a heuristic system and look up info on IPs.
IPv4 with NAT means you can "overban" too.
malfist 1 hours ago [-]
Why wouldn't something like fail2ban not work here? That's what it's built for and has been around for eons.
firesteelrain 5 hours ago [-]
I think they are suggesting the range of IPs to block is too high?
CBLT 4 hours ago [-]
Allow -> Tarpit -> Block should be done by ASN
carlhjerpe 3 hours ago [-]
You probably want to check how many ips/blocks a provider announces before blocking the entire thing.
It's also not a common metric you can filter on in open firewalls since you must lookup and maintain a cache of IP to ASN, which has to be evicted and updated as blocks still move around.
echoangle 5 hours ago [-]
Maybe it’s easier to circumvent because getting a new IPv6 address is easier than with IPv4?
j_walter 5 hours ago [-]
Check this out if you want to stop this behavior...
These links do show up for humans who might be using text browsers, (perhaps) screen readers, bookmarklets that list the links on a page, etc.
ChuckMcM 3 hours ago [-]
true, but you can make the link text 'do not click this' or 'not a real link' to let them know. I'm not sure if crawlers have started using LLMs to check pages or not which would be a problem.
marcusb 4 hours ago [-]
Zip bombs are fun. I discovered a vulnerability in a security product once where it wouldn’t properly scan a file for malware if the file was or contained a zip archive greater than a certain size.
The practical effect of this was you could place a zip bomb in an office xml document and this product would pass the ooxml file through even if it contained easily identifiable malware.
secfirstmd 4 hours ago [-]
Eh I got news for ya.
The file size problem is still an issue for many big name EDRs.
marcusb 3 hours ago [-]
Undoubtedly. If you go poking around most any security product (the product I was referring to was not in the EDR space,) you'll see these sorts of issues all over the place.
kazinator 5 hours ago [-]
I deployed this, instead of my usual honeypot script.
It's not working very well.
In the web server log, I can see that the bots are not downloading the whole ten megabyte poison pill.
They are cutting off at various lengths. I haven't seen anything fetch more than around 1.5 Mb of it so far.
Or is it working? Are they decoding it on the fly as a stream, and then crashing? E.g. if something is recorded as having read 1.5 Mb, could it have decoded it to 1.5 Gb in RAM, on the fly, and crashed?
There is no way to tell.
arctek 32 minutes ago [-]
Perhaps need to semi-randomize the file size?
I'm guessing some of the bots have a hard limit to the size of the resource they will download.
Many of these are annoying LLM training/scraping bots (in my case anyway).
So while it might not crash them if you spit out a 800KB zipbomb, at least it will waste computing resources on their end.
MoonGhost 4 hours ago [-]
Try content labyrinth. I.e. infinitely generated content with a bunch of references to other generated pages. It may help against simple wget and till bots adapt.
PS: I'm on the bots side, but don't mind helping.
palijer 1 hours ago [-]
This doesn't work if you pay bandwidth and CPU usage for your servers though.
bugfix 29 minutes ago [-]
Wouldn't this just waste your own bandwidth/resources?
unnouinceput 4 hours ago [-]
Do they comeback? If so then they detect it and avoid it. If not then they crashed and mission accomplished.
kazinator 3 hours ago [-]
I currently cannot tell without making a little configuration change, because as soon as an IP address is logged as having visited the trap URL (honeypot, or zipbomb or whatever), a log monitoring script bans that client.
Secondly, I know that most of these bots do not come back. The attacks do not reuse addresses against the same server in order to evade almost any conceivable filter rule that is predicated on a prior visit.
KTibow 5 hours ago [-]
It's worth noting that this is a gzip bomb (acts just like a normal compressed webpage), not a classical zip file that uses nested zips to knock out antiviruses.
wewewedxfgdf 5 hours ago [-]
I protected uploads on one of my applications by creating fixed size temporary disk partitions of like 10MB each and unzipping to those contains the fallout if someone uploads something too big.
warkdarrior 5 hours ago [-]
`unzip -p | head -c 10MB`
sidewndr46 5 hours ago [-]
What? You partitioned a disk rather than just not decompressing some comically large file?
2048 yottabyte Zip Bomb
This zip bomb uses overlapping files and recursion to achieve 7 layers with 256 files each, with the last being a 32GB file.
It is only 266 KB on disk.
When you realise it's a zip bomb it's already too late. Looking at the file size doesn't betray its contents. Maybe applying some heuristics with ClamAV? But even then it's not guaranteed. I think a small partition to isolate decompression is actually really smart. Wonder if we can achieve the same with overlays.
sidewndr46 5 hours ago [-]
What are you talking about? You get a compressed file. You start decompressing it. When the amount of bytes you've written exceeds some threshold (say 5 megabytes) just stop decompressing, discard the output so far & delete the original file. That is it.
tremon 5 hours ago [-]
That assumes they're using a stream decompressor library and are feeding that stream manually. Solutions that write the received file to $TMP and just run an external tool (or, say, use sendfile()) don't have the option to abort after N decompressed bytes.
overfeed 4 hours ago [-]
> Solutions that write the received file to $TMP and just run an external tool (or, say, use sendfile()) don't have the option to abort after N decompressed bytes
cgroups with hard-limits will let the external tool's process crash without taking down the script or system along with it.
pessimizer 1 hours ago [-]
> cgroups with hard-limits
This is exactly the same idea as partitioning, though.
AndrewStephens 2 hours ago [-]
I worked on a commercial HTTP proxy that scanned compressed files. Back then we would start to decompress a file but keep track of the compression ratio. I forget what the cutoff was but as soon as we saw a ratio over a certain threshold we would just mark the file as malicious and block it.
maxbond 3 hours ago [-]
That is exactly what OP is doing, they've just implemented it at the operating system/file system level.
gruez 5 hours ago [-]
Depending on the language/library that might not always be possible. For instance python's zip library only provides an extract function, without a way to hook into the decompression process, or limit how much can be written out. Sure, you can probably fork the library to add in the checks yourself, but from a maintainability perspective it might be less work to do with the partition solution.
banana_giraffe 3 hours ago [-]
It also provides an open function for the files in a zip file. I see no reason something like this won't bail after a small limit:
import zipfile
with zipfile.ZipFile("zipbomb.zip") as zip:
for name in zip.namelist():
print("working on " + name)
left = 1000000
with open("dest_" + name, "wb") as fdest, zip.open(name) as fsrc:
while True:
block = fsrc.read(1000)
if len(block) == 0:
break
fdest.write(block)
left -= len(block)
if left <= 0:
print("too much data!")
break
gchamonlive 5 hours ago [-]
Those files are designed to exhaust the system resources before you can even do these kinds of checks. I'm not particularly familiar with the ins and outs of compression algorithms, but it's intuitively not strange for me to have a a zip that is carefully crafted so that memory and CPU goes out the window before any check can be done. Maybe someone with more experience can give mode details.
I'm sure though that if it was as simples as that we wouldn't even have a name for it.
crazygringo 3 hours ago [-]
Not really. It really is that simple. It's just dictionary decompression, and it's just halting it at some limit.
It's just nobody usually implements a limit during decompression because people aren't usually giving you zip bombs. And sometimes you really do want to decompress ginormous files, so limits aren't built in by default.
Your given language might not make it easy to do, but you should pretty much always be able to hack something together using file streams. It's just an extra step is all.
kulahan 3 hours ago [-]
Isn’t this basically a question about the halting problem? Whatever arbitrary cutoff you chose might not work for all.
kam 2 hours ago [-]
No, compression formats are not Turing-complete. You control the code interpreting the compressed stream and allocating the memory, writing the output, etc. based on what it sees there and can simply choose to return an error after writing N bytes.
Rohansi 2 hours ago [-]
Not really. It's easy to abort after exceeding a number of uncompressed bytes or files written. The problem is the typical software for handling these files does not implement restrictions to prevent this.
kccqzy 5 hours ago [-]
Seems like a good and simple strategy to me. No real partition needed; tmpfs is cheap on Linux. Maybe OP is using tools that do not easily allow tracking the number of uncompressed bytes.
wewewedxfgdf 4 hours ago [-]
Yes I'd rather deal with a simple out of disk space error than perform some acrobatics to "safely" unzip a potential zip bomb.
Also zip bombs are not comically large until you unzip them.
Also you can just unpack any sort of compressed file format without giving any thought to whether you are handling it safely.
monster_truck 3 hours ago [-]
I do something similar using a script I've cobbled together over the years. Once a year I'll check the 404 logs and add the most popular paths trying to exploit something (ie ancient phpmyadmin vulns) to the shitlist. Requesting 3 of those URLs adds that host to a greylist that only accepts requests to a very limited set of legitimate paths.
fracus 3 hours ago [-]
I'm curious why a 10GB file of all zeroes would compress only to 10MB. I mean theoretically you could compress it to one byte. I suppose the compression happens on a stream of data instead of analyzing the whole, but I'd assume it would still do better than 10MB.
philsnow 3 hours ago [-]
A compressed file that is only one byte long can only represent maximally 256 different uncompressed files.
Signed, a kid in the 90s who downloaded some "wavelet compression" program from a BBS because it promised to compress all his WaReZ even more so he could then fit moar on his disk. He ran the compressor and hey golly that 500MB ISO fit into only 10MB of disk now! He found out later (after a defrag) that the "compressor" was just hiding data in unused disk sectors and storing references to them. He then learned about Shannon entropy from comp.compression.research and was enlightened.
david422 59 minutes ago [-]
> He found out later (after a defrag) that the "compressor" was just hiding data in unused disk sectors and storing references to them
So you could access the files until you wrote more data to disk?
2 hours ago [-]
marcusf 2 hours ago [-]
man, a comment that brings back memories. you and me both.
suid 6 minutes ago [-]
Good question. The "ultimate zip bomb" looks something like https://github.com/iamtraction/ZOD - this produces the infamous "42.zip" file, which is about 42KiB, but expands to 3.99 PiB (!).
There's literally no machine on Earth today that can deal with that (as a single file, I mean).
tom_ 2 hours ago [-]
It has to cater for any possible input. Even with special case handling for this particular (generally uncommon) case of vast runs of the same value: the compressed data will probably be packetized somehow, and each packet can reproduce only so many repeats, so you'll need to repeat each packet enough times to reproduce the output. With 10 GB, it mounts up.
I tried this on my computer with a couple of other tools, after creating a file full of 0s as per the article.
gzip -9 turns it into 10,436,266 bytes in approx 1 minute.
xz -9 turns it into 1,568,052 bytes in approx 4 minutes.
bzip2 -9 turns it into 7,506 (!) bytes in approx 5 minutes.
I think OP should consider getting bzip2 on the case. 2 TBytes of 0s should compress nicely. And I'm long overdue an upgrade to my laptop... you probably won't be waiting long for the result on anything modern.
rtkwe 3 hours ago [-]
It'd have to be more than one byte. There's the central directory, zip header, local header then the file itself you need to also tell it how many zeros to make when decompressing the actual file but most compression algorithms don't work like that because they're designed for actual files not essentially blank files so you get larger than the absolute minimum compression.
malfist 1 hours ago [-]
I mean, if I make a new compression algorithm that says a 10GB file of zeros is represented with a single specific byte, that would technically be compression.
All depends on how much magic you want to shove into an "algorithm"
kulahan 3 hours ago [-]
There probably aren’t any perfectly lossless compression algorithms, I guess? Nothing would ever be all zeroes, so it might not be an edge case accounted for or something? I have no idea, just pulling at strings. Maybe someone smarter can jump in here.
mr_toad 2 hours ago [-]
No lossless algorithm can compress all strings; some will end up larger. This is a consequence of the pigeonhole principle.
ugurs 3 hours ago [-]
It requires at leadt few bytes, there is no way to represent 10GB of data in 8 bits.
msm_ 1 hours ago [-]
But of course there is. Imagine the following compression scheme:
0-253: output the input byte
254 followed by 0: output 254
254 followed by 1: output 255
255: output 10GB of zeroes
Of course this is an artificial example, but theoretically it's perfectly sound. In fact, I think you could get there with static huffman trees supported by some formats, including gzip.
dagi3d 3 hours ago [-]
I get your point(and have no idea why it isn't compressed more), but is the theoretical value of 1 byte correct? With just one single byte, how does it know how big should the file be after being decompressed?
hxtk 58 minutes ago [-]
In general, this theoretical problem is called the Kolmogorov Complexity of a string: the size of the smallest program that outputs a the input string, for some definition of "program", e.g., an initial input tape for a given universal turing machine. Unfortunately, Kolmogorov Complexity in general is incomputable, because of the halting problem.
But a gzip decompressor is not turing-complete, and there are no gzip streams that will expand to infinitely large outputs, so it is theoretically possible to find the pseudo-Kolmogorov-Complexity of a string for a given decompressor program by the following algorithm:
Let file.bin be a file containing the input byte sequence.
1. BOUNDS=$(gzip --best -c file.bin | wc -c)
2. LENGTH=1
3. If LENGTH==BOUNDS, run `gzip --best -o test.bin.gz file.bin` and HALT.
4. Generate a file `test.bin.gz` LENGTH bytes long containing all zero bits.
5. Run `gunzip -k test.bin.gz`.
6. If `test.bin` equals `file.bin`, halt.
7. If `test.bin.gz` contains only 1 bits, increment LENGTH and GOTO 3.
8. Replace test.bin.gz with its lexicographic successor by interpreting it as a LENGTH-byte unsigned integer and incrementing it by 1.
9. GOTO 5.
test.bin.gz contains your minimal gzip encoding.
There are "stronger" compressors for popular compression libraries like zlib that outperform the "best" options available, but none of them are this exhaustive because you can surely see how the problem rapidly becomes intractable.
For the purposes of generating an efficient zip bomb, though, it doesn't really matter what the exact contents of the output file are. If your goal is simply to get the best compression ratio, you could enumerate all possible files with that algorithm (up to the bounds established by compressing all zeroes to reach your target decompressed size, which makes a good starting point) and then just check for a decompressed length that meets or exceeds the target size.
I think I'll do that. I'll leave it running for a couple days and see if I can generate a neat zip bomb that beats compressing a stream of zeroes. I'm expecting the answer is "no, the search space is far too large."
kulahan 3 hours ago [-]
It’s a zip bomb, so does the creator care? I just mean from a practical standpoint - overflows and crashes would be a fine result.
Like, a legitimate crawler suing you and alleging that you broke something of theirs?
thayne 4 hours ago [-]
Disclosure: IANAL
The CFAA[1] prohibits:
> knowingly causes the transmission of a program, information, code, or command, and as a result of such conduct, intentionally causes damage without authorization, to a protected computer;
As far as I can tell (again, IANAL) there isn't an exception if you believe said computer is actively attempting to abuse your system[2]. I'm not sure if a zip bomb would constitute intentional damage, but it is at least close enough to the line that I wouldn't feel comfortable risking it.
[2]: And of course, you might make a mistake and incorrectly serve this to legitimate traffic.
jedberg 3 hours ago [-]
I don't believe the client counts as a protected computer because they initiated the connection. Also a protected computer is a very specific definition that involves banking and/or commerce and/or the government.
thayne 2 hours ago [-]
Part B of the definition of "protected computer" says:
> which is used in or affecting interstate or foreign commerce or communication, including a computer located outside the United States that is used in a manner that affects interstate or foreign commerce or communication of the United States
Assuming the server is running in the states, I think that would apply unless the client is in the same state as the server, in which case there is probably similar state law that comes into affect. I don't see anything there that excludes a client, and that makes sense, because otherwise it wouldn't prohibit having a site that tricks people into downloading malware.
sinuhe69 55 minutes ago [-]
There is IMO no legal use case for an external computer system to initiate a connection with my system without prior legal agreement. It all happens on good will and therefore can be terminated at any time.
sinuhe69 56 minutes ago [-]
There is IMO no legal use case for an external computer system to initiate a connection with my system without prior legal agreement. It all happens on good will.
brudgers 4 hours ago [-]
Though anyone can sue anyone, not doing X is the simplest thing that might avoid being sued for doing X.
But if it matters pay your lawyer and if it doesn’t matter, it doesn’t matter.
bilekas 5 hours ago [-]
Please, just as a conversational piece, walk me through the potentials you might think there are ?
I'll play the side of the defender and you can play the "bot"/bot deployer.
echoangle 4 hours ago [-]
Well creating a bot is not per se illegal, so assuming the maliciousness-detector on the server isn’t perfect, it could serve the zip bomb to a legitimate bot. And I don’t think it’s crazy that serving zip bombs with the stated intent to sabotage the client would be illegal. But I’m not a lawyer, of course.
bilekas 4 hours ago [-]
Disclosure, I'm not a lawyer either. This is all hypothetical high level discussion here.
> it could serve the zip bomb to a legitimate bot.
Can you define the difference between a legitimate bot, and a non legitimate bot for me ?
The OP didn't mention it, but if we can assume they have SOME form of robots.txt (safe assumtion given their history), would those bots who ignored the robots be considered legitimate/non-legitimate ?
Almost final question, and I know we're not lawyers here, but is there any precedent in case law or anywhere, which defines a 'bad bot' in the eyes of the law ?
Final final question, as a bot, do you believe you have a right or a privilege to scrape a website ?
brudgers 4 hours ago [-]
Anyone can sue anyone for anything and the side with the most money is most likely to prevail.
pessimizer 1 hours ago [-]
Mantrapping is a fairly good analogy, and that's very illegal. If the person reading your gas meter gets caught in your mantrap, you're going to prison. You're probably going to prison if somebody burglarizing you gets caught in your mantrap.
Of course their computers will live, but if you accidentally take down your own ISP or maybe some third-party service that you use for something, I'd think they would sue you.
bauruine 5 hours ago [-]
>User-agent: *
>Disallow: /zipbomb.html
Legitimate crawlers would skip it this way only scum ignores robots.txt
echoangle 5 hours ago [-]
I’m not sure that’s enough, robots.txt isn’t really legally binding so if the zip bomb somehow would be illegal, guarding it behind a robots.txt rule probably wouldn’t make it fine.
boricj 4 hours ago [-]
> robots.txt isn’t really legally binding
Neither is the HTTP specification. Nothing is stopping you from running a Gopher server on TCP port 80, should you get into trouble if it happens to crash a particular crawler?
Making a HTTP request on a random server is like uttering a sentence to a random person in a city: some can be helpful, some may tell you to piss off and some might shank you. If you don't like the latter, then maybe don't go around screaming nonsense loudly to strangers in an unmarked area.
lcnPylGDnU4H9OF 4 hours ago [-]
Has any similar case been tried? I'd think that a judge learning the intent of robots.txt and disallow rules is fairly likely to be sympathetic. Seems like it could go either way, I mean. (Jury is probably more a crap-shoot.)
thephyber 4 hours ago [-]
Who, running a crawler which violates robots.txt, is going to prosecute/sue the server owner?
The server owner can make an easy case to the jury that it is a booby trap to defend against trespassers.
crazygringo 3 hours ago [-]
> For the most part, when they do, I never hear from them again. Why? Well, that's because they crash right after ingesting the file.
I would have figured the process/server would restart, and restart with your specific URL since that was the last one not completed.
What makes the bots avoid this site in the future? Are they really smart enough to hard-code a rule to check for crashes and avoid those sites in the future?
fdr 3 hours ago [-]
Seems like an exponential backoff rule would do the job: I'm sure crashes happen for all sorts of reasons, some of which are bugs in the bot, even on non-adversarial input.
sgc 5 hours ago [-]
I am ignorant as to how most bots work. Could you have a second line of defense for bots that avoid this bomb: Dynamically generate a file from /dev/random and trickle stream it to them, or would they just keep spawning parallel requests? They would never finish streaming it, and presumably give up at some point. The idea would be to make it more difficult for them to detect it was never going to be valid content.
jerf 5 hours ago [-]
You want to consider the ratio of your resource consumption to their resource consumption. If you trickle bytes from /dev/random, you are holding open a TCP connection with some minimal overhead, and that's about what they are doing too. Let's assume they are bright enough to use any of the many modern languages or frameworks that can easily handle 10K/100K connections or more on a modern system. They aren't all that bright but certainly some are. You're basically consuming your resources to their resources 1:1. That's not a winning scenario for you.
The gzip bomb means you serve 10MB but they try to consume vast quantities of RAM on their end and likely crash. Much better ratio.
3np 5 hours ago [-]
Also might open up a new DoS vector on entropy consumed by /dev/random so it can be worse than 1:1.
sgc 5 hours ago [-]
That's clear. It all comes down to their behavior. Will they sit there waiting to finish this download, or just start sending other requests in parallel until you dos yourself? My hope is they would flag the site as low-value and go looking elsewhere, on another site.
For HTTP/1.1 you could send a "chunked" response. Chunked responses are intended to allow the server to start sending dynamically generated content immediately instead of waiting for the generation process to finish before sending. You could just continue to send chunks until the client gives up or crashes.
The idea is to trickle it very slowly, like keeping a cat occupied with a ball of fluff in the corner.
uniqueuid 5 hours ago [-]
Cats also have timeouts set for balls of fluff. They usually get bored at some point and either go away or attack you :)
stavros 1 hours ago [-]
Yes but you still need to keep a connection open to them. This is a sort of reverse SlowLoris attack, though.
jeroenhd 3 hours ago [-]
If the bot is connecting over IPv4, you only have a couple thousand connections before your server starts needing to mess with shared sockets and other annoying connectivity tricks.
I don't think it's a terrible problem to solve these days, especially if you use one of the tarpitting implementations that use nftables/iptables/eBPF, but if you have one of those annoying Chinese bot farms with thousands of IP addresses hitting your server in turn (Huawei likes to do this), you may need to think twice before deploying this solution.
CydeWeys 5 hours ago [-]
Yeah but in the mean time it's tying up a connection on your webserver.
uniqueuid 5 hours ago [-]
Practically all standard libraries have timeouts set for such requests, unless you are explicitly offering streams which they would skip.
zzo38computer 1 days ago [-]
I also had the idea of zip bomb to confuse badly behaved scrapers (and I have mentioned it before to some other people, although I did not implemented it). However, maybe instead of 0x00, you might use a different byte value.
I had other ideas too, but I don't know how well some of them will work (they might depend on what bots they are).
ycombinatrix 17 hours ago [-]
The different byte values likely won't compress as well as all 0s unless they are a repeating pattern of blocks.
An alternative might be to use Brotli which has a static dictionary. Maybe that can be used to achieve a high compression ratio.
zzo38computer 7 hours ago [-]
I meant that all of the byte values would be the same (so they would still be repeating), but a different value than zero. However, Brotli could be another idea if the client supports it.
manmal 3 hours ago [-]
> Before I tell you how to create a zip bomb, I do have to warn you that you can potentially crash and destroy your own device
Surely, the device does crash but it isn’t destroyed?
4 hours ago [-]
mahi_novice 5 hours ago [-]
Do you mind sharing your specs of your digital ocean droplet? I'm trying to setup one with less cost.
foxfired 4 hours ago [-]
The blog runs on a $6 digital ocean droplet. It's 1GB RAM and 25GB storage. There is a link at the end of the article on how it handles typical HN traffic. Currently at 5% CPU.
Wouldn't it be cheaper to use Cloudflare than task a human to obsessively watch webserver logs on a box lacking proper filtering?
_QrE 5 hours ago [-]
There's a lot of creative ideas out there for banning and/or harassing bots. There's tarpits, infinite labyrinths, proof of work || regular challenges, honeypots etc.
Most of the bots I've come across are fairly dumb however, and those are pretty easy to detect & block. I usually use CrowdSec (https://www.crowdsec.net/), and with it you also get to ban the IPs that misbehave on all the other servers that use it before they come to yours. I've also tried turnstile for web pages (https://www.cloudflare.com/application-services/products/tur...) and it seems to work, though I imagine most such products would, as again most bots tend to be fairly dumb.
I'd personally hesitate to do something like serving a zip bomb since it would probably cost the bot farm(s) less than it would cost me, and just banning the IP I feel would serve me better than trying to play with it, especially if I know it's misbehaving.
Edit: Of course, the author could state that the satisfaction of seeing an IP 'go quiet' for a bit is priceless - no arguing against that
d--b 5 hours ago [-]
Zip libraries aren’t bomb proof yet? Seems fairly easy to detect and ignore, no?
java-man 1 days ago [-]
I think it's a good idea, but it must be coupled with robots.txt.
forinti 4 hours ago [-]
I was looking through my logs yesterday.
Bad bots don't even read robots.txt.
cratermoon 1 days ago [-]
AI scraper bots don't respect robots.txt
jsheard 1 days ago [-]
I think that's the point, you'd use robots.txt to direct Googlebot/Bingbot/etc away from countermeasures that could potentially mess up your SEO. If other bots ignore the signpost clearly saying not to enter the tarpit, that's their own stupid fault.
reverendsteveii 5 hours ago [-]
The ones that survive do
cynicalsecurity 5 hours ago [-]
This topic comes up from time to time and I'm surprised no one yet mentioned the usual fearmongering rhetoric of zip bombs being potentially illegal.
I'm not a lawyer, but I'm yet to see a real life court case of a bot owner suing a company or an individual for responding to his malicious request with a zip bomb. The usual spiel goes like this: responding to his malicious request with a malicious response makes you a cybercriminal and allows him (the real cybercriminal) to sue you. Again, except of cheap talk I've never heard of a single court case like this. But I can easily imagine them trying to blackmail someone with such cheap threats.
I cannot imagine a big company like Microsoft or Apple using zip bombs, but I fail to see why zip bombs would be considered bad in any way. Anyone with an experience of dealing with malicious bots knows the frustration and the amount of time and money they steal from businesses or individuals.
os2warpman 3 hours ago [-]
Anyone can sue anyone else for any reason.
This is what trips me up:
>On my server, I've added a middleware that checks if the current request is malicious or not.
There's a lot of trust placed in:
>if (ipIsBlackListed() || isMalicious()) {
Can someone assigned a previously blacklisted IP or someone who uses a tool to archive the website that mimics a bot be served malware? Is the middleware good enough or "good enough so far"?
Close enough to 100% of my internet traffic flows through a VPN. I have been blacklisted by various services upon connecting to a VPN or switching servers on multiple occasions.
1 days ago [-]
harrison_clarke 5 hours ago [-]
it'd be cool to have a proof of work protocol baked into http. like, a header that browsers understood
codingdave 1 days ago [-]
Mildly amusing, but it seems like this is thinking that two wrongs make a right, so let us serve malware instead of using a WAF or some other existing solution to the bot problem.
imiric 5 hours ago [-]
The web is overrun by malicious actors without any sense of morality. Since playing by the rules is clearly not working, I'm in favor of doing anything in my power to waste their resources. I would go a step further and try to corrupt their devices so that they're unable to continue their abuse, but since that would require considerably more effort from my part, a zip bomb is a good low-effort solution.
bsimpson 5 hours ago [-]
There's no ethical ambiguity about serving garbage to malicious traffic.
They made the request. Respond accordingly.
petercooper 5 hours ago [-]
Based on the example in the post, that thinking might need to be extended to "someone happening to be using a blocklisted IP." I don't serve up zip bombs, but I've blocklisted many abusive bots using VPN IPs over the years which have then impeded legitimate users of the same VPNs.
joezydeco 5 hours ago [-]
This is William Gibson's "black ICE" becoming real, and I love it.
At least, not with the default rules. I read that discussion a few days ago and was surprised how few callouts there were that a WAF is just a part of the infrastructure - it is the rules that people are actually complaining about. I think the problem is that so many apps run on AWS and their default WAF rules have some silly content filtering. And their "security baseline" says that you have to use a WAF and include their default rules, so security teams lock down on those rules without any real thought put into whether or not they make sense for any given scenario.
chmod775 1 days ago [-]
Truly one my favorite thought-terminating proverbs.
"Hurting people is wrong, so you should not defend yourself when attacked."
"Imprisoning people is wrong, so we should not imprison thieves."
Also the modern telling of Robin Hood seems to be pretty generally celebrated.
Two wrongs may not make a right, but often enough a smaller wrong is the best recourse we have to avert a greater wrong.
The spirit of the proverb is referring to wrongs which are unrelated to one another, especially when using one to excuse another.
5 hours ago [-]
cantrecallmypwd 1 hours ago [-]
> "Hurting people is wrong, so you should not defend yourself when attacked."
This is exactly what Californian educators told kids who were being bullied in the 90's.
zdragnar 5 hours ago [-]
> a smaller wrong is the best recourse we have to avert a greater wrong
The logic of terrorists and war criminals everywhere.
impulsivepuppet 3 hours ago [-]
I admire your deontological zealotry. That said, I think there is an implied virtuous aspect of "internet vigilantism" that feels ignored (i.e. disabling a malicious bot means it does not visit other sites) While I do not absolve anyone from taking full responsibility for their actions, I have a suspicion that terrorists do a bit more than just avert a greater wrong--otherwise, please sign me up!
I did actually try zip bombs at first. They didn't work due to the architecture of how Amazon's scraper works. It just made the requests get retried.
wiredfool 6 hours ago [-]
Amazon's scraper has been sending multiple requests per second to my servers for 6+ weeks, and every request has been returned 429.
Amazon's scraper doesn't back off. Meta, google, most of the others with identifiable user agents back off, Amazon doesn't.
toast0 5 hours ago [-]
If it's easy, sleep 30 before returning 429. Or tcpdrop the connections and don't even send a response or a tcp reset.
deathanatos 5 hours ago [-]
So first, let me prefix this by saying I generally don't accept cookies from websites I don't explicitly first allow, my reasoning being "why am I granting disk read/write access to [mostly] shady actors to allow them to track me?"
(I don't think your blog qualifies as shady … but you're not in my allowlist, either.)
So if I visit https://anubis.techaro.lol/ (from the "Anubis" link), I get an infinite anime cat girl refresh loop — which honestly isn't the worst thing ever?
Neither xeserv.us nor techaro.lol are in my allowlist. Curious that one seems to pass. IDK.
The blog post does have that lovely graph … but I suspect I'll loop around the "no cookie" loop in it, so the infinite cat girls are somewhat expected.
I was working on an extension that would store cookies very ephemerally for the more malicious instances of this, but I think its design would work here too. (In-RAM cookie jar, burns them after, say, 30s. Persisted long enough to load the page.)
xena 2 hours ago [-]
You're seeing an experiment in progress. It seems to be working, but I have yet to get enough data to know if it's ultimately successful or not.
cycomanic 4 hours ago [-]
Just FYI temporary containers (Firefox extension) seem to be the solution you're looking for. It essentially generates a new container for every tab you open (subtabs can be either new containers or in the same container). Once the tab is closed it destroys the container and deletes all browsing data (including cookies). You can still whitelist some domains to specific persistent containers.
I used cookie blockers for a long time, but always ended up having to whitelist some sites even though I didn't want their cookies because the site would misbehave without them. Now I just stopped worrying.
lcnPylGDnU4H9OF 4 hours ago [-]
> Neither xeserv.us nor techaro.lol are in my allowlist. Curious that one seems to pass. IDK.
Is your browser passing a referrer?
cookiengineer 18 hours ago [-]
Did you also try Transfer-Encoding: chunked and things like HTTP smuggling to serve different content to web browser instances than to scrapers?
Later on, browsers started to check for actual content I think, and would abort such requests.
Years later I was finally able to open it.
Surprisingly, Windows 95 didn't die trying to load it, but quite a lot of operations in the system took noticeably longer than they normally did.
Among things that didn't work were qutebrowser, icecat, nsxiv, feh, imv, mpv. I did worry at first the file was corrupt, I was redownloading it, comparing hashes with a friend, etc. Makes for an interesting benchmark, I guess.
For others curious, here's the file: https://0x0.st/82Ap.png
I'd say just curl/wget it, don't expect it to load in a browser.
I think this was it:
https://freedomhacker.net/annoying-favicon-crash-bug-firefox...
https://medium.com/@bishr_tabbaa/when-smart-ships-divide-by-...
"On 21 September 1997, the USS Yorktown halted for almost three hours during training maneuvers off the coast of Cape Charles, Virginia due to a divide-by-zero error in a database application that propagated throughout the ship’s control systems."
" technician tried to digitally calibrate and reset the fuel valve by entering a 0 value for one of the valve’s component properties into the SMCS Remote Database Manager (RDM)"
https://www.google.com/search?q=windows+nt+bug+affects+ship
Though, bots may not support modern compression standards. Then again, that may be a good way to block bots: every modern browser supports zstd, so just force that on non-whitelisted browser agents and you automatically confuse scrapers.
I know it's slightly off topic, but it's just so amusing (edit: reassuring) to know I'm not the only one who, after 1 hour of setting up Wordpress there's a PHP shell magically deployed on my server.
>Oh look 3 separate php shells with random strings as a name
Never less than 3, but always guaranteed.
But it's such a bad platform that there really isn't any reason for anybody to use WordPress for anything. No matter your use case, there will be a better alternative to WordPress.
I've tried Drupal in the past for such situations, but it was too complicated for them. That was years ago, so maybe it's better now.
25 years ago we used Microsoft Frontpage for that, with the web root mapped to a file share that the non-technical secretary could write to and edit it as if it were a word processor.
Somehow I feel we have regressed from that simplicity, with nothing but hand waving to make up for it. This method was declared "obsolete" and ... Wordpress kludges took its place as somehow "better". Someone prove me wrong.
[0] https://decapcms.org/
Edit: I actually feel a bit sorry for the SurrealCMS developer. He has a fantastic product that should be an industry standard, but it's fairly unknown.
In one, multiple users can login, edit WYSIWYG, preview, add images, etc, all from one UI. You can access it from any browser including smart phones and tablets.
In the other, you get to instruct users on git, how to deal with merge conflicts, code review (two people can't easily work on a post like they can in wordpress), previews require a manual build, you need a local checkout and local build installation to do the build. There no WYSIWYG, adding images is a manual process of copying a file, figuring out the URL, etc... No smartphone/tablet support. etc....
I switched by blog from wordpress install to a static site geneator because I got tired of having to keep it up to date but my posting dropped because of friction of posting went way up. I could no longer post from a phone. I couldn't easily add images. I had to build to preview. And had to submit via git commits and pushes. All of that meant what was easy became tedious.
I build mine with GitHub Actions and host it free on Pages.
If they are selling anything on their website, it's probably going to be through a cloud hosted third party service and then it's just an embedded iframe on their website.
If you're making an entire web shop for a very large enterprise or something of similar magnitude, then you have to ask somebody else than me.
Everything I've built in the past like 5 years has been almost entirely pure ES6 with some helpers like jsviews.
https://survey.stackoverflow.co/2024/technology#1-web-framew...
There's a few plugins that do this, but vanilla WP is dangerous.
I've used this teaching folks devops, here deploy your first hello world nginx server... huh what are those strange requests in the log?
Edit: And for folks who write their own web pages, you can always create zip bombs that are links on a web page that don't show up for humans (white text on white background with no highlight on hover/click anchors). Bots download those things to have a look (so do crawlers and AI scrapers)
I did a version of this with my form for requesting an account on my fediverse server. The problem I was having is that there exist these very unsophisticated bots that crawl the web and submit their very unsophisticated spam into every form they see that looks like it might publish it somewhere.
First I added a simple captcha with distorted characters. This did stop many of the bots, but not all of them. Then, after reading the server log, I noticed that they only make three requests in a rapid succession: the page that contains the form, the captcha image, and then the POST request with the form data. They don't load neither the CSS nor the JS.
So I added several more fields to the form and hid them with CSS. Submitting anything in these fields will fail the request and ban your session. I also modified the captcha, I made the image itself a CSS background, and made the src point to a transparent image instead.
And just like that, spam has completely stopped, while real users noticed nothing.
Automated banning is harder, you'd probably want a heuristic system and look up info on IPs.
IPv4 with NAT means you can "overban" too.
It's also not a common metric you can filter on in open firewalls since you must lookup and maintain a cache of IP to ASN, which has to be evicted and updated as blocks still move around.
https://github.com/skeeto/endlessh
The practical effect of this was you could place a zip bomb in an office xml document and this product would pass the ooxml file through even if it contained easily identifiable malware.
The file size problem is still an issue for many big name EDRs.
It's not working very well.
In the web server log, I can see that the bots are not downloading the whole ten megabyte poison pill.
They are cutting off at various lengths. I haven't seen anything fetch more than around 1.5 Mb of it so far.
Or is it working? Are they decoding it on the fly as a stream, and then crashing? E.g. if something is recorded as having read 1.5 Mb, could it have decoded it to 1.5 Gb in RAM, on the fly, and crashed?
There is no way to tell.
Many of these are annoying LLM training/scraping bots (in my case anyway). So while it might not crash them if you spit out a 800KB zipbomb, at least it will waste computing resources on their end.
PS: I'm on the bots side, but don't mind helping.
Secondly, I know that most of these bots do not come back. The attacks do not reuse addresses against the same server in order to evade almost any conceivable filter rule that is predicated on a prior visit.
cgroups with hard-limits will let the external tool's process crash without taking down the script or system along with it.
This is exactly the same idea as partitioning, though.
I'm sure though that if it was as simples as that we wouldn't even have a name for it.
It's just nobody usually implements a limit during decompression because people aren't usually giving you zip bombs. And sometimes you really do want to decompress ginormous files, so limits aren't built in by default.
Your given language might not make it easy to do, but you should pretty much always be able to hack something together using file streams. It's just an extra step is all.
Also zip bombs are not comically large until you unzip them.
Also you can just unpack any sort of compressed file format without giving any thought to whether you are handling it safely.
Signed, a kid in the 90s who downloaded some "wavelet compression" program from a BBS because it promised to compress all his WaReZ even more so he could then fit moar on his disk. He ran the compressor and hey golly that 500MB ISO fit into only 10MB of disk now! He found out later (after a defrag) that the "compressor" was just hiding data in unused disk sectors and storing references to them. He then learned about Shannon entropy from comp.compression.research and was enlightened.
So you could access the files until you wrote more data to disk?
There's literally no machine on Earth today that can deal with that (as a single file, I mean).
I tried this on my computer with a couple of other tools, after creating a file full of 0s as per the article.
gzip -9 turns it into 10,436,266 bytes in approx 1 minute.
xz -9 turns it into 1,568,052 bytes in approx 4 minutes.
bzip2 -9 turns it into 7,506 (!) bytes in approx 5 minutes.
I think OP should consider getting bzip2 on the case. 2 TBytes of 0s should compress nicely. And I'm long overdue an upgrade to my laptop... you probably won't be waiting long for the result on anything modern.
All depends on how much magic you want to shove into an "algorithm"
But a gzip decompressor is not turing-complete, and there are no gzip streams that will expand to infinitely large outputs, so it is theoretically possible to find the pseudo-Kolmogorov-Complexity of a string for a given decompressor program by the following algorithm:
Let file.bin be a file containing the input byte sequence.
1. BOUNDS=$(gzip --best -c file.bin | wc -c)
2. LENGTH=1
3. If LENGTH==BOUNDS, run `gzip --best -o test.bin.gz file.bin` and HALT.
4. Generate a file `test.bin.gz` LENGTH bytes long containing all zero bits.
5. Run `gunzip -k test.bin.gz`.
6. If `test.bin` equals `file.bin`, halt.
7. If `test.bin.gz` contains only 1 bits, increment LENGTH and GOTO 3.
8. Replace test.bin.gz with its lexicographic successor by interpreting it as a LENGTH-byte unsigned integer and incrementing it by 1.
9. GOTO 5.
test.bin.gz contains your minimal gzip encoding.
There are "stronger" compressors for popular compression libraries like zlib that outperform the "best" options available, but none of them are this exhaustive because you can surely see how the problem rapidly becomes intractable.
For the purposes of generating an efficient zip bomb, though, it doesn't really matter what the exact contents of the output file are. If your goal is simply to get the best compression ratio, you could enumerate all possible files with that algorithm (up to the bounds established by compressing all zeroes to reach your target decompressed size, which makes a good starting point) and then just check for a decompressed length that meets or exceeds the target size.
I think I'll do that. I'll leave it running for a couple days and see if I can generate a neat zip bomb that beats compressing a stream of zeroes. I'm expecting the answer is "no, the search space is far too large."
https://www.hackerfactor.com/blog/index.php?/archives/762-At...
Like, a legitimate crawler suing you and alleging that you broke something of theirs?
The CFAA[1] prohibits:
> knowingly causes the transmission of a program, information, code, or command, and as a result of such conduct, intentionally causes damage without authorization, to a protected computer;
As far as I can tell (again, IANAL) there isn't an exception if you believe said computer is actively attempting to abuse your system[2]. I'm not sure if a zip bomb would constitute intentional damage, but it is at least close enough to the line that I wouldn't feel comfortable risking it.
[1]: https://www.law.cornell.edu/uscode/text/18/1030
[2]: And of course, you might make a mistake and incorrectly serve this to legitimate traffic.
> which is used in or affecting interstate or foreign commerce or communication, including a computer located outside the United States that is used in a manner that affects interstate or foreign commerce or communication of the United States
Assuming the server is running in the states, I think that would apply unless the client is in the same state as the server, in which case there is probably similar state law that comes into affect. I don't see anything there that excludes a client, and that makes sense, because otherwise it wouldn't prohibit having a site that tricks people into downloading malware.
But if it matters pay your lawyer and if it doesn’t matter, it doesn’t matter.
I'll play the side of the defender and you can play the "bot"/bot deployer.
> it could serve the zip bomb to a legitimate bot.
Can you define the difference between a legitimate bot, and a non legitimate bot for me ?
The OP didn't mention it, but if we can assume they have SOME form of robots.txt (safe assumtion given their history), would those bots who ignored the robots be considered legitimate/non-legitimate ?
Almost final question, and I know we're not lawyers here, but is there any precedent in case law or anywhere, which defines a 'bad bot' in the eyes of the law ?
Final final question, as a bot, do you believe you have a right or a privilege to scrape a website ?
https://en.wikipedia.org/wiki/Mantrap_(snare)
Of course their computers will live, but if you accidentally take down your own ISP or maybe some third-party service that you use for something, I'd think they would sue you.
>Disallow: /zipbomb.html
Legitimate crawlers would skip it this way only scum ignores robots.txt
Neither is the HTTP specification. Nothing is stopping you from running a Gopher server on TCP port 80, should you get into trouble if it happens to crash a particular crawler?
Making a HTTP request on a random server is like uttering a sentence to a random person in a city: some can be helpful, some may tell you to piss off and some might shank you. If you don't like the latter, then maybe don't go around screaming nonsense loudly to strangers in an unmarked area.
The server owner can make an easy case to the jury that it is a booby trap to defend against trespassers.
I would have figured the process/server would restart, and restart with your specific URL since that was the last one not completed.
What makes the bots avoid this site in the future? Are they really smart enough to hard-code a rule to check for crashes and avoid those sites in the future?
The gzip bomb means you serve 10MB but they try to consume vast quantities of RAM on their end and likely crash. Much better ratio.
[0]: https://en.wikipedia.org/wiki/Chunked_transfer_encoding
I don't think it's a terrible problem to solve these days, especially if you use one of the tarpitting implementations that use nftables/iptables/eBPF, but if you have one of those annoying Chinese bot farms with thousands of IP addresses hitting your server in turn (Huawei likes to do this), you may need to think twice before deploying this solution.
I had other ideas too, but I don't know how well some of them will work (they might depend on what bots they are).
An alternative might be to use Brotli which has a static dictionary. Maybe that can be used to achieve a high compression ratio.
Surely, the device does crash but it isn’t destroyed?
Most of the bots I've come across are fairly dumb however, and those are pretty easy to detect & block. I usually use CrowdSec (https://www.crowdsec.net/), and with it you also get to ban the IPs that misbehave on all the other servers that use it before they come to yours. I've also tried turnstile for web pages (https://www.cloudflare.com/application-services/products/tur...) and it seems to work, though I imagine most such products would, as again most bots tend to be fairly dumb.
I'd personally hesitate to do something like serving a zip bomb since it would probably cost the bot farm(s) less than it would cost me, and just banning the IP I feel would serve me better than trying to play with it, especially if I know it's misbehaving.
Edit: Of course, the author could state that the satisfaction of seeing an IP 'go quiet' for a bit is priceless - no arguing against that
Bad bots don't even read robots.txt.
I'm not a lawyer, but I'm yet to see a real life court case of a bot owner suing a company or an individual for responding to his malicious request with a zip bomb. The usual spiel goes like this: responding to his malicious request with a malicious response makes you a cybercriminal and allows him (the real cybercriminal) to sue you. Again, except of cheap talk I've never heard of a single court case like this. But I can easily imagine them trying to blackmail someone with such cheap threats.
I cannot imagine a big company like Microsoft or Apple using zip bombs, but I fail to see why zip bombs would be considered bad in any way. Anyone with an experience of dealing with malicious bots knows the frustration and the amount of time and money they steal from businesses or individuals.
This is what trips me up:
>On my server, I've added a middleware that checks if the current request is malicious or not.
There's a lot of trust placed in:
>if (ipIsBlackListed() || isMalicious()) {
Can someone assigned a previously blacklisted IP or someone who uses a tool to archive the website that mimics a bot be served malware? Is the middleware good enough or "good enough so far"?
Close enough to 100% of my internet traffic flows through a VPN. I have been blacklisted by various services upon connecting to a VPN or switching servers on multiple occasions.
They made the request. Respond accordingly.
https://williamgibson.fandom.com/wiki/ICE
"Hurting people is wrong, so you should not defend yourself when attacked."
"Imprisoning people is wrong, so we should not imprison thieves."
Also the modern telling of Robin Hood seems to be pretty generally celebrated.
Two wrongs may not make a right, but often enough a smaller wrong is the best recourse we have to avert a greater wrong.
The spirit of the proverb is referring to wrongs which are unrelated to one another, especially when using one to excuse another.
This is exactly what Californian educators told kids who were being bullied in the 90's.
The logic of terrorists and war criminals everywhere.
Do you really want to live in a society were all use of punishment to discourage bad behaviour in others? That is a game theoretical disaster...
Crime and Justice are not the same.
If you cannot figure that out, you ARE a major part of the problem.
Keep thinking until you figure it out for good.
Amazon's scraper doesn't back off. Meta, google, most of the others with identifiable user agents back off, Amazon doesn't.
(I don't think your blog qualifies as shady … but you're not in my allowlist, either.)
So if I visit https://anubis.techaro.lol/ (from the "Anubis" link), I get an infinite anime cat girl refresh loop — which honestly isn't the worst thing ever?
But if I go to https://xeiaso.net/blog/2025/anubis/ and click "To test Anubis, click here." … that one loads just fine.
Neither xeserv.us nor techaro.lol are in my allowlist. Curious that one seems to pass. IDK.
The blog post does have that lovely graph … but I suspect I'll loop around the "no cookie" loop in it, so the infinite cat girls are somewhat expected.
I was working on an extension that would store cookies very ephemerally for the more malicious instances of this, but I think its design would work here too. (In-RAM cookie jar, burns them after, say, 30s. Persisted long enough to load the page.)
I used cookie blockers for a long time, but always ended up having to whitelist some sites even though I didn't want their cookies because the site would misbehave without them. Now I just stopped worrying.
Is your browser passing a referrer?