lundi 30 mars 2026

Web preservation (or at least not immediate destruction) and you

This is a bit of a tautologism, but most people who spend time debating on the internet like debating. As such, we often see things get meta, and people begin debating about people opinion back them, how they were formulated, how they changed. The problem is that, unlike other domain, the evolution of people takes on the internet is a very poorly archived one, forcing often people to have to rely on guesses and memories. But why should peopl trust your memories ? Why should you trust them ? Why should you trust your own memories after all ? The brain don’t retain things like an hard drive; It get the general idea based on your impression on the moment. Anything read that have all copies deleted may as well never had been read in the first place.

Here is a concrete exemple : A few years ago, I saw someone on twitter saying that like kingdom hearts 3, kingdom hearts 2 was poorly received on release and opinions only begin shifting with the release of final mix. I wanted to object. Sure, I haven’t known how the climate was like during the release, but I was already terminally online before the international release of 2.5 HD ReMix, and as far as I remember Kingdom hearts 2 was already seen like an even better sequel. But what could I say ? I didn’t remember where I read most of this stuff, and the websites I do remember, most of theses were already offline. The other guys told other to "check the forums, they are still online" so not the forums that I remembered, but the one that were still around, probably the big website like IGN. This is a silly problem, but I would like to mark a point : If you don’t archive, then history is written by whoever have enough money to pay the bill. And small website have no obligation to keep paying it just for you.

There is also another thing to point out. You may have noticed a bit of irony as I talk to you about posts I cannot link while using a tweet I did not link as an exemple. Well the problem is that, not only is finding an old info on most social medias really hard, but we recently had several big exodus on twitter, due to the new admin being an open fascist. Several people deleted their datas, not necessarily because they wanted it off the internet, but because they didn’t want them beneficing a person they were opposed with. Some persons even didn’t like deleting all of this, even if they did it for the principle.

A more serious case than a bunch of old takes online would be that website I have found out once while trying to read structure and interpretation of computer programs. It was an extremely complete list of solution for the exercices of the book that didn’t just told you one way to do it, but also had multiple conversations on the ways to approch the problems, which could be very usefull when stuck to see if I was close to having an idea of how to solve it. However, this was back when I was mostly driven by my compulsions so I ended up procrastinating actually seriously working on the book, and the website after a first offline period definitively closed, probably wasn’t fully archived and I lost the url anyway.

What am I trying to get at, is that the true cause of erasure of the internet isn’t necessarily antipathy, but apathy. Lot of people post stuff without thinking of how solid the database is, talking without much differences on social network website that will archive everything forever or unreferenced forums where everything will go down when the owner can’t pay the bill anymore. They say that "the internet never forget" but the correct term would rather be that the internet don’t forget if it can be used against you; with how the web is structured, consider every page a future 404 error.
It’s a bit of a touchy subject with all the preoccupation on privacy, and privacy and archiving are inevitably at odds with each other : the more copies of a data exists, the better is it archived, but the harder is it for someone to value their right to forget. It’s also worth adding the concern over AI in the mix, since some of them leech archives and use technology used for archiving to feed themselves. But I think that if there something you care about on the internet, you can’t let it at the whim of a random ceo that may decide to cut you with his moderation team or sell his company to someone you hate so much you don’t want a word of his writing on his website.

So I would like to show the tools I found out to create "extra copies" of webpage. Most of them can be used without necessarily being the kind of guy who have a nas of multiple terabytes and several crawling scripts running in the background. 

The easy combo (history archiver + external archiving service)

Not exactly the most robust thing but good enough to have an additional layer of security. The idea is to use an extension that can send the page to a website archiving service, and another one to keep the history beyond 90 days so that you can keep access to the url (because theses websites are useless if you don’t have the url…) Of course, this isn’t really true archiving, as theses services are out of your control and could shut down, but it should rather be seen as a way to avoid a single point of failure.

For sending pages to theses third party service, we have the wayback machine official extension. It comes with interesting features like the ability to automatically save a page that hasn’t been archived for a while, or automatically saving every outlink to a page. But it fail on a lot of intensive page, so you often have to check if the page was saved properly.

More dubious morally wise is archive.today and it’s unofficial extension archive page. Archive.today is much more powerful than wayback machine, being able to load really fast webpages that the wayback machine would struggle archiving, Archive.today also have a weird case of enabling a ddos with it, and ignore robot.txt, which is very useful for saving forums but can be argued as kinda crossing the line on what should be archived. It’s ultimately up to you to see what you value the most.
Also, it have the other flaw to reload multiple time his page when archiving a website, meaning it can quickly clog up your history if you use an extension that retain multiple visits. Thankfully you don’t actually need to keep the tab open once the archiving processus has begun.

As for history archiver, there is a lot of options out there. There is history trends unlimited and history plus that seems to be the most popular for chronium, but feels free to see what other options are out there. Chrome can show your history in the my activity section, but in top of the privacy concern, I find it much slower than a local database, google might decide to delete your data, and it’s easy to accidentally do a full deletion while trying to clean the cache. So try at least to back it up using google takeout. Worth noting that depending of the browser and the software you’re using, theses extension can randomly die, but thankfully there is an autobackup feature in place, even if it can quickly clog up your download folder. If you’re using safari or firefox, your extensions choice are much more limited, but firefox can retain history data before an arbitrary limit and safari stock up to six month by default, but that can be modified in settings. Both only store the most recent access to a page.

Actually having the page on your machine : Singlefile

Singlefile is an extension that basically an all around improvement on the default downloading page option of your navigator. The two most interesting feature of singlefile imo are the network settings and how it work as a capture of the page you moment you see it. The network settings allow you to select what kind of ressources you want to download from the page, which can be usefull to not download over and over the same picture files you don’t need. The result are impressive; Page that would weight multiple megabites even when trying to save them in raw html with the default download option of the browser can be only a few kilobytes here. The whole capture at the moment thing mean that for exemple, if you have loaded information requested through an ajax script or a get request, singlefile can capture it without problems, something that online archiving often can’t do and that I think might sometime go beyond archivebox capability, since it works by sending an headless browser to a designated address. To give a concrete example : I like sometime to read patreon blog post about game design. However, some of the most interesting points are in the back and forth of the comment section. Archive.is, which can capture additional reddit comments just fine, can’t get theses in my experience. However, once they are loaded, they are treated by singlefile like any other text on the page.

Singlefile come with other features, like the ability to automatically download a copy of each page opened in a tab, or directly loading a list of url adress to ask it to automatically archieve them. However, it still have heavy limitations and with how, being an extension, it’s stuck cloging your download folder if you don’t install the companion program, which make me say that if you want to get in theses kind of advanced uses, you might be better off using Archive Box.

Getting videos on the computer : yt-dlp

yt-dlp is a line command program that can directly download videos from youtube if you give him the url. The program is under the mit license, which means that other programs with similar features like 4k video downloader probably run a version of yt-dlp with a gui under the shell. However, yt-dlp doesn’t have strict usage limitations like theses softwares. Plus, what become really interesting is that yt-dlp allow you to save other types of data, even if your computers doesn’t have the space to save the full video, by using the command "--skip-download". For exemple, by using the command "yt-dlp --skip-download --write-sub --sub-lang all --sub-format srv3" I was able to quickly download a bunch of subtitles back when everybody was panicking because it seemed that youtube was deleting all customised captions and upload them on the internet archive. You can also directly save an entire playlist by using it as the argument, and download comments with the --write-comments option.

Backing up discord conversation

I don’t think it’s really a mystery that discord may kinda be going worse and that a lot of persons want to leave. However, a lot of guides and informations has been written on theses server, and a lot of people are frustated at the idea of leaving behind years of conversations.

The first, most morally indiscutable tool is to use the request to get a copy of your personal data. Among other things, this tool will send you a list of all the messages you send in all the differents discord channels. This isn’t the most human friendly version of the data, but at least it’s there on your computer if you get ban or the server get deleted. This option only save your messages and doesn’t give you additional medias, so you will be missing a lot of context.

On the unofficial tool category, the discrub extension allow the user to download the content of the channels of their choice on a selected period with other filters available. It take a bit of time (especially if you use it to fetch reactions) and it can sometime freeze during the process so I would recommand to use something like chunks that cover a certain period, but outside of that it work really great. The programs can export to html to get files directly readable, but also to csv and json to allow further manipulation later on. CSV also seems to store more data than the HTML export. I also tested a bit Discord Chat Exporter which probably eat less ram since it work as a standalone program but I remember doing stuff like quickly selecting all channels except one or two and saving it over a defined period being much more annoying to do than with discrub so I didn’t used it much.
Discord history tracker meanwhile work more as a tool to capture what your current session of discord is watching. It can be used in theory to archive old messages, but I think you’re gonna be here for a while if you ask the auto scroll to deal with channels that are a decade old. Unlike discrub, it can’t fetch reaction data beyond the number of people who reacted to my knowledge. It also store medias in a bdd and only offer to "clean" everything, so I think it’s worth in that regard than discrub where you can target the heavier files if you want to gain space.

This is worth repeating as a warning, downloading discord conversation isn’t that hard. If you have a periodically deleted channel, giving people a 24 hour warning is basically an invitation to launch a scrapping program. Deleting only stop people who come after to see the data.

The most advanced solution : Archivebox

Archivebox basically aim to be an all in one solution for internet archiving. It uses several of the tools in this list, like singlefile or yt-dlp, to keep a copy of the url you feed him. I have a docker container installed on my mac for curiosity, but I find the setup a bit "heavy", especially with their emphasis on saving with multiple save format (even if you can pick which format you want to use). If you want to save a lot of url and have disk space in spare it’s probably a tool worth looking into.

-This is very creepy and I would like to not be archived

If you want to minimize the risk of your data existing in multiple places, the easiest way is to setup your social media account to only allow logged-in users to see it in the first place. This kill most online archive tools, and can even put in difficulty tools to download pages. For exemple, archivebox send by default an headless browser without any cookies or login data, and in the current state of the application it’s recommended to use burner account because login information can be found if the settings to login to website are enabled. There isn’t much you can do to stop people from downloading your data if they really want to however.

lundi 23 mars 2026

(Web) Compulsion and you.

For 12 years, I thought I was the laziest person in the world. I couldn’t get anything done, always procrastinating almost litterally everything. It cost me my studies, and good chances of integrating myself in society. And at the same time I thought that my time was gonna come, that I was gonna focus and make back theses months years at least some of the time lost. But then I woke up, I was already in my late twenties, and I began to get worried things would never change. So I tried to contact back some old friend that I originally didn’t want to let see me in that state. But it was too late, and they moved on already, if I was even able to trade a few words. Worried about how much I was forgetting, I was trying to scrape togethe whatever traces theses years have left by. And then stumbling on an old firefox install, I discovered something : I didn’t gave that much of a fuck about all the things I searched. I thought about all the stuff, even simple, I could have done instead of reading the wiki for « factorio » a game I only played a free alpha. It was frustrating. But it gave me more direction than I thought to move forwards. However, it seems that with the explosion of techiques to make the internet more and more addictive, I am not the one with problems to use my time well. So I thought that a few notes about what I observed could be usefull to other people.


Identifying the problem.

Having a web compulsion kinda suck. You are just living everything by procuration. I could wake up, eat my breakfast, go on my computer, eat my lunch, go back on my computer, eat a snack, go back on my computer, eat a bit of dinner, go back on my computer and thinking I would just maybe do one more click until I went to sleep way too late and if I maybe played a bit of videogame it was the most productive I had been. If you slightly recognize yourself into that, you can make things at least a little better for yourself.

A triangle where the three vertices read «sufferring» «one of theses days» and «compulsion»
I recognize that this isn’t really fine art, but that will do

(disclaimer : I am not a medical specialist, addictologue, or anything else. This is just personal experience)

This is the triangle. I believe, that identifying with details each of the points is key to manage a compulsion problem.


Compulsion :

This is the point you probably don’t know as well as you think you do. Our brain suck at remembering stuff ; I remember during my first attempt at keeping a diary that idenfiying exactly what goes on each days get tricky after only 3 or 4 days of procrastinating. So you don’t remember your web sessions as well as you think you do. While you think of them as waste of time in hindsight, you also only remember the bests bit of thems. The one where you actually learn something (even if it wasn’t very usefull), where you laughed, where you made someone else laughed. A compulsion, is in a way, an action that sound more useful on the moment that it actually is. I think the first step is to look at what you were actually doing on the computer, say, a few years ago, and see how much stuff you already forgotten and didn’t care much about is there. This is easier said that done, and you may have to settle for only a few months ago, but this can already help a lot.

There is other reasons why I think you should check your web history. Most advice in terms of how to treat addictions reccommand looking at what trigger the compulsion, their frequency and stuff, and web compulsion have the advantage of potentially being one of the compulsion that document itself the most, so we may as well take advantage of that. Also, even if you don’t have pathological problems, keeping traces of navigation history can be useful to not always have to default to « I remember reading somewhere by someone » more that you would like to.

How to read it is the tricky part. We want for the sake of the exercice to read something a few months/years old, not just yesterday session you’re currently guilting about, but most navigator only keep this data for three months. Microsoft Edge, being a chronium navigator, only keep the data for three months but allow manual exports. Safari keep by default the data for six months unless changed in settings, and delete all visits that aren’t the most recents, with a manual export option setting. Chrome may store data beyond three months in my activity, which can also be used to retrieve your searchs if you use Google and all kind of other data. Firefox work with a variable limit entry instead, depending of your hard drive. Safari have six months by default unless you change it manually, and don’t keep repeated visits. Vivaldi only keep for three months unless set otherwise, but at least come with a cool interface.

If you use a chronium navigator, you can bypass some of theses limit with an extension, like history plus or history trends unlimited, which I honestly recommand to anyone regardless of your internet usage. If anything, it can at least help make whatever little of data has been saved to be more readable.

So, try looking at your google activity or whatever old browsing data you could salvage. What do you think of it in hindsight ? Were they all subjects that interested you ? Were it the best way to learn about it ? What kind of patterns lead to you wasting your time ? Do you really give a fuck in hindsight ?

This is a painful, but necessary I think experience. Don’t try to delete the data you’re ashamed of; I tried and I regret it. It doesn’t give you your time back.


One of theses days :

The second point of the graph. Now that you’re watching some old history, you can probably at least think of some stuff you would have liked to do instead, whether it’s productive or personal. This probably isn’t the first time you are thinking of stuff you need to do, but you should for now keep it broad. A « one of theses days » is anything you are post-ponning without real reasons. It can be « working » sure, but it can also be spending time with your loved one, sorting some papers, cleaning your room, or playing a video game. It can even be something in rapport with internet. As long as YOU want to do it but you are procrastinating it for the sake of compulsion, it work. Don’t think too hard in terms of important/not important yet ; We are trying to build the level 0 of « getting stuff I actually want to get done instead of whatever go through my head at some precise moment ». For exemple, one of the first thing I did was progressively cleaning up an old address mail with over a thousand of unread mail to make it actually usable. So, when you can recognize you are doing something by compulsion, try to grab something you wanted to do one of theses days instead. There is a line between « one of theses days » and « compulsion » because the two aren’t so distinct ; stuff started by compulsion can slide into « one of theses days » as you keep procrastinating going further into it, or sometime you do manage by compulsion to accidentally do something that you wanted to do one of theses days. But compulsion is a poor guide, and you shouldn’t expect it by itself to point you towards things that are actually fulffilling to do. For some people, facebook is a compulsion ; for me, adding my family to stay in touch with them was something I wanted to do « one of theses days ». Some google search can be compulsive while other can be something you wanted to do one of theses days. 

- Bu-But I want to lock in
Stop. Forget « locking in ». It ain’t gonna happen. It’s part of the problem ; locking in is scary, so you keep thinking you’ll do it tomorrow. You aren’t gonna make back the day you lost to compulsing navigation by working twice as hard, they are just lost. Kinda suck but that’s it.

Even if you do want to lock in, you can’t make me believe that the only thing you want to do. The mail aren’t gonna read themselves. The documents aren’t gonna sort themselves. The food isn’t cooking itself, and your mom would like some help. Do some of the thing you want to do one of theses days first, and then we’ll talk about locking in.


Sufferring

The third point, also important to understand. To some level, even if you don’t realize it, you may suffer at the idea of doing what you want to do one of theses days. You can be afraid that is gonna be less than perfect, that it’s not gonna pay off, or that working on it make you think about all the time you already wasted. That when a compulsion may kick in, to hide the pain. Identifying the suffering is easier when the two other points has been identified, and you can actually begin to answer some question about it : Can I do anything to make it less painfull ? Is it gonna lessen with time ? If the answer is no, tough luck. Look like you are gonna have to deal with that suffering, as unpleasant as it is, to actually get some stuff done.


Now that the triangle has been identified, we can actually make some step forwards

1) Recognizing you are doing something by compulsion. Question whether you are actually going to give a fuck about that thing a few months from now on

2) Trying to think of something you wanted to do one of theses day instead. Remembering why you actually want to advance on it, why it’s actually important to you

3) Questioning the pain and whether it can be lessened. If the answer is no, then you’ll have to deal with it.

You will not manage to always escape compulsion, so I’d reccommand making sure they are finite, to make it easier to manage your time. Avoid stuff like discover pages and use feature like subscriptions to focus on what you actually care about. I reccommand getting the old reddit redirect extension or an userscript because the new version feels a lot more addictive : Old reddit will more or less show you the same links in a day you can realize you don’t actually care much about, while new reddit only show you one new link by screen that you may never find later.


That’s more or less the guide. Does it work ? I’m not really sure, but that I managed to write a small blog post, even if it’s less than perfect, is a good sign and improvement. I hope that this stuff may prove useful to some people

The three game rules framework

 If you have ever seen game design discussion on the internet, especially if you saw say rpg fans and arcade fans fighting each other, you p...