Backing Up This Forum Elsewhere To Preserve The Work

Following this wget suggestion (How to create a read only, archive version of Discourse content - Support - Discourse Meta) I just quickly did a full scrape of this forum (public posts only). It works very well.

Browsable Archive now located on github pages here:

https://johnk-.github.io/forum.rebol.info/index.html

I am just running this from my laptop for the moment on a weekly cron job (and I have added a few delays to reduce the server load).

Not the most robust approach, but it should do for the moment. I think we have moved the risk of losing the forum content a few points down the scale.

2 Likes

Wow--thank you! It's comforting to think that if I get hit by a bus these ideas will still be somewhere out there... :bus:

1 Like

I couldn't find something here and so I did a google site search to see if it was better and found...

no hits!

Google results for site:forum.rebol.info

I don't think this is all bad, as the plan is re-launching the content under a new domain "when it's ready". I'm okay with being under the radar until then. And hey--it keeps one from worrying too much about keeping old forum.rebol.info links working.

--nevertheless-- it is a little bit unsettling in the sense that there are plenty of inbound links. Zero Google indexing is surprising. I think the quality of content is better than many other places (personally).

Not necessarily a particular action item--not trying to get found yet. But definitely a call to arms, and praise for @johnk making the valiant preservation effort!

1 Like

I noticed that @hiiamboris has been assembling a corpus of Red material (issues, codebases, chats) to feed into LLMs:

That reminded me of this backup effort...

The last update is December 2024, and it looks like that happened a bit before the move from forum.rebol.info to rebol.metaeducation.com in April 2025.

Is it hard to get it to update again? There's no need for the github.io URL to change just because the domain name had to, I guess...for continuity in web scraping.

The idea of being able to talk to an AI that has read and digested the entire forum is seeming appealing to me. I found ChatGPT to be surprisingly aware of Ren-C-isms (definitional returns and such), and I wonder if this scrape might be why?

I should have checked to see, but there's actually another scrape here!

https://johnk-.github.io/rebol.metaeducation.com/index.html

Although the last scrape was 28-Jul-2025 (so I guess the cron job is down...)

Having two copies is not necessary. In fact, I'd prefer just wiping out the existing scrape and any git history, overwriting it with a new one.

I don't know what consequences we face in terms of content ingestion, by shifting the URLs and repositories around. Should a fresh scrape go up at https://johnk-.github.io/forum.rebol.info/index.html since that has continuity from Feb 2022?

Content creators definitely need some kind of AI control panel. :robot:

1 Like