Make WordPress Core

Opened 5 months ago

Last modified 4 days ago

#64363 new enhancement

add MarkDown feeds

Reported by: pbearne's profile pbearne Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Feeds Keywords: has-patch
Focuses: Cc:

Description (last modified by westonruter)

Matt said it might be good to have a Markdown version of the content.

This is the start of doing this.

Change History (13)

This ticket was mentioned in PR #10599 on WordPress/wordpress-develop by @pbearne.


5 months ago
#1

  • Keywords has-patch added

#2 follow-up: @westonruter
5 months ago

Matt said it might be good to have a Markdown version of the content.

Where did he say that?

#3 @wildworks
5 months ago

Matt said it might be good to have a Markdown version of the content.

I've never heard of such a thing either. It would be nice to have some clear data and reasons why markdown feeds are useful.

#4 in reply to: ↑ 2 @NekoJonez
5 months ago

Replying to westonruter:

Matt said it might be good to have a Markdown version of the content.

Where did he say that?

State of the Word 2025. I'm unable to find the exact moment where he said it.

#5 @westonruter
5 months ago

  • Description modified (diff)

OK, I found it on YouTube at 1:39:49:

One thing I've been wondering about is a, a cool feature of WordPress is actually every webpage that you visit actually has, um, if you sort of twiddle the query string, you can actually get,
uh, a feed representation of that and it kind of goes through the feed engine of WordPress. So you can get a RSS S two version of it or, you know, atom version or SS 9.2 or whatever.
And it basically allows you to create like a different representation. So you've got kind the HTML page that gets served to humans, but, uh, Bach could in
theory connect request a sort of XMO version. Um, you know, something that we could add to this feed engine is a markdown version.
Some people started to experiment this where you essentially get like either with a sort of varies header http header or requesting like a.md version of a webpage could be something that,
um, is sort of, uh, smaller and easier to parse. So there's often, with a lot of the AI models, there's sort of these
context windows or token lengths. So essentially distilling down the sort of meat of a webpage or a blog post.
So it's sort of like barest form, which marked down as a beautiful sort of text version of
um, could essentially allow webpages to be ingested by AI things in a more efficient manner.
And, uh, something I could definitely see as building into 7.0 or 7.1. This is kind of just an idea right now.
It's more of in the notebook, mostly notebook phase. But, um, something I've been noodling on is how we can sort of make different, uh,
representations of websites accessible.

#6 @pbearne
5 months ago

This might die as a bad idea or not
I am sure if it lives, that better/more code will be needed
Let's treat this ticket as a placeholder for now

#7 @wildworks
5 months ago

At the very least, porting an external library directly into core as a parser may not be ideal.

Using the HTML Processor should allow for a more flexible and accurate implementation.

#8 @JeffPaul
3 months ago

Note that there's a conceptual exploration for this pending in the AI Experiments plugin: https://github.com/WordPress/ai/pull/194.

#9 @pbearne
3 months ago

Happy to use another method to create the markdown

#10 @pbearne
3 months ago

removed the third party libary and copied the HTML_To_Markdown_Converter class from https://github.com/Jameswlepage/ai/tree/feature/markdown-feeds

#11 @pbearne
8 weeks ago

The current patch will be replaced with code from https://github.com/dmsnell/html-to-md when it is finished

#12 @justlevine
5 days ago

I'm finally giving the hmtl-to-md and the w.org endpoints a review and seeing some basics missing.
This one felt particularly important to crosslink here.

Last edited 5 days ago by westonruter (previous) (diff)

#13 @pbearne
4 days ago

We need to talk about the URL pattern

The WP.org code is us adding ?output_format=md

When I first coded this, I used the add_feed functions to add the URL pattern just a markdown to the end of the URL domain.tdl/page/markdown to my view, this seems to be cleaner and follows the other views (RSS) in WordPress

What do other people feel is the right solution, and what are the pro/cons?

Note: See TracTickets for help on using tickets.