WordPress.org

Make WordPress Core

Opened 7 years ago

Last modified 3 weeks ago

#14179 new defect (bug)

Theme asking to update (theme with same name on WordPress.org)

Reported by: design_dolphin Owned by:
Milestone: WordPress.org Priority: normal
Severity: normal Version:
Component: Themes Keywords:
Focuses: Cc:

Description

I have a theme with a certain name, but there is a theme with the same name in the WordPress themes directory. So now my theme keeps showing an update message.

Proposed solutions:

  1. Add unique hash to each uploaded theme in the themes directory. This way even if someone has a home-grown theme which happens to have the same name, but are not going to upload it to the themes directory, there won't be a conflict. (You could change the name of the theme, but who is to say someone won't upload a theme that also has that name.)
  1. Add more fields to the updater check such as author, and date/ time created. This could mean though that theme authors would always have to add this during theme creation, and know why they are doing this. Possibly adding a extra complicating step in WordPress theme design. Otherwise at some point they or their client could start getting a update message for their theme.

Attachments (1)

23318-upgrade-header.diff (3.4 KB) - added by meloniq 3 years ago.
Adds optional "Upgrade" header for plugins and themes. New header defaults to 'true', when "Upgrade: false" is set, it excludes plugin/theme from checking for updates in WP API.

Download all attachments as: .zip

Change History (48)

#1 @GautamGupta
7 years ago

Same with plugins..

#2 @filosofo
7 years ago

  • Component changed from Themes to WordPress.org

See #13928 also for a similar issue with plugins.

#3 @sean212
7 years ago

I'd go with solution #2, because adding a hash for all already existing plugins would be quite a hassle, being that there's over 10,000. :D

#4 @nacin
7 years ago

  • Milestone changed from Awaiting Review to Future Release

#5 @SergeyBiryukov
5 years ago

  • Component changed from WordPress.org site to Themes
  • Milestone changed from Future Release to WordPress.org

Related: #23318

#6 @SergeyBiryukov
4 years ago

#26645 was marked as a duplicate.

#7 @meloniq
4 years ago

  • Cc meloniq@… added

#8 @dreamwhisper
4 years ago

  • Cc dreamwhisper added

#9 @SergeyBiryukov
3 years ago

#28954 was marked as a duplicate.

@meloniq
3 years ago

Adds optional "Upgrade" header for plugins and themes. New header defaults to 'true', when "Upgrade: false" is set, it excludes plugin/theme from checking for updates in WP API.

#10 @meloniq
3 years ago

Patch 23318-upgrade-header.diff adds optional "Upgrade" header for plugins and themes. New header defaults to 'true', when "Upgrade: false" is set, it excludes plugin/theme from checking for updates in WP API.

(Patch was originally submitted to #23318)

#11 @nhuja
3 years ago

This is a major issue. Many of our clients updated to a different "free" version theme by different author with same theme name and we are at a loss. Can we not simply check for Author for this auto update? We are losing clients!!

#12 @dd32
3 years ago

#31969 was marked as a duplicate.

#13 @ocean90
2 years ago

#34370 was marked as a duplicate.

#14 @JeroenReumkens
2 years ago

Same thing here. Would be great if the check would be a little more reliable. Luckily I noticed that Wordpress told me about a theme update before my theme was released to the client, so I could change the theme name. But in theory it could mean that one day someone releases a theme with the exact same name as one of the themes I've build (probably months / years ago). If the client notices this message earlier then I do, they will have a broken site.

Hope the update check can be improved :)

This ticket was mentioned in Slack in #core by benoitchantre. View the logs.


21 months ago

This ticket was mentioned in Slack in #core by dd32. View the logs.


21 months ago

This ticket was mentioned in Slack in #core by sergey. View the logs.


13 months ago

#18 @infolu
10 months ago

Hello, how was this resolved, was there any change in the core to solve this question?
I have some client models with generic names such as Ecommerce - Dealership - Real Estate or more normal words, and just why models do not have wordpress org appear to update, breaking the websites.

#19 @SergeyBiryukov
5 months ago

#40961 was marked as a duplicate.

This ticket was mentioned in Slack in #themereview by greenshady. View the logs.


3 months ago

This ticket was mentioned in Slack in #themereview by dingo_d. View the logs.


3 months ago

This ticket was mentioned in Slack in #core by grapplerulrich. View the logs.


3 months ago

#23 @dingdang
3 months ago

Some ideas how to solve this while:
(A) Having backward compatibility - older versions of WordPress to handle (at least partially) the new implementation w/o any new code;
(B) Maximal prevention of abuses;
(C) Solve several problems at once as: 1. unwanted updates of not related themes 2. abuse of "Popular" themes page;
(D) Minimal needs for new code writing.

(I will use these (A),(B)... marks below when related, not to repeat the above.)

  1. For all NEW themes - to add a generated prefix to the slug. The prefix should be unique and non-existing at the moment at svn. An example of such is "tNNNN-". This is required so old themes are not misinterpreted as new with a prefix. In the beginning unique prefixes (ids) can have 4 digits like "t2345-", after these are used to use 5 digits "t23456-" and so on. If a new theme is called "The Theme" its slug will be "t2345-the-theme". (A) (C) (D)
  1. For all UPDATES, authors should add an additional line in style.css's header: Uid: t2345 (C)
  1. New section at profiles.wordpress.org for logged in users for theme name reservation: the user can check trough a simple form if a name is available (checking the current svn and other reserved names) and if it is available – the system will reserve the new name for 1 hour (B), generate the next available uid for that name and provide it to the user so he can include it in the style.css file. After an hour a garbage collector releases the name and uid if a theme with that name is not uploaded in the meantime. (B)
  1. In the same section the system will permanently auto-assign UIDs for all current themes, so the author could get it for inclusion in style.css in the future updates of that theme.
  1. Inclusion of UID in style.css is mandatory for all updates, not mandatory (would be ignored) for all new themes as it is present in the slug. (A)
  1. CORE: Send to API not just the slug, but the UID as well, if present in style.css. Slug of the new themes is the extended one, no need to do any additional tasks. (A)
  1. API: If the slug is of the extended type, do nothing special. If there is an UID in the data received, concatenate that UID to the slug in the form “UID-slug” and proceed as usual. (D)

N.B. "Old" themes can be easily distinguished from "New" by looking at their slug and locating a presence of the prefix (so no need to store any additional data anywhere for the themes regarding this).

One decision has to be made about how long (or infinitely) to count for old themes as active installs both the extended slug and the old slug (like to ignore at some point the old slug after let say 1-3 updates) - this is required in order to detach all counts of active installs that have nothing in common with that specific theme (like child themes named the same way, other themes from other marketplaces, random custom themes, etc).

All of the above solve all the problems related to name collisions; new themes extended slug works even on older WordPress installations; several lines of code needed for the core and API, not much for the name check and UID generator form for the logged in users and several lines for the sorting algorithm for the "Popular" page.

@grapplerulrich

Last edited 3 months ago by dingdang (previous) (diff)

This ticket was mentioned in Slack in #themereview by dingdang. View the logs.


3 months ago

#25 @joyously
3 months ago

@dingdang I don't see how your scheme helps the biggest category: old themes or child themes with no number that match a name in the repo.
And if old themes don't need a number, why do new themes need a number? In other words, trying to tie the site's theme to a repo theme by something besides a name is good, but it would have to work for all themes. So what do all themes in the repo already have that other themes do not? I'd say that the SVN info is unique, so even if the theme name matches, the style.css or the readme probably do not (or just the headers). Someone mentioned author, which sounds good but is often changed. What if the update API checked if their current version style.css matched the one in SVN for that version?

#26 @dingdang
3 months ago

@joyously

  • Re: the new themes - adding the id as a prefix to the slug in the svn is all that has to be done to assure that there are no collisions and themes with same names in wp.org's svn and external providers will not interfere anymore
  • Re: The old themes are something that already happened to the WP world. Installations of the theme "X" from wp.org should continue to get updates from wp.org, the slug of "X" can't be changed. We can't do anything about that..
  • The addition of UID to style.css of "X"'s new versions (updates) however helps to resolve the side effects of the old themes with collisions to wp.org (counting the active installations of themes that are not related). After some number of updates of the theme and/or some period of time (1 or 2 years), only active installations of "old" themes with updates and containing the new UID could be counted as active and to equalize with the "new" themes which have the UID in their slugs.
  • NEW: now this is something important that I forgot to mention: A convention for a predefined word for a UID like "external" should be adopted as an option for any themes that are not in wp.org's svn (but of course not mandatory). If in style.css there is just one additional line "Uid: external" then:
  • WordPress will NOT try to update the theme from wordpress.org - this will be the easiest way for the theme developers to prevent unwanted updates;
  • wp.org will NOT count these as active installs - so for example if the authors of the original themes which names were pointed as allowed exceptions with name collisions ("Total", "Consulting", etc) add that line to style.css, those fake active installs numbers will drop to the real values and they will get down to their real rank.

Then if I am an "external" author I would have the control to say "Never update my theme and don't count my installations on behalf of some 'smart' guys that caught my theme's name in the past"

Last edited 3 months ago by dingdang (previous) (diff)

#27 @dingdang
3 months ago

While the above idea's top priority is minimal need of new code and looks a little "hacky" with slug prefixes it has:
PROS:

  • partially backward compatible with older versions of WP
  • little code to implement

CONS:

  • need a whole new functionality to generate UIDs
  • doesn't resolve the problem for old existing themes

The second idea below however addresses all the problems but needs more coding and server resources:
PROS:

  • solves ALL problems
  • fully backward compatible (old WP versions)
  • solves the problem for OLD existing themes as well
  • solves the "Active installs" count problem (and so the Popular page list will be automatically fixed counting only wp.org's theme active installs, not of the external)
  • no need to change/add anything from developer's standpoint
  • no need for UUIDs
  • no need for changes in the core

CONS:

  • a little more complicated to implement
  • requires (a little) more system resources
  1. For all OLD and NEW themes and all of their versions in the svn an MD5 HASH will be calculated based on the concatenation of (slug)+(author)+(author uri) (These are reported even by WP 3.0, haven't checked older versions).
  2. A dedicated database with relations theme slug - MD5 HASH will be created. So for the theme "ABC" there will be a list of N hashes for the present N versions of that theme. Adding of new versions and new themes will expand those lists.
  3. API will calculate the MD5 HASH when the data is received from a site for an active theme report; will check in the above database based on the theme's slug if the HASH is present; if yes - the theme is native for wp.org; if not - the theme is an external one; if external - doesn't send an update info back to the site.
  4. Same will done by the Active installs calculator - will count only those active themes, which HASHes are present in the database for the specific theme slug. All the themes will get their REAL counts.

@joyously @grapplerulrich @otto42

This ticket was mentioned in Slack in #themereview by dingdang. View the logs.


3 months ago

#29 @earnjam
3 months ago

Your 2nd option is closer what I was going to suggest. I think what gets used to compute the hash could be debated.

To save processing resources on the api.wordpress.org side, the best option would be to compute that hash on the end-user/requestee side before making the request and then pass that along as part of the API call. That would require an update to wp_update_themes() in Core.

I'm not sure the mindset of whether it's ok to add that to previous branches or not, but in theory it could be added to all the branches getting security patches, so all the way back to 3.7. It's not a security patch though, so not sure it's something the lead devs would want to do, but that would essentially cover about 95% of installs based on the stats from https://wordpress.org/about/stats/. At for least those who allow automatic updates or install them themselves.

At the very least, making that update in the latest version would cover a big chunk of installs.

#30 follow-up: @dingdang
3 months ago

@earnjam that's good. And to keep anyway backward compatibility with versions of WP that are not updated - only if the hash is not provided by the client, it can be calculated by the API itself.

So there will be a hash in all the cases, while it will be calculated by the API only for outdated clients.

Checking for a hash match would be almost of the same comp.power - checking in a list of 4K (themes) or 40K (hashes) may be (?) the same.

#31 in reply to: ↑ 30 @earnjam
3 months ago

Thinking more about it, while I like the idea in theory of a per-version theme/plugin hash for full backwards compatibility and addressing all of the issues with the current system (including the "controversies" over developers gaming the popular themes algorithm), I'm not sure it's really the best option.

The process of generating the hashes for all the themes/plugins on .org would be a tremendous undertaking for a (relatively) small benefit and just wouldn't happen. Not to mention the extra complexity added to the update API, and the possibility of a user making a tiny edit to the theme/plugin header, screwing up their hash value and breaking update checks.

The related tickets have two other much simpler options suggested.

#23318 has a suggestion to include a GUID in the theme/plugin header. That basically just further specifies what to look for on the API side. It puts the onus on the developers to add, but could be verified as part of the submission/upload process before getting published, so everything on .org would have one.

#32101 suggests adding a private flag to prevent update checks from occurring at all. Again, onus there is on the developers of external themes/plugins to add, but it would be even simpler than the GUID.

Both of those make preventing bad auto-updates easy, and would help "verify" true installs on the API side. They don't, solve the problems for older themes/plugins, but would stop the problems going forward and be far simpler to implement. You're much more likely to get some traction on either of those options.

#32 @joyously
3 months ago

The process of generating the hashes for all the themes/plugins on .org would be a tremendous undertaking for a (relatively) small benefit

That is just one way to do it. It would not need to have a database or generating hashes for all versions. The API could generate the hash when the theme/plugin is checked for updates. No need to pre-calculate them all. A database could be used, however, to store the hashes already calculated.

Not to mention the extra complexity added to the update API

Extra complexity is needed, because the simple one that exists now has problems.

the possibility of a user making a tiny edit to the theme/plugin header, screwing up their hash value and breaking update checks.

If you only use a few fields, it's unlikely that tiny edits will affect the hash. Or, to make it more bulletproof, hash the entire file and present the user with enough information to decide for himself if the update is applicable or not.

a GUID in the theme/plugin header

That doesn't solve the problem of old themes/plugins or child themes. There would be a lot of them without a GUID. This will be the same as is being proposed for the minimum PHP version. It's fine for everything being updated, but all those things not updated are where the problem lies.

a private flag to prevent update checks from occurring at all

This is the same problem. Child themes, old themes/plugins won't have the new flag. The change needs to happen on the repo side of things to handle all the cases.

#33 follow-up: @dingdang
3 months ago

@earnjam it's about just 3 fields, that the user is not supposed to touch: theme name, author and author URI. If I'm a user and I change those, that means I don't want to get updates. I am not supposed to change the author or his URI nor the theme name. I think this is not a problem, but if there is a change in these 3 fields it is not a screw by accident, but with purpose.

So that's the same as adding UID manually, but with many additional benefits:

  • it is handled automatically
  • theme authors don't have to do anything - no changes to style.css or anywhere from their standpoint
  • external authors don't have to do anything to prevent their themes to be messed by accidental updates - no need for "private" tag
  • active installs will count automatically just the real active installs of the wp.org's theme even for the old so-discussed cases
  • very little code for the core (which is just for calculations optimization)
  • check for updates at the backend (API) is almost the same, the search is performed in a table of hashes instead of names (no real complexity that's just a tweak)
  • backward compatibility for the old versions of WP and old versions of the themes w/o the need to change them which is the most exciting part of the idea(!)

With simple words - implementing it that way will put everything in place in a way like it was so from the beginning.

Generating of hashes for the current themes and all of their versions is a one-time job and is trivial, shouldn't take long:
themes: 4876
total versions: 56730
average versions per theme: 11.6

Last edited 3 months ago by dingdang (previous) (diff)

#34 in reply to: ↑ 33 @dingdang
3 months ago

By the way out of curiosity I performed 10 million checks "if exists" in a list of 4500 keys in a hash table and 45000 keys (10x) and the cpu time was almost identical. MD5 generation (just for cases where the client didn't provided it) is also fast enough, so I guess all of this will not impact the API's servers cpu load at all.

P.S. I also imagined that the actual number of hashes will be close to the number of themes, as for the different versions they will be the same if there is no change in the author fields which is really rare.

Last edited 3 months ago by dingdang (previous) (diff)

#35 @dingdang
3 months ago

Hey, since I can't find a way to attach here .pdf files, I've copy/pasted something final as a proposal here. More of the things are already explained, there are some new.

Proposal for a solution to the “collisions” of WordPress themes.

Table of contents:
Introduction
Formal composition of a unique ID.
API: determination of available theme updates.
Other: calculation of theme's active installs.
Benefits.
Technical data.

Introduction.

A collision is a term that is describing the slug match of two themes that are not related to each other but have the same name.

Two main problems are related to these cases of collisions:

  1. If there is a theme in the wordpress.org's database of themes and another one, created by another author, the second one would get an “Update” option and possibly will be replaced by the theme, published at wordpress.org. This can happen also to well distributed themes after uploading a new theme with the same name at wordpress.org and unexpectedly after an unwanted update to replace themes of web sites published long time before that.
  2. Calculation of active installations is taking in count not just those of the themes from wordpress.org's database, but as well other external themes as well random child themes residing in a folder with a matching name. Thus, authors exploit this to artificially place their new themes on top of the list by catching names of long time distributed external popular themes.

The proposed techniques solve all of the problems, with very little coding, while keeping backward compatibility, and solving the related problems for the old themes as well, not just the newly released.

Formal composition of a unique ID.

  1. Need to chose a separator, that is currently not allowed to be present in theme names. Ex: “|”; will be used below.
  2. For every theme since WordPress 3.0 (and may be even earlier versions) the code is already reporting the following three strings:
  • theme Slug (ex: nicetheme)
  • theme Author (ex: John Doe). May not be present, if not – this is an empty string.
  • theme Author URI (ex: http://johndoe.com). May not be present, if not – this is an empty string.
  1. Compose a string: “slug|author|author_uri”. Ex: “nicetheme|John Doe|http://johndoe.com
  2. Calculate the MD5. Ex: “fff8d626c2e8cd611f66827a55028d7a”
  3. Final UID for that specific theme by concatenating again the slug with the MD5 using the same separator: “slug|MD5”. Ex: “nicetheme|fff8d626c2e8cd611f66827a55028d7a”

Since all of the three fields are present in the themes (trough style.css) and are reported by WordPress (even by the very old versions) there is no need to implement and add any new data to the themes (like manually adding codes/hashes) nor to the code of the core or API to handle them.

A one-time calculation of the UIDs for the current themes must be performed and store the list in a table.

For all new theme version updates and new theme uploads, the UID will be calculated and added to the same table.

As the UID contains the theme slug as a prefix, it is trivial to relate a given UID unambiguously to the theme slug if needed.

Optimization: (not needed, see the technical data) several lines of code can be added to WordPress core to calculate and add to the API call the UID at the client side, to save some API CPU time.

API: determination of available theme updates.

If the calculated UID is not provided by the client, it is calculated based on the three fields received trough the API call (as described above) by the API's engine.

A small update (several lines of code) is needed to identify themes not by just a slug, but by this new UID, checking in the table of UIDs. Only if the UID is present the algorithm continues by identifying the theme slug from the UID and checking as usual if there is newer version and if so – to send back an “update available” reply.

Other: calculation of theme's active installs.

Active installations of a given theme are calculated by the sum of active installations for all the IDs related to that theme. This will result in real numbers and the “Popular themes” list will be sorted using the real numbers for the themes at wordpress.org, automatically excluding all the counts related to external themes (i.e. the wrong current numbers will be corrected to their true values).

Benefits.

  • it is handled automatically;
  • solves all the problems;
  • fully backward compatible (old WP versions);
  • solves the problem for the old existing themes as well;
  • solves the "Active installs" count problem – active installs will count automatically just the real active installs of the wordpress.org's theme even for the old cases and exploits;
  • theme authors don't have to do anything – no changes to style.css or anywhere from their standpoint;
  • external authors don't have to do anything to prevent their themes to be messed by unwanted updates – no need for "private" tag;
  • no need for changes in the core (unless for optimization);
  • the check for updates at the backend (API) is almost the same, the search is performed in a table of UIDs instead of theme slugs;
  • since there is no change in the theme's structure and new fields, the updates related to API and Active installations counting are independent; can be done at different points in time;
  • backward compatibility for the old versions of WordPress and old versions of the themes w/o the need to change them which is the best part of this proposal.

With simple words – implementing it the proposed way will put everything in place in a way like it was so from the beginning of WordPress existence.

Technical data.

Some tests were performed to help on decisions.

  1. An average single server executes 100,000,000 md5() calculations using PHP in 40 seconds. If this is the daily number of requests for update checks to the API, that means only 40 seconds of additional CPU time will be needed, even if all the calculations are done by the API. In fact the optimization to make the calculation at the core is not needed. This will save programmers from adding code to the core and changing the API protocol by adding new fields.
  2. There are:
  • 4876 total themes at wordpress.org;
  • 56730 total different versions;
  • 11.6 average versions per theme;
  • 1.5 the average ratio of different UIDs per theme (a single theme has more than one related UID if the author or author URI have been changed over the time);
  • 7300 (approximately) generated UIDs for the current themes (the new list to search in, instead 4876), i.e. no difference in the CPU time needed to process search requests.

07/21/2017
by dingdang

This ticket was mentioned in Slack in #themereview by dingdang. View the logs.


3 months ago

#37 @williampatton
3 months ago

I hate to play devil's advocate against something which sounds like a thoroughly thought out potential solution.

I can see an issue in the case of author and author URI changes. If either of those are changed then existing users from before the switch would lose benefits of update notifications (and lose the future updates too!).

Also sometimes themes are willingly traded to other users - I see it mostly with themes that authors do not have time to maintain, they pass to someone who wants to maintain it. Currently the new theme owner will inherit active installs with the theme.

If the UID changed with author switch then the new author will not inherit the userbase along with the theme and the pre-existing users will no longer get updates. It is expected that if a theme willingly changes hands then the existing users will come along with the theme.

To fix that newly added UIDs would need to be relational with any previously generated UIDs of the same theme so that pre-existing users are not lost (and they don't miss out on possible updates) when UID changes.

Linking of UIDs in this way should not negate any of the fixes for popular/actives abuse nor the issue of non-org themes being updated from .org source of matching slug/name with the above proposal intends to deal with.

P.S. If we're thinking about CPU load then making the UIDs relational like this will almost certainly result in many more lookups and additional processing to combine stats across different UIDs. I haven't tested total load if such changes were made but I would imagine it's not a negligible amount.

Last edited 3 months ago by williampatton (previous) (diff)

#38 @joyously
3 months ago

I can see an issue in the case of author and author URI changes.

That's because in the explanation, he left out that the update check is also looking at a version number. The update check should be asking the question "Does this name/author/version info match the one in the repository?" If yes, then "Is there a newer version for that theme?"

#39 @dingdang
3 months ago

@williampatton

May be I had to give some examples. The raised cases are handled by the method very well. There is no problems to change the author and author URI infinitely trough the new versions at wordpress.org.

Example 1:

Let say there is a theme uploaded at wordpress.org that is named "ABC", by author "XYZ" with author URL "http://xyz.com". That theme has UID: abc|fff8d626c2e8cd611f66827a55028d7a

Later the theme is acquired by another author: "MNO" with author URL "http://mno.com". This new version of the SAME theme (since the slug is not changed) has UID: abc|gfkshjg41765jg2j53nghg3ghf76323

Let both these (the older and the newer versions) are installed at two separate WordPress sites.

Then another newer version gets published by the latest author. That new version of the theme will have the same UID (since neither of slug, author and author URI's are changed): abc|gfkshjg41765jg2j53nghg3ghf76323.

It's clear that the second site will get an update. It's not so clear on first look that the first site will be updated, too (so I guess that's why it was raised as a question) but in fact it will be updated:

  • the API gets a request to update a theme with UID abc|gfkshjg41765jg2j53nghg3ghf76323
  • it checks against the list of native themes' UIDs and it founds it (as it is an old version from the SVN) and continues (as in the workflow I described in the proposal) otherwise it would stop here
  • it extracts the actual theme slug which is "abc" just from the UID (getting the part that precedes the delimiter)
  • continue with the algorithm and return back to the site a reply with the current version of the theme "abc"

Voila!

Example 2:

Same happens with those cases of developers that have distributed their theme some time before their upload at wordpress.org. Their initial upload at wordpress.org will have the same UID that would be generated for their prior-upload installations. And so automatically all of them will get "native" to wordpress.org. Later if they upload updates, the new versions will have either the same UID, or linked to the same theme (as in the example 1) so all of their installations (even those installed before their upload to wordpress.org) will get an authoritative update.

To explain again:

  • there are N themes "native" for wordpress.org (those that are currently active) for which the UID is precalculated (the "one time job" in the proposal) for all of their old and the current version in the SVN, and a table with that list is created
  • there are a total of N*1.56 UIDs (that's because some themes have "evolved" and got changed author or (in most of the cases) the author URI). I've got that real 1.56 coefficient from the partial download I've done of about 1400 themes that I've downloaded all versions of (about 16000 in total)
  • so 1 theme is identified in general by more than one UID of the new type
  • any site with any of these UIDs is unambiguously linked by the API to specific theme slug (the part that precedes the delimiter) and the API sends back the new version as usual (there is no changes in this part of the code at all)
  • any external theme with the same name however comes with different UID and so the API stops at that point where this UID is unknown (not present in the table of UIDs) and so doesn't send back an update info, nor counts this as active install.

It is very simple but efficient.

Last edited 3 months ago by dingdang (previous) (diff)

#40 @dingdang
3 months ago

@joyously there is no need to add version in the question, there is no need to change the API request (and this is the good part, as it is backward compatible with the API requests back to 3.0 or even earlier, I checked down to 3.0).

The API needs just the set of slug, author and author URI (all of which are currently being sent if present in style.css). Then everything is handled as described and magically it works and solves all the problems.

Last edited 3 months ago by dingdang (previous) (diff)

#41 @dingdang
3 months ago

P.S. Another good part of this is that only very simple code must be added in several places:

The API:

  1. calculate the UID based on slug, author, author id
  2. check in the table of native UIDs
  3. if the UID is present, slug = the part that precedes the delimiter and continue as usual
  4. else, ignore that theme and continue (the same way it is ignored if the slug is not present in wordpress' database now)

The "one time job":

  1. foreach active themes and all of their versions in the SVN
  2. read style.css and calculate the UID based on slug, author, author id
  3. store the UID in the table of UIDs (if it's non existing)

On new theme/update approval:

  1. calculate the UID based on slug, author, author id
  2. store the UID in the table of UIDs (if it's non existing)

The active themes counter/collector:

  1. calculate the UID based on slug, author, author id
  2. checks if it is present in the table of UIDs
  3. only if it is present increase the counter for the slug = the part that precedes the delimiter
  4. count at a second table active installs for non-existing UIDs as well (as it does now for non-existing slugs - to be able to inform how much active installs has a newly uploaded theme so the reviewer could investigate if it is a legitimate author that must be linked to these copies, or someone uploaded someone else's theme).

The code that reports "currently has ... active installations"

  1. It must report not just >500 cases but now the exact number of installations of the exact UID match (which is for the exact combination of slug, author, author uri) - we have this in the table 4. from the previous section.

May be these are all in all 30-50 lines of code.

Last edited 3 months ago by dingdang (previous) (diff)

#42 follow-up: @grapplerulrich
3 months ago

Thank you @dingdang for spending time thinking up a solution.

The main issue is with the author name. The author name is the style.css can anything and the theme author could change this text whenever they wish. WordPress.org knows who the theme belongs to from the uploader process and not from the style.css.

We have at times have theme transferred from one account to another and then the new owner has changed the author information in the style.css.

A common method that most package manager use is to have the developer's name and then theme or plugin name. So grapplerulrich/mytheme.

We could fix the problem with the active installs this way. There may be a small drop in active installs for those users who are still using the old version. Potentially in the future we could track the user changes.

This would not work for the updates untill we have a full list of author name changes.

The code that handles this is WordPress.org and I don't think yet public.

There is a bit of uncertainty for the future as there is plan to parse the readme file in themes.

If we look at the plugins headers the author's name can be any text and the contributors are defined the readme file and they need to match the w.org usernames. I expect that same to happen for themes. The author text in the style.css can be your full name and the w.org username defined in the readme.

When we will be able to parse the readmes then a theme could have multiple owners.

We would need to check with the #meta team if there is a way to have list of related usernames for a user.

#43 in reply to: ↑ 42 @dingdang
3 months ago

@grapplerulrich

Replying to grapplerulrich:

The main issue is with the author name. The author name is the style.css can anything and the theme author could change this text whenever they wish. WordPress.org knows who the theme belongs to from the uploader process and not from the style.css.

The proposed method does just one thing: creates a new single table. That means it is not incompatible with any other parts of the code, even like in your example that wordpress.org knows the author by other means.

To understand better how it works, here is actually how I came up with that simple idea - I sniffed back to WP 3.0 its API requests and found what is common for all of the versions - all of them are sending back to the API these 3 values (slug, author, author URI). Then we know there are N themes and M total versions at the SVN that I call "native". All we have to do is to identify them by a unique string, put that in a table and that's all. From that point forward the API will know if a request coming for update check is about a "native" theme or an external one (and so used for the counting of active themes as well).

To clarify more - the MD5 is there not because it is needed, just for esthetics. Now after I did more tests (I've downloaded partially 36000 different versions of themes, 3100 unique themes) I can come up with VERSION 2 of the method which is even simpler.

Key factors:

  • MD5 processing is not needed (so not at all increasing of CPU time for API)
  • author URI is not needed to be taken in consideration (and so lower the chance for a client to lose future updates if he accidentally changes author's URI line in style.css file after installation) - just the additional author string is enough for this to work.

VERSION 2 UPDATE

  1. No need of changes in the core even for optimization as there is no MD5 anymore
  2. UIDs are composed simpler by just concatenating the slug+delimiter+author strings
  3. No need to attach the second time the slug - it is already in the concatenated string
  4. The invention of the method is the table of UIDs and it is the same - just a simple list of UIDs
  5. Identification of the theme slug is the same (extracting from UID the string that precedes the delimiter)
  6. New uploads to the SVN result in addition (if not present) of a new entry to the table as in version 1

Additional benefits of version 2:

  • no need for MD5 calculations
  • less fields used for identification, the list of UIDs is almost the same in numbers as the count of the unique themes (as the author field is almost never changed, the approximate coefficient is 1.17 which means the table of UIDs will have approx. 5700 entries for 4900 themes, an excess of just 800)

Cheers

#44 @dingdang
3 months ago

Here is the version 2, no need of MD5 anymore and with an explanation in How it works section, easy to understand by anyone.

Proposal for a solution to the “collisions” of WordPress themes.
Simplified Version 2.

Table of contents:
Changes compared to Version 1.
Introduction.
Formal composition of a unique ID.
API: determination of available theme updates.
Other: calculation of theme's active installs.
Benefits.
How it works.
Technical data.
Software changes.

Changes compared to Version 1.

  • eliminated the need of the Author URI field
  • eliminated the need to calculate MD5 hashes
  • new section “How it works”
  • new section “Software changes”

After analysis of the content of the current set of “native” to wordpress.org themes and all of their versions (4876 themes, 56730 versions) a conclusion has been made that only two fields are needed in the process: the theme slug and the author. The author URI is redundant.
As a result the composition of the UIDs is simplified thus calculation of MD5 hashes is unnecessary which simplifies even more the changes to the system.

Introduction.

A collision is a term that is describing the slug match of two themes that are not related to each other but have the same name.

Two main problems are related to these cases of collisions:

  1. If there is a theme in the wordpress.org's database of themes and another one, created by another author, the second one would get an “Update” option and possibly will be replaced by the theme, published at wordpress.org. This can happen also to well distributed themes after uploading a new theme with the same name at wordpress.org and unexpectedly after an unwanted update to replace themes of web sites published long time before that.
  2. Calculation of active installations is taking in count not just those of the themes from wordpress.org's database, but as well other external themes as well random child themes residing in a folder with a matching name. Thus, authors exploit this to artificially place their new themes on top of the list by catching names of long time distributed external popular themes.

The proposed techniques solve all of the problems, with very little coding, while keeping backward compatibility, and solving the related problems for the old themes as well, not just the newly released.

Formal composition of a unique ID.

  1. Need to chose a separator, that is currently not allowed to be present in theme names. Ex: “|”, will be used below.
  2. For every theme since WordPress 3.0 (and may be even earlier versions) the core code is already reporting the following two strings:
  • theme Slug (ex: nicetheme)
  • theme Author (ex: John Doe). May not be present, if not – this is an empty string.
  1. Compose thе UID: “slug|author”. Ex: “nicetheme|John Doe”

Since all of the two fields are present in the themes (trough style.css) and are reported by WordPress (even by the very old versions) there is no need to implement and add any new data to the themes (like manually adding codes/hashes) nor to the code of the core or API to handle them.

The invention: A one-time composition of the UIDs for the current themes and all of their versions must be performed and store the list in a table. For all new theme version updates and new theme uploads, the UID will be composed and added to the same table if it's not existing already.

As the UID contains the theme slug as a prefix, it is trivial to relate a given UID unambiguously to the theme slug if needed by extracting the string that precedes the first occurrence of the separator. No other relations need to be stored.

API: determination of available theme updates.

A small update (several lines of code) is needed to identify themes not by just a slug, but by this new UID, checking in the table of UIDs. Only if the UID is present the algorithm continues by identifying the theme slug from the UID and checking as usual if there is newer version and if so – to send back an “update available” reply.

Other: calculation of theme's active installs.

Active installations of a given theme are calculated by the sum of active installations for all the UIDs related to that theme. This will result in real numbers and the “Popular themes” list will be sorted using the real numbers for the themes at wordpress.org, automatically excluding all the counts related to external themes (the wrong current numbers will be corrected to their true values).

Benefits.

  • it is handled automatically;
  • solves all the problems;
  • fully backward compatible (old WP versions);
  • solves the problem for the old existing themes as well;
  • solves the "Active installs" count problem – active installs will count automatically just the real active installs of the wordpress.org's theme even for the old cases and exploits;
  • theme authors don't have to do anything – no changes to style.css or anywhere from their standpoint;
  • external authors don't have to do anything to prevent their themes to be messed by unwanted updates – no need for "private" tag;
  • no need for changes in the core (unless for optimization);
  • the check for updates at the backend (API) is almost the same, the search is performed in a table of UIDs instead of theme slugs;
  • since there is no change in the theme's structure and new fields, the software updates related to the API and Active installations counting are independent; can be done at different points in time;
  • backward compatibility for the old versions of WordPress and old versions of the themes w/o the need to change them which is the best part of this proposal;
  • handles well the cases where a theme is acquired by another author – the theme will continue to catch updates;
  • handles well the cases of themes distributed by an author prior uploading it to wordpress.org – all previous installations will continue to catch updates from wordpress.org.

With simple words – implementing it the proposed way will put everything in place in a way like it was so from the beginning of WordPress existence.

How it works.

  • There are N themes "native" for wordpress.org (those that are currently active) for which the UIDs are precomposed for all of their old and the current versions in the SVN, and a table with that list is created; only unique values are stored, they act like a database of fingerprints, like humans can have 10 different fingerprints that link to one and the same person;
  • There are a total of N*1.16 UIDs (that's because some themes have "evolved" and got changed their authors);
  • Which means that one theme is identified in general by more than one UID;
  • Any site with any of these UIDs is unambiguously linked by the API to specific theme slug (the part that precedes the delimiter) and the API sends back the new version as usual;
  • Any external theme with the same name however comes with different UID and so the API stops at that point where this UID is unknown (not present in the table of UIDs) and as a result doesn't send back an update info, nor counts this as an active install.

Technical data.

Some tests were performed to help on decisions.

  1. There are:
  • 4876 total themes at wordpress.org;
  • 56730 total different versions;
  • 11.6 average versions per theme;
  • 1.16 the average ratio of different UIDs per theme (a single theme has more than one related UID if the author has been changed over the time);
  • 5600 (approximately) generated UIDs for the current themes (the new list to search in, instead 4876), i.e. no difference in the CPU time needed to process search requests.

Software changes.

This is a guess where in the system software updates are needed.

The API:

  1. compose the UID based on slug, author
  2. check in the table of native UIDs
  3. if the UID is present, slug = the part that precedes the delimiter and continue as usual
  4. else, ignore that theme and continue (the same way it is ignored if the slug is not present in wordpress' database of slugs now)

The "one time job":

  1. foreach active themes and all of their versions in the SVN
  2. read their style.css and compose the UID based on slug, author
  3. store the UID in the table of UIDs (only if it's non existing)

On new theme/update approval:

  1. compose the UID based on slug, author
  2. store the UID in the table of UIDs (only if it's non existing)

The active themes counter/collector:

  1. compose the UID based on slug, author
  2. checks if it is present in the table of UIDs
  3. only if it is present increase the counter for the slug which is the part that precedes the delimiter
  4. count in a second table the active installs for non-existing UIDs as well (as it probably does now for non-existing slugs – to be able to inform how much active installs has the newly uploaded theme so the reviewer could investigate if it is a legitimate author that must be linked to these copies, or someone uploaded someone else's theme)

The code that reports "currently has ... active installations"

  1. it must report not just >500 cases but now the exact number of installations of the exact UID match (which is for the exact combination of slug, author) - we have this in the table 4. from the previous section
  2. to prevent abuse on theme updates – if there is an author change (those cases are very rare) and the number of active installations of that newly composed UID is not 0 (or close to 0 taking in mind that there may be testing installations of that version), it shouldn't be auto-approved by themetrackbot but a reviewer must check manually the author's change in style.css to avoid hijacking of an external theme's UID

07/22/2017
by dingdang

Last edited 3 months ago by dingdang (previous) (diff)

This ticket was mentioned in Slack in #themereview by dingdang. View the logs.


3 months ago

This ticket was mentioned in Slack in #themereview by dingdang. View the logs.


3 months ago

This ticket was mentioned in Slack in #themereview by dingdang. View the logs.


3 weeks ago

Note: See TracTickets for help on using tickets.