#424 - Performance improvements: backend technologies
| Identifier | #424 |
|---|---|
| Issue type | Feature request or suggestion |
| Title | Performance improvements: backend technologies |
| Status | Closed (rejected) |
| Tags |
Type: Performance (custom) |
| Handling member | Chris Graham |
| Addon | core |
| Description | 1) automatic de-tempcoding for blocks that don't need it. Before anything goes into the block cache, see what's in there - if there are no 'non-static' symbols then it can just be done as plain text.
2) combined self-learning cache for pages. Currently done for language strings, but use it now for lots of things: language, config, panels, blocks - allows much smarter bulk pre-loading of only what we usually need, and then the rest upon demand. 3) flag when we need to preprocess in any compiled block of Tempcode, so we don't pre-compute if not needed 4) drop SHIFT_ENCODING'ing, and just use SET/GET. As the current SET/GET (done since 'preprocessing' was formalised in the runtime process) can do the same thing with less conceptual baggage. |
| Steps to reproduce | |
| Funded? | No |
The system will post a comment when this issue is modified (e.g., status changes). To be notified of this, click "Enable comment notifications".


Comments
------------------------
Currently each Tempcode object is:
- a list of 'closures' - PHP function calls with arguments (seq_parts).
- a lump of executable code that defines necessary functions (mostly the result of compiled independent segments of templates, e.g. bits of static text to output - or to template parameters).
- (NB: the engine supports both literal function calls, and fake function calls done via 'eval' and pre-setting of variables)
- a list of preprocessable seq_parts, that will be picked up on prior to the main output starting (indicating required loading of CSS, etc)
- 'bound' Tempcode parameters which the seq_parts may reference (optional)
- various kinds of auditing data for if the tree is being viewed (optional)
An is_empty/is_non_empty operation does not need to check all closures to finish, as the first usually lets the optimiser know the state of affairs.
However, depending on execution parameters, it may let the evaluation finish and cache it in the object, so that the evaluate/evaluate_echo call later on can run a bit faster.
An attach call appends seq_parts from what is being attached, to what it is being attached to.
An evaluate/evaluate_echo call works out the contents of the object as a string.
Tempcode objects are composed into a tree during Composr's execution. This is done using 'attach', and also via parameter passing. This tree is what you see when you view it using the contextual tools (more or less).
This is an extremely elaborate and complex system that makes Composr possible, as an extremely flexible and modular system with huge scope where non-programmers can extend it and theme it easily.
It is the programming language at the heart of Composr, a bit like a virtual machine.
Tempcode is the biggest performance bottleneck in Composr, and the only thing we have to do that is 'expensive' that other systems don't. Other systems hard-code a lot more, or require PHP programming for themeing, or just generally make a lot of assumption about how stuff is modularised and composed.
Changes to boost performance:
- Allow closures to be defined using native function calls (esp to the 'ecv' function) without needing a wrapper function
- The Tempcode compiler currently identifies preprocessable seq_parts, let it also identify how a Tempcode object is 'context sensitive' -- that is, if it depends on Tempcode symbols like system time, or variables that may have been set in other parts of the Tempcode tree.
- If it is entirely non-context-dependent, we can let serialisation operations (caching operations - block cache & template cache) actually fully evaluate it and store a make_string_tempcode of that evaluated string (big optimisation)
- If it is partially context-dependent, we can let it get evaluated at the point of decaching, to simplify attach/parameter-pass/preprocessing operations later on in execution, and to reduce memory usage
- Otherwise we leave in it's full seq_parts form
- We can reduce what counts as preprocessable. We can use GET_LOCAL/SET_LOCAL instead of GET/SET, as a hint that we don't need to preprocess (SET) or be contextually-dependent (GET)
- Serialisation operations can do an 'optimise' operation that compiles in some bound Tempcode parameters, so that the seq_part absorbs them rather than references them. Unused Tempcode parameters would be thrown away. The algorithm must recognise when to compile in, and when not to, based on execution and memory complexity (i.e. a big Tempcode parameter used more than once in the template definitely should not be compiled in - it should stay a bound variable, so it can be evaluated just once)
- Drop the 'last_attach' feature from the Tempcode engine, as it slows down attach operations considerable (turns an array merge into a loop with a write in it). It is very rarely used, and GET/SET can be used instead to provide cross-template hints. Especially it is not so important now that CSS3 solves the main use (which used to be intelligent comma insertion, but now we'd use HTML lists and CSS for this kind of stylistic thing)
- Do array merge operations using 'array_splice' rather than array_merge or a "[]" loop.
- During evaluation make sure we use ob_get_clean rather than ob_get_contents+ob_clean -- this means dropping older versions of PHP
- As well as listing preprocessable seq_parts, also store a very simple flag to indicate if there are any. Potentially that replaces an empty loop with an 'if' test. Possibly we can even have it as NULL or true (not false) and therefore do an isset($variable->has_preprocessables) which would give the right answer even if $variable was a string (so that removes another 'if' check).
- Make code_to_prexecute an array, with codenames as keys, so we know quickly when we don't need to extend it on an attach (due to potentially having the code already) and also conserve memory (as PHP will share references). Actually why is it separate to seq_parts - check this out.
- Review if we really need a difference between is_definitely_empty/is_empty/is_really_empty in the engine; in particular is_empty just calls is_really_empty which is an extra unneeded function call
- Do we really need to give each seq_part and preprocessable_bit a copy of the parameters during a bind operation? We need to copy it through due to attaches squashing things and us not wanting to literally store a full complex tree (extra processing time/memory); maybe we can detect precisely what parameters are needed by what seq_parts using an array_intersect call on some kind of stored list.
- Add option to allow disabling pre-processing. HTML5 even makes in-body <link> tags valid, so its not such an issue. The proposed self-learning algorithm would pick up on it, so this would only happen when the cache was empty anyway. The 'PAGE_LINK' moniker preprocessing is nice but again the self-learner removes the performance gain from the mass-detection-and-loading of this.
What would be the required minimum version of PHP needed to accomplish "During evaluation make sure we use ob_get_clean rather than ob_get_contents+ob_clean -- this means dropping older versions of PHP"?
With an extended self-learning cache, certain API operations will not be needed. Particular large parts of the language API would hardly ever need to be called. So it makes sense to split up this API so that usually most of it is not loaded.
Similarly, the database API is bigger than it needs to be - most page loads don't even need writes, and if they do (such as bumping view counts) we can code our writes in direct SQL so that less API is needed.
There's quite a lot in sources/support.php that is certainly not needed on average page loads.
Even something like 'get_bot_type' is almost a null-op in most cases (due to it detecting what is NOT a bot in just a few lines of code) - so we can make functions like these stubs, loading only the full implementation if really needed.
Another good way of thinking of it is that if we can have an algorithm that 'learns' common cases (or if we hard-code them) we can avoid full implementations in most cases.
One of the biggest slow-downs in Composr is simply loading up PHP code files, not executing it.
Here are a few possible Tempcode optimisations, to be performed at cache saving:
1) If a template parameter is used zero times in a template, discard it
2) If a template parameter is used one time, compile it in directly, rather than referencing it. That will make cache-dereferencing much more performant for a number of reasons (data quantity, nesting depth, evaluation time).
3) Compile out expressions we know as static, for example 'IF_NON_EMPTY' calls on parameters we know are fixed as non-empty.
4) Escapings can be compiled-in.
This is all a bit tricky as we have our code compiled PHP at this point. We probably would need to invoke the static Tempcode implementation for cacheable operations and convert back to the PHP implementation via a cross-compile, after running our optimisation sweep. Essentially the static implementation becomes an abstract syntax tree (with interpreted language for it) and we do a JIT-compile to the final implementation. Kinda.
This should all help quite a bit. We want the cached code to essentially be tuned for almost direct echoing out without intermediary reprocessing.