How much space do document versions take up in Alfresco?

by Heiko Robert
Categories: alfresco
cover-image

Reports on Alfresco

The question came up in the Alfresco Discord Channel: How to evaluate how much disk space is occupied by document versions in Alfresco?

Alfresco stores all information about documents in a relational database. However, this data is highly normalized and not so easy to use for professional evaluations. For efficient reporting, it is therefore recommended to collect the desired data per event in separate database tables so that they can be easily evaluated. For this you have to develop a corresponding behavior in Java, which creates entries in a custom reporting table for the events to be registered (e.g. create, update, delete). However, this also means that you have to know in advance what you want to evaluate. But what do you do if you have not collected such data?

If you understand the database model of the Alfresco repository, it is possible to run these reporting queries directly on the database, despite normalization. However, depending on the amount of data stored in Alfresco, this must be used with caution, as very large amounts of data must be combined and analyzed during the query in the database which may produce a multiple of data in RAM and/or disk of the queried tables itself. It may therefore be useful to split the subsequent query into sub-steps and store the intermediate results in new temporarily tables. In Postgres, this can even be cached in so-called materialized views and accelerated enormously.

Database Query

SELECT 
    n.id AS node_id,
    concat(s.protocol,'://',s.identifier,'/',n.uuid) AS node_ref,
    nnp.string_value AS file_name,
    sum(u.content_size) AS sum_kb_versions
FROM alf_node n
JOIN alf_store s ON (n.store_id=s.id AND s.protocol='workspace' AND s.identifier='SpacesStore')
-- find name prop
join alf_node_properties nnp ON (
    nnp.node_id=n.id AND
    nnp.qname_id = (SELECT id FROM alf_qname WHERE local_name='name' AND ns_id IN (SELECT id FROM alf_namespace WHERE uri='http://www.alfresco.org/model/content/1.0'))
)
-- filter nodes having version > 1.0
join alf_node_properties p1 ON (p1.node_id=n.id
            AND p1.qname_id = (SELECT id FROM alf_qname WHERE local_name='versionLabel' AND ns_id IN (SELECT id FROM alf_namespace WHERE uri='http://www.alfresco.org/model/content/1.0'))
            AND p1.string_value!='1.0'
        )
-- find version nodes having the same nodeRef as 'frozenNodeRef' property
join alf_node_properties vnp ON (
    vnp.qname_id = (SELECT id FROM alf_qname WHERE local_name='frozenNodeRef' AND ns_id = (SELECT id FROM alf_namespace WHERE uri='http://www.alfresco.org/model/versionstore/2.0'))
    AND concat('workspace://SpacesStore/',n.uuid) = vnp.string_value
)
-- also filter version nodes > 1.0
JOIN alf_node_properties vvnp ON (
    vnp.node_id = vvnp.node_id AND 
    vvnp.qname_id=(SELECT id FROM alf_qname WHERE local_name='versionLabel' AND ns_id IN (SELECT id FROM alf_namespace WHERE uri='http://www.alfresco.org/model/versionstore/2.0')) and
    vvnp.string_value!='1.0'
)
-- find content id of the version node
join alf_node_properties cnp ON (
    vnp.node_id=cnp.node_id AND 
    cnp.qname_id = (SELECT id FROM alf_qname WHERE local_name='content' AND ns_id IN (SELECT id FROM alf_namespace WHERE uri='http://www.alfresco.org/model/content/1.0'))
)
JOIN alf_content_data d ON (cnp.long_value=d.id)
JOIN alf_content_url u ON (d.content_url_id=u.id)
GROUP BY n.id, nnp.string_value, s.protocol, s.identifier

Picture credits title picture: aitoff (Andrew Martin) licensed under the pixabay license (free for commercial use)