The ability to periodically update the reference matrix (SCK data and mapping) while also preserving version history and correct historical mapping.
Data model adjustments.
Importing algorithm.
Exact requirements:
origin
sck_count
sck_to_jobrole_count
sck_to_trainingelement_count
badge_count
sck_to_trend_count
sck_to_externalsystem_count
sck_to_person_count
sck_to_acquire_count
person_to_academy_count
apprenticeship_to_sck_count
ESCO
13487
522
702
3482
566
13487
2881
1
0
0
DRIVES
782
836
901
12315
1518
1
10466
1
0
0
Making data model relevant
known and relevant: SCK, SCKToJobRole, SCKToTrainingElement, Badge, SCKToTrend, SCKToExternalSystem, SCKToPerson
unknown: SCKToAcquire, PersonToAcademy, ApprenticeshipToSCK
not even present in DB: SCKToProject
conclusion:
keep known and relevant
( ! ) SCKToExternalSystem could be replaced by new SCK column originKey
SCKToProject (and also JobRoleToProject) - unused in project - to delete
SCKToAcquire - still unknown and used somewhere in project - to keep
PersonToAcademy - old academy version - to delete
ApprenticeshipToSCK - not needed - to delete
SCK and linking tables - final list:
SCK, SCKToJobRole, SCKToTrainingElement, Badge, SCKToTrend, SCKToExternalSystem, SCKToPerson, SCKToAcquire
• Description:
The platform supports releasing skill updates (no change history).
• Principle:
A new table SCKRelease will be added. Attributes are id, releaseName, origin, validFrom, validTo, importedAt.
The origin attribute will be removed from the SCK table (it will be moved to SCKRelease).
A sckReleaseId FK to SCKRelease will be added to the SCK table.
The string attribute status in SCK table will be used to determine whether a skill is active or inactive to speed up data loading (it will not be calculated from validFrom and validTo in real time).
There is no need to create new history tables for the linking tables (e.g., SCKToJobRoleHistory). Mapping is stored in the existing mapping tables, and it is applied based on whether the SCK is active.
The main logic will be implemented in the algorithm that creates new SCK records.
All SCKs have unique IDs across time and are assigned to a specific version.
The algorithm performs remapping.
• Advantages:
Malý zásah do platformy : SCK zůstává current tabulka, linking tabulky beze změn.
Jednoduché vydávání verzí datasetu (versionName, validFrom/validTo).
Rychlé načítání přes status (žádné realtime počítání validity).
Ingest s delta files je přímočarý (toggle active/inactive + update atributů).
• Disadvantages:
Neuchovává historii atributů SCK : když přepíšeš SCK , staré hodnoty jsou pryč (a SCKRelease je neobsahuje - název, popis atd.).
Přesun origin mimo SCK může zkomplikovat:
identifikaci entity při ingestu,
unikátnost a referencování externích klíčů.
Historie mapování (linking tables) se ztratí: změny vazeb v čase nepůjdou rekonstruovat; “SCK active” není náhrada za historickou validitu vazby.
Hůř se bude dělat audit/reprodukce starých výsledků (doporučení, badge pravidla, reporting “jak to bylo tehdy”).
chatgpt
• Description :
The platform supports releasing skill updates while simultaneously creating skill history using snapshots.
• Data model :
New table RefDefRelease (id, releaseName, origin, validFrom, validTo, importedAt )
New table SCKSnapshot (id, sckId, refDefReleaseId, changeType, attrHash, payloadJson, createdAt )
attribute changeType - CREATED | CREATED_FROM | REPLACED_BY | UPDATED | DELETED | UNCHANGED | REACTIVATED_SAME | REACTIVATED_CHANGED
attribute attrHash - hash of the canonical representation of skill attributes (of payloadJson)
attribute payloadJson - snapshot of skill attributes (TODO - define as standard) { "preferredLabel": "string", "altLabels": ["string"], "hiddenLabels": ["string"], "description": "string", "reuseLevel": "string" }
this table says: this SCK at this Release looked as it is in payloadJson
for same Release there can be many snapshots of the SCK
existing table SCK
existing status - currently values: null | released → will be: active | inactive
new attribute originKey - for identifying skills in delta files
new attribute currentSnapshotId (FK) - for fast join
existing (mapping) tables SCKToJobRole , SCKToTrainingElement , Badge , SCKToTrend , SCKToExternalSystem , SCKToPerson , SCKToAcquire
new attribute sckSnapshotId - then it is possible to retrospectively display exactly the payload skill
• Constraints :
SCKSnapshot - unique (sck_id, release_id) or composite PK (id, sck_id, release_id)
performance indexes
index (refDefReleaseId, changeType)
index (sckId, createdAt DESC) or (sckId, refDefReleaseId) for history lookup
• Ingest delta file (algorithm):
prerequisite: delta file says (new/changed/deleted)
New release and snapshot for each SCK
create RefDefRelease
for each SCK (within the origin)
find instruction in delta file (using originKey)
assembly normalized payloadJson
use only selected SCK attributes that can change (final listing: name , description )
calculate attrHash
from payloadJson that must have alphabetically sorted keys and no whitespace within JSON
SHA-256
find the latest snapshot for the SCK row - take attrHash and mark it as previous hash
create new SCKSnapshot for (sckId, refDefReleaseId, attrHash, payloadJson)
changeType rules:
new originKey in delta file, no previous snapshot → CREATED
known originKey in delta file, existing previous snapshot, different hashes, previous state is ’active’ → UPDATED
known originKey in delta file, existing previous snapshot, same hashes, previous state is ’active’ → UNCHANGED
known originKey in delta file, explicit delete in delta file → DELETED (attrHash set same as previous one)
known originKey in delta file, existing previous snapshot, same hashes, previous state is ’inactive’ → UNCHANGED_REACTIVATED
known originKey in delta file, existing previous snapshot, different hashes, previous state is ’inactive’ → UPDATED_REACTIVATED
for SCK record according to SCKSnapshot changType:
CREATED - insert new SCK, status is ‘active’
UPDATED - update the SCK attributes
UNCHANGED - remain SCK
DELETED - remain SCK and set as ‘inactive’
UNCHANGED_REACTIVATED - set status as ‘active’
UPDATED_REACTIVATED - update the SCK attributes and set status as ‘active’
Mapping tables
SCKToJobRole, SCKToTrainingElement, Badge, SCKToTrend, SCKToExternalSystem, SCKToPerson, SCKToAcquire
two mapping table manipulation approaches are needed A) and B)
A) Update the mapping
description: object has new released skills mapped
applies to:
SCKToJobRole, SCKToTrainingElement, SCKToTrend, SCKToExternalSystem, SCKToAcquire
how :
take all mapping records (of the table)
duplicate mapping record but with new sckSnapshotId from new release
create also mapping record for skills with changType = DELETED from its its last snapshot
for the possibility of preserving mapping when the skill is reactivated in the future
when selecting which skills belong to a job role, it is necessary to omit all skills with changType equal to DELETED
when:
new release using delta file
SCK is changed by form
B) Do not update the mapping
description: object's mapping is associated with particular release
applies to:
SCKToPerson
person gained SCK in particular release
• Typical user queries:
What did the skill look like in release X?
SELECT payloadJson FROM SCKSnapshot WHERE sckId=? AND refDefReleaseId =?
for mapping tables
SELECT payloadJson FROM SCKSnapshot WHERE id=?
values from payloadJson will be set up for selected attribues
mapping tables ….
Algorithm perspective
ESCO skills data will be updated using a delta files.
For DRIVES skills, it is necessary to identify the current skills and find the transition (dela file) to the new skills (new reference definition) !!
In the future, everything will be updated using delta files.
Thoughts and notes:
after new release import - set validTo date up as a current date in previous release
• Web interface (RefDef update page) :