← Back to work

Enterprise · Full-Stack

Wolters Kluwer — Journal Automation Platform

A full-stack web app that turns 5 scattered data sources into 45 branded PowerPoint presentations — automatically. What took a marketing team weeks now takes under 10 minutes.

45

Journals

5 → 1

Sources → click

~95%

Time saved

35/35

Audit pass

The Problem

Wolters Kluwer's legal publishing division produces 45 academic law journals. Every year, the marketing team builds a branded partnership overview presentation for each journal — subscriber trends, usage analytics, geographic reach, citation rankings, page counts.

The data lives in 5 separate systems: Tableau (subscribers, geography, segments), SIQ (platform usage, top articles), HeinOnline (visit statistics), Scopus/Clarivate/Google Scholar (rankings), and internal spreadsheets (page counts per issue).

The old process: manually copy-paste data from each source into a 16-slide PowerPoint template, recreate charts, cross-reference journal names across systems (which don't agree on naming), and repeat 45 times. This took multiple weeks every year and was error-prone — the same journal might appear as "b-Arbitra | Belgian Review of Arbitration" in Tableau and "b-Arbitra: Belgian Review of Arbitration" in SIQ.

The Solution

A full-stack web application that automates the entire pipeline from raw data to finished, auditable presentations.

Upload Dashboard

Drag files or click to upload

All sources loaded
📊

Tableau

12 files

📈

SIQ

8 files

🌐

HeinOnline

3 files

🏆

Rankings

4 files

📄

Page Counts

2 files

📣

Marketing

6 files

Frontend

React 19 + TypeScript + Tailwind + shadcn/ui

The upload dashboard uses a 6-zone drag-and-drop interface with auto-categorization — drop any file and the system identifies which data source it belongs to based on filename patterns and content structure.

A real-time data completeness matrix tracks coverage across all 7 data categories for each of the 45 journals. During generation, live WebSocket updates stream progress per journal so you can see exactly where the pipeline is.

The built-in file browser supports individual and bulk download of generated presentations. An audit UI renders chart-level validation results — every data point in every chart is traced back to its source row.

Backend

FastAPI + Python

The data pipeline fuzzy-matches journal names across all 5 sources using Python's SequenceMatcher combined with explicit alias tables. This handles the naming inconsistencies automatically — no manual mapping required for known journals.

Multi-year data aggregation pulls 3 years of subscriber and usage trends to build time-series charts. Template-based PPTX generation works via direct XML manipulation of PowerPoint's OOXML format, preserving full editability — the marketing team can still open and tweak any generated file in PowerPoint.

Conditional slide inclusion handles missing data gracefully: if a journal has no HeinOnline stats, that slide is deleted entirely rather than showing empty charts. A reverse-engineering audit engine extracts chart data from the generated PPTX XML and validates it against the source data.

Integrations

A custom Tableau REST API client handles automated data pulls using Personal Access Token authentication, extracting subscriber demographics, geographic distribution, and segment breakdowns directly from published Tableau views.

A rate-limited Google Scholar metrics scraper extracts h5-index and h5-median values for journal ranking slides. Bulk file upload with keyword-based auto-categorization routes uploaded files to the correct processing pipeline without manual sorting.

Pipeline Flow

01

Data Upload

02

Auto-Processing

03

PPTX Generation

04

Audit & Validate

05

Download

Key Technical Challenges

1. PPTX XML Corruption

PowerPoint's OOXML format is unforgiving. Three separate corruption vectors discovered and fixed: external data references to SharePoint in template charts, Python's XML writer dropping the standalone="yes" declaration, and dangling slide relationship entries after slide deletion. Each caused silent corruption that only manifested when opening in PowerPoint.

pptx_corruption_fix.xml xml
<!-- Template chart with external SharePoint reference -->
<c:externalData r:id="rId1">
  <c:autoUpdate val="0"/>
</c:externalData>

<!-- Fix: strip all externalData elements before writing -->
# chart_xml.findall(".//c:externalData", ns)
# → Remove each element from its parent

Template charts contained SharePoint references that corrupted the output when the linked workbook wasn't available.

2. Journal Name Normalization

The same journal appears under different names, abbreviations, and Unicode variants across 5 data systems. Built a multi-strategy resolver: explicit alias mapping → fuzzy SequenceMatcher → acronym lookup → keyword detection. Handles 45 journals across all sources with zero manual intervention.

3. 3D Bar Chart Labels

Discovered that OOXML's bar3DChart type does not support the dLblPos attribute at any level. Any attempt to set label positioning corrupts the file. Solved by working within PowerPoint's default placement and using showVal toggle only.

4. Conditional Slide Management

Each of the 16 template slides maps to a specific data requirement. If a journal has no HeinOnline data, that slide is deleted. Marketing slides from a separate PPTX are injected and matched via fuzzy matching. Orphaned media cleaned up to prevent file bloat.

Validation

0/35

charts validated against source data

The Result

  • 45 journal presentations generated from a single button click
  • Sub-10-minute full pipeline (was multiple weeks)
  • 35/35 audit pass rate — every chart validated against source data
  • Fully editable output — marketing team can still tweak in PowerPoint
  • Reproducible — regenerate any time data updates

Tech Stack

Layer Technology
Frontend React 19, TypeScript, Vite, Tailwind CSS, shadcn/ui, Recharts
Backend FastAPI, Python 3.13, pandas, openpyxl
Real-time WebSocket (native FastAPI)
PPTX Engine Direct XML manipulation (ElementTree)
Integrations Tableau REST API, Google Scholar scraper
Validation Custom audit engine (PPTX XML ↔ source data)

I take on 3 new clients per month.

The businesses that move first win. Let's find where you're leaving 20+ hours a week on the table.

Book Your Free Audit

2 spots remain for March — next availability: April