Architecture Overview¶
OpenEV Data is designed as a modular, scalable system following best practices in software architecture and data engineering.
System Components¶
The OpenEV Data ecosystem consists of four main components:
graph TB
subgraph "Data Layer"
Dataset[Dataset Repository<br/>Layered JSON Files]
end
subgraph "Processing Layer"
ETL[ETL Pipeline<br/>Rust]
end
subgraph "Storage Layer"
JSON[JSON Files]
SQLite[SQLite DB]
PostgreSQL[PostgreSQL]
CSV[CSV Export]
XML[XML Export]
end
subgraph "API Layer"
Server[REST API<br/>Rust/Axum]
end
subgraph "Presentation Layer"
Docs[Documentation<br/>MkDocs]
end
Dataset -->|Source| ETL
ETL -->|Generate| JSON
ETL -->|Generate| SQLite
ETL -->|Generate| PostgreSQL
ETL -->|Generate| CSV
ETL -->|Generate| XML
SQLite -->|Query| Server
PostgreSQL -->|Query| Server
Server -->|Reference| Docs
Dataset -->|Link| Docs Repository Structure¶
1. Dataset Repository¶
Purpose: Single source of truth for EV specifications
Technology: JSON, JSON Schema
Architecture: Layered Canonical Dataset (LCD)
The dataset follows a hierarchical structure:
src/
<make>/
<model>/
base.json # Shared specifications
<year>/
<vehicle>.json # Base vehicle
<vehicle>_<variant>.json # Variants
Key Features:
- Deep merge inheritance
- Strict schema validation
- Version controlled
- Community reviewed
Read detailed Dataset Architecture →
2. API Repository¶
Purpose: High-performance data processing and API serving
Technology: Rust, Axum, SQLite, PostgreSQL
Architecture: Hexagonal Architecture (Ports and Adapters)
The API follows a clean architecture pattern:
crates/
ev-core/ # Domain types (Pure Rust)
ev-etl/ # ETL pipeline (CLI)
ev-server/ # REST API server
Key Features:
- Type-safe domain model
- Multiple output formats
- Edge-ready deployment
- <50ms response times
Read detailed API Architecture →
3. Documentation Repository¶
Purpose: Centralized documentation and guides
Technology: MkDocs, Material Theme, Python
Architecture: Static site generation with versioning
Key Features:
- Automatic deployment
- Version management with mike
- Search functionality
- Material design
You're reading it right now!
4. Governance Repository¶
Purpose: Organization-wide policies and templates
Technology: Markdown, GitHub templates
Contents:
- Contributing guidelines
- Code of conduct
- Issue templates
- Pull request templates
- Workspace configuration
Data Flow¶
Build-Time Pipeline¶
sequenceDiagram
participant C as Contributor
participant D as Dataset Repo
participant CI as GitHub Actions
participant E as ETL
participant R as Releases
C->>D: Commit JSON changes
D->>CI: Trigger workflow
CI->>E: Run cargo build
E->>E: Load & validate
E->>E: Merge layers
E->>E: Generate outputs
E->>R: Publish artifacts
Note over R: vehicles.json<br/>vehicles.db<br/>vehicles.sql<br/>vehicles.csv<br/>vehicles.xml Runtime API Flow¶
sequenceDiagram
participant Client
participant API as ev-server
participant DB as Database
Client->>API: GET /v1/vehicles/tesla/model_3/2024
API->>DB: Query vehicle
DB-->>API: Vehicle data
API-->>Client: JSON response
Note over Client,API: <50ms response time Architectural Principles¶
1. Separation of Concerns¶
Each repository has a single, well-defined responsibility:
- Dataset: Data curation
- API: Data processing and serving
- Docs: Documentation
- Governance: Policies and templates
2. Data as Code¶
All vehicle specifications are:
- Version controlled (Git)
- Reviewed (Pull Requests)
- Validated (Schema + CI)
- Traceable (Sources)
3. Build-Time Optimization¶
Heavy processing happens once during builds:
- Layer merging
- Schema validation
- Format conversion
- Index generation
Runtime is optimized for reads.
4. Multiple Consumption Patterns¶
The same source data powers:
- Direct file downloads
- REST API access
- Embedded databases
- Bulk SQL imports
- CSV analysis
5. Type Safety¶
Rust's type system prevents invalid states:
- Compile-time guarantees
- No null pointer errors
- Exhaustive pattern matching
- Zero-cost abstractions
6. Global Performance¶
API deployed at the edge:
- Production-ready deployments
- Low-latency worldwide access
- Automatic scaling
Design Patterns¶
Dataset: Layered Canonical Dataset (LCD)¶
Inspired by configuration management systems, LCD provides:
- Inheritance: Base → Year → Variant
- DRY: No repeated data
- Deterministic: Same input = same output
- Traceable: Clear data lineage
API: Hexagonal Architecture¶
Clean separation between:
- Core: Pure domain logic
- Adapters: I/O implementations
- Ports: Abstract interfaces
Benefits:
- Testable without I/O
- Swappable implementations
- Framework independent
Documentation: Docs as Code¶
Documentation follows software practices:
- Version controlled
- CI/CD deployment
- Automated versioning
- Pull request reviews
Technology Choices¶
Why Rust?¶
- Performance: Near C/C++ speed
- Safety: No segfaults, no data races
- Tooling: Cargo, rustfmt, clippy
- Community: Growing ecosystem
Why MkDocs Material?¶
- Beautiful: Modern, responsive design
- Features: Search, navigation, tabs
- Extensions: Mermaid, admonitions, code highlighting
- Versioning: Built-in support with mike
- Fast: Static site generation
Why JSON Schema?¶
- Standard: Industry-standard validation
- Language-agnostic: Works everywhere
- Documentation: Self-documenting schema
- Tooling: Wide ecosystem support
- Strict: Catches errors early
Deployment Architecture¶
Current¶
graph LR
subgraph "GitHub"
Dataset[Dataset Repo]
API[API Repo]
Docs[Docs Repo]
end
subgraph "GitHub Actions"
ETL[ETL Pipeline]
DocsBuild[Docs Build]
end
subgraph "GitHub Releases"
Artifacts[Data Artifacts]
end
subgraph "GitHub Pages"
Site[Documentation Site]
end
Dataset -->|Trigger| ETL
ETL -->|Publish| Artifacts
Docs -->|Trigger| DocsBuild
DocsBuild -->|Deploy| Site Quality Assurance¶
Dataset Quality¶
- Schema validation
- Source verification
- Peer review
- Automated checks
Code Quality¶
- Unit tests
- Integration tests
- Clippy linting
- Format checking
Documentation Quality¶
- Link checking
- Spell checking
- Preview builds
- Version tracking
Scalability¶
Data Growth¶
The architecture supports:
- Thousands of vehicles: Efficient indexing
- Multiple variants: Layered structure
- Historical data: Version tracking
- International markets: Multi-source support
API Scaling¶
The API scales through:
- Horizontal scaling: Stateless design
- Caching: Database in memory
- CDN: Static asset delivery
- Edge deployment: Global distribution
Security¶
Data Integrity¶
- Git commit signing
- Pull request reviews
- CI validation
- Checksum verification
API Security¶
- HTTPS only
- CORS configuration
- Rate limiting (planned)
- API keys (planned)
Next Steps¶
Explore specific architectures:
-
Layered data structure
-
Hexagonal architecture in Rust
-
Start contributing
-
Complete API documentation