iOSDC

【iOSDC2025】Sign Language Gesture Detection and Translation ~Possibilities and Limitations of Hand Tracking~

Contents
  1. Introduction
  2. Implementation Approach
  3. Implementation Mechanisms
  4. Conclusion
  5. Resources

Introduction

I presented at iOSDC Japan 2025 with the title “Sign Language Gesture Detection and Translation – The Possibilities and Limitations of Hand Tracking“.
In this session, I deeply explore the technical implementation of real-time sign language gesture recognition using visionOS hand tracking features, examining both its possibilities and practical constraints.

The slides and information used in the presentation are available here:

  1. Slides (Download from TestFlight)
    1. https://testflight.apple.com/join/s5j2zJbS
  2. HandGestureKit
    1. https://github.com/u5-03/HandGestureKit

Implementation Approach

Gesture Detection Implementation Approach

The gesture detection system presented in this session is implemented in three steps:

1. Initialize the Hand Tracking System

  • Request permissions with ARKitSession: Requires adding usage purpose to Info.plist
  • Enable .hand with SpatialTrackingSession: Leveraging new features in visionOS 2.0

2. Retrieve Hand Joint Position and Orientation Information

  • Select necessary joints from HandSkeleton: Choose from 27 joints as needed
  • Set up AnchorEntity for real-time joint tracking: Automatically updates joint positions
  • Access Entity joint positions and orientations via Components: Data access using ECS pattern

3. Determine Gestures from Joint Information

  • Determine whether joint positions and orientations match gesture conditions
  • Protocol-oriented design makes adding new gestures easy

Development Environment and Prerequisites

Required Environment

The implementation presented in this session requires the following development environment:

  • Xcode: 16.2 or later
  • visionOS: 2.0 or later (AnchorEntity was introduced in visionOS 2.0)
  • Swift: 6.0
  • Device: Apple Vision Pro (Hand tracking is limited in the simulator)

Important Dependencies

import ARKit        // SpatialTrackingSession
import RealityKit   // AnchorEntity, Entity-Component-System
import SwiftUI      // UI construction

Note: AnchorEntity is only available in visionOS 2.0 and later. Alternative implementation is required for visionOS 1.x.

Session Overview

What You’ll Learn

This session comprehensively covers the following aspects of building a gesture recognition system for visionOS:

  1. Hand Tracking Fundamentals: Understanding visionOS SpatialTrackingSession and AnchorEntity
  2. Gesture Detection Architecture: Building a flexible, protocol-oriented gesture detection system
  3. Practical Implementation: Implementing sign language gestures with validation
  4. Performance Optimization: Real-time gesture processing strategies at 90Hz
  5. Limitations and Workarounds: Addressing spatial hand tracking challenges

Target Audience

This session is aimed at iOS/visionOS developers who:

  • Have basic knowledge of SwiftUI and RealityKit
  • Are interested in spatial computing and gesture-based interfaces
  • Want to understand practical aspects of hand tracking implementation
  • Are interested in the future of human-computer interaction

Implementation Mechanisms

How HandSkeleton Works

What is HandSkeleton?

A hand skeleton model provided by ARKit, consisting of 27 joint points.
This enables high-precision tracking of hand shapes.

Reference: https://developer.apple.com/videos/play/wwdc2023/10082/?time=970

Available Joint Information

  • Wrist: Hand reference point
  • Finger joints:
    • Metacarpal
    • Proximal phalanx
    • Intermediate phalanx
    • Distal phalanx
    • Tip
  • Forearm: Determines arm orientation

Data Available from Each Joint

// Position information (SIMD3<Float>)
let position = joint.position

// Orientation information (simd_quatf)
let orientation = joint.orientation

// Relative position from parent joint
let relativePosition = joint.relativeTransform

Coordinate System

  • Right-handed coordinate system: Right:+X, Up:+Y, Forward:+Z
  • Units: Meters
  • Origin: Based on device initial position

RealityKit and ECS (Entity-Component-System)

ECS Architecture Basics

RealityKit is a 3D rendering framework that adopts the ECS pattern:

1. Entity

Represents objects in 3D space.
Spheres, text, hands – all 3D elements are Entities.

2. Component

Adds functionality to Entities.
Appearance (ModelComponent), movement (Transform), physics (PhysicsBodyComponent), etc.

3. System

Processes Entities with specific Components every frame.
The core of the game loop.

Basic RealityView Structure

RealityView { content in
    // Create root Entity and add to scene
    let rootEntity = Entity()
    content.add(rootEntity)
    
    // Create hand entity container
    let handEntitiesContainerEntity = Entity()
    rootEntity.addChild(handEntitiesContainerEntity)
}

Enabling Hand Tracking with SpatialTrackingSession

Implementation using AnchorEntity introduced in visionOS 2.0:

// Enable hand tracking
let session = SpatialTrackingSession()
let config = SpatialTrackingSession.Configuration(tracking: [.hand])
await session.run(config)

// Auto-track joints with AnchorEntity
let anchorEntity = AnchorEntity(
    .hand(.left, location: .palm),
    trackingMode: .predicted  // Reduce tracking latency with prediction
)

// Auto-tracking starts just by adding
handEntitiesContainerEntity.addChild(anchorEntity)

Creating and Placing Joint Markers

// Create Entity for sphere marker
let sphere = ModelEntity(
    mesh: .generateSphere(radius: 0.005),
    materials: [UnlitMaterial(color: .yellow)]
)

// Add to AnchorEntity (follows joint)
anchorEntity.addChild(sphere)

HandGestureTrackingSystem Implementation

Create a custom System to monitor hand state every frame:

1. Get Hand Entities with EntityQuery
let handEntities = context.scene.performQuery(
    EntityQuery(where: .has(HandTrackingComponent.self))
)
2. Extract Information from HandTrackingComponent
for entity in handEntities {
    if let component = entity.components[HandTrackingComponent.self] {
        let chirality = component.chirality  // .left or .right
        let handSkeleton = component.handSkeleton
    }
}
3. Gesture Detection Processing
let detectedGestures = GestureDetector.detectGestures(
    from: handTrackingComponents,
    targetGestures: targetGestures
)

This System‘s update(context:) method is automatically called every frame, retrieving necessary information from SceneUpdateContext for processing.

Technical Architecture

Repository Structure

The project consists of three main packages, each serving a specific role in the gesture recognition pipeline:

Slidys/
├── Packages/
│   ├── iOSDC2025Slide/       # Presentation built with Slidys framework
│   ├── HandGestureKit/        # Core gesture detection library (OSS-ready)
│   └── HandGesturePackage/    # Application-specific implementation

HandGestureKit: Core Library

HandGestureKit functions as the foundation layer for gesture recognition.
It’s designed as a standalone open-source library that can be integrated into any visionOS project.

Key Components

1. Gesture Data Model

The library provides comprehensive data structures for hand tracking:

public struct SingleHandGestureData {
    public let handTrackingComponent: HandTrackingComponent
    public let handKind: HandKind
    
    // Threshold settings for gesture detection accuracy
    public let angleToleranceRadians: Float
    public let distanceThreshold: Float
    public let directionToleranceRadians: Float
    
    // Pre-computed values for performance optimization
    private let palmNormal: SIMD3<Float>
    private let forearmDirection: SIMD3<Float>
    private let wristPosition: SIMD3<Float>
    private let isArmExtended: Bool
}

This struct encapsulates all necessary hand tracking data and minimizes runtime overhead by pre-computing frequently used values.

2. Protocol-Oriented Design

The gesture system is built on a hierarchical protocol structure:

// Base protocol for all gestures
public protocol BaseGestureProtocol {
    var id: String { get }
    var gestureName: String { get }
    var priority: Int { get }
    var gestureType: GestureType { get }
}

// Single-hand gesture protocol with rich default implementations
public protocol SingleHandGestureProtocol: BaseGestureProtocol {
    func matches(_ gestureData: SingleHandGestureData) -> Bool
    
    // Finger state requirements
    func requiresFingersStraight(_ fingers: [FingerType]) -> Bool
    func requiresFingersBent(_ fingers: [FingerType]) -> Bool
    func requiresFingerPointing(_ finger: FingerType, direction: GestureDetectionDirection) -> Bool
    
    // Palm orientation requirements
    func requiresPalmFacing(_ direction: GestureDetectionDirection) -> Bool
    
    // Arm position requirements
    func requiresArmExtended() -> Bool
    func requiresArmExtendedInDirection(_ direction: GestureDetectionDirection) -> Bool
}

This protocol design allows easy addition of new gestures by overriding only the necessary conditions.

3. Gesture Detection Engine

The GestureDetector class evaluates registered gestures in priority order:

public class GestureDetector {
    private var gestures: [BaseGestureProtocol] = []
    
    public func detect(from handData: SingleHandGestureData) -> [BaseGestureProtocol] {
        return gestures
            .sorted { $0.priority < $1.priority }
            .filter { gesture in
                guard let singleHandGesture = gesture as? SingleHandGestureProtocol else {
                    return false
                }
                return singleHandGesture.matches(handData)
            }
    }
}

Implementation Examples: Sign Language Gestures

1. Thumbs Up Gesture
public class ThumbsUpGesture: SingleHandGestureProtocol {
    public var gestureName: String { "Thumbs Up" }
    public var priority: Int { 100 }
    
    // Requires only thumb extended
    public var requiresOnlyThumbStraight: Bool { true }
    
    // Requires thumb pointing up
    public func requiresFingerPointing(_ finger: FingerType, direction: GestureDetectionDirection) -> Bool {
        return finger == .thumb && direction == .up
    }
}

Detection Logic Details

The thumbs up gesture’s matches function leverages the protocol’s default implementation to check the following conditions:

// From SingleHandGestureProtocol default implementation
public func matches(_ gestureData: SingleHandGestureData) -> Bool {
    // 1. Check if only thumb is extended
    if requiresOnlyThumbStraight {
        // Internally validates these conditions:
        // - Thumb: isFingerStraight(.thumb) == true
        // - Index: isFingerBent(.index) == true
        // - Middle: isFingerBent(.middle) == true
        // - Ring: isFingerBent(.ring) == true
        // - Little: isFingerBent(.little) == true
        guard isOnlyThumbStraight(gestureData) else { return false }
    }
    
    // 2. Check if thumb is pointing up
    if requiresFingerPointing(.thumb, direction: .up) {
        // Calculate angle between thumb vector and up direction
        // True if within angleToleranceRadians (default: π/4)
        guard gestureData.isFingerPointing(.thumb, direction: .up) else { return false }
    }
    
    return true
}

Finger Bend Detection Implementation

// Detection logic in SingleHandGestureData
public func isFingerStraight(_ finger: FingerType) -> Bool {
    // Get joint angles for each finger
    let jointAngles = getJointAngles(for: finger)
    
    // "Straight" if all joints bend less than threshold
    return jointAngles.allSatisfy { angle in
        angle < straightThreshold // Default: 30 degrees
    }
}

public func isFingerBent(_ finger: FingerType) -> Bool {
    // "Bent" if at least one joint bends more than threshold
    let jointAngles = getJointAngles(for: finger)
    return jointAngles.contains { angle in
        angle > bentThreshold // Default: 60 degrees
    }
}
2. Peace Sign
public class PeaceSignGesture: SingleHandGestureProtocol {
    public var gestureName: String { "Peace Sign" }
    public var priority: Int { 90 }
    
    // Requires only index and middle fingers extended
    public var requiresOnlyIndexAndMiddleStraight: Bool { true }
    
    // Requires palm facing forward
    public func requiresPalmFacing(_ direction: GestureDetectionDirection) -> Bool {
        return direction == .forward
    }
}

Detection Logic Details

public func matches(_ gestureData: SingleHandGestureData) -> Bool {
    // 1. Check if only index and middle fingers are extended
    if requiresOnlyIndexAndMiddleStraight {
        // Must satisfy all conditions:
        // - gestureData.isFingerStraight(.index) == true
        // - gestureData.isFingerStraight(.middle) == true
        // - gestureData.isFingerBent(.thumb) == true
        // - gestureData.isFingerBent(.ring) == true
        // - gestureData.isFingerBent(.little) == true
        guard isOnlyIndexAndMiddleStraight(gestureData) else { return false }
    }
    
    // 2. Check palm orientation
    if requiresPalmFacing(.forward) {
        // Calculate palm normal vector and check angle with forward direction
        let palmNormal = gestureData.palmNormal
        let forwardVector = SIMD3<Float>(0, 0, -1) // Forward direction
        let angle = acos(dot(palmNormal, forwardVector))
        
        guard angle < directionToleranceRadians else { return false }
    }
    
    return true
}
3. Prayer Gesture (Two Hands)
public class PrayerGesture: TwoHandGestureProtocol {
    public var gestureName: String { "Prayer" }
    public var priority: Int { 80 }
    
    public func matches(_ leftGestureData: SingleHandGestureData, _ rightGestureData: SingleHandGestureData) -> Bool {
        // Palms facing each other
        let palmsFacing = arePalmsFacingEachOther(leftGestureData, rightGestureData)
        
        // Hands are close together
        let handsClose = areHandsClose(leftGestureData, rightGestureData, threshold: 0.1)
        
        // All fingers extended
        let fingersStraight = areAllFingersStraight(leftGestureData) && 
                              areAllFingersStraight(rightGestureData)
        
        return palmsFacing && handsClose && fingersStraight
    }
}

Mechanisms for Concise Gesture Detection Implementation

Conciseness Through Protocol Default Implementations

HandGestureKit’s greatest feature is that rich protocol default implementations allow new gestures to be defined with minimal code:

// Adding new gestures is extremely simple
public class OKSignGesture: SingleHandGestureProtocol {
    public var gestureName: String { "OK Sign" }
    public var priority: Int { 95 }
    
    // Declaratively define only necessary conditions
    public var requiresOnlyIndexAndThumbTouching: Bool { true }
    public var requiresMiddleRingLittleStraight: Bool { true }
}

With just this concise definition, complex gesture detection logic is automatically applied.

Condition Combination Patterns

Commonly used finger combinations are provided as dedicated properties:

// Convenient property set
public protocol SingleHandGestureProtocol {
    // Complex finger conditions (convenience properties)
    var requiresAllFingersBent: Bool { get }              // Fist (all fingers bent)
    var requiresOnlyIndexFingerStraight: Bool { get }     // Index finger only
    var requiresOnlyIndexAndMiddleStraight: Bool { get }  // Peace sign
    var requiresOnlyThumbStraight: Bool { get }           // Thumbs up
    var requiresOnlyLittleFingerStraight: Bool { get }    // Little finger only
    
    // Wrist states
    var requiresWristBentOutward: Bool { get }            // Wrist bent outward
    var requiresWristBentInward: Bool { get }             // Wrist bent inward
    var requiresWristStraight: Bool { get }               // Wrist straight
}
Validation Utilities

The GestureValidation class provides commonly used validation patterns:

public enum GestureValidation {
    // Validate that only specific fingers are extended
    static func validateOnlyTargetFingersStraight(
        _ gestureData: SingleHandGestureData,
        targetFingers: [FingerType]
    ) -> Bool {
        for finger in FingerType.allCases {
            if targetFingers.contains(finger) {
                guard gestureData.isFingerStraight(finger) else { return false }
            } else {
                guard gestureData.isFingerBent(finger) else { return false }
            }
        }
        return true
    }
    
    // Validate fist gesture
    static func validateFistGesture(_ gestureData: SingleHandGestureData) -> Bool {
        return FingerType.allCases.allSatisfy { 
            gestureData.isFingerBent($0) 
        }
    }
}

GestureDetector Processing Logic

Protocol Hierarchy

GestureDetector uses a hierarchical protocol design to uniformly process various types of gestures:

protocol BaseGestureProtocol {
    var gestureName: String { get }
    var priority: Int { get }
    var gestureType: GestureType { get }
}

protocol SingleHandGestureProtocol: BaseGestureProtocol {
    func matches(_ gestureData: SingleHandGestureData) -> Bool
}

protocol TwoHandsGestureProtocol: BaseGestureProtocol {
    func matches(_ gestureData: HandsGestureData) -> Bool
}

Detection Architecture

class GestureDetector {
    // Gesture array sorted by priority
    private var sortedGestures: [BaseGestureProtocol]
    
    // Dedicated serial gesture tracker
    private let serialTracker = SerialGestureTracker()
    
    // Type-based index (for fast lookup)
    private var typeIndex: [GestureType: [Int]]
}

Convenient Detection Methods

SingleHandGestureData provides convenient methods for concise gesture detection:

// Convenient methods provided by SingleHandGestureData
gestureData.isFingerStraight(.index)     // Is index finger extended?
gestureData.isFingerBent(.thumb)         // Is thumb bent?
gestureData.isPalmFacing(.forward)       // Is palm facing forward?
gestureData.areAllFingersExtended()      // Are all fingers extended?
gestureData.isAllFingersBent             // Is it a fist?

// Example of combining multiple conditions
guard gestureData.isFingerStraight(.index),
      gestureData.isFingerStraight(.middle),
      gestureData.areAllFingersBentExcept([.index, .middle])
else { return false }

Gesture Detection Conditions

Four main conditions are used for gesture detection:

  • Finger state: isExtended/isCurled
  • Hand orientation: palmDirection
  • Joint angles: angleWithParent
  • Joint distances: jointToJointDistance

Detection Flow

func detectGestures(from components: [HandTrackingComponent]) -> GestureDetectionResult {
    // 1. Check serial gesture timeout
    if serialTracker.isTimedOut() {
        serialTracker.reset()
    }
    
    // 2. Detect normal gestures in priority order
    for gesture in sortedGestures {
        if gesture.matches(handData) {
            return [gesture.gestureName]
        }
    }
    
    // 3. Update serial gesture progress
    if let serial = checkSerialGesture() {
        return handleSerialResult(serial)
    }
}

Sequential Gesture Tracking System

SerialGestureProtocol

A mechanism for detecting time-sequential gestures (like sign language):

protocol SerialGestureProtocol {
    // Array of gestures to detect in sequence
    var gestures: [BaseGestureProtocol] { get }
    
    // Maximum allowed time between gestures (seconds)
    var intervalSeconds: TimeInterval { get }
    
    // Step descriptions (for UI display)
    var stepDescriptions: [String] { get }
}

SerialGestureTracker – State Management

  1. Track current gesture index
  2. Monitor timeout between gestures
  3. Reset state after timeout or completion

Detection Flow Example

// Example: "Thank you" in sign language
let arigatouGesture = SignLanguageArigatouGesture()
gestures = [
    // Step 1: Initial position detection
    ArigatouInitialPositionGesture(),  // Both hands at same height
    // Step 2: Final position detection → completed ✅
    ArigatouFinalPositionGesture()     // Right hand moved to upper position
]

SerialGestureDetectionResult

Sequential gesture detection results have four states:

  • progress: Advancing to next step
  • completed: All steps completed
  • timeout: Time expired
  • notMatched: No match

This mechanism enables detection of dynamic gestures by dividing them into several phases.

GestureDetector: Detailed Gesture Detection Engine

GestureDetector Overview

GestureDetector is the core gesture detection engine of HandGestureKit.
This class efficiently evaluates registered gesture patterns and recognizes gestures in real-time.

Basic Usage

// 1. Initialize GestureDetector
let gestureDetector = GestureDetector()

// 2. Register gestures to recognize
gestureDetector.registerGesture(ThumbsUpGesture())
gestureDetector.registerGesture(PeaceSignGesture())
gestureDetector.registerGesture(PrayerGesture())

// 3. Detect gestures from hand data
let detectedGestures = gestureDetector.detect(from: handGestureData)

// 4. Process detection results
for gesture in detectedGestures {
    print("Detected gesture: \(gesture.gestureName)")
}

Internal Implementation and Design Points

1. Priority-Based Evaluation System
public class GestureDetector {
    private var gestures: [BaseGestureProtocol] = []
    
    public func detect(from handData: SingleHandGestureData) -> [BaseGestureProtocol] {
        // Sort by priority (lower numbers = higher priority)
        let sortedGestures = gestures.sorted { $0.priority < $1.priority }
        
        var detectedGestures: [BaseGestureProtocol] = []
        
        for gesture in sortedGestures {
            if let singleHandGesture = gesture as? SingleHandGestureProtocol {
                if singleHandGesture.matches(handData) {
                    detectedGestures.append(gesture)
                    
                    // End processing here for exclusive gestures
                    if gesture.isExclusive {
                        break
                    }
                }
            }
        }
        
        return detectedGestures
    }
}

Design Points:

  • Priority-based evaluation detects more specific gestures first
  • Exclusive flag skips other evaluations when specific gestures are detected
  • Supports cases where multiple gestures are valid simultaneously
2. Performance Optimization
// Optimization during gesture registration
public func registerGesture(_ gesture: BaseGestureProtocol) {
    // Duplicate check
    guard !gestures.contains(where: { $0.id == gesture.id }) else {
        return
    }
    
    gestures.append(gesture)
    
    // Pre-sort by priority to speed up detection
    gestures.sort { $0.priority < $1.priority }
}

// Optimization through batch registration
public func registerGestures(_ newGestures: [BaseGestureProtocol]) {
    gestures.append(contentsOf: newGestures)
    gestures.sort { $0.priority < $1.priority }
}
3. Debug and Logging Features
extension GestureDetector {
    // Detailed log output in debug mode
    public func detectWithDebugInfo(from handData: SingleHandGestureData) -> [(gesture: BaseGestureProtocol, confidence: Float)] {
        var results: [(BaseGestureProtocol, Float)] = []
        
        for gesture in gestures.sorted(by: { $0.priority < $1.priority }) {
            if let singleHandGesture = gesture as? SingleHandGestureProtocol {
                let confidence = singleHandGesture.confidenceScore(for: handData)
                
                if HandGestureLogger.isDebugEnabled {
                    HandGestureLogger.logDebug("Gesture: \(gesture.gestureName), Confidence: \(confidence)")
                }
                
                if singleHandGesture.matches(handData) {
                    results.append((gesture, Float(confidence)))
                }
            }
        }
        
        return results
    }
}

AnchorEntity Integration in visionOS 2.0

Implementation using AnchorEntity introduced in visionOS 2.0:

import RealityKit
import ARKit

@MainActor
class GestureTrackingSystem: System {
    private let gestureDetector = GestureDetector()
    
    static let query = EntityQuery(where: .has(HandTrackingComponent.self))
    
    required init(scene: Scene) {
        // Register gestures during system initialization
        setupGestures()
    }
    
    private func setupGestures() {
        gestureDetector.registerGestures([
            ThumbsUpGesture(),
            PeaceSignGesture(),
            OKSignGesture(),
            PrayerGesture()
        ])
    }
    
    func update(context: SceneUpdateContext) {
        for entity in context.entities(matching: Self.query, updatingSystemWhen: .rendering) {
            guard let handComponent = entity.components[HandTrackingComponent.self] else {
                continue
            }
            
            // Create SingleHandGestureData
            let handData = SingleHandGestureData(
                handTrackingComponent: handComponent,
                handKind: .left // or .right
            )
            
            // Detect gestures
            let detectedGestures = gestureDetector.detect(from: handData)
            
            // Notify detection results
            if !detectedGestures.isEmpty {
                notifyGestureDetection(detectedGestures)
            }
        }
    }
    
    private func notifyGestureDetection(_ gestures: [BaseGestureProtocol]) {
        let gestureNames = gestures.map { $0.gestureName }
        
        DispatchQueue.main.async {
            NotificationCenter.default.post(
                name: .gestureDetected,
                object: gestureNames
            )
        }
    }
}

HandGestureKit: Provided as OSS Library

HandGestureKit is published as an open-source library, free for anyone to use and improve.

Performance Optimization

1. Pre-calculation and Value Caching

Pre-calculate and cache frequently used values:

extension SingleHandGestureData {
    // Calculate values during initialization
    init(handTrackingComponent: HandTrackingComponent, handKind: HandKind) {
        self.handTrackingComponent = handTrackingComponent
        self.handKind = handKind
        
        // Pre-calculate frequently used values
        self.palmNormal = calculatePalmNormal(handTrackingComponent)
        self.forearmDirection = calculateForearmDirection(handTrackingComponent)
        self.wristPosition = handTrackingComponent.joint(.wrist)?.position ?? .zero
        self.isArmExtended = calculateArmExtension(handTrackingComponent)
    }
}

2. Early Return Optimization

Check most selective conditions first:

public func matchesWithOptimization(_ gestureData: SingleHandGestureData) -> Bool {
    // 1. Most selective conditions first (finger configuration)
    if requiresOnlyIndexAndMiddleStraight {
        guard validateOnlyTargetFingersStraight(gestureData, targetFingers: [.index, .middle]) 
        else { return false }
    }
    
    // 2. Direction check (medium selectivity)
    for direction in GestureDetectionDirection.allCases {
        if requiresPalmFacing(direction) {
            guard gestureData.isPalmFacing(direction) else { return false }
        }
    }
    
    // 3. Individual finger direction checks (potentially high cost)
    // ...other checks
    
    return true
}

3. Priority-Based Detection

Skip unnecessary checks using priority:

public func detect(from handData: SingleHandGestureData) -> BaseGestureProtocol? {
    let sortedGestures = gestures.sorted { $0.priority < $1.priority }
    
    for gesture in sortedGestures {
        if let singleHandGesture = gesture as? SingleHandGestureProtocol,
           singleHandGesture.matches(handData) {
            return gesture // Stop at first match
        }
    }
    
    return nil
}

Limitations and Possibilities

Limitations of Sign Language Detection with Apple Vision Pro

1. Camera Detection Range Limitations

visionOS hand tracking has physical constraints:

  • Cannot detect hands behind or beside the body: Hands outside camera field of view cannot be tracked
  • Blind spots near face and behind head: Detection is difficult in these positions due to device structure
  • Difficult to detect accurately when hands overlap: Occlusion reduces joint position estimation accuracy
2. Complex Hand Shape Recognition
  • Interlaced finger shapes prone to misrecognition: Difficult to accurately detect complex finger crosses and combinations
  • Detection accuracy of subtle hand tilts and rotations: Limited ability to distinguish fine angle differences
3. Sign Language-Specific Elements

Sign language consists of multiple elements beyond just hand shapes:

  • Meaning changes with facial expressions: Facial expressions have grammatical roles in sign language, but current APIs have difficulty detecting them
  • Recognition of movement speed and intensity: Important elements that change sign language meaning, but accurate detection is difficult
4. Technical Constraints
  • Registering recognition patterns is challenging: Enormous pattern definitions needed to accommodate sign language diversity
  • Balance with performance: Trade-off between real-time processing and accuracy
  • Handling individual differences: Recognition accuracy varies with hand size and flexibility differences
  • Cannot detect other person’s hands: Only the wearer’s own hands are detected (cannot read conversation partner’s sign language)
    • With Enterprise API main camera access, Vision Framework could potentially enable this
    • However, 3D information like HandSkeleton is unavailable, limiting to 2D image analysis with very high implementation difficulty

Still Expanding Possibilities

1. Basic Sign Language Word Recognition is Possible!

Current technology can recognize at practical levels:

  • Standard expressions: Daily sign language like “thank you” and “please”
  • Numbers and simple words: Finger spelling and number expressions can be recognized with relatively high accuracy
2. First Step Toward Improved Accessibility

Even if not perfect, it can provide significant value:

  • Communication support between hearing-impaired and hearing individuals: Support for basic communication
  • Simple communication in emergencies: As a means to quickly convey important information
  • Promoting interest and understanding of sign language: Applications in sign language learning apps and interactive educational materials
3. Expectations for Future Technology Development
  • Improved accuracy through hardware evolution: Higher resolution cameras, wider field of view, faster processing
  • Combination with machine learning and AI: Improved pattern recognition accuracy and adaptation to individual differences
  • Utilizing EyeSight: Apple Vision Pro’s EyeSight feature allows the wearer’s facial expressions to be visible externally, addressing the importance of facial expressions in sign language

Demo video

AnchorEntity implementation and HandGestureKit demo

Sign language detection demo

Conclusion

visionOS hand tracking capabilities open new possibilities for natural user interfaces.
Using frameworks like HandGestureKit, developers can more easily implement complex gesture recognition systems.

While current technology has limitations, proper design and optimization enable creation of practical and responsive gesture-based applications.
As spatial computing continues to evolve, these technologies will become more sophisticated, and by leveraging other tools like AI, I look forward to seeing more accessible and accurate tools being implemented!

Resources

0

COMMENT

Your email address will not be published. Required fields are marked *

CAPTCHA