Doksi Docs
Device API

Observation

Inspect the device screen, UI hierarchy, system state, and app performance.

Tools for understanding what's on screen and how the device is performing. Prefer lightweight tools (get_what_changed, get_textual_state) over screenshots when possible — they're faster and cheaper.

get_visual_state

Capture a screenshot of the current screen. Returns a base64-encoded PNG image.

Best for first looks at new screens, visual regression checks, or when you need pixel-level detail. After interactions, prefer get_what_changed.

Returns an image. No text output.

get_textual_state

Get a structured hierarchy of every UI element on screen — element type, text content, bounding box coordinates, and interactability. Use this to find tap targets.

Window: com.example.myapp
  LinearLayout [0,0][1080,2400]
    TextView "Welcome back" [120,300][960,380] clickable=false
    EditText "Email" [120,420][960,520] clickable=true focused=false
    EditText "Password" [120,560][960,660] clickable=true focused=false
    Button "Log In" [120,700][960,800] clickable=true
    TextView "Forgot password?" [340,840][740,900] clickable=true

Coordinates are in the format [left,top][right,bottom]. Use the center of an element's bounds for tap and long_press.

get_general_state

Full device snapshot in a single call: foreground app, orientation, display dimensions, keyboard visibility, battery, connectivity, volume, GPS, clipboard, telephony, panel state, dark mode, and animation settings.

My App (com.example.myapp) is visible. Device is a phone, in portrait mode
at 1080x2400 @432dpi. The keyboard is hidden. Battery is at 87% and not
charging. WiFi on, Bluetooth off, Location on, Airplane mode off,
Auto-rotate on. Volume 5/15, silent mode off. GPS at 37.774929, -122.419416.
Clipboard is empty. No active call. All panels collapsed. Dark mode inactive.
Animations enabled. Font scale 1x.

One call replaces many separate queries. Start here when you need to orient yourself.

get_what_changed

Diff the current screen against the state before the last action. Shows elements added, removed, and modified.

Elements added:
  Toast "Login successful" [120,2100][960,2180]

Elements removed:
  Button "Log In" [120,700][960,800] (was clickable)

Elements changed:
  TextView "Welcome back" -> "Welcome, Alice"

Interaction tools already return this diff inline. Use get_what_changed only when you need to re-examine the last change.

get_performance_metrics

Rendering performance for a specific app: total frames, janky frame count and percentage, render time at the 50th, 90th, 95th, and 99th percentiles.

ParameterTypeRequiredDescription
packageNamestringYesApp package name
Performance for com.example.myapp:
  Frames: 342 total, 12 janky (3.5%)
  Render time: 50th=8ms, 90th=18ms, 95th=24ms, 99th=67ms

get_memory_info

Current memory usage for a specific app: native heap, Dalvik heap, and total PSS.

ParameterTypeRequiredDescription
packageNamestringYesApp package name
Memory for com.example.myapp:
  Native heap: 18.3 MB
  Dalvik heap: 9.7 MB
  Total PSS: 61.2 MB

On this page