Observation
Inspect the device screen, UI hierarchy, system state, and app performance.
Tools for understanding what's on screen and how the device is performing. Prefer lightweight tools (get_what_changed, get_textual_state) over screenshots when possible — they're faster and cheaper.
get_visual_state
Capture a screenshot of the current screen. Returns a base64-encoded PNG image.
Best for first looks at new screens, visual regression checks, or when you need pixel-level detail. After interactions, prefer get_what_changed.
Returns an image. No text output.
get_textual_state
Get a structured hierarchy of every UI element on screen — element type, text content, bounding box coordinates, and interactability. Use this to find tap targets.
Window: com.example.myapp
LinearLayout [0,0][1080,2400]
TextView "Welcome back" [120,300][960,380] clickable=false
EditText "Email" [120,420][960,520] clickable=true focused=false
EditText "Password" [120,560][960,660] clickable=true focused=false
Button "Log In" [120,700][960,800] clickable=true
TextView "Forgot password?" [340,840][740,900] clickable=trueCoordinates are in the format [left,top][right,bottom]. Use the center of an element's bounds for tap and long_press.
get_general_state
Full device snapshot in a single call: foreground app, orientation, display dimensions, keyboard visibility, battery, connectivity, volume, GPS, clipboard, telephony, panel state, dark mode, and animation settings.
My App (com.example.myapp) is visible. Device is a phone, in portrait mode
at 1080x2400 @432dpi. The keyboard is hidden. Battery is at 87% and not
charging. WiFi on, Bluetooth off, Location on, Airplane mode off,
Auto-rotate on. Volume 5/15, silent mode off. GPS at 37.774929, -122.419416.
Clipboard is empty. No active call. All panels collapsed. Dark mode inactive.
Animations enabled. Font scale 1x.One call replaces many separate queries. Start here when you need to orient yourself.
get_what_changed
Diff the current screen against the state before the last action. Shows elements added, removed, and modified.
Elements added:
Toast "Login successful" [120,2100][960,2180]
Elements removed:
Button "Log In" [120,700][960,800] (was clickable)
Elements changed:
TextView "Welcome back" -> "Welcome, Alice"Interaction tools already return this diff inline. Use get_what_changed only when you need to re-examine the last change.
get_performance_metrics
Rendering performance for a specific app: total frames, janky frame count and percentage, render time at the 50th, 90th, 95th, and 99th percentiles.
| Parameter | Type | Required | Description |
|---|---|---|---|
packageName | string | Yes | App package name |
Performance for com.example.myapp:
Frames: 342 total, 12 janky (3.5%)
Render time: 50th=8ms, 90th=18ms, 95th=24ms, 99th=67msget_memory_info
Current memory usage for a specific app: native heap, Dalvik heap, and total PSS.
| Parameter | Type | Required | Description |
|---|---|---|---|
packageName | string | Yes | App package name |
Memory for com.example.myapp:
Native heap: 18.3 MB
Dalvik heap: 9.7 MB
Total PSS: 61.2 MB