Bitinforming Intelligence
2026.05.19 // 06:36 UTC
Buy me a coffee
Track Record · last 365 days
50.7%
5-day hit rate · n=438
+8.14%
20-day alpha vs SPY · n=61
74%
20-day win rate vs benchmark
537
of 657 calls scored
Top call: RKLB +68.7% over 5d (2026-05-07)
Worst call: OKLO -26.3% over 5d (2026-05-11)
Executive Brief
rrfp is the dominant theme today — 7 stories surfaced, running 7× its baseline.
Lead story: Data-Driven Dynamic Modeling of a Tendon-Actuated Continuum Robot.
Also notable: DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention — touches MSFT, GOOGL, META, NVDA, MDB.
4
Large-cap exposure
5
Market movers
arXiv Preprints9
AX2026-05-18T17:59:02Z
Spatial intelligence unfolds through a perception-action loop: agents act to acquire observations, and reason about how observations vary as a function of action. Rather than passively processing what is seen, they actively uncover what is unseen - occluded structure, dynamics, containment, and functionality that cannot be resolved from passive sensing alone. We move beyond prior formulations of spatial intelligence that assume oracle observations by recasting the observer as an actor. We introduce ESI-BENCH, a comprehensive benchmark for embodied spatial intelligence spanning 10 task categories and 29 subcategories built on OmniGibson, grounded in Spelke's core knowledge systems. Agents must decide what abilities to deploy - perception, locomotion, and manipulation - and how to sequence them to actively accumulate task-relevant evidence. We conduct extensive experiments on state-of-the-art MLLMs and find that active exploration substantially outperforms passive counterparts, with agents spontaneously discovering emergent spatial strategies without explicit instructions, while random multi-view often adds noise rather than signal despite consuming far more images. Most failures stem not from weak perception but from action blindness: poor action choices lead to poor observations, which in turn drive cascading errors. While explicit 3D grounding stabilizes reasoning on depth-sensitive tasks, imperfect 3D representation proves more harmful than 2D baselines by distorting spatial relations. Human studies further reveal that unlike humans who seek falsifying viewpoints and revise beliefs under contradiction, models commit prematurely with high confidence regardless of evidence quality, exposing a metacognitive gap that neither better perception nor more embodied interaction alone can close.
AX2026-05-18T17:52:14Z
Here's a simpler version: --- Robots struggle to navigate new places because they can't learn useful lessons from past experiences — each trip starts from scratch. We built **Robo-Cortex**, a system that lets robots teach themselves how to navigate better over time. Instead of just reacting, Robo-Cortex writes down what worked and what didn't in plain language, building up a library of navigation tips it can reuse later. **How it works:** - **Autonomous Knowledge Induction (AKI):** Turns past trips into a structured library of navigation rules. - **Dual-Grain Cognitive Memory:** Two layers of memory — a short-term one that tracks current progress, and a long-term one that stores reusable do's and don'ts. - **Imagine-then-Verify loop:** Before acting, the robot simulates what might happen and a vision-language model checks the plan. **Results:** On three benchmarks (IGNav, AR, AEQA), Robo-Cortex beats the best existing methods — up to +4.16% SPL on familiar tasks, and up to +15.30% SPL when applying its lessons to new environments. Early real-robot tests back this up.
AX2026-05-18T17:51:34Z
Evaluating embodied systems on real dexterous hardware requires more than isolated primitive skills: an agent must perceive a changing tabletop scene, choose a context-appropriate action, execute it with a dexterous hand, and leave the scene usable for later decisions. We introduce DexHoldem, a real-world system-level benchmark built around Texas Hold'em dexterous manipulation with a ShadowHand. DexHoldem provides 1,470 teleoperated demonstrations across 14 Texas Hold'em manipulation primitives, a standardized physical policy benchmark, and an agentic perception benchmark that tests whether agents can recover the structured game state needed for embodied decision making. On primitive execution, $π_{0.5}$ obtains the highest task completion rate ($61.2%$), while $π_{0.5}$ and $π_0$ tie on scene-preserving success rate ($47.5%$). On agentic perception, Opus 4.7 obtains the best strict problem-level accuracy ($34.3%$), while GPT 5.5 obtains the best average field-wise accuracy ($66.8%$), exposing a gap between isolated visual sub-capabilities and complete routing-relevant state recovery. Finally, we instantiate the full embodied-agent loop in three case studies, where waiting, recovery dispatches, human-help requests, and repeated primitive execution reveal how perception and policy errors accumulate during closed-loop deployment. DexHoldem therefore evaluates dexterous tabletop execution, agentic perception, and embodied decision routing in a shared physical setting. Project page: https://dexholdem.github.io/Dexholdem/.
AX2026-05-18T17:50:32Z
Most VLA (Vision-Language-Action) models today only handle simple two-finger grippers or one dexterous hand. Two-finger control is easy enough without VLA, but dexterous hands really need full end-to-end learning to work well. We built Dexora, the first open-source VLA system made for two arms and two dexterous hands at the same time. To collect training data, we built a hybrid teleoperation setup: - An exoskeleton backpack tracks arm motion - Apple Vision Pro tracks finger motion (no markers needed) - The same setup drives both a real robot and a matching MuJoCo simulation With this, we gathered: - 100K simulated trajectories (6.5M frames) - 10K real teleoperated episodes (2.92M frames) Teleop data is often noisy, so we trained a discriminator to score each clip. The diffusion-transformer policy then learns less from low-quality clips. Results: - 66.7% average success on dexterous tasks (vs. 51.7% for baselines) - 90% success on basic tasks - Strong generalization to new objects and robot bodies Ablations show real data and the discriminator both matter for dexterity.
AX2026-05-18T17:50:22Z
This paper compares data-driven methods (N4SID, ARX, SINDYc) for modeling a tendon-driven continuum robot built at CERN. These robots are hard to model because they are nonlinear, high-dimensional, and dominated by friction. Experiments show that a simple two-degree-of-freedom model captures the dynamics well, since the joints move in strongly linked ways. The models match experimental data and work inside a model predictive controller for real-time control.
AX2026-05-18T17:59:52Z
Here's a simplified version: --- Current attention methods like NSA and InfLLMv2 pick the top-k most relevant chunks of text, then run detailed attention on them. But this has two problems: it always picks the same number of chunks regardless of the query, and it blocks gradients from flowing between the coarse and fine stages. We introduce **DashAttention** (Differentiable and Adaptive Sparse Hierarchical Attention). Instead of top-k, it uses α-entmax to pick a flexible number of chunks based on the query. This keeps the whole process differentiable end-to-end. Unlike other hierarchical methods, DashAttention is non-dispersive, which helps it handle long contexts better. In LLM experiments, it matches full attention's accuracy at 75% sparsity and beats NSA and InfLLMv2, especially when sparsity is high. We also built a GPU-optimized Triton version that runs faster than FlashAttention-3 at inference. In short, DashAttention is a cheaper way to handle long contexts.
AX2026-05-18T17:59:18Z
Pipeline parallelism is a key technique for scaling large-model training, but modern workloads exhibit runtime variability in computation and communication. Existing pipeline systems typically consume static, profiled, or adaptively generated schedules as pre-committed execution orders. When realized task readiness diverges from the pre-committed order, stages may wait for not-yet-ready work even though other executable work is available, creating stage misalignment, idle bubbles, and reduced utilization. We present Runtime-Readiness-First Pipeline (RRFP), a readiness-driven runtime for pipeline-parallel training. RRFP changes how schedules are consumed at runtime: instead of treating a schedule as a sequence that stages must wait to follow, it treats the schedule as a non-binding hint order for ranking currently ready work. To support this model, RRFP combines message-driven asynchronous communication, lightweight tensor-parallel coordination for collective consistency, and ready-set arbitration for low-overhead dispatch. We implement RRFP in a Megatron-based training framework and evaluate it on language-only and multimodal workloads at up to 128 GPUs. RRFP improves over fixed-order pipeline baselines across all settings. Using the BFW hint, RRFP achieves up to 1.77$\times$ speedup on language-only workloads and up to 2.77$\times$ on multimodal workloads. In cross-framework comparisons, RRFP with the default BF hint outperforms the faster available external system by up to 1.84$\times$ while preserving training correctness.
AX2026-05-18T17:59:00Z
Diffusion models often use inference-time guidance—like drift terms or expert reweighting—to improve sample quality for specific tasks. But most methods need repeated score or gradient evaluations, which are biased, slow, or both. We introduce **URGE** (Unbiased Resampling via Girsanov Estimation), a derivative-free algorithm that reweights trajectories using a Girsanov change of measure. Instead of computing gradient-based particle weights, URGE attaches a simple multiplicative weight to each simulated path and resamples periodically. No scores, Hessians, or PDEs needed. We prove that path-wise and particle-wise SMC are equivalent: the Girsanov path weight, when averaged backward, recovers the standard particle weights, so both methods produce the same unbiased result. In experiments, URGE beats existing inference-time guidance baselines on synthetic tests and diffusion benchmarks. It generates better samples, is simpler to implement, and is fully gradient-free.
AX2026-05-18T17:57:04Z
Here's a simplified version --- Large multimodal AI models (MLLMs) often miss small but important details in images. We noticed something interesting: when you crop the image to show just the relevant part, the same model answers correctly. When you give it the full image, it fails. So the problem isn't that the model can't recognize things — it's that it can't focus on what matters. To fix this, we built **Vision-OPD (Vision On-Policy Distillation)**. The idea: let the model teach itself. We run the same model in two modes: - **Teacher**: sees the cropped, relevant part of the image - **Student**: sees the full image The student tries to answer, and we train it to match the teacher's predictions token by token. Over time, the student learns to "zoom in" mentally without anyone cropping for it. What's nice about this approach: - No bigger teacher model needed - No labeled answers needed - No reward model needed - No extra tools at inference time In tests on fine-grained visual benchmarks, Vision-OPD matches or beats much larger models — including closed-source ones and agentic "Thinking-with-Images" systems.
Patent Filings5
PT2020-06-02
Here's a simplified version: For example, in modern **robotic automation**, a part may be complex with many features like holes. The relationships between these features help the robot quickly learn part positions and other process settings.
PT2014-12-10
Method for teaching robot movement using a system with a robot, a robot controller (with automatic and teach modes), and a PLC connected to the robot controller.
PT2019-05-07
A method for programming a robot — especially one with a robotic arm — where you set up its movement using a preset motion template chosen from a database.
PT2025-10-03
A robotic controller for a robotic arm. It includes a first spatial shaping module that shapes a first-space target motion by combining it with a pulse train, producing a smoothed motion command.
PT2012-08-15
Here's a simplified version: A robotic arm controller for motion and logic control, where the arm has multiple joints driven by several servo motors to control its movement path.
Accelerating Keywords15
#01
rrfp
7.53
7↑ / base 1.0
#02
dashattention
7.53
6↑ / base 0.9
#03
continuum
7.52
3↑ / base 0.4
#04
dexholdem
7.44
6↑ / base 0.9
#05
vision-opd
7.43
5↑ / base 0.7
#06
runtime
7.38
5↑ / base 0.7
#07
hierarchical
7.35
4↑ / base 0.6
#08
hierarchical attention
7.35
4↑ / base 0.6
#09
continuum robot
7.35
2↑ / base 0.3
#10
data-driven
7.35
2↑ / base 0.3
#11
modeling tendon-actuated
7.35
2↑ / base 0.3
#12
tendon-actuated
7.35
2↑ / base 0.3
#13
tendon-actuated continuum
7.35
2↑ / base 0.3
#14
robo-cortex
7.34
5↑ / base 0.7
#15
workloads
7.30
4↑ / base 0.6
Market Movers5
#01   ARXIV
IMPACT
6.67
novelty spike·real-time/edge
Why it matters — Operators get tendon-driven robots that self-calibrate from sensor data, cutting hand-tuning and slashing downtime on surgical, inspection, and warehouse arms where rigid models drift after wear.
Why it matters — Investors and engineers see a path to deploying soft/continuum manipulators on edge hardware: lower compute, faster control loops, and a moat against rivals stuck with brittle analytical models.
#02   ARXIV
IMPACT
4.83
MSFT$423.54 (+0.38%) GOOGL$396.94 (+0.04%) META$611.21 (-0.49%) NVDA$222.32 (-1.33%) MDB$330.00 (+5.72%) novelty spike·multi-ticker·large-cap exposure·efficiency/cost·hardware
Why it matters — Sparse attention cuts inference cost on long contexts, where token spend is highest. Operators serving chatbots, RAG, and agents see lower GPU bills; NVDA still wins on volume, but margin pressure shifts toward efficient-model providers.
Why it matters — Differentiable sparsity is trainable, not hand-tuned — meaning hyperscalers (MSFT, GOOGL, META) can ship it without retraining from scratch. MDB benefits as cheaper long-context unlocks larger vector/document workloads.
#03   ARXIV
IMPACT
3.58
MSFT$423.54 (+0.38%) GOOGL$396.94 (+0.04%) NVDA$222.32 (-1.33%) META$611.21 (-0.49%) novelty spike·multi-ticker·large-cap exposure·benchmark lead·safety/alignment·hardware
Why it matters — Cuts idle GPU stalls in pipeline-parallel training, lifting throughput on heterogeneous clusters; operators get cheaper trillion-param runs, NVDA/MSFT/GOOGL/META gain margin headroom on existing fleets without new silicon.
Why it matters — Engineers get a scheduling primitive that absorbs stragglers and preemptions automatically, shrinking training wall-clock and reducing reruns; investors see capex efficiency improve as utilization climbs on the same H100/TPU base.
#04   ARXIV
IMPACT
1.75
MDB$330.00 (+5.72%) MSFT$423.54 (+0.38%) GOOGL$396.94 (+0.04%) PATH$10.64 (+3.60%) BBAI$3.92 (-3.92%) AI$8.75 (+1.16%) multi-ticker·large-cap exposure·open-source release·benchmark lead·safety/alignment
Why it matters — Impacts multiple players/supply chain; effects may propagate.
Why it matters — Large-cap ties can amplify market impact and adoption.
#05   ARXIV
IMPACT
1.67
MSFT$423.54 (+0.38%) GOOGL$396.94 (+0.04%) META$611.21 (-0.49%) NVDA$222.32 (-1.33%) PATH$10.64 (+3.60%) BBAI$3.92 (-3.92%) AI$8.75 (+1.16%) multi-ticker·large-cap exposure·open-source release·benchmark lead·efficiency/cost
Why it matters — Fine-detail vision unlocks higher-value enterprise workloads (document AI, medical imaging, industrial inspection) where current MLLMs fumble — expanding TAM for hyperscaler APIs (MSFT, GOOGL, META) and inference demand on NVDA silicon.
Why it matters — On-policy self-distillation cuts the labeled-data tax versus RLHF-style pipelines, lowering training cost per capability gain; operators shipping vertical agents (PATH, BBAI, AI) get a cheaper path to differentiated accuracy without proprietary datasets.
High-Potential Items5
arXiv
Why it matters — Operators get continuum robots that hit targets without hand-tuned controllers; investors see surgical/inspection robotics moving from lab demos toward shippable products with shorter calibration cycles.
Why it matters — Engineers can skip deriving Cosserat-rod physics and instead fit a data-driven model from sensor logs, cutting integration time for soft/tendon-driven hardware in new geometries.
arXiv
Why it matters — Cuts long-context inference cost by learning *where* to attend, so operators run 100K+ token agents at a fraction of the GPU spend and investors see clearer unit economics on context-heavy products.
Why it matters — Engineers get a drop-in attention module that's differentiable end-to-end (no hand-tuned sparsity masks), making it practical to fine-tune retrieval/RAG and document-analysis models without rewriting the stack.
arXiv
Why it matters — Pipeline-parallel training stalls when one stage lags; a readiness-driven scheduler reclaims idle GPU time, cutting wall-clock and dollar cost per run for operators and investors funding large training budgets.
Why it matters — Engineers get a runtime that adapts to stragglers and variable step times without manual rebalancing, so heterogeneous clusters and spot/preemptible nodes become viable infra instead of reliability liabilities.
arXiv
Why it matters — Demonstrates dexterous manipulation under adversarial, hidden-information pressure (shuffling, chip handling, bluff-timed motions), pushing robotics past scripted pick-and-place toward closed-loop tasks where perception, planning, and fine motor control must co-adapt in real time.
Why it matters — Signals a fundable robotics niche: casino/retail automation and embodied-agent benchmarks. Engineers get a reproducible testbed combining vision, multi-agent RL, and tactile control; investors see a concrete moat beyond LLM wrappers.
arXiv
Why it matters — Multimodal LLMs often miss small UI text, chart labels, and document fields. On-policy self-distillation sharpens fine-detail vision without new labels, lifting OCR-heavy workflows like receipt parsing, doc review, and screenshot QA.
Why it matters — Engineers get a cheap post-training recipe (model teaches itself from its own rollouts) that boosts accuracy on detail-bound tasks; operators and investors see lower annotation spend and fewer hallucinated reads in production pipelines.
Entity Mentions
Apple1Meta1
Google Trends (US / 1m)8
TermLast7dΔCorr
rrfp0.0
dashattention0.0
continuum25.0-0.04-0.139
dexholdem0.0
vision-opd0.0
runtime46.0+0.03-0.176
hierarchical6.00.00-0.011
hierarchical attention0.0-0.034