3 Comments
User's avatar
Boyuan Xiao's avatar

"It is lumpy, dependent on whether the new data happens to contain novel physical phenomena or merely more instances of phenomena the model has already seen."

Not sure I get the difference to scaling language data -there are a lot of texts that say the same thing

To me the biggest limitation of large scale video data is that:

- It's 2D - there aren't many multi-cam captures with calibration data

- We can't capture forces, so limited to kinematics rather than understanding dynamics

- Frame rates are too low in most videos, e.g. we see aliasing on wheels for 30fps

Hugo's avatar

Really good point, and you're right that the original framing was weak. Your three points are sharper. I've actually updated the piece to reflect this.

Thanks for making the piece better. If you spot anything else or have other suggestions, please let me know. Thank you again

Boyuan Xiao's avatar

That's very kind!

My pleasure - love all the updates on these new paradigms