XLNet Paper Presentation at ICML 2019
Zhilin Yang presents the XLNet paper at ICML 2019, introducing a groundbreaking AI model that outperforms BERT in natural language understanding tasks.
Setting
Main presentation hall at the Long Beach Convention & Entertainment Center, filled with rows of seating and a large projection screen at the front. The hall is spacious with high ceilings and modern decor.
Characters
Zhilin Yang
primary
A lean, young Chinese man in his late 20s with short, neatly styled black hair and rectangular glasses. His posture conveys both academic rigor and the energy of a rising star in machine learning research.
Conference Moderator
secondary
A middle-aged academic with a professional demeanor, average height with a slightly rounded posture from years of desk work. His dark hair is neatly combed, with streaks of gray at the temples. He wears rectangular wire-framed glasses that give him a studious appearance.
Senior Researcher
secondary
A middle-aged man with a lean build, sharp facial features, and short, neatly trimmed salt-and-pepper hair. His piercing eyes are framed by thin, rectangular glasses that reflect the projection screen's light. He has a slight furrow between his brows, indicating deep concentration.
Graduate Student
background
A young adult in their mid-20s with a lean build, slightly tousled dark brown hair, and wire-rimmed glasses perched on their nose. Their face is framed by a few stray curls that escaped from their casual ponytail, and their bright, inquisitive eyes dart between the speaker and their notebook.
Dialog
Conference Moderator
Without further ado, it's my honor to introduce Dr. Zhilin Yang, presenting groundbreaking work on XLNet—a model that rethinks autoregressive pretraining for NLP.
Zhilin Yang
Thank you. Let me walk through how XLNet overcomes BERT's limitations—starting with permutation language modeling. (points to projection) Note the bidirectional context here—
Senior Researcher
(interrupting) Wait—how does your attention masking avoid target leakage during permutations? The ablation studies don’t isolate this—
Zhilin Yang
(smiles) Excellent question. See appendix C—we introduce segment recurrence with memory units, unlike Transformer-XL’s implementation. Here’s the empirical comparison—
Senior Researcher
(muttering while scribbling) Hmph. Still seems like an overfit risk on short sequences with that attention span…
Zhilin Yang
To preempt further questions—yes, we’ve open-sourced the code. All hyperparameters are in section 4. Now, the SQuAD 2.0 results—(click)—these ROUGE scores speak for themselves.
Conference Moderator
(raising hand) We’ll hold Q&A until after the full presentation. Zhilin, proceed when ready.