1. Gabatarwa & Bayyani

Haske wani abu ne na asali amma wanda aka sani da wahalar sarrafa shi a cikin bidiyon da AI ke samarwa. Yayin da samfuran rubutu-zuwa-bidiyo (T2V) suka sami ci gaba mai mahimmanci, raba da kuma aiwatar da yanayin haske akai-akai ba tare da la'akari da ma'anar wurin ba har yanzu babban kalubale ne. LumiSculpt ya magance wannan gibi kai tsaye. Sabon tsari ne wanda ke gabatar da sarrafa haske daidai, wanda mai amfani ya ƙayyade, akan ƙarfi, wuri, da kuma tafiyarsa a cikin samfuran yada bidiyo. Sabon abu na tsarin yana da bangarori biyu: na farko, ya gabatar da LumiHuman, sabon bayanan bidiyoyin hotunan mutane sama da 220,000 tare da sanannun sigogin haske, yana magance matsalar ƙarancin bayanai mai mahimmanci. Na biyu, yana amfani da na'urar haɗawa kai tsaye, mai iya koyo, wacce ke shigar da yanayin haske cikin samfuran T2V da aka riga aka horar da su ba tare da lalata wasu siffofi kamar abun ciki ko launi ba, yana ba da damar samun bidiyo mai inganci, mai daidaitaccen motsin haske daga sauƙaƙan bayanin rubutu da hanyoyin haske.

2. Tsarin Aiki na Asali: Tsarin LumiSculpt

An tsara tsarin LumiSculpt don haɗawa da sarrafa shi cikin sauƙi. Mai amfani yana ba da umarni ta rubutu da ke bayyana wurin da kuma ƙayyadaddun tushen haske na zahiri (misali, tafiya, ƙarfi). Tsarin daga nan yana amfani da abubuwan da aka horar da su don samar da bidiyo inda haske ke ci gaba daidai gwargwado bisa ga umarnin mai amfani.

2.1 Bayanan LumiHuman

Wani babban matsalar cikin binciken sarrafa haske shine rashin isassun bayanai da suka dace. Bayanan da ake da su kamar waɗanda aka samo daga matakan haske (misali, Digital Emily) suna da inganci amma suna da tsauri kuma ba su dace da horarwa na samarwa ba. An gina LumiHuman a matsayin madadin mai sassauƙa. Ta amfani da injin zahiri na baje kolin, yana samar da bidiyoyin hotunan mutane inda sigogin haske (shugabanci, launi, ƙarfi) aka san su daidai kuma ana iya sake haɗa su cikin 'yanci a cikin firam. Wannan hanyar "gina tubalan" tana ba da damar kwaikwayon kusan iyakataccen iri-iri na hanyoyin haske da yanayi, yana samar da bayanan horo iri-iri da ake buƙata don samfurin ya koyi wakilcin haske da aka raba.

Bayanan LumiHuman a Sauƙaƙe

  • Girman: >220,000 jerin bidiyo
  • Abun ciki: Hotunan mutane tare da haske mai sigogi
  • Siffa Mai Muhimmanci: Firam masu haɗawa cikin 'yanci don hanyoyin haske iri-iri
  • Gini: Baje kolin injin zahiri tare da sanannun sigogin haske

2.2 Wakilcin Haske & Sarrafawa

Maimakon ƙirƙirar daidaitattun lissafin jigilar haske, LumiSculpt ya ɗauki wakilci mai sauƙi amma mai inganci. Yanayin haske na firam ana ƙididdige shi azaman vector mai ƙarancin girma wanda ke ɓoye siffofin tushen haske da ake zato (misali, daidaitawar sararin samaniya don shugabanci, ma'auni don ƙarfi). An raba wannan wakilcin da gangan daga albedo na saman da lissafin lissafi, yana mai da hankalin ƙarfin samfurin akan koyon tasirin haske. Ana aiwatar da sarrafa mai amfani ta hanyar ayyana jerin waɗannan vector na sigogi—"hanyar haske"—a kan lokaci, wanda samfurin daga nan yana yin sharadi a lokacin samar da bidiyo.

2.3 Tsarin Na'urar Haɗawa Kai Tsaye

Ginshiƙin LumiSculpt shine na'urar hanyar sadarwar jijiyoyi mai sauƙi wacce ke aiki a cikin U-Net na cire hayaniya na samfurin yada ɓoyayyen bayanai. Yana ɗaukar shigarwa guda biyu: ɓoyayyen lambar ɓoyayye $z_t$ a lokacin t da kuma vector na sigogin haske $l_t$ don firam da aka yi niyya. Fitowar na'urar shine siginar daidaita fasali (misali, ta hanyar canjin fasalin sarari ko hankali na giciye) wanda aka shigar cikin takamaiman yadudduka na U-Net. Mafi mahimmanci, ana horar da wannan na'urar daban akan bayanan LumiHuman yayin da ake daskare nauyin samfurin T2V na asali. Wannan dabarar "haɗawa kai tsaye" tana tabbatar da cewa za a iya ƙara ikon sarrafa haske zuwa samfuran da ake da su ba tare da tsadar sake horarwa gabaɗaya ba kuma yana rage tsangwama tare da ilimin da samfurin ya riga ya sani na ma'ana da salo.

3. Cikakkun Bayanai na Fasaha & Tsarin Lissafi

LumiSculpt ya ginu akan tsarin samfurin yada ɓoyayyen bayanai (LDM). Manufar ita ce koyon tsarin cire hayaniya mai sharadi $\epsilon_\theta(z_t, t, c, l_t)$, inda $c$ shine sharadin rubutu kuma $l_t$ shine sharadin haske a matakin samarwa $t$. Ana horar da na'urar sarrafa haske $M_\phi$ don yin hasashen taswirar daidaitawa $\Delta_t = M_\phi(z_t, l_t)$. Ana amfani da wannan taswira don daidaita fasali a cikin mai cire hayaniya na asali: $\epsilon_\theta^{adapted} = \epsilon_\theta(z_t, t, c) + \alpha \cdot \Delta_t$, inda $\alpha$ shine ma'aunin sikelin. Manufar horon tana rage asarar sake gini tsakanin firam ɗin bidiyon da aka samar da kuma firam ɗin baje kolin gaskiya daga LumiHuman, tare da sharadin haske $l_t$ a matsayin siginar sharadi mai mahimmanci. Wannan yana tilasta na'urar danganta vector na sigogi da tasirin haske na gani da ya dace.

4. Sakamakon Gwaji & Bincike

Takardar ta nuna ingancin LumiSculpt ta hanyar cikakkun kimantawa.

4.1 Ma'auni na Ƙididdiga

An auna aikin ta amfani da ma'auni na ingancin bidiyo na yau da kullun (misali, FVD, FID-Vid) da samfuran T2V na asali ba tare da sarrafa haske ba. Mafi mahimmanci, an ƙirƙiri ma'auni na al'ada don daidaiton haske, mai yiwuwa sun haɗa da auna alaƙa tsakanin wurin haske da aka yi niyya/hanyar ƙarfi da hasken da ake gani a cikin bidiyon fitarwa a cikin firam. Sakamakon ya nuna LumiSculpt yana kiyaye ingancin samfurin asali yayin da yake inganta bin ƙayyadaddun yanayin haske sosai.

4.2 Kimantawa ta Hanyar Hali & Nazarin Masu Amfani

Hoto na 1 a cikin PDF (wanda aka bayyana a ra'ayi) yana nuna sakamakon da aka samar. Zai nuna jerin abubuwan da tushen haske ke motsawa cikin sauƙi a kusa da wani abu—misali, daga hagu zuwa dama a kan fuska—tare da inuwoyi masu daidaito da fitattun abubuwa suna bin hanyar da aka tsara. Nazarin masu amfani mai yiwuwa ya ƙididdige fitarwar LumiSculpt mafi girma don gaskiyar haske, daidaito, da ikon sarrafawa idan aka kwatanta da ƙoƙarin amfani da umarni na rubutu kawai (misali, "haske yana motsawa daga hagu") a cikin samfuran yau da kullun, waɗanda galibi suna haifar da ƙyalli ko haske mara ma'ana.

4.3 Nazarin Cire Sassa

Nazarin cire sassa ya tabbatar da wajibcin kowane ɓangare: horo ba tare da bayanan LumiHuman ba ya haifar da rashin haɗawa mai kyau; amfani da wakilcin haske mai haɗaka (kamar cikakkun taswirorin muhalli na HDR) ya rage daidaiton sarrafawa; kuma gyara kai tsaye samfurin asali maimakon amfani da na'urar haɗawa kai tsaye ya haifar da mantawa da sauran damar samarwa.

5. Tsarin Bincike & Nazarin Lamari

Nazarin Lamari: Ƙirƙirar Wurin Magana Mai Ban Sha'awa
Manufa: Samar da bidiyon mutum yana gabatar da magana ɗaya, inda haske ya fara a matsayin hasken maɓalli mai tsauri, gefe sannan a hankali yana tausasawa kuma yana kewaye yayin da sautin motsin rai ya zama mai bege.

  1. Ƙayyadaddun Shigarwa:
    • Umarin Rubutu: "ɗan wasan kwaikwayo mai matsakaicin shekaru tare da bayyanar tunani, a cikin ɗakin atisaye mara kyau, harbi kusa."
    • Hanyar Haske: Jerin vector na haske inda:
      • Firam 0-30: Shugabanci na haske a kusan digiri 80 daga axis na kyamara (haske mai wuya na gefe), ƙarfi mai girma.
      • Firam 31-60: Shugabanci yana motsawa a hankali zuwa kusan digiri 45, ƙarfi ya ragu kaɗan.
      • Firam 61-90: Shugabanci ya kai kusan digiri 30 (cike mai laushi), ƙarfi ya ragu ƙara, sigar hasken cike na biyu a hankali yana ƙaruwa.
  2. Sarrafa LumiSculpt: Na'urar haɗawa kai tsaye tana fassara vector haske na kowane firam $l_t$. Yana daidaita tsarin yadawa don jefa inuwoyi masu ƙarfi, masu ƙayyadaddun suna a farkon, waɗanda daga nan suke tausasawa da rage bambanci yayin da vector ke canzawa, yana kwaikwayon ƙara mai yadawa ko tushen motsi.
  3. Fitarwa: Bidiyo mai daidaito inda canjin haske ya daidaita a gani kuma yana goyan bayan baka na labari, ba tare da shafar bayyanar ɗan wasan ko cikakkun bayanai na ɗakin ba. Wannan yana nuna sarrafa sararin samaniya da lokaci daidai wanda ba za a iya samu tare da rubutu kawai ba.

6. Ra'ayin Mai Nazarin Masana'antu

Fahimta ta Asali

LumiSculpt ba wani ƙarin ci gaba ne kawai a cikin ingancin bidiyo ba; yana da mataki na dabara don sanya fim ɗin ƙwararrun fim ɗin ya zama kayan amfani. Ta hanyar raba haske daga samar da wurin, yana ƙirƙirar sabon "yadudduka na haske" don bidiyon AI, kama da yaduddukan daidaitawa a cikin Photoshop. Wannan yana magance babban matsalar ci gaba a cikin ƙirƙirar abun ciki na ƙwararru inda saitin haske ke ɗaukar lokaci, ƙwarewa, da albarkatu. Haƙiƙanin tsayawar ƙima shine ba wa masu ƙirƙira—daga masu shirya fina-finai na indie zuwa ƙungiyoyin tallace-tallace—damar maimaita akan haske bayan an samar da wurin asali, canjin tsari mai girma ga aikin da farashi.

Kwararar Hankali & Matsayin Dabara

Hankalin takardar yana da wayo ta kasuwanci: gano ƙima da aka kulle (sarrafa haske) → magance matsalar bayanai ta asali (LumiHuman) → ƙirƙira hanyar haɗawa mara rushewa (na'urar haɗawa kai tsaye). Wannan yayi daidai da littafin wasan nasara na hanyoyin sadarwa na sarrafawa kamar ControlNet don hotuna. Ta hanyar gina akan tsarin yadawa mai ƙarfi, suna tabbatar da amfani nan take. Duk da haka, mai da hankali kan hasken hotunan mutum duka wata dabara ce mai wayo da kuma iyaka. Yana ba da damar bayanai masu sarrafawa, masu tasiri amma yana barin matsalar mafi wahala na hasken wuri mai rikitarwa (hasken duniya, juyawa tsakanin juna) don aikin gaba. Suna sayar da sigar 1.0 mai haske, ba mafita ta ƙarshe ba.

Ƙarfi & Kurakurai

Ƙarfi: Ƙirar haɗawa kai tsaye ita ce siffarta mai kashewa. Yana rage shinge na karɓuwa sosai. Bayanan LumiHuman, ko da yake na roba, mafita ce mai aiki da iya aiki zuwa matsalar bincike na gaske. Takardar ta tabbatar da cewa samfurin yana bin hanyoyi bayyananne, wani nau'i na sarrafawa mafi aminci fiye da rubutu mara ma'ana.

Kurakurai & Haɗari: Giwa a cikin ɗaki shine haɗawa gabaɗaya. Hotunan mutane a cikin yanayi da aka sarrafa abu ɗaya ne; ta yaya zai sarrafa umarni mai rikitarwa kamar "jarumi a cikin daji da magriba tare da hasken tocila yana ƙyalli akan sulke"? Samfurin haske mai sauƙi mai yiwuwa ya lalace tare da tushen haske da yawa, fitulun launi, ko saman da ba na Lambert ba. Akwai kuma haɗarin dogaro: aikinsa yana daure da iyawar samfurin T2V na asali. Idan samfurin asali ba zai iya samar da jarumi ko daji mai daidaituwa ba, babu wani na'urar haske da zai iya ceton sa.

Fahimta Mai Aiki

Ga Masu Binciken AI: Gaba gaba shine motsawa daga hasken maki ɗaya zuwa sharadin taswirar muhalli. Bincika haɗa abubuwan da suka gabata na zahiri (misali, kimanta lissafin lissafi na 3D daga samfurin T2V da kansa) don sanya haske ya zama mai yiwuwa ta zahiri, kama da ci gaban juyawa baya. Ga Masu Zuba Jari & Manazojan Samfura: Wannan fasahar ta cika don haɗawa cikin rukunin gyara bidiyo da ake da su (Adobe, DaVinci Resolve) a matsayin siffa mai inganci. Kasuwar nan take ita ce tallan dijital, abun cikin kafofin watsa labarun, da kuma gani kafin. Ya kamata ayyukan gwaji su mai da hankali kan waɗannan sassan. Ga Masu Ƙirƙirar Abun ciki: Fara tunanin yadda sarrafa haske bayan samarwa zai iya canza tsarin zane-zanen labarin ku da tsarin samar da kadara. Zamanin "gyara shi a cikin bayan" don bidiyon da AI ke samarwa yana zuwa da sauri fiye da yadda mutane suke tunani.

7. Ayyukan Gaba & Hanyoyin Bincike

  • Ƙarin Samfuran Haske: Haɗa cikakkun taswirorin muhalli na HDR ko filayen haske na jijiyoyi (NeRFs) don haske mafi rikitarwa, na gaskiya daga kowane shugabanci.
  • Gyara Mai Mu'amala & Bayan Samarwa: Haɗa na'urori irin na LumiSculpt cikin Masu Gyara marasa Layi (NLEs) don ba wa daraktoci damar sake haskaka wuraren da AI ke samarwa bayan samarwa.
  • Canja Haske Tsakanin Hanyoyi: Yin amfani da hoton tunani ɗaya ko guntun bidiyo don ciro da kuma amfani da salon haske zuwa bidiyon da aka samar, yana gina gada tsakanin sarrafa sigogi bayyananne da tunanin fasaha.
  • Horo Mai Cike da Kimiyyar Lissafi: Haɗa daidaitattun lissafin baje kolin ko na'urorin baje kolin daban-daban cikin madauki na horo don inganta daidaiton zahiri, musamman ga inuwoyi masu wuya, fitattun abubuwa, da bayyananne.
  • Bayyan Hotunan Mutum: Ƙara girman hanyar zuwa wuraren 3D gabaɗaya, abubuwa, da muhalli mai ƙarfi, wanda zai buƙaci bayanai mafi rikitarwa da fahimtar wuri.

8. Nassoshi

  1. Zhang, Y., Zheng, D., Gong, B., Wang, S., Chen, J., Yang, M., Dong, W., & Xu, C. (2025). LumiSculpt: Enabling Consistent Portrait Lighting in Video Generation. arXiv preprint arXiv:2410.22979v2.
  2. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10684-10695).
  3. Blattmann, A., Rombach, R., Ling, H., Dockhorn, T., Kim, S. W., Fidler, S., & Kreis, K. (2023). Align your latents: High-resolution video synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  4. Zhang, L., Rao, A., & Agrawala, M. (2023). Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 3836-3847). (ControlNet)
  5. Debevec, P., Hawkins, T., Tchou, C., Duiker, H. P., Sarokin, W., & Sagar, M. (2000). Acquiring the reflectance field of a human face. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques (pp. 145-156).
  6. Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2021). Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1), 99-106.
  7. Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125-1134). (Pix2Pix)